LIX, École Polytechnique, Palaiseau, France
April 6th, 2023
Background: Antibiotic resistance is a major global health concern, as bacteria can develop immunity to drugs rendering them ineffective. To address this problem, it is crucial to identify and classify the genes that are responsible for antibiotic resistance i.e., antibiotic resistant genes (ARGs). Previous methods for gene classification have mainly focused on the sequence of proteins and have ignored their structure. Recently, the AlphaFold model has made significant progress in predicting the 3D structure of proteins. Since the sequence and the structure can complement each other, having access to the both of them can allow machine learning models to more accurately classify novel ARGs. In this paper, we thus develop two deep learning models to classify novel Antibiotic Resistant Genes (ARGs) using information from both protein sequence and structure. The first architecture is a graph neural network (GNN) model equipped with node features derived from a large language model, while the second model is a convolutional neural network (CNN) applied to images extracted from the protein structure.
Results: Evaluation of the proposed models on a standard benchmark dataset of ARGs over 18 antibiotic resistance categories demonstrates that both models can achieve high accuracy in classifying ARGs (>73%). The model outperformed state-of-the-art methods and provided rich protein embeddings that could be also utilized in other tasks involving proteins. With larger datasets, it is expected that the performance would further increase due to the nature of the underlying neural networks.
Conclusions: The proposed deep learning methods offer a more accurate approach for antibiotic resistance classification and hold significant potential for improving our understanding of the mechanisms underlying antibiotic resistance.