Approaches for accelerating microbial gene function discovery using artificial intelligence

0
Approaches for accelerating microbial gene function discovery using artificial intelligence
  • Hutchison, C. A. I. et al. Design and synthesis of a minimal bacterial genome. Science 351, aad6253 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lim, Y. et al. In silico protein interaction screening uncovers DONSON’s role in replication initiation. Science 381, eadi3448 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978.e3 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rhee, H. S. & Pugh, B. F. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr. Protoc. Mol. Biol. 100, 21.24.1–21.24.14 (2012).

    Article 

    Google Scholar 

  • Gao, Y. et al. Unraveling the functions of uncharacterized transcription factors in Escherichia coli using ChIP-exo. Nucleic Acids Res. 49, 9696–9710 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kim, G. B., Gao, Y., Palsson, B. O. & Lee, S. Y. DeepTFactor: a deep learning-based tool for the prediction of transcription factors. Proc. Natl Acad. Sci. USA 118, e2021171118 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Gao, Y. et al. Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655. Nucleic Acids Res. 46, 10682–10696 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Perez-Rueda, E. & Collado-Vides, J. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 28, 1838–1847 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mejia-Almonte, C. et al. Redefining fundamental concepts of transcription initiation in bacteria. Nat. Rev. Genet. 21, 699–714 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ishihama, A., Shimada, T. & Yamazaki, Y. Transcription profile of Escherichia coli: genomic SELEX search for regulatory targets of transcription factors. Nucleic Acids Res. 44, 2058–2074 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sastry, A. V. et al. The Escherichia coli transcriptome mostly consists of independently regulated modules. Nat. Commun. 10, 5536 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rodionova, I. A. et al. Identification of a transcription factor, PunR, that regulates the purine and purine nucleoside transporter punC in E. coli. Commun. Biol. 4, 991 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Poudel, S. et al. Revealing 29 sets of independently modulated genes in Staphylococcus aureus, their regulators, and role in key physiological response. Proc. Natl Acad. Sci. USA 117, 17228–17239 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Miller, H. K. et al. The extracytoplasmic function sigma factor σS protects against both intracellular and extracytoplasmic stresses in Staphylococcus aureus. J. Bacteriol. 194, 4342–4354 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Catoiu, E. A. et al. iModulonDB 2.0: dynamic tools to facilitate knowledge-mining and user-enabled analyses of curated transcriptomic datasets. Nucleic Acids Res. 53, D99–D106 (2025).

    Article 
    PubMed 

    Google Scholar 

  • Yu, C., Zavaljevski, N., Desai, V. & Reifman, J. Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases. Proteins 74, 449–460 (2009).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Desai, D. K., Nandi, S., Srivastava, P. K. & Lynn, A. M. ModEnzA: accurate identification of metabolic enzymes using function specific profile HMMs with optimised discrimination threshold and modified emission probabilities. Adv. Bioinform 2011, 743782 (2011).

    Article 

    Google Scholar 

  • Claudel-Renard, C., Chevalet, C., Faraut, T. & Kahn, D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 31, 6633–6639 (2003).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl Acad. Sci. USA 116, 13996–14001 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kim, G. B. et al. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat. Commun. 14, 7370 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Thumuluri, V., Almagro Armenteros, J. J., Johansen, A. R., Nielsen, H. & Winther, O. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Res. 50, W228–W234 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhang, C., Freddolino, L. & Zhang, Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information. Nucleic Acids Res. 45, W291–W299 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sanderson, T., Bileschi, M. L., Belanger, D. & Colwell, L. J. ProteInfer, deep neural networks for protein functional inference. eLife 12, e80942 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, T. et al. Discovery of diverse and high-quality mRNA capping enzymes through a language model-enabled platform. Sci. Adv. 11, eadt0402 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mateus, A. et al. The functional proteome landscape of Escherichia coli. Nature 588, 473–478 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kulmanov, M., Khan, M. A., Hoehndorf, R. & Wren, J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 34, 660–668 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40, 932–937 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Abdin, O., Nim, S., Wen, H. & Kim, P. M. PepNN: a deep attention model for the identification of peptide binding sites. Commun. Biol. 5, 503 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pavlopoulos, G. A. et al. Unraveling the functional dark matter through global metagenomics. Nature 622, 594–602 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dalkiran, A. et al. ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinform. 19, 334 (2018).

    Article 
    CAS 

    Google Scholar 

  • Shi, Z. et al. Enzyme Commission number prediction and benchmarking with hierarchical dual-core multitask learning framework. Research 6, 0153 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nguyen, T. B., de Sá, A. G. C., Rodrigues, C. H. M., Pires, D. E. V. & Ascher, D. B. LEGO-CSM: a tool for functional characterization of proteins. Bioinformatics 39, btad402 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Buton, N., Coste, F. & Le Cunff, Y. Predicting enzymatic function of protein sequences with attention. Bioinformatics 39, btad620 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Han, S. R. et al. Evidential deep learning for trustworthy prediction of Enzyme Commission number. Brief. Bioinform. 25, bbad401 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Watanabe, N., Yamamoto, M., Murata, M., Kuriya, Y. & Araki, M. EnzymeNet: residual neural networks model for Enzyme Commission number prediction. Bioinform. Adv. 3, vbad173 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *