A federated graph learning method to realize multi-party collaboration for molecular discovery
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Google Scholar
Hartono, N. T. P. et al. How machine learning can help select capping layers to suppress perovskite degradation. Nat. Commun. 11, 4172 (2020).
Google Scholar
Jiang, Y. et al. Coupling complementary strategy to flexible graph neural network for quick discovery of coformer in diverse co-crystal materials. Nat. Commun. 12, 5950 (2021).
Google Scholar
Cao, Y. et al. Perovskite light-emitting diodes based on spontaneously formed submicrometre-scale structures. Nature 562, 249–253 (2018).
Google Scholar
Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).
Google Scholar
Müller, S. Small-molecule-mediated G-quadruplex isolation from human cells. Nat. Chem. 2, 1095–1098 (2010).
Google Scholar
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Google Scholar
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Google Scholar
Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).
Google Scholar
Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906 (2008).
Google Scholar
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
Google Scholar
Tan, L. et al. Tackling assay interference associated with small molecules. Nat. Rev. Chem. 8, 319–339 (2024).
Google Scholar
Durant, G. et al. The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 735–743 (2024).
Google Scholar
Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).
Zhu, W. et al. Federated learning of molecular properties with graph neural networks in a heterogeneous setting. Patterns 3, 100521 (2022).
Google Scholar
Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).
Google Scholar
Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).
Google Scholar
Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).
Veličković, P. et al. Graph attention networks. Preprint at (2018).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR, 2017).
Ning, Y. et al. GFedKRL: graph federated knowledge re-learning for effective molecular property prediction via privacy protection. In International Conference on Artificial Neural Networks. 426–438 (Springer, 2023).
Cao, X., Jia, J., Zhang, Z. & Gong, N. Z. FedRecover: recovering from poisoning attacks in federated learning using historical information. In Proc. 2023 IEEE Symposium on Security and Privacy (SP) 1366–1383 (IEEE, 2023).
Gupta, S. et al. Recovering private text in federated learning of language models. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (2022).
Zhang, K. et al. Flip: a provable defense framework for backdoor mitigation in federated learning. In International Conference on Learning Representations (ICLR, 2022).
Chen, J. et al. FederEI: federated library matching framework for electron ionization mass spectrum based compound identification. Anal. Chem. 96, 15840–15845 (2024).
Google Scholar
Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl Bur. Stand. 45, 255 (1950).
Google Scholar
Liao, R., Zhao, Z., Urtasun, R. & Zemel, R. S. Lanczosnet: multi-scale deep graph convolutional networks. In International Conference on Learning Representations (ICLR, 2019).
Olkin, I. & Rubin, H. Multivariate beta distributions and independence properties of the Wishart distribution. Ann. Math. Stat. 35, 261–269 (1964).
Google Scholar
Alaggan, M., Gambs, S. & Kermarrec, A.-M. Heterogeneous differential privacy. Journal of Privacy and Confidentiality, 7(2) (2016).
McInnes, L. et al. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Google Scholar
Pelikan, M. Bayesian Optimization Algorithm. In Hierarchical Bayesian Optimization Algorithm Vol. 170, 31–48 (Springer, Berlin, Heidelberg, 2005).
Wu, C. et al. A federated graph neural network framework for privacy-preserving personalization. Nat. Commun. 13, 3091 (2022).
Google Scholar
Liu, J., Lou, J., Xiong, L., Liu, J. & Meng, X. Projected federated averaging with heterogeneous differential privacy. Proc. VLDB Endow. 15, 828–840 (2021).
Google Scholar
Wang, L. et al. Enhancing federated learning with in-cloud unlabeled data. In Proc. IEEE 38th International Conference on Data Engineering (ICDE) 136–149 (IEEE, 2022).
Lin, T. et al. Ensemble distillation for robust model fusion in federated learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) (2020).
Li, Q. et al. Practical one-shot federated learning for cross-silo setting. Preprint at (2020).
Shao, J., Wu, F. & Zhang, J. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nat. Commun. 15, 349 (2024).
Google Scholar
Park, J. et al. Sageflow: robust federated learning against both stragglers and adversaries. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) (2021).
Xie, C. et al. Zeno++: Robust fully asynchronous SGD. In Proceedings of the 37th International Conference on Machine Learning (eds III, H. D. & Singh, A.) 10495–10503 (PMLR, 2020).
Huang, K., Xiao, C., Hoang, T. N., Glass, L. M. & Sun, J. CASTER: predicting drug interactions with chemical substructure representation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 702–709 (2020).
Li, Y. et al. An adaptive graph learning method for automated molecular interactions and properties predictions. Nat. Mach. Intell. 4, 645–651 (2022).
Google Scholar
Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).
Google Scholar
Zhang, X., Kang, Y., Chen, K., Fan, L. & Yang, Q. Trading off privacy, utility and efficiency in federated learning. In ACM Trans. Intell. Syst. Technol. 14, 1–32 (2023).
Cai, H., Zhang, H., Zhao, D., Wu, J. & Wang, L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 23, bbac408 (2022).
Google Scholar
Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).
Google Scholar
Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Google Scholar
Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025).
McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).
Farayola, O. A. et al. Data privacy and security in it: a review of techniques and challenges. Comput. Sci. IT Res. J. 5, 606–615 (2024).
Google Scholar
Weber, R. H. Internet of things—new security and privacy challenges. Comput. Law Secur. Rev. 26, 23–30 (2010).
Google Scholar
Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017).
Liu, L. et al. GEM-2: next generation molecular property prediction network by modeling full-range many-body interactions. Preprint at (2022).
Hussain, M. S., Zaki, M. J. & Subramanian, D. Triplet interaction improves graph transformers: accurate molecular graph learning with triplet graph transformers. Preprint at (2024).
Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).
Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Google Scholar
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962).
Google Scholar
Li, P. et al. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief. Bioinform. 22, bbaa266 (2021).
Google Scholar
Gao, W., Tang, Z., Zhao, J. & Chelikowsky, J. R. Efficient full-frequency GW calculations using a Lanczos method. Phys. Rev. Lett. 132, 126402 (2024).
Google Scholar
Ma, W., Lou, Q., Kazemi, A., Faraone, J. & Afzal, T. Super efficient neural network for compression artifacts reduction and super resolution. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 460–468 (2024).
Wang, S., Zhang, Z. & Zhang, T. Improved analyses of the randomized power method and block Lanczos method. Preprint at (2015).
Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 115–123 (PMLR, 2013).
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) (2019).
Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at (2019).
Zhang, L. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Zenodo (2025).
link
