A federated graph learning method to realize multi-party collaboration for molecular discovery

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

Article

Google Scholar

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

Article

Google Scholar

Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).

Article

Google Scholar

Hartono, N. T. P. et al. How machine learning can help select capping layers to suppress perovskite degradation. Nat. Commun. 11, 4172 (2020).

Article

Google Scholar

Jiang, Y. et al. Coupling complementary strategy to flexible graph neural network for quick discovery of coformer in diverse co-crystal materials. Nat. Commun. 12, 5950 (2021).

Article

Google Scholar

Cao, Y. et al. Perovskite light-emitting diodes based on spontaneously formed submicrometre-scale structures. Nature 562, 249–253 (2018).

Article

Google Scholar

Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

Article

Google Scholar

Müller, S. Small-molecule-mediated G-quadruplex isolation from human cells. Nat. Chem. 2, 1095–1098 (2010).

Article

Google Scholar

Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

Article

Google Scholar

Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

Article

Google Scholar

Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).

Article

Google Scholar

Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906 (2008).

Article

Google Scholar

Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

Article

Google Scholar

Tan, L. et al. Tackling assay interference associated with small molecules. Nat. Rev. Chem. 8, 319–339 (2024).

Article

Google Scholar

Durant, G. et al. The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 735–743 (2024).

Article

Google Scholar

Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).

Google Scholar

Zhu, W. et al. Federated learning of molecular properties with graph neural networks in a heterogeneous setting. Patterns 3, 100521 (2022).

Article

Google Scholar

Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).

Article

Google Scholar

Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).

Article

Google Scholar

Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).

Veličković, P. et al. Graph attention networks. Preprint at (2018).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR, 2017).

Ning, Y. et al. GFedKRL: graph federated knowledge re-learning for effective molecular property prediction via privacy protection. In International Conference on Artificial Neural Networks. 426–438 (Springer, 2023).

Cao, X., Jia, J., Zhang, Z. & Gong, N. Z. FedRecover: recovering from poisoning attacks in federated learning using historical information. In Proc. 2023 IEEE Symposium on Security and Privacy (SP) 1366–1383 (IEEE, 2023).

Gupta, S. et al. Recovering private text in federated learning of language models. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (2022).

Zhang, K. et al. Flip: a provable defense framework for backdoor mitigation in federated learning. In International Conference on Learning Representations (ICLR, 2022).

Chen, J. et al. FederEI: federated library matching framework for electron ionization mass spectrum based compound identification. Anal. Chem. 96, 15840–15845 (2024).

Article

Google Scholar

Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl Bur. Stand. 45, 255 (1950).

Article
MathSciNet

Google Scholar

Liao, R., Zhao, Z., Urtasun, R. & Zemel, R. S. Lanczosnet: multi-scale deep graph convolutional networks. In International Conference on Learning Representations (ICLR, 2019).

Olkin, I. & Rubin, H. Multivariate beta distributions and independence properties of the Wishart distribution. Ann. Math. Stat. 35, 261–269 (1964).

Article
MathSciNet

Google Scholar

Alaggan, M., Gambs, S. & Kermarrec, A.-M. Heterogeneous differential privacy. Journal of Privacy and Confidentiality, 7(2) (2016).

McInnes, L. et al. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

Article

Google Scholar

Pelikan, M. Bayesian Optimization Algorithm. In Hierarchical Bayesian Optimization Algorithm Vol. 170, 31–48 (Springer, Berlin, Heidelberg, 2005).

Wu, C. et al. A federated graph neural network framework for privacy-preserving personalization. Nat. Commun. 13, 3091 (2022).

Article

Google Scholar

Liu, J., Lou, J., Xiong, L., Liu, J. & Meng, X. Projected federated averaging with heterogeneous differential privacy. Proc. VLDB Endow. 15, 828–840 (2021).

Article

Google Scholar

Wang, L. et al. Enhancing federated learning with in-cloud unlabeled data. In Proc. IEEE 38th International Conference on Data Engineering (ICDE) 136–149 (IEEE, 2022).

Lin, T. et al. Ensemble distillation for robust model fusion in federated learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) (2020).

Li, Q. et al. Practical one-shot federated learning for cross-silo setting. Preprint at (2020).

Shao, J., Wu, F. & Zhang, J. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nat. Commun. 15, 349 (2024).

Article

Google Scholar

Park, J. et al. Sageflow: robust federated learning against both stragglers and adversaries. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) (2021).

Xie, C. et al. Zeno++: Robust fully asynchronous SGD. In Proceedings of the 37th International Conference on Machine Learning (eds III, H. D. & Singh, A.) 10495–10503 (PMLR, 2020).

Huang, K., Xiao, C., Hoang, T. N., Glass, L. M. & Sun, J. CASTER: predicting drug interactions with chemical substructure representation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 702–709 (2020).

Li, Y. et al. An adaptive graph learning method for automated molecular interactions and properties predictions. Nat. Mach. Intell. 4, 645–651 (2022).

Article

Google Scholar

Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).

Article

Google Scholar

Zhang, X., Kang, Y., Chen, K., Fan, L. & Yang, Q. Trading off privacy, utility and efficiency in federated learning. In ACM Trans. Intell. Syst. Technol. 14, 1–32 (2023).

Cai, H., Zhang, H., Zhao, D., Wu, J. & Wang, L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 23, bbac408 (2022).

Article

Google Scholar

Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).

Article

Google Scholar

Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

Article

Google Scholar

Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025).

McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

Farayola, O. A. et al. Data privacy and security in it: a review of techniques and challenges. Comput. Sci. IT Res. J. 5, 606–615 (2024).

Article

Google Scholar

Weber, R. H. Internet of things—new security and privacy challenges. Comput. Law Secur. Rev. 26, 23–30 (2010).

Article

Google Scholar

Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017).

Liu, L. et al. GEM-2: next generation molecular property prediction network by modeling full-range many-body interactions. Preprint at (2022).

Hussain, M. S., Zaki, M. J. & Subramanian, D. Triplet interaction improves graph transformers: accurate molecular graph learning with triplet graph transformers. Preprint at (2024).

Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

Article

Google Scholar

Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

Article

Google Scholar

Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962).

Article
MathSciNet

Google Scholar

Li, P. et al. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief. Bioinform. 22, bbaa266 (2021).

Article

Google Scholar

Gao, W., Tang, Z., Zhao, J. & Chelikowsky, J. R. Efficient full-frequency GW calculations using a Lanczos method. Phys. Rev. Lett. 132, 126402 (2024).

Article
MathSciNet

Google Scholar

Ma, W., Lou, Q., Kazemi, A., Faraone, J. & Afzal, T. Super efficient neural network for compression artifacts reduction and super resolution. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 460–468 (2024).

Wang, S., Zhang, Z. & Zhang, T. Improved analyses of the randomized power method and block Lanczos method. Preprint at (2015).

Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 115–123 (PMLR, 2013).

Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).

Article

Google Scholar

Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) (2019).

Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at (2019).

Zhang, L. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Zenodo (2025).