Skip to content
2000
Volume 32, Issue 28
  • ISSN: 0929-8673
  • E-ISSN: 1875-533X

Abstract

Structure-based drug discovery methods, such as molecular docking and virtual screening, have become invaluable tools in developing novel drugs. At the core of these methods are Scoring Functions (SFs), which predict the binding affinity between ligands and protein targets. This study aims to review and contextualize the challenges and best practices in training novel scoring functions to improve their accuracy and generalizability in predicting protein-ligand binding affinities. Effective training of scoring functions requires careful attention to the quality of training data and methodologies. We emphasize the need for robust training strategies to produce consistent and generalizable SFs. Key considerations include addressing hidden biases and overfitting in machine-learning models, as well as ensuring the use of high-quality, unbiased datasets for both training and evaluation of SFs. Innovative hybrid methods, combining the advantages of empirical and machine-learning approaches, hold promise for outperforming current scoring functions while displaying greater generalizability and versatility.

Loading

Article metrics loading...

/content/journals/cmc/10.2174/0109298673334469241017053508
2024-10-30
2025-10-05
Loading full text...

Full text loading...

References

  1. BenderA. BojanicD. DaviesJ.W. CrismanT.J. MikhailovD. ScheiberJ. JenkinsJ.L. DengZ. HillW.A. PopovM. JacobyE. GlickM. Which aspects of HTS are empirically correlated with downstream success?Curr. Opin. Drug Discov. Devel.200811332733718428086
    [Google Scholar]
  2. SadybekovA.V. KatritchV. Computational approaches streamlining drug discovery.Nature2023616795867368510.1038/s41586‑023‑05905‑z37100941
    [Google Scholar]
  3. CongreveM. de GraafC. SwainN.A. TateC.G. Impact of GPCR Structures on Drug Discovery.Cell20201811819110.1016/j.cell.2020.03.00332243800
    [Google Scholar]
  4. JumperJ. EvansR. PritzelA. GreenT. FigurnovM. RonnebergerO. TunyasuvunakoolK. BatesR. ŽídekA. PotapenkoA. BridglandA. MeyerC. KohlS.A.A. BallardA.J. CowieA. Romera-ParedesB. NikolovS. JainR. AdlerJ. BackT. PetersenS. ReimanD. ClancyE. ZielinskiM. SteineggerM. PacholskaM. BerghammerT. BodensteinS. SilverD. VinyalsO. SeniorA.W. KavukcuogluK. KohliP. HassabisD. Highly accurate protein structure prediction with AlphaFold.Nature2021596787358358910.1038/s41586‑021‑03819‑234265844
    [Google Scholar]
  5. HolcombM. ChangY.T. GoodsellD.S. ForliS. Evaluation of AlphaFold2 structures as docking targets.Protein Sci.2023321e453010.1002/pro.453036479776
    [Google Scholar]
  6. LiuJ. WangR. Classification of current scoring functions.J. Chem. Inf. Model.201555347548210.1021/ci500731a25647463
    [Google Scholar]
  7. SliwoskiG. KothiwaleS. MeilerJ. LoweE.W. Computational methods in drug discovery.Pharmacol Rev.201466133439510.1124/pr.112.007336
    [Google Scholar]
  8. BenderB.J. GahbauerS. LuttensA. LyuJ. WebbC.M. SteinR.M. FinkE.A. BaliusT.E. CarlssonJ. IrwinJ.J. ShoichetB.K. A practical guide to large-scale docking.Nat. Protoc.202116104799483210.1038/s41596‑021‑00597‑z34561691
    [Google Scholar]
  9. WeinerS.J. KollmanP.A. CaseD.A. SinghU.C. GhioC. AlagonaG. ProfetaS. WeinerP. A new force field for molecular mechanical simulation of nucleic acids and proteins.J. Am. Chem. Soc.1984106376578410.1021/ja00315a051
    [Google Scholar]
  10. GuedesI.A. BarretoA.M.S. MarinhoD. KrempserE. KuenemannM.A. SperandioO. DardenneL.E. MitevaM.A. New machine learning and physics-based scoring functions for drug discovery.Sci. Rep.2021111319810.1038/s41598‑021‑82410‑133542326
    [Google Scholar]
  11. DiasR. de AzevedoW.Jr Molecular docking algorithms.Curr. Drug Targets20089121040104710.2174/13894500878694943219128213
    [Google Scholar]
  12. YadavaU. Search algorithms and scoring methods in protein-ligand docking.Endocrinol. Metabo. Int. J.20186635936710.15406/emij.2018.06.00212
    [Google Scholar]
  13. HalperinI. MaB. WolfsonH. NussinovR. Principles of docking: An overview of search algorithms and a guide to scoring functions.Proteins200247440944310.1002/prot.1011512001221
    [Google Scholar]
  14. GoodsellD.S. OlsonA.J. Automated docking of substrates to proteins by simulated annealing.Proteins19908319520210.1002/prot.3400803022281083
    [Google Scholar]
  15. JonesG. WillettP. GlenR.C. LeachA.R. TaylorR. Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. Cohen.J. Mol. Biol.1997267372774810.1006/jmbi.1996.08979126849
    [Google Scholar]
  16. DesJarlaisR.L. SheridanR.P. SeibelG.L. DixonJ.S. KuntzI.D. VenkataraghavanR. Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure.J. Med. Chem.198831472272910.1021/jm00399a0063127588
    [Google Scholar]
  17. ScrimaS. TibertiM. RydeU. LambrughiM. PapaleoE. Comparison of force fields to study the zinc-finger containing protein NPL4, a target for disulfiram in cancer therapy.Biochim. Biophys. Acta. BBA202318714140921
    [Google Scholar]
  18. Varela-RialA. MajewskiM. De FabritiisG. Structure based virtual screening: Fast and slow.Wiley Interdiscip. Rev. Comput. Mol. Sci.2022122e154410.1002/wcms.1544
    [Google Scholar]
  19. SipplM.J. Boltzmann’s principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures.J. Comput. Aided Mol. Des.19937447350110.1007/BF023375628229096
    [Google Scholar]
  20. VelecH.F.G. GohlkeH. KlebeG. DrugScore(CSD)- knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction.J. Med. Chem.200548206296630310.1021/jm050436v16190756
    [Google Scholar]
  21. GohlkeH. HendlichM. KlebeG. Knowledge-based scoring function to predict protein-ligand interactions.J. Mol. Biol.2000295233735610.1006/jmbi.1999.337110623530
    [Google Scholar]
  22. NeudertG. KlebeG. DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes.J. Chem. Inf. Model.201151102731274510.1021/ci200274q21863864
    [Google Scholar]
  23. MueggeI. MartinY.C. A general and fast scoring function for protein-ligand interactions: a simplified potential approach.J. Med. Chem.199942579180410.1021/jm980536j10072678
    [Google Scholar]
  24. HsiehJ.H. YinS. WangX.S. LiuS. DokholyanN.V. TropshaA. Cheminformatics meets molecular mechanics: a combined application of knowledge-based pose scoring and physical force field-based hit scoring functions improves the accuracy of structure-based virtual screening.J. Chem. Inf. Model.2012521162810.1021/ci200250722017385
    [Google Scholar]
  25. DiasR. Macedo TimmersL.F. CaceresR. de AzevedoW.Jr. Evaluation of molecular docking using polynomial empirical scoring functions.Curr. Drug Targets20089121062107010.2174/13894500878694945019128216
    [Google Scholar]
  26. BöhmH.J. The computer program LUDI: A new method for the de novo design of enzyme inhibitors.J. Comput. Aided Mol. Des.199261617810.1007/BF001243871583540
    [Google Scholar]
  27. EldridgeM.D. MurrayC.W. AutonT.R. PaoliniG.V. MeeR.P. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes.J. Comput. Aided Mol. Des.199711542544510.1023/A:10079961245459385547
    [Google Scholar]
  28. WangR. LaiL. WangS. Further development and validation of empirical scoring functions for structure-based binding affinity prediction.J. Comput. Aided Mol. Des.2002161112610.1023/A:101635781188212197663
    [Google Scholar]
  29. FriesnerR.A. BanksJ.L. MurphyR.B. HalgrenT.A. KlicicJ.J. MainzD.T. RepaskyM.P. KnollE.H. ShelleyM. PerryJ.K. ShawD.E. FrancisP. ShenkinP.S. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy.J. Med. Chem.20044771739174910.1021/jm030643015027865
    [Google Scholar]
  30. TrottO. OlsonA.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J. Comput. Chem.201031245546110.1002/jcc.2133419499576
    [Google Scholar]
  31. QuirogaR. VillarrealM.A. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening.PLOS ONE2016115e0155183
    [Google Scholar]
  32. LiH. LeungK.S. WongM.H. idock: A multithreaded virtual screening tool for flexible ligand docking.2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 2012, pp. 77-84.10.1109/CIBCB.2012.6217214
    [Google Scholar]
  33. Bitencourt-FerreiraG. VillarrealM.A. QuirogaR. BiziukovaN. PoroikovV. TarasovaO. de Azevedo JuniorW.F. Exploring scoring function space: Developing computational models for drug discovery.Curr. Med. Chem.202431172361237710.2174/092986733066623032110373136944627
    [Google Scholar]
  34. IrwinJ.J. RaushelF.M. ShoichetB.K. Virtual screening against metalloenzymes for inhibitors and substrates.Biochemistry20054437123161232810.1021/bi050801k16156645
    [Google Scholar]
  35. PottelJ. TherrienE. GleasonJ.L. MoitessierN. Docking ligands into flexible and solvated macromolecules. 6. Development and application to the docking of HDACs and other zinc metalloenzymes inhibitors.J. Chem. Inf. Model.201454125426510.1021/ci400550m24364808
    [Google Scholar]
  36. García-SosaA.T. Hydration properties of ligands and drugs in protein binding sites: tightly-bound, bridging water molecules and their effects and consequences on molecular design strategies.J. Chem. Inf. Model.20135361388140510.1021/ci300578623662606
    [Google Scholar]
  37. van DijkA.D.J. BonvinA.M.J.J. Solvated docking: introducing water into the modelling of biomolecular complexes.Bioinformatics200622192340234710.1093/bioinformatics/btl39516899489
    [Google Scholar]
  38. ForliS. OlsonA.J. A force field with discrete displaceable waters and desolvation entropy for hydrated ligand docking.J. Med. Chem.201255262363810.1021/jm200514522148468
    [Google Scholar]
  39. EberhardtJ. Santos-MartinsD. TillackA.F. ForliS. AutoDock vina 1.2.0: New docking methods, expanded force field, and python bindings.J. Chem. Inf. Model.20216183891389810.1021/acs.jcim.1c0020334278794
    [Google Scholar]
  40. Bitencourt-FerreiraG. De AzevedoW.F. Exploring the scoring function space.Docking Screens for Drug Discovery.Springer201910.1007/978‑1‑4939‑9752‑7_17
    [Google Scholar]
  41. MeliR. MorrisG.M. BigginP.C. Scoring functions for protein-ligand binding affinity prediction using structure-based deep learning: A review.Front. Bioinform.2022288598310.3389/fbinf.2022.88598336187180
    [Google Scholar]
  42. BallesterP.J. MitchellJ.B.O. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking.Bioinformatics20102691169117510.1093/bioinformatics/btq11220236947
    [Google Scholar]
  43. ZilianD. SotrifferC.A. SFCscore( RF ): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes.J. Chem. Inf. Model.20135381923193310.1021/ci400120b23705795
    [Google Scholar]
  44. ZhangX. ShenC. JiangD. ZhangJ. YeQ. XuL. HouT. PanP. KangY. TB-IECS: an accurate machine learning-based scoring function for virtual screening.J. Cheminform.20231516310.1186/s13321‑023‑00731‑x37403155
    [Google Scholar]
  45. StaffordK.A. AndersonB.M. SorensonJ. van den BedemH. AtomNet PoseRanker: Enriching ligand pose quality for dynamic proteins in virtual high-throughput screens.J. Chem. Inf. Model.20226251178118910.1021/acs.jcim.1c0125035235748
    [Google Scholar]
  46. RagozaM. HochuliJ. IdroboE. SunseriJ. KoesD.R. Protein–ligand scoring with convolutional neural networks.J. Chem. Inf. Model.201757494295710.1021/acs.jcim.6b0074028368587
    [Google Scholar]
  47. VolkovM. TurkJ.A. DrizardN. MartinN. HoffmannB. Gaston-MathéY. RognanD. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks.J. Med. Chem.202265117946795810.1021/acs.jmedchem.2c0048735608179
    [Google Scholar]
  48. ChenL. CruzA. RamseyS. DicksonC.J. DucaJ.S. HornakV. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.Plos one2019148e022011310.1371/journal.pone.0220113
    [Google Scholar]
  49. MorrisC.J. SternJ.A. StarkB. ChristophersonM. Della CorteD. MILCDock: Machine learning enhanced consensus docking for virtual screening in drug discovery.J. Chem. Inf. Model.202262225342535010.1021/acs.jcim.2c0070536342217
    [Google Scholar]
  50. SiegJ. FlachsenbergF. RareyM. In need of bias control: Evaluating chemical data for machine learning in structure-based virtual screening.J. Chem. Inf. Model.201959394796110.1021/acs.jcim.8b0071230835112
    [Google Scholar]
  51. AshtawyH.M. MahapatraN.R. Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment.J. Chem. Inf. Model.201858111913310.1021/acs.jcim.7b0030929190087
    [Google Scholar]
  52. GabelJ. DesaphyJ. RognanD. Beware of machine learning-based scoring functions-on the danger of developing black boxes.J. Chem. Inf. Model.201454102807281510.1021/ci500406k25207678
    [Google Scholar]
  53. Méndez-LucioO. AhmadM. del Rio-ChanonaE.A. WegnerJ.K. A geometric deep learning approach to predict binding conformations of bioactive molecules.Nat. Mach. Intell.20213121033103910.1038/s42256‑021‑00409‑9
    [Google Scholar]
  54. StarkH. GaneaO.E. PattanaikL. BarzilayR. JaakkolaT. EQUIBIND: Geometric deep learning for drug binding structure prediction.arXiv: 2202.051462023
    [Google Scholar]
  55. LuW. WuQ. ZhangJ. RaoJ. LiC. ZhengS. TANKBind: trigonometry-aware neural networks for drug-protein binding structure prediction.Proceedings of the 36th International Conference on Neural Information Processing Systems 2022, New Orleans, LA, USA, pp. 1-14.
    [Google Scholar]
  56. CorsoG. StarkH. JingB. BarzilayR. JaakkolaT. DiffDock: Diffusion steps, twists, and turns for molecular docking.2023Available from:https://openreview.net/forum?id=kKF8_K-mBbS(accessed on 2-10-2024)
  57. ZhouG. GaoZ. DingQ. ZhengH. XuH. WeiZ. Uni-mol: A universal 3d molecular representation learning framework.2023Available from:https://openreview.net/forum?id=6K2RM6wVqKu(accessed on 2-10-2024)
  58. ButtenschoenM. MorrisG.M. DeaneC.M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences.Chem. Sci. (Camb.)20241593130313910.1039/D3SC04185A38425520
    [Google Scholar]
  59. BrocidiaconoM. PopovK.I. KoesD.R. TropshaA. PLANTAIN: Diffusion-inspired pose score minimization for fast and accurate molecular dockingarxiv. 2307.120902023Available from: http://arxiv.org/abs/2307.12090
    [Google Scholar]
  60. AlcaideE. GaoZ. KeG. LiY. ZhangL. ZhengH. Uni-Mol docking V2: Towards realistic and accurate binding pose prediction.arxiv. 2405.117692024
    [Google Scholar]
  61. CorsoG. DengA. FryB. PolizziN. BarzilayR. JaakkolaT. Deep confident Steps to new pockets: Strategies for docking generalization.arxiv. 2402.183962024
    [Google Scholar]
  62. AbramsonJ. AdlerJ. DungerJ. EvansR. GreenT. PritzelA. RonnebergerO. WillmoreL. BallardA.J. BambrickJ. BodensteinS.W. EvansD.A. HungC.C. O’NeillM. ReimanD. TunyasuvunakoolK. WuZ. ŽemgulytėA. ArvanitiE. BeattieC. BertolliO. BridglandA. CherepanovA. CongreveM. Cowen-RiversA.I. CowieA. FigurnovM. FuchsF.B. GladmanH. JainR. KhanY.A. LowC.M.R. PerlinK. PotapenkoA. SavyP. SinghS. SteculaA. ThillaisundaramA. TongC. YakneenS. ZhongE.D. ZielinskiM. ŽídekA. BapstV. KohliP. JaderbergM. HassabisD. JumperJ.M. Accurate structure prediction of biomolecular interactions with AlphaFold 3.Nature2024630801649350010.1038/s41586‑024‑07487‑w38718835
    [Google Scholar]
  63. XueM. LiuB. CaoS. HuangX. FeatureDock: Protein-ligand docking guided by physicochemical feature-based local environment learning using transformerchemrxiv202410.26434/chemrxiv‑2024‑dh2rw
    [Google Scholar]
  64. ZhangZ. HeX. LongD. LuoG. ChenS. Enhancing generalizability and performance in drug–target interaction identification by integrating pharmacophore and pre- trained models.Bioinformatics202440Suppl. 1i539i54710.1093/bioinformatics/btae24038940179
    [Google Scholar]
  65. de MagalhãesC.S. AlmeidaD.M. BarbosaH.J.C. DardenneL.E. A dynamic niching genetic algorithm strategy for docking highly flexible ligands.Inf. Sci.201428920622410.1016/j.ins.2014.08.002
    [Google Scholar]
  66. DebroiseT. ShakhnovichE.I. ChéronN. A hybrid knowledge-based and empirical scoring function for protein–ligand interaction: SMoG2016.J. Chem. Inf. Model.201757358459310.1021/acs.jcim.6b0061028191941
    [Google Scholar]
  67. BaekM. ShinW.H. ChungH.W. SeokC. GalaxyDock BP2 score: a hybrid scoring function for accurate protein–ligand docking.J. Comput. Aided Mol. Des.201731765366610.1007/s10822‑017‑0030‑928623486
    [Google Scholar]
  68. LiY. LinH. YangH. YuanY. ZouR. ZhouG. Synergistic application of molecular docking and machine learning for improved binding pose.Natl Sci Open202432023005810.1360/nso/20230058
    [Google Scholar]
  69. WangR. FangX. LuY. YangC.Y. WangS. The PDBbind database: methodologies and updates.J. Med. Chem.200548124111411910.1021/jm048957q15943484
    [Google Scholar]
  70. ChengT. LiX. LiY. LiuZ. WangR. Comparative assessment of scoring functions on a diverse test set.J. Chem. Inf. Model.20094941079109310.1021/ci900005319358517
    [Google Scholar]
  71. MysingerM.M. CarchiaM. IrwinJ.J. ShoichetB.K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking.J. Med. Chem.201255146582659410.1021/jm300687e22716043
    [Google Scholar]
  72. SuM. YangQ. DuY. FengG. LiuZ. LiY. WangR. Comparative assessment of scoring functions: The CASF-2016 update.J. Chem. Inf. Model.201959289591310.1021/acs.jcim.8b0054530481020
    [Google Scholar]
  73. LagardeN. ZaguryJ.F. MontesM. Benchmarking data sets for the evaluation of virtual ligand screening methods: Review and perspectives.J. Chem. Inf. Model.20155571297130710.1021/acs.jcim.5b0009026038804
    [Google Scholar]
  74. RohrerS.G. BaumannK. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.J. Chem. Inf. Model.200949216918410.1021/ci800264919434821
    [Google Scholar]
  75. Tran-NguyenV.K. JacquemardC. RognanD. LIT-PCBA: An unbiased data set for machine learning and virtual screening.J. Chem. Inf. Model.20206094263427310.1021/acs.jcim.0c0015532282202
    [Google Scholar]
  76. ScantleburyJ. BrownN. Von DelftF. DeaneC.M. Data set augmentation allows deep learning-based virtual screening to better generalize to unseen target classes and highlight important binding interactions.J. Chem. Inf. Model.20206083722373010.1021/acs.jcim.0c0026332701288
    [Google Scholar]
  77. ZhangX. ShenC. LiaoB. JiangD. WangJ. WuZ. DuH. WangT. HuoW. XuL. CaoD. HsiehC.Y. HouT. TocoDecoy: A new approach to design unbiased datasets for training and benchmarking machine-learning scoring functions.J. Med. Chem.202265117918793210.1021/acs.jmedchem.2c0046035642777
    [Google Scholar]
  78. ImrieF. BradleyA.R. DeaneC.M. Generating property-matched decoy molecules using deep learning.Bioinformatics202137152134214110.1093/bioinformatics/btab080
    [Google Scholar]
  79. LiY. HanL. LiuZ. WangR. Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results.J. Chem. Inf. Model.20145461717173610.1021/ci500081m24708446
    [Google Scholar]
  80. LeungS. BodkinM. Von DelftF. BrennanP. MorrisG. SuCOS is better than RMSD for evaluating fragment elaboration and docking poses.chemrxiv 8100203201910.26434/chemrxiv.8100203.v1
    [Google Scholar]
  81. WójcikowskiM. BallesterP.J. SiedleckiP. Performance of machine-learning scoring functions in structure-based virtual screening.Sci. Rep.2017714671010.1038/srep4671028440302
    [Google Scholar]
  82. McGibbonM. Money-KyrleS. BlayV. HoustonD.R. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.J. Adv. Res.20234613514710.1016/j.jare.2022.07.00135901959
    [Google Scholar]
  83. AdeshinaY.O. DeedsE.J. KaranicolasJ. Machine learning classification can reduce false positives in structure-based virtual screening.Proc. Natl. Acad. Sci. USA202011731184771848810.1073/pnas.200058511732669436
    [Google Scholar]
/content/journals/cmc/10.2174/0109298673334469241017053508
Loading
/content/journals/cmc/10.2174/0109298673334469241017053508
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test