Skip to content
2000
Volume 21, Issue 6
  • ISSN: 1570-1646
  • E-ISSN: 1875-6247

Abstract

Background

Identifying disease-protein associations is a key step in treating disease, understanding pathomechanisms, and developing drugs. Although experimental methods can be used to identify disease-protein associations, they are often time-consuming, laborious, and expensive. Therefore, there is a strong need to develop theoretical computational methods to identify potential disease-protein associations.

Objective

This work aimed to study the effect of the graph embedding algorithm and reliable negative sample screening methods on predicting disease-protein association.

Methods

In our study, information on disease similarity, disease-protein association, and protein-protein interaction was used to construct a heterogeneous network, including protein-protein interaction subnetwork, disease similarity subnetwork, and disease-protein association subnetwork. Then, a graph embedding algorithm was utilized to obtain network node features to characterize the disease-protein relationships. The support vector data description algorithm was applied to screen the reliable negative samples. Finally, random forest algorithm was employed to construct a model for identifying potential disease-protein associations.

Results

The present method achieved an accuracy of 94.55%, a specificity of 98.49%, a precision of 98.36%, a Matthew's correlation coefficient of 0.8938, an area under the receiver operating characteristic curve of 0.9815, and an area under the precision-recall curve of 0.9591, based on a constructed benchmark dataset and a 10-fold cross-validation test. Results from a series of non-redundant datasets and an independent test dataset showed our method to be robust for data redundancy and that it can accurately identify disease-related proteins, protein-related diseases, and potential disease-protein associations. Based on the constructed model, the large-scale prediction study identified more than 1.7 million potential disease-protein association pairs with a probability greater than 99%. The top five predicted disease-protein association pairs were further confirmed by literature and molecular docking simulations.

Conclusion

Extensive experimental results showed that the proposed method can effectively identify potential disease-protein associations. It is expected that the current method can help not only in understanding disease mechanisms at the protein level, but also in discovering new protein targets and potential small molecule drugs.

Loading

Article metrics loading...

/content/journals/cp/10.2174/0115701646352055250219101309
2025-02-24
2025-10-30
Loading full text...

Full text loading...

References

  1. SchrimlL.M. MunroJ.B. SchorM. OlleyD. McCrackenC. FelixV. BaronJ.A. JacksonR. BelloS.M. BearerC. LichensteinR. BisordiK. DialoN.C. GiglioM. GreeneC. The human disease ontology 2022 update.Nucleic Acids Res.202250D1D1255D126110.1093/nar/gkab1063 34755882
    [Google Scholar]
  2. ZhouY. ZhangY. LianX. LiF. WangC. ZhuF. QiuY. ChenY. Therapeutic target database update 2022: Facilitating drug discovery with enriched comparative data of targeted agents.Nucleic Acids Res.202250D1D1398D140710.1093/nar/gkab953 34718717
    [Google Scholar]
  3. YangL. ZhaoX. TangX. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.Int. J. Biol. Sci.201410767768810.7150/ijbs.8430 25013377
    [Google Scholar]
  4. LiW. ChenL. HeW. LiW. QuX. LiangB. GaoQ. FengC. JiaX. LvY. ZhangS. LiX. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on “guilt by association” analysis.PLoS One201388e7119110.1371/journal.pone.0071191 23940716
    [Google Scholar]
  5. XuL. LiangG. LiaoC. ChenG.D. ChangC.C. K-Skip-n-Gram-RF: A random forest based method for Alzheimer’s disease protein identification.Front. Genet.2019103310.3389/fgene.2019.00033 30809242
    [Google Scholar]
  6. ZhaoT. HuY. ZangT. ChengL. Identifying Alzheimer’s disease-related proteins by LRRGD.BMC Bioinformat.201920Suppl. 1857010.1186/s12859‑019‑3124‑7 31760934
    [Google Scholar]
  7. YuX. LaiS. ChenH. ChenM. Protein–protein interaction network with machine learning models and multiomics data reveal potential neurodegenerative disease-related proteins.Hum. Mol. Genet.20202981378138710.1093/hmg/ddaa065 32277755
    [Google Scholar]
  8. DasB. MitraP. Protein interaction network-based deep learning framework for identifying disease-associated human proteins.J. Mol. Biol.20214331916714910.1016/j.jmb.2021.167149 34271012
    [Google Scholar]
  9. SurataneeA. PlaimasK. Reverse nearest neighbor search on a protein-protein interaction network to infer protein-disease associations.Bioinform. Biol. Insights201711117793221772040510.1177/1177932217720405 28757797
    [Google Scholar]
  10. ZhangH. XuR. DingM. ZhangY. Prediction of gastric cancer-related proteins based on graph fusion method.Front. Cell Dev. Biol.2021973971510.3389/fcell.2021.739715 34790662
    [Google Scholar]
  11. LiJ. WangL. GuoM. ZhangR. DaiQ. LiuX. WangC. TengZ. XuanP. ZhangM. Mining disease genes using integrated protein–protein interaction and gene–gene co-regulation information.FEBS Open Bio20155125125610.1016/j.fob.2015.03.011 25870785
    [Google Scholar]
  12. ChangM. AhnJ. KangB.G. YoonS. Cross-modal embedding integrator for disease-gene/protein association prediction using a multi-head attention mechanism.Pharmacol. Res. Perspect.2024126e7003410.1002/prp2.70034 39560053
    [Google Scholar]
  13. BoizardF. Buffin-MeyerB. AligonJ. TesteO. SchanstraJ.P. KleinJ. PRYNT: A tool for prioritization of disease candidates from proteomics data using a combination of shortest-path and random walk algorithms.Sci. Rep.2021111576410.1038/s41598‑021‑85135‑3 33707596
    [Google Scholar]
  14. ZhangS.W. ShaoD.D. ZhangS.Y. WangY.B. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.Mol. Biosyst.20141061400140810.1039/C3MB70588A 24695957
    [Google Scholar]
  15. ZhangJ. SuoY. LiuM. XuX. Identification of genes related to proliferative diabetic retinopathy through RWR algorithm based on protein–protein interaction network.Biochim. Biophys. Acta Mol. Basis Dis.2018186462369237510.1016/j.bbadis.2017.11.017 29237571
    [Google Scholar]
  16. ErtenS. BebekG. KoyutürkM. Vavien: An algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks.J. Comput. Biol.201118111561157410.1089/cmb.2011.0154 22035267
    [Google Scholar]
  17. ThummadiN.B. VindalT.M. PrioritizingP.M. Prioritizing the candidate genes related to cervical cancer using the moment of inertia tensor.Proteins202290236337110.1002/prot.26226
    [Google Scholar]
  18. PerozziB. Ai-RfouR. SkienaS. DeepWalk: Oneline learning of social representations.Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data miningNew York, USA201470171010.1145/2623330.2623732
    [Google Scholar]
  19. TaxD.M.J. DuinR.P.W. Support vector data description.Mach. Learn.2004541456610.1023/B:MACH.0000008084.60811.49
    [Google Scholar]
  20. BreimanL. Random forests.Mach. Learn.200145153210.1023/A:1010933404324
    [Google Scholar]
  21. Alanis-LobatoG. Andrade-NavarroM.A. SchaeferM.H. HIPPIE v2.0: Enhancing meaningfulness and reliability of protein–protein interaction networks.Nucl. Acids Res.201745D1D408D41410.1093/nar/gkw985 27794551
    [Google Scholar]
  22. BatemanA. MartinM-J. OrchardS. MagraneM. AhmadS. AlpiE. Bowler-BarnettE.H. BrittoR. Bye-A-JeeH. CukuraA. DennyP. DoganT. EbenezerT.G. FanJ. GarmiriP. GonzalesD.C.L.J. Hatton-EllisE. HusseinA. IgnatchenkoA. InsanaG. IshtiaqR. JoshiV. JyothiD. KandasaamyS. LockA. LucianiA. LugaricM. LuoJ. LussiY. MacDougallA. MadeiraF. MahmoudyM. MishraA. MoulangK. NightingaleA. PundirS. QiG. RajS. RaposoP. RiceD.L. SaidiR. SantosR. SperettaE. StephensonJ. TotooP. TurnerE. TyagiN. VasudevP. WarnerK. WatkinsX. ZaruR. ZellnerH. BridgeA.J. AimoL. Argoud-PuyG. AuchinclossA.H. AxelsenK.B. BansalP. BaratinD. NetoB.T.M. BlatterM-C. BollemanJ.T. BoutetE. BreuzaL. GilB.C. Casals-CasasC. EchioukhK.C. CoudertE. CucheB. CastroD.E. EstreicherA. FamigliettiM.L. FeuermannM. GasteigerE. GaudetP. GehantS. GerritsenV. GosA. GruazN. HuloC. Hyka-NouspikelN. JungoF. KerhornouA. MercierL.P. LieberherrD. MassonP. MorgatA. MuthukrishnanV. PaesanoS. PedruzziI. PilboutS. PourcelL. PouxS. PozzatoM. PruessM. RedaschiN. RivoireC. SigristC.J.A. SonessonK. SundaramS. WuC.H. ArighiC.N. ArminskiL. ChenC. ChenY. HuangH. LaihoK. McGarveyP. NataleD.A. RossK. VinayakaC.R. WangQ. WangY. ZhangJ. UniProt: The universal protein knowledgebase in 2023.Nucl. Acids Res.202351D1D523D53110.1093/nar/gkac1052 36408920
    [Google Scholar]
  23. ColettiM.H. BleichH.L. Medical subject headings used to search the biomedical literature.J. Am. Med. Inform. Assoc.20018431732310.1136/jamia.2001.0080317 11418538
    [Google Scholar]
  24. CoutoF.M. LamuriasA. Semantic similarity definition. In: Encyclopedia of bioinformatics and computational biology. RanganathanS. GribskovM. NakaiK. SchonbachC. Cambridge, MassachusettsElsevier Press201987087610.1016/B978‑0‑12‑809633‑8.20401‑9
    [Google Scholar]
  25. ResnikP. Using information content to evaluate semantic similarity in a taxonomy.Proceedings of the 14th interactional joint conference on artificial intelligence1995 Aug 20-25 San Francisco, USA1995
    [Google Scholar]
  26. LinD. An information-theoretic definition of similarity.Proceedings of the 15th International Conference On Machine Learning19981995 vol. 1448453
    [Google Scholar]
  27. JiangJ.J. ConrathD.W. Semantic similarity based on corpus statistics and lexical taxonomy.Proceedings of international conference research on computational linguistics1997Taipei, Taiwan19971933
    [Google Scholar]
  28. AmbergerJS Hamosh, A searching online mendelian inheritance in man (OMIM): A knowledgebase of human genes and genetic phenotypes.Curr. Protoc. Bioinformat.2017581.2.11.2.1210.1002/cpbi.27
    [Google Scholar]
  29. MikolovT. CorradoG.S. ChenK. DeanJ. Efficient estimation of word representations in vector space.arXiv:1301.3781201337811610.48550/arXiv.1301.3781
    [Google Scholar]
  30. JinT. DaiH. CaoL. ZhangB. HuangF. GaoY. JiR. Deepwalk-aware graph convolutional networks.Sci. China Inf. Sci.202265515210410.1007/s11432‑020‑3318‑5
    [Google Scholar]
  31. ZouH.T. JiB.Y. XieX.L. A multi-source molecular network representation model for protein–protein interactions prediction.Sci. Rep.2024141618410.1038/s41598‑024‑56286‑w 38485942
    [Google Scholar]
  32. LiG. LuoJ. WangD. LiangC. XiaoQ. DingP. ChenH. Potential circRNA-disease association prediction using DeepWalk and network consistency projection.J. Biomed. Inform.202011210362410.1016/j.jbi.2020.103624 33217543
    [Google Scholar]
  33. RozemberczkiB. KissO. SarkarR. Karate Club: An API oriented open-source python framework for unsupervised learning on graphs.Proceedings of the 29th ACM International Conference on Information & Knowledge Management20203125313210.1145/3340531.3412757
    [Google Scholar]
  34. LiuF.T. TingK.M. ZhouZ.H. Isolation forest.Proceedings of the 2008 Eighth IEEE International Conference on Data Mining200841342210.1109/ICDM.2008.17
    [Google Scholar]
  35. Cabello-AguilarS. VendrellJ. Goethem, CV ifCNV: A novel isolation-forest-based package to detect copy-number variations from various targeted NGS datasets.Mol. Ther. Nucl. Acids20223017418310.1016/j.omtn.2022.09.009
    [Google Scholar]
  36. SunD.L. FevotteC. Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence.IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE201410.1109/ICASSP.2014.6854796
    [Google Scholar]
  37. GroverA. LeskovecJ. node2vec: Scalable feature learning for networks.KDD2016201685586410.1145/2939672.2939754
    [Google Scholar]
  38. AhmedN.K. RossiR.A. LeeJ.B. WillkeT.L. ZhouR. KongX. EldardiryH. Role-based graph embeddings.IEEE Trans. Knowl. Data Eng.20223452401241510.1109/TKDE.2020.3006475
    [Google Scholar]
  39. HanY. ZhangS.W. ncRPI-LGAT: Prediction of ncRNA-protein interactions with line graph attention network framework.Comput. Struct. Biotechnol. J.2023212286229510.1016/j.csbj.2023.03.027 37035546
    [Google Scholar]
  40. WongL. YouZ.H. GuoZ.H. YiH.C. ChenZ.H. CaoM.Y. MIPDH: A novel computational model for predicting microRNA-mRNA interactions by deepwalk on a heterogeneous network.ACS Omega2020528170221703210.1021/acsomega.9b04195 32715187
    [Google Scholar]
  41. KanehisaM. FurumichiM. SatoY. KawashimaM. Ishiguro-WatanabeM. KEGG for taxonomy-based analysis of pathways and genomes.Nucleic Acids Res.202351D1D587D59210.1093/nar/gkac963 36300620
    [Google Scholar]
  42. DavisA.P. GrondinC.J. JohnsonR.J. SciakyD. WiegersJ. WiegersT.C. MattinglyC.J. Comparative toxicogenomics database (CTD): Update 2021.Nucleic Acids Res.202149D1D1138D114310.1093/nar/gkaa891 33068428
    [Google Scholar]
  43. DeviG.R. EzhilarasanD. Concurrent administration of farnesol protects acetaminophen-induced acute hepatic necrosis in mice.J. Biochem. Mol. Toxicol.20233711e2347810.1002/jbt.23478 37458150
    [Google Scholar]
  44. HewettJ.A. JeanP.A. KunkelS.L. RothR.A. Relationship between tumor necrosis factor-alpha and neutrophils in endotoxin-induced liver injury.Am. J. Physiol. Gastrointest. Liver Physiol.19932656G1011G101510.1152/ajpgi.1993.265.6.G1011 8279551
    [Google Scholar]
  45. IjiriY. KatoR. SadamatsuM. TakanoM. OkadaY. TanakaK. HayashiT. Chronological changes in circulating levels of soluble tumor necrosis factor receptors 1 and 2 in rats with carbon tetrachloride-induced liver injury.Toxicology2014316556010.1016/j.tox.2013.12.004 24389507
    [Google Scholar]
  46. YanT. WangH. ZhaoM. YagaiT. ChaiY. KrauszK.W. XieC. ChengX. ZhangJ. CheY. LiF. WuY. BrockerC.N. GonzalezF.J. WangG. HaoH. Glycyrrhizin protects against acetaminophen-induced acute liver injury via alleviating tumor necrosis factor α-mediated apoptosis.Drug Metab. Dispos.201644572073110.1124/dmd.116.069419 26965985
    [Google Scholar]
  47. ZhaoY. WangC. WangC. HongX. MiaoJ. LiaoY. ZhouL. LiuY. An essential role for Wnt/β-catenin signaling in mediating hypertensive heart disease.Sci. Rep.201881899610.1038/s41598‑018‑27064‑2 29895976
    [Google Scholar]
  48. ZhengQ. ChenP. XuZ. LiF. YiX.P. Expression and redistribution of β-catenin in the cardiac myocytes of left ventricle of spontaneously hypertensive rat.J. Mol. Histol.201344556557310.1007/s10735‑013‑9507‑6 23591738
    [Google Scholar]
  49. MethathamT. TomidaS. KimuraN. ImaiY. AizawaK. Inhibition of the canonical Wnt signaling pathway by a β-catenin/CBP inhibitor prevents heart failure by ameliorating cardiac hypertrophy and fibrosis.Sci. Rep.20211111488610.1038/s41598‑021‑94169‑6 34290289
    [Google Scholar]
  50. GayathiriE. PrakashP. PratheepT. RamasubburayanR. ThirumalaivasanN. GaurA. GovindasamyR. RengasamyK.R.R. Bio surfactants from lactic acid bacteria: An in-depth analysis of therapeutic properties and food formulation.Crit. Rev. Food Sci. Nutr.20246430109251094910.1080/10408398.2023.2230491 37401803
    [Google Scholar]
  51. WishartD.S. FeunangY.D. GuoA.C. LoE.J. MarcuA. GrantJ.R. SajedT. JohnsonD. LiC. SayeedaZ. AssempourN. IynkkaranI. LiuY. MaciejewskiA. GaleN. WilsonA. ChinL. CummingsR. LeD. PonA. KnoxC. WilsonM. DrugBank 5.0: A major update to the drugbank database for 2018.Nucleic Acids Res.201846D1D1074D108210.1093/nar/gkx1037 29126136
    [Google Scholar]
  52. O’BoyleN.M. BanckM. JamesC.A. MorleyC. VandermeerschT. HutchisonG.R. Open babel: An open chemical toolbox.J. Cheminform.2011313310.1186/1758‑2946‑3‑33 21982300
    [Google Scholar]
  53. JumperJ. EvansR. PritzelA. GreenT. FigurnovM. RonnebergerO. TunyasuvunakoolK. BatesR. ŽídekA. PotapenkoA. BridglandA. MeyerC. KohlS.A.A. BallardA.J. CowieA. Romera-ParedesB. NikolovS. JainR. AdlerJ. BackT. PetersenS. ReimanD. ClancyE. ZielinskiM. SteineggerM. PacholskaM. BerghammerT. BodensteinS. SilverD. VinyalsO. SeniorA.W. KavukcuogluK. KohliP. HassabisD. Highly accurate protein structure prediction with AlphaFold.Nature2021596787358358910.1038/s41586‑021‑03819‑2 34265844
    [Google Scholar]
  54. ForliS. HueyR. PiqueM.E. SannerM.F. GoodsellD.S. OlsonA.J. Computational protein–ligand docking and virtual drug screening with the AutoDock suite.Nat. Protoc.201611590591910.1038/nprot.2016.051 27077332
    [Google Scholar]
  55. EberhardtJ. Santos-MartinsD. TillackA.F. ForliS. AutoDock Vina 1.2.0: New docking methods, expanded force filed, and python bindings.J. Chem. Inf. Model.20216183891389810.1021/acs.jcim.1c00203 34278794
    [Google Scholar]
/content/journals/cp/10.2174/0115701646352055250219101309
Loading
/content/journals/cp/10.2174/0115701646352055250219101309
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test