Skip to content
2000
Volume 20, Issue 8
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Introduction

Ubiquitylation, a key post-translational modification (PTM), has significant influences on the structures, activities, and functions of proteins and is linked to various diseases. Traditional experimental identification and characterization methods for identifying ubiquitylation sites (Ubsites) are time-consuming, expensive, and labor-intensive if prior knowledge concerning ubiquitylation is absent. Nevertheless, most methods reported for predictions of Ubsites are based on traditional machine learning. Owing to the increased availability of genomic and proteomic samples, deep learning-based recognition methods for Ubsites are becoming increasingly popular.

Methods

In this study, we propose a new feature extraction method, pKcode, based on only seven physicochemical features of amino acids (AAs). The pKcode captures both the biochemical context and precise sequence locations of AAs around the Ubsites, improving the predictive capability for ubiquitination. We created the pKPAP encoding scheme by integrating the pKcode with PSDAAP, AAC, and PWAA, resulting in an all-encompassing feature representation. Concurrently, we developed the PKE-Ubsite model.

Results

PKE-Ubsite model, a new ensemble prediction framework, amalgamates the power of classifiers in five pipelines: three bidirectional long short-term memory (BiLSTM) networks, one convolutional neural network (CNN), and one random forest (RF) classifier. Each classifier uses an optimized combination of encoding features, and an integrated classification is achieved through a voting mechanism.

Conclusion

Finally, compared with existing models on an independent test set, our model has an accuracy of 0.8368, an F1-score of 0.8430, a precision of 0.8124, a recall of 0.8760, and an AUC of 0.9103, which are superior to all methods reported to date. Overall, PKE-Ubsite may facilitate a thorough understanding of ubiquitylation.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936347236241119045342
2025-01-15
2026-02-20
Loading full text...

Full text loading...

References

  1. WalshC.T. Garneau-TsodikovaS. GattoG.J. Protein posttranslational modifications: The chemistry of proteome diversifications.Angew. Chem. Int. Ed.200544457342737210.1002/anie.200501023 16267872
    [Google Scholar]
  2. MengL. ChanW.S. HuangL. Mini-review: Recent advances in post-translational modification site prediction based on deep learning.Comput. Struct. Biotechnol. J.2022203522353210.1016/j.csbj.2022.06.045 35860402
    [Google Scholar]
  3. AudagnottoM. Dal PeraroM. Protein post-translational modifications: In silico prediction tools and molecular modeling.Comput. Struct. Biotechnol. J.20171530731910.1016/j.csbj.2017.03.004 28458782
    [Google Scholar]
  4. HuangG. LiJ. Feature extractions for computationally predicting protein post- translational modifications.Curr. Bioinform.201813438739510.2174/1574893612666170707094916
    [Google Scholar]
  5. CuiX. WangJ. LiK. LvB. HouB. DingZ. Protein post-translational modifications in auxin signaling.J. Genet. Genomics202451327929110.1016/j.jgg.2023.07.002 37451336
    [Google Scholar]
  6. JacominA.C. DikicI. Membrane remodeling via ubiquitin-mediated pathways.Cell Chem. Biol.20243191627163510.1016/j.chembiol.2024.08.007 39303699
    [Google Scholar]
  7. LinY. JiangS. SuJ. Novel insights into the role of ubiquitination in osteoarthritis.Int. Immunopharmacol.202413211202610.1016/j.intimp.2024.112026 38583240
    [Google Scholar]
  8. WuX. DuY. LiangL.J. Structure-guided engineering enables E3 ligase-free and versatile protein ubiquitination via UBE2E1.Nat. Commun.2024151126610.1038/s41467‑024‑45635‑y 38341401
    [Google Scholar]
  9. LiuF. ChenJ. LiK. Ubiquitination and deubiquitination in cancer: From mechanisms to novel therapeutic approaches.Mol. Cancer202423114810.1186/s12943‑024‑02046‑3 39048965
    [Google Scholar]
  10. ZhangS. HuN. YuF. Insights into a functional model of key deubiquitinases UBP12/13 in plants.New Phytol.2024242242443010.1111/nph.19639 38406992
    [Google Scholar]
  11. AkizukiY. KaypeeS. OhtakeF. IkedaF. The emerging roles of non-canonical ubiquitination in proteostasis and beyond.J. Cell Biol.20242235e20231117110.1083/jcb.202311171 38517379
    [Google Scholar]
  12. WeiZ. SuL. GaoS. The roles of ubiquitination in AML.Ann. Hematol.202410393413342810.1007/s00277‑023‑05415‑y 37603061
    [Google Scholar]
  13. ChenY. XueH. JinJ. Applications of protein ubiquitylation and deubiquitylation in drug discovery.J. Biol. Chem.2024300510726410.1016/j.jbc.2024.107264 38582446
    [Google Scholar]
  14. SahuI. ZhuH. BuhrlageS.J. MartoJ.A. Proteomic approaches to study ubiquitinomics.Biochim. Biophys. Acta. Gene Regul. Mech.20231866219494010.1016/j.bbagrm.2023.194940 37121501
    [Google Scholar]
  15. HuaZ. Deciphering the protein ubiquitylation system in plants.J. Exp. Bot.202374216487650410.1093/jxb/erad354 37688404
    [Google Scholar]
  16. ShaziaF.U.M. UllahF.U.M. RhoS. LeeM.Y. Predictive modeling for ubiquitin proteins through advanced machine learning technique.Heliyon20241012e3251710.1016/j.heliyon.2024.e32517 38975176
    [Google Scholar]
  17. LiuY. LiD. ZhangX. A protein sequence-based deep transfer learning framework for identifying human proteome-wide deubiquitinase-substrate interactions.Nat. Commun.2024151451910.1038/s41467‑024‑48446‑3 38806474
    [Google Scholar]
  18. LiW. ChenN. WangJ. Species-specific model based on sequence and structural information for ubiquitination sites prediction.J. Mol. Biol.20244362216878110.1016/j.jmb.2024.168781 39245319
    [Google Scholar]
  19. DenisN.J. VasilescuJ. LambertJ.P. SmithJ.C. FigeysD. Tryptic digestion of ubiquitin standards reveals an improved strategy for identifying ubiquitinated proteins by mass spectrometry.Proteomics20077686887410.1002/pmic.200600410 17370265
    [Google Scholar]
  20. DenisonC. KirkpatrickD.S. GygiS.P. Proteomic insights into ubiquitin and ubiquitin-like proteins.Curr. Opin. Chem. Biol.200591697510.1016/j.cbpa.2004.10.010 15701456
    [Google Scholar]
  21. SylvestersenK.B. YoungC. NielsenM.L. Advances in characterizing ubiquitylation sites by mass spectrometry.Curr. Opin. Chem. Biol.2013171495810.1016/j.cbpa.2012.12.009 23298953
    [Google Scholar]
  22. BonidiaR.P. SampaioL.D.H. DominguesD.S. Feature extraction approaches for biological sequences: A comparative study of mathematical features.Brief. Bioinform.2021225bbab01110.1093/bib/bbab011 33585910
    [Google Scholar]
  23. DiptaS.R. TaherzadehG. AhmadM.D.W. ArafatM.D.E. ShatabdaS. DehzangiA. SEMal: Accurate protein malonylation site predictor using structural and evolutionary information.Comput. Biol. Med.202012510402210.1016/j.compbiomed.2020.104022 33022522
    [Google Scholar]
  24. ChenX. QiuJ.D. ShiS.P. SuoS.B. HuangS.Y. LiangR.P. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites.Bioinformatics201329131614162210.1093/bioinformatics/btt196 23626001
    [Google Scholar]
  25. TungC.W. HoS.Y. Computational identification of ubiquitylation sites from protein sequences.BMC Bioinformatics20089131010.1186/1471‑2105‑9‑310 18625080
    [Google Scholar]
  26. RadivojacP. VacicV. HaynesC. Identification, analysis, and prediction of protein ubiquitination sites.Proteins201078236538010.1002/prot.22555 19722269
    [Google Scholar]
  27. ZhaoX. LiX. MaZ. YinM. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.Int. J. Mol. Sci.201112128347836110.3390/ijms12128347 22272076
    [Google Scholar]
  28. LeeT.Y. ChenS.A. HungH.Y. OuY.Y. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.PLoS One201163e1733110.1371/journal.pone.0017331 21408064
    [Google Scholar]
  29. CaiY. HuangT. HuL. ShiX. XieL. LiY. Prediction of lysine ubiquitination with mRMR feature selection and analysis.Amino Acids20124241387139510.1007/s00726‑011‑0835‑0 21267749
    [Google Scholar]
  30. ChenZ. ChenY.Z. WangX.F. WangC. YanR.X. ZhangZ. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs.PLoS One201167e2293010.1371/journal.pone.0022930 21829559
    [Google Scholar]
  31. ChenZ. ZhouY. SongJ. ZhangZ. hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties.Biochim. Biophys. Acta. Proteins Proteomics2013183481461146710.1016/j.bbapap.2013.04.006 23603789
    [Google Scholar]
  32. HuangC.H. SuM.G. KaoH.J. JhongJ.H. WengS.L. LeeT.Y. UbiSite: Incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.BMC Syst. Biol.201610Suppl. 1S610.1186/s12918‑015‑0246‑z 26818456
    [Google Scholar]
  33. WangJ.R. HuangW.L. TsaiM.J. HsuK.T. HuangH.L. HoS.Y. ESA-UbiSite: Accurate prediction of human ubiquitination sites by identifying a set of effective negatives.Bioinformatics201733566166810.1093/bioinformatics/btw701 28062441
    [Google Scholar]
  34. NguyenV.N. HuangK.Y. HuangC.H. LaiK.R. LeeT.Y. A new scheme to characterize and identify protein ubiquitination sites.IEEE/ACM Trans. Comput. Biol. Bioinformatics201714239340310.1109/TCBB.2016.2520939 26887002
    [Google Scholar]
  35. HasanM.A.M. AhmadS. mLysPTMpred: Multiple lysine ptm site prediction using combination of svm with resolving data imbalance issue.Nat. Sci. (Irvine Calif.)201810937038410.4236/ns.2018.109035
    [Google Scholar]
  36. CuiX. YuZ. YuB. WangM. TianB. MaQ. UbiSitePred: A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou’s pseudo components.Chemom. Intell. Lab. Syst.201918415284310.1016/j.chemolab.2018.11.012
    [Google Scholar]
  37. WangW. ZhangY. LiuD. ZhangH. WangX. ZhouY. PseAraUbi: Predicting arabidopsis ubiquitination sites by incorporating the physico-chemical and structural features.Plant Mol. Biol.20221101-2819210.1007/s11103‑022‑01288‑3 35773617
    [Google Scholar]
  38. QiuW.R. XiaoX. LinW.Z. ChouK.C. iUbiq-Lys: Prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model.J. Biomol. Struct. Dyn.20153381731174210.1080/07391102.2014.968875 25248923
    [Google Scholar]
  39. MosharafM.P. HassanM.M. AhmedF.F. KhatunM.S. MoniM.A. MollahM.N.H. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana.Comput. Biol. Chem.20208510723810.1016/j.compbiolchem.2020.107238 32114285
    [Google Scholar]
  40. LiY. XieP. LuL. An integrated bioinformatics platform for investigating the human E3 ubiquitin ligase-substrate interaction network.Nat. Commun.20178134710.1038/s41467‑017‑00299‑9 28839186
    [Google Scholar]
  41. HeF. WangR. LiJ. BaoL. XuD. ZhaoX. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture.BMC Syst. Biol.201812Suppl. 610910.1186/s12918‑018‑0628‑0 30463553
    [Google Scholar]
  42. FuH. YangY. WangX. WangH. XuY. DeepUbi: A deep learning framework for prediction of ubiquitination sites in proteins.BMC Bioinformatics20192018610.1186/s12859‑019‑2677‑9 30777029
    [Google Scholar]
  43. WangH. WangZ. LiZ. LeeT.Y. Incorporating deep learning with word embedding to identify plant ubiquitylation sites.Front. Cell Dev. Biol.2020857219510.3389/fcell.2020.572195 33102477
    [Google Scholar]
  44. SirajA. LimD.Y. TayaraH. ChongK.T. UbiComb: A hybrid deep learning model for predicting plant-specific protein ubiquitylation sites.Genes (Basel)202112571710.3390/genes12050717 34064731
    [Google Scholar]
  45. LuoY. JiangJ. ZhuJ. A Caps-UBI model for protein ubiquitination site prediction.Front Plant Sci20221388490310.3389/fpls.2022.884903 35693166
    [Google Scholar]
  46. WangC. TanX. TangD. GPS-Uber: A hybrid-learning framework for prediction of general and E3-specific lysine ubiquitination sites.Brief. Bioinform.2022232bbab57410.1093/bib/bbab574 35037020
    [Google Scholar]
  47. WangX. ZhangZ. LiuC. iACP-DFSRA: Identification of anticancer peptides based on a dual-channel fusion strategy of ResCNN and Attention.J. Mol. Biol.20244362216881010.1016/j.jmb.2024.168810 39362624
    [Google Scholar]
  48. ZhuY. LiF. GuoX. TIMER is a Siamese neural network-based framework for identifying both general and species-specific bacterial promoters.Brief. Bioinform.2023244bbad20910.1093/bib/bbad209 37291763
    [Google Scholar]
  49. ChenJ. ZhaoJ. YangS. ChenZ. ZhangZ. Prediction of protein ubiquitination sites in Arabidopsis thaliana.Curr. Bioinform.201914761462010.2174/1574893614666190311141647
    [Google Scholar]
  50. YinS. ZhengJ. JiaC. ZouQ. LinZ. ShiH. UPFPSR: A ubiquitylation predictor for plant through combining sequence information and random forest.Math. Biosci. Eng.202219177579110.3934/mbe.2022035 34903012
    [Google Scholar]
  51. XuH. ZhouJ. LinS. DengW. ZhangY. XueY. PLMD: An updated data resource of protein lysine modifications.J. Genet. Genomics201744524325010.1016/j.jgg.2017.03.007 28529077
    [Google Scholar]
  52. LiW. GodzikA. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.Bioinformatics200622131658165910.1093/bioinformatics/btl158 16731699
    [Google Scholar]
  53. HassM.A.S. MulderF.A.A. Contemporary NMR studies of protein electrostatics.Annu. Rev. Biophys.201544537510.1146/annurev‑biophys‑083012‑130351
    [Google Scholar]
  54. ChouK.C. Some remarks on protein attribute prediction and pseudo amino acid composition.J. Theor. Biol.2011273123624710.1016/j.jtbi.2010.12.024 21168420
    [Google Scholar]
  55. ShiS.P. QiuJ.D. SunX.Y. SuoS.B. HuangS.Y. LiangR.P. PMeS: Prediction of methylation sites based on enhanced feature encoding scheme.PLoS One201276e3877210.1371/journal.pone.0038772 22719939
    [Google Scholar]
  56. XuY. WenX. ShaoX.J. DengN.Y. ChouK.C. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition.Int. J. Mol. Sci.20141557594761010.3390/ijms15057594 24857907
    [Google Scholar]
  57. KawashimaS PokarowskiP PokarowskaM KolinskiA KatayamaT KanehisaM. AAindex: Amino acid index database, progress report 2008. Nucleic Acids Res 200736DatabaseD202510.1093/nar/gkm99817998252
    [Google Scholar]
  58. ChenZ. ZhaoP. LiC. iLearnPlus: A comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.Nucleic Acids Res.20214910e60e010.1093/nar/gkab122 33660783
    [Google Scholar]
  59. ShaoJ. XuD. TsaiS.N. WangY. NgaiS.M. NgaiS.M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction.PLoS One200943e492010.1371/journal.pone.0004920 19290060
    [Google Scholar]
  60. LiB.Q. HuL.L. NiuS. CaiY.D. ChouK.C. Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches.J. Proteomics20127551654166510.1016/j.jprot.2011.12.003 22178444
    [Google Scholar]
  61. PourmirzaeiM. RamaziS. EsmailiF. ShojaeilangariS. AllahvardiA. Machine learning-based approaches for ubiquitination site prediction in human proteins.BMC Bioinformatics202324144910.1186/s12859‑023‑05581‑w 38017391
    [Google Scholar]
  62. LiX. YuanZ. ChenY. UbNiRF: A hybrid framework based on null importances and random forest that combines multiple features to predict ubiquitination sites in Arabidopsis thaliana and Homo sapiens.Frontiers in Bioscience-Landmark202429519710.31083/j.fbl2905197 38812315
    [Google Scholar]
  63. ChenZ. HeN. HuangY. QinW.T. LiuX. LiL. Integration of a deep learning classifier with a random forest approach for predicting malonylation sites.Genom Proteom Bioinform201816645145910.1016/j.gpb.2018.08.004 30639696
    [Google Scholar]
  64. SorkhiA.G. PirgaziJ. GhasemiV. A hybrid feature extraction scheme for efficient malonylation site prediction.Sci. Rep.2022121575610.1038/s41598‑022‑08555‑9 35388017
    [Google Scholar]
  65. AL-barakati H, Thapa N, Hiroto S, Roy K, Newman RH, Kc D. RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites.Comput. Struct. Biotechnol. J.20201885286010.1016/j.csbj.2020.02.012 32322367
    [Google Scholar]
  66. AhmadMW ArafatME TaherzadehG Mal-Light: Enhancing lysine malonylation sites prediction problem using evolutionary- based features. IEEE Access 202087788890210.1109/ACCESS.2020.298971333354488
    [Google Scholar]
  67. LiuX. WangL. LiJ. HuJ. ZhangX. Mal-Prec: Computational prediction of protein Malonylation sites via machine learning based feature integration.BMC Genomics202021181210.1186/s12864‑020‑07166‑w 33225896
    [Google Scholar]
  68. HochreiterS. SchmidhuberJ. Long short-term memory.Neural Comput.1997981735178010.1162/neco.1997.9.8.1735 9377276
    [Google Scholar]
  69. TaoS. LiY. GaoF. FanH. DongJ. GanY. Multi-scale spatial features and temporal attention mechanisms: Advancing the accuracy of ENSO prediction.Intell Mar Technol Sys202421710.1007/s44295‑023‑00017‑w
    [Google Scholar]
  70. HuF. LiW. LiY. HouC. MaJ. JiaC. O-GlcNAcPRED-DL: Prediction of protein O-GlcNAcylation sites based on an ensemble model of deep learning.J. Proteome Res.20242319510610.1021/acs.jproteome.3c00458 38054441
    [Google Scholar]
  71. LiC. ZouQ. JiaC. ZhengJ. AMPpred-MFA: An interpretable antimicrobial peptide predictor with a stacking architecture, multiple features, and multihead attention.J. Chem. Inf. Model.20246472393240410.1021/acs.jcim.3c01017 37799091
    [Google Scholar]
  72. ZhouS. ZhouY. LiuT. ZhengJ. JiaC. PredLLPS_PSSM: A novel predictor for liquid–liquid protein separation identification based on evolutionary information and a deep neural network.Brief. Bioinform.2023245bbad29910.1093/bib/bbad299 37609923
    [Google Scholar]
  73. FawcettT. An introduction to ROC analysis.Pattern Recognit. Lett.200627886187410.1016/j.patrec.2005.10.010
    [Google Scholar]
  74. VacicV. IakouchevaL.M. RadivojacP. Two Sample Logo: A graphical representation of the differences between two sets of sequence alignments.Bioinformatics200622121536153710.1093/bioinformatics/btl151 16632492
    [Google Scholar]
  75. SaravananV. GauthamN. Harnessing computational biology for exact linear B-Cell epitope prediction: A novel amino acid composition-based feature descriptor.OMICS2015191064865810.1089/omi.2015.0095 26406767
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936347236241119045342
Loading
/content/journals/cbio/10.2174/0115748936347236241119045342
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test