Skip to content
2000
image of Prediction of Interleukin-binding Sites Combining Multi-Source Features with Integrated Algorithm

Abstract

Introduction

Interleukins (ILs) are important immune cytokines involved in immune regulation, inflammatory responses, and metabolic control. They are closely associated with various diseases, such as rheumatoid arthritis, atherosclerosis, diabetes, and asthma. However, the specific binding mechanisms of interleukins remain unclear. Studying the binding mechanisms between proteins and interleukins can help to understand the functions of interleukins, disease pathogenesis, and the development of new drugs. This study aims to systematically analyze the characteristics of interleukin family binding sites, uncover their shared features and specific mechanisms, provide new perspectives for understanding their functional roles in ligand-receptor interactions, and elucidate the potential impact of binding sites on signal transduction and immune responses.

Methods

We constructed a dataset containing both binding and non-binding sites. Extracted eight features based on the sequence, structure, and functional information of the proteins. Six machine learning algorithms, along with an integrated algorithm, were used to predict these features.

Results

We found that among the machine learning algorithms, the prediction performance using energy features was the best, achieving the highest accuracy (ACC) and area under the ROC curve (AUC). Further feature fusion and ensemble algorithm models significantly improved the predictive performance, with a maximum accuracy (ACC) of 98.4% and an ensemble algorithm accuracy of up to 99.2%.

Discussion

This study outperforms existing methods, achieving an MCC score of 0.984 with the Gradient Boosting algorithm. However, the limitations of a small sample size and dataset imbalance highlight the need for future research to collect larger and more diverse datasets to improve the model's generalization ability and predictive accuracy. Future studies will aim to verify our method's applicability and develop an online prediction tool to assist in studying small molecule drugs, antibodies, and interleukin binding sites, supporting targeted drug design and treatment of immune-related diseases.

Conclusion

This study demonstrates that the developed predictive model for interleukin binding sites effectively utilizes geometric and biochemical features, validating the SMOTETomek sampling method in enhancing model performance and providing a basis for targeted drug design and understanding immune response mechanisms.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936372515250801173324
2025-08-28
2026-01-02
Loading full text...

Full text loading...

References

  1. Carlini V. Noonan D.M. Abdalalem E. The multifaceted nature of IL-10: Regulation, role in immunological homeostasis and its relevance to cancer, COVID-19 and post-COVID conditions. Front. Immunol. 2023 14 1161067 10.3389/fimmu.2023.1161067 37359549
    [Google Scholar]
  2. Garlanda C. Dinarello C.A. Mantovani A. The interleukin-1 family: Back to the future. Immunity 2013 39 6 1003 1018 10.1016/j.immuni.2013.11.010 24332029
    [Google Scholar]
  3. Rose-John S. Interleukin-6 family cytokines. Cold Spring Harb. Perspect. Biol. 2018 10 2 a028415 10.1101/cshperspect.a028415 28620096
    [Google Scholar]
  4. Rutz S. Ouyang W. Regulation of interleukin-10 expression. Adv. Exp. Med. Biol. 2016 941 89 116 10.1007/978‑94‑024‑0921‑5_5 27734410
    [Google Scholar]
  5. Bernstein Z.J. Shenoy A. Chen A. Heller N.M. Spangler J.B. Engineering the IL ‐4/IL ‐13 axis for targeted immune modulation. Immunol. Rev. 2023 320 1 29 57 10.1111/imr.13230 37283511
    [Google Scholar]
  6. Damoiseaux J. The IL-2 – IL-2 receptor pathway in health and disease: The role of the soluble IL-2 receptor. Clin. Immunol. 2020 218 108515 10.1016/j.clim.2020.108515 32619646
    [Google Scholar]
  7. Paul W.E. History of interleukin-4. Cytokine 2015 75 1 3 7 10.1016/j.cyto.2015.01.038 25814340
    [Google Scholar]
  8. Winer H. Rodrigues G.O.L. Hixon J.A. IL-7: Comprehensive review. Cytokine 2022 160 156049 10.1016/j.cyto.2022.156049 36201890
    [Google Scholar]
  9. Mirlekar B. Pylayeva-Gupta Y. IL-12 family cytokines in cancer and immunotherapy. Cancers 2021 13 2 167 10.3390/cancers13020167 33418929
    [Google Scholar]
  10. Ullrich K A M. Schulze L.L. Paap E.M. Müller T.M. Neurath M.F. Zundler S. Immunology of IL-12: An update on functional activities and implications for disease. EXCLI J. 2020 19 1563 1589 33408595
    [Google Scholar]
  11. Waldmann T.A. Dubois S. Miljkovic M.D. Conlon K.C. IL-15 in the combination immunotherapy of cancer. Front. Immunol. 2020 11 868 10.3389/fimmu.2020.00868 32508818
    [Google Scholar]
  12. Watson J.L. Juergens D. Bennett N.R. De novo design of protein structure and function with RFdiffusion. Nature 2023 620 7976 1089 1100 10.1038/s41586‑023‑06415‑8 37433327
    [Google Scholar]
  13. Shan Y. Mysore V.P. Leffler A.E. Kim E.T. Sagawa S. Shaw D.E. How does a small molecule bind at a cryptic binding site? PLOS Comput. Biol. 2022 18 3 e1009817 10.1371/journal.pcbi.1009817 35239648
    [Google Scholar]
  14. Yang J. Yan R. Roy A. Xu D. Poisson J. Zhang Y. The I-TASSER Suite: Protein structure and function prediction. Nat. Methods 2015 12 1 7 8 10.1038/nmeth.3213 25549265
    [Google Scholar]
  15. Ogawa N. Ohta M. Ikeguchi M. Conformational selectivity of ITK inhibitors: Insights from molecular dynamics simulations. J. Chem. Inf. Model. 2023 63 24 7860 7872 10.1021/acs.jcim.3c01352 38069816
    [Google Scholar]
  16. Zhang C. Zhang X. Freddolino L. Zhang Y. BioLiP2: An updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 2024 52 D1 D404 D412 10.1093/nar/gkad630 37522378
    [Google Scholar]
  17. Wei H. Wang W. Peng Z. Yang J. Q-biolip: A comprehensive resource for quaternary structure-based protein–ligand interactions. Genomics Proteomics Bioinformatics 2024 22 1 qzae001 10.1093/gpbjnl/qzae001 38862427
    [Google Scholar]
  18. Chandra M.A. Bedi S.S. Survey on SVM and their application in image classification. Int. J. Inf. Technol. 2021 13 5 1 11 10.1007/s41870‑017‑0080‑1
    [Google Scholar]
  19. Cervantes J. Garcia-Lamont F. Rodríguez-Mazahua L. Lopez A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020 408 189 215 10.1016/j.neucom.2019.10.118
    [Google Scholar]
  20. Ghaddar B. Naoum-Sawaya J. High dimensional data classification and feature selection using support vector machines. Eur. J. Oper. Res. 2018 265 3 993 1004 10.1016/j.ejor.2017.08.040
    [Google Scholar]
  21. Zhang S. Challenges in KNN Classification. IEEE Trans. Knowl. Data Eng. 2022 34 10 4663 4675 10.1109/TKDE.2021.3049250
    [Google Scholar]
  22. Ukey N. Yang Z. Li B. Zhang G. Hu Y. Zhang W. Survey on exact knn queries over high-dimensional data space. Sensors 2023 23 2 629 10.3390/s23020629 36679422
    [Google Scholar]
  23. Wang Q.Q. Yu S.C. Qi X. Overview of logistic regression model analysis and application. Chin. J. Prev. Med 2019 53 9 955 960 31474082
    [Google Scholar]
  24. Starbuck C. Logistic regression[M]//The fundamentals of people analytics: With applications in R. Cham: Springer International Publishing 2023 223 238
    [Google Scholar]
  25. Walsh E.S. Kreakie B.J. Cantwell M.G. Nacci D. A Random Forest approach to predict the spatial distribution of sediment pollution in an estuarine system. PLoS One 2017 12 7 e0179473 10.1371/journal.pone.0179473 28738089
    [Google Scholar]
  26. Yang L. Wu H. Jin X. Study of cardiovascular disease prediction model based on random forest in eastern China. Sci. Rep. 2020 10 1 5245 10.1038/s41598‑020‑62133‑5 32251324
    [Google Scholar]
  27. Bentéjac C. Csörgő A. Martínez-Muñoz G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021 54 3 1937 1967 10.1007/s10462‑020‑09896‑5
    [Google Scholar]
  28. Wade C. Glynn K. Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python. Packt Publishing Ltd 2020
    [Google Scholar]
  29. Duan T. Anand A. Ding D.Y. NGBoost: Natural gradient boosting for probabilistic prediction. Proceedings of the 37th International Conference on Machine Learning. PMLR, 13-18 Jul 2020 119 2690 2700
    [Google Scholar]
  30. Shi T. Liu Y. Zheng X. Recent advances in plant disease severity assessment using convolutional neural networks. Sci. Rep. 2023 13 1 2336 10.1038/s41598‑023‑29230‑7 36759626
    [Google Scholar]
  31. Kattenborn T. Leitloff J. Schiefer F. Hinz S. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS J. Photogramm. Remote Sens. 2021 173 24 49 10.1016/j.isprsjprs.2020.12.010
    [Google Scholar]
  32. Alzubaidi L. Zhang J. Humaidi A.J. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021 8 1 53 10.1186/s40537‑021‑00444‑8 33816053
    [Google Scholar]
  33. Kondratenko Y. Korobeynikov A. Lapidus A. CDSnake: Snake make pipeline for retrieval of annotated OTUs from paired-end reads using CD-HIT utilities. BMC Bioinformatics 2020 21 1 7 10.1186/s12859‑020‑03591‑6
    [Google Scholar]
  34. Zou Q. Lin G. Jiang X. Liu X. Zeng X. Sequence clustering in bioinformatics: An empirical study. Brief. Bioinform. 2020 21 1 1 10 30239587
    [Google Scholar]
  35. Zeng M. Zhang F. Wu F.X. Li Y. Wang J. Li M. Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 2020 36 4 1114 1120 10.1093/bioinformatics/btz699 31593229
    [Google Scholar]
  36. Wardah W. Dehzangi A. Taherzadeh G. Predicting protein-peptide binding sites with a deep convolutional neural network. J. Theor. Biol. 2020 496 110278 10.1016/j.jtbi.2020.110278 32298689
    [Google Scholar]
  37. Wang S. Hu X. Feng Z. Liu L. Sun K. Xu S. Recognition of ion ligand binding sites based on amino acid features with the fusion of energy, physicochemical and structural features. Curr. Pharm. Des. 2021 27 8 1093 1102 10.2174/1381612826666201029100636 33121402
    [Google Scholar]
  38. Clark J.J. Benson M.L. Smith R.D. Carlson H.A. Inherent versus induced protein flexibility: Comparisons within and between apo and holo structures. PLOS Comput. Biol. 2019 15 1 e1006705 10.1371/journal.pcbi.1006705 30699115
    [Google Scholar]
  39. Cui Q. Zhang A. Li R. Wang X. Sun L. Jiang L. Ultrasonic treatment affects emulsifying properties and molecular flexibility of soybean protein isolate-glucose conjugates. Food Biosci. 2020 38 100747 10.1016/j.fbio.2020.100747
    [Google Scholar]
  40. Wu H. Zhang Y. Chen W. Mu Z. Comparative analysis of protein primary sequences with graph energy. Physica A 2015 437 249 262 10.1016/j.physa.2015.04.017
    [Google Scholar]
  41. Jankauskaitė J. Jiménez-García B. Dapkūnas J. Fernández-Recio J. Moal I.H. SKEMPI 2.0: An updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019 35 3 462 469 10.1093/bioinformatics/bty635 30020414
    [Google Scholar]
  42. Pal A. Levy Y. Structure, stability and specificity of the binding of ssDNA and ssRNA with proteins. PLOS Comput. Biol. 2019 15 4 e1006768 10.1371/journal.pcbi.1006768 30933978
    [Google Scholar]
  43. Lv Z. Jin S. Ding H. Zou Q. A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front. Bioeng. Biotechnol. 2019 7 215 10.3389/fbioe.2019.00215 31552241
    [Google Scholar]
  44. Zhang J. Liu B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform. 2019 14 3 190 199 10.2174/1574893614666181212102749
    [Google Scholar]
  45. Huang A. Lu F. Liu F. Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor. Front. Microbiol. 2023 14 1130594 10.3389/fmicb.2023.1130594 36860491
    [Google Scholar]
  46. Chen C. Zhang Q. Yu B. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med. 2020 123 103899 10.1016/j.compbiomed.2020.103899 32768046
    [Google Scholar]
  47. Fu H. Liang Y. Zhong X. Codon optimization with deep learning to enhance protein expression. Sci. Rep. 2020 10 1 17617 10.1038/s41598‑020‑74091‑z 33077783
    [Google Scholar]
  48. Moss M.J. Chamness L.M. Clark P.L. The effects of codon usage on protein structure and folding. Annu. Rev. Biophys. 2024 53 1 87 108 10.1146/annurev‑biophys‑030722‑020555 38134335
    [Google Scholar]
  49. Liu Y. A code within the genetic code: Codon usage regulates co-translational protein folding. Cell Commun. Signal. 2020 18 1 145 10.1186/s12964‑020‑00642‑6 32907610
    [Google Scholar]
  50. Soleymani F. Paquet E. Viktor H. Michalowski W. Spinello D. Protein–protein interaction prediction with deep learning: A comprehensive review. Comput. Struct. Biotechnol. J. 2022 20 5316 5341 10.1016/j.csbj.2022.08.070 36212542
    [Google Scholar]
  51. Siebenmorgen T. Zacharias M. Computational prediction of protein–protein binding affinities. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2020 10 3 e1448 10.1002/wcms.1448
    [Google Scholar]
  52. Larson M.G. Analysis of Variance. Circulation 2008 117 1 115 121 10.1161/CIRCULATIONAHA.107.654335 18172051
    [Google Scholar]
  53. Sun C. Feng Y. EPDRNA: A model for identifying DNA–RNA binding sites in disease-related proteins. Protein J. 2024 43 3 513 521 10.1007/s10930‑024‑10183‑3 38491248
    [Google Scholar]
  54. Liu Y. Gong W. Yang Z. Li C. SNB‐PSSM: A spatial neighbor‐based PSSM used for protein–RNA binding site prediction. J. Mol. Recognit. 2021 34 6 e2887 10.1002/jmr.2887 33442949
    [Google Scholar]
  55. Qian L. Jiang Y. Xuan Y.Y. Yuan C. SiQiao T. PsePSSM-based prediction for the protein-ATP binding sites. Curr. Bioinform. 2021 16 4 576 582 10.2174/1574893615999200918183543
    [Google Scholar]
  56. Soudy M. Anwar A.M. Ahmed E.A. UniprotR: Retrieving and visualizing protein sequence and functional information from Universal Protein Resource (UniProt knowledgebase). J. Proteomics 2020 213 103613 10.1016/j.jprot.2019.103613 31843688
    [Google Scholar]
  57. Bateman A. Martin M-J. Orchard S. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res. 2023 51 D1 D523 D531 10.1093/nar/gkac1052 36408920
    [Google Scholar]
  58. Coordinators N.R. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2016 44 D1 D7 D19 10.1093/nar/gkv1290 26615191
    [Google Scholar]
  59. Steinegger M. Salzberg S.L. Terminating contamination: Large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 2020 21 1 115 10.1186/s13059‑020‑02023‑1 32398145
    [Google Scholar]
  60. Meng F Kurgan L Computational prediction of protein secondary structure from sequence. Curr Protoc Protein Sci 2016 86 2.3.1 2.3.10 10.1002/cpps.19 27801519
    [Google Scholar]
  61. Buchan D.W.A. Moffat L. Lau A. Kandathil S.M. Jones D.T. Deep learning for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res. 2024 52 W1 W287-93 10.1093/nar/gkae328 38747351
    [Google Scholar]
  62. Wang Z Wu C Zheng K Niu X Wang X. SMOTETomek-based resampling for personality recognition. IEEE Access 2019 7 129678 129689 10.1109/ACCESS.2019.2940061
    [Google Scholar]
  63. Assyifa D.S. Luthfiarta A. SMOTE-Tomek re-sampling based on random forest method to overcome unbalanced data for multi-class classification. Inform J Ilm Teknol Inf Komun 2024 9 2 151 160 10.25139/inform.v9i2.8410
    [Google Scholar]
  64. Ratantja Kusumajati F. Rahmat B. Junaidi A. Implementation of balancing data method using smotetomek in diabetes classification using XGBOOST. Jurnal Ilmiah Kursor 2024 12 4 201 212 10.21107/kursor.v12i4.410
    [Google Scholar]
  65. Mienye ID Sun Y A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022 10 99129 99149 10.1109/ACCESS.2022.3207287
    [Google Scholar]
  66. Yang Y. Lv H. Chen N. A Survey on ensemble learning under the era of deep learning. Artif. Intell. Rev. 2023 56 6 5545 5589 10.1007/s10462‑022‑10283‑5
    [Google Scholar]
  67. Canzhuang S. Yonge F. Identification of disordered regions of intrinsically disordered proteins by multi-features fusion. Curr. Bioinform. 2021 16 9 1126 1132 10.2174/1574893616666210308102552
    [Google Scholar]
  68. Handelman G.S. Kok H.K. Chandra R.V. Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods. AJR Am. J. Roentgenol. 2019 212 1 38 43 10.2214/AJR.18.20224 30332290
    [Google Scholar]
  69. Zahra S. Characterization of vomeronasal receptor class 2 in Danio rerio: Characterization of V2R genes in Danio rerio. MARKHOR (The Journal of Zoology) 2022 3 2 20 24 10.54393/mjz.v3i02.56
    [Google Scholar]
  70. Hassan S. Töpel M. Aronsson H. Ligand Binding Site Comparison — LiBiSCo — A web‐based tool for analyzing interactions between proteins and ligands to explore amino acid specificity within active sites. Proteins 2021 89 11 1530 1540 10.1002/prot.26175 34240464
    [Google Scholar]
  71. Yang Y.H. Yang J.T. Liu J.F. Lactylation prediction models based on protein sequence and structural feature fusion. Brief. Bioinform. 2024 25 2 bbad539 10.1093/bib/bbad539 38385873
    [Google Scholar]
  72. Wang X. Liu Y. Du Z. Prediction of protein solubility based on sequence feature fusion and DDcCNN. Interdiscip. Sci. 2021 13 4 703 716 10.1007/s12539‑021‑00456‑1 34236625
    [Google Scholar]
  73. Dong X. Yu Z. Cao W. Shi Y. Ma Q. A survey on ensemble learning. Front. Comput. Sci. 2020 14 2 241 258 10.1007/s11704‑019‑8208‑z
    [Google Scholar]
  74. Xia C.Q. Pan X. Shen H.B. Protein–ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics 2020 36 10 3018 3027 10.1093/bioinformatics/btaa110 32091580
    [Google Scholar]
  75. Chelur V.R. Priyakumar U.D. BiRDS - Binding residue detection from protein sequences using deep ResNets. J. Chem. Inf. Model. 2022 62 8 1809 1818 10.1021/acs.jcim.1c00972 35414182
    [Google Scholar]
  76. Hosseini S. Golding G.B. Ilie L. Seq-InSite: Sequence supersedes structure for protein interaction site prediction. Bioinformatics 2024 40 1 btad738 10.1093/bioinformatics/btad738 38212995
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936372515250801173324
Loading
/content/journals/cbio/10.2174/0115748936372515250801173324
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test