Skip to content
2000
Volume 20, Issue 9
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

Phage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information.

Methods

In this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods were utilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model.

Results

The results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%.

Conclusion

We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936330198240924110742
2024-10-28
2025-10-31
Loading full text...

Full text loading...

References

  1. ClarkJ.R. MarchJ.B. Bacteriophages and biotechnology: Vaccines, gene therapy and antibacterials.Trends Biotechnol.200624521221810.1016/j.tibtech.2006.03.003 16567009
    [Google Scholar]
  2. HockenberryA.J. WilkeC.O. BACPHLIP: Predicting bacteriophage lifestyle from conserved protein domains.PeerJ20219e1139610.7717/peerj.11396 33996289
    [Google Scholar]
  3. O’FlahertyS. RossR.P. CoffeyA. Bacteriophage and their lysins for elimination of infectious bacteria.FEMS Microbiol. Rev.200933480181910.1111/j.1574‑6976.2009.00176.x 19416364
    [Google Scholar]
  4. FengP.M. DingH. ChenW. LinH. Naïve Bayes classifier with feature selection to identify phage virion proteins.Comput. Math. Methods Med.201320131610.1155/2013/530696 23762187
    [Google Scholar]
  5. LavigneR. CeyssensP.J. RobbenJ. Phage proteomics: Applications of mass spectrometry.Methods Mol. Biol.200950223925110.1007/978‑1‑60327‑565‑1_14 19082560
    [Google Scholar]
  6. Jara-AcevedoR. DíezP. González-GonzálezM. Screening phage-display antibody libraries using protein arrays.Methods Mol. Biol.2018170136538010.1007/978‑1‑4939‑7447‑4_20 29116516
    [Google Scholar]
  7. SeguritanV. AlvesN.Jr ArnoultM. Artificial neural networks trained to detect viral and phage structural proteins.PLOS Comput. Biol.201288e100265710.1371/journal.pcbi.1002657 22927809
    [Google Scholar]
  8. DingH. FengP.M. ChenW. LinH. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.Mol. Biosyst.20141082229223510.1039/C4MB00316K 24931825
    [Google Scholar]
  9. ManavalanB. ShinT.H. LeeG. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine.Front. Microbiol.2018947610.3389/fmicb.2018.00476 29616000
    [Google Scholar]
  10. PanY. GaoH. LinH. LiuZ. TangL. LiS. Identification of bacteriophage virion proteins using Multinomial Naïve Bayes with g-Gap feature tree.Int. J. Mol. Sci.2018196177910.3390/ijms19061779 29914091
    [Google Scholar]
  11. TanJ.X. DaoF.Y. LvH. FengP.M. DingH. Identifying phage virion proteins by using two-step feature selection methods.Molecules2018238200010.3390/molecules23082000 30103458
    [Google Scholar]
  12. RuX. LiL. WangC. Identification of phage viral proteins with hybrid sequence features.Front. Microbiol.20191050710.3389/fmicb.2019.00507 30972038
    [Google Scholar]
  13. ArifM. AliF. AhmadS. KabirM. AliZ. HayatM. Pred-BVP-Unb: Fast prediction of bacteriophage virion proteins using un-biased multi-perspective properties with recursive feature elimination.Genomics202011221565157410.1016/j.ygeno.2019.09.006 31526842
    [Google Scholar]
  14. CharoenkwanP. KanthawongS. SchaduangratN. YanaJ. ShoombuatongW. PVPred-SCM: Improved Prediction and analysis of phage virion proteins using a scoring card method.Cells20209235310.3390/cells9020353 32028709
    [Google Scholar]
  15. ZhangL. ZhangC. GaoR. YangR. An ensemble method to distinguish bacteriophage virion from non-virion proteins based on protein sequence characteristics.Int. J. Mol. Sci.2015169217342175810.3390/ijms160921734 26370987
    [Google Scholar]
  16. CharoenkwanP. NantasenamatC. HasanM.M. ShoombuatongW. Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation.J. Comput. Aided Mol. Des.202034101105111610.1007/s10822‑020‑00323‑z 32557165
    [Google Scholar]
  17. HanH. ZhuW. DingC. LiuT. iPVP-MCV: A multi-classifier voting model for the accurate identification of phage virion proteins.Symmetry (Basel)2021138150610.3390/sym13081506
    [Google Scholar]
  18. AhmadS. CharoenkwanP. QuinnJ.M.W. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.Sci. Rep.2022121410610.1038/s41598‑022‑08173‑5 35260777
    [Google Scholar]
  19. FangZ. ZhouH. VirionFinder: Identification of complete and partial prokaryote virus virion protein from virome data using the sequence and biochemical properties of amino acids.Front. Microbiol.20211261571110.3389/fmicb.2021.615711 33613485
    [Google Scholar]
  20. UniProt: A worldwide hub of protein knowledge.Nucleic Acids Res.201947D1D506D51510.1093/nar/gky1049 30395287
    [Google Scholar]
  21. HuangY. NiuB. GaoY. FuL. LiW. CD-HIT Suite: A web server for clustering and comparing biological sequences.Bioinformatics201026568068210.1093/bioinformatics/btq003 20053844
    [Google Scholar]
  22. JiangM. ZhaoB. LuoS. NeuroPpred-Fuse: An interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods.Brief. Bioinform.2021226bbab31010.1093/bib/bbab310 34396388
    [Google Scholar]
  23. XieR. LiJ. WangJ. DeepVF: A deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.Brief. Bioinform.2021223bbaa12510.1093/bib/bbaa125 32599617
    [Google Scholar]
  24. ChengC.W. SuE.C.Y. HwangJ.K. SungT.Y. HsuW.L. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information.BMC Bioinformatics20089S610.1186/1471‑2105‑9‑S12‑S6
    [Google Scholar]
  25. WangJ. YangB. RevoteJ. POSSUM: A bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles.Bioinformatics201733172756275810.1093/bioinformatics/btx302 28903538
    [Google Scholar]
  26. YuB. LouL. LiS. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising.J. Mol. Graph. Model.20177626027310.1016/j.jmgm.2017.07.012 28743071
    [Google Scholar]
  27. JuanE.Y.T. LiW.J. JhangJ.H. ChiuC.H. Predicting protein subcellular localizations for gram-negative bacteria using DP-PSSM and support vector machines.International Conference on Complex, Intelligent and Software Intensive SystemsFukuoka, Japan20098364110.1109/CISIS.2009.194
    [Google Scholar]
  28. ZouL. NanC. HuF. Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles.Bioinformatics201329243135314210.1093/bioinformatics/btt554 24064423
    [Google Scholar]
  29. ChenZ. ZhaoP. LiF. iFeature: A Python package and web server for features extraction and selection from protein and peptide sequences.Bioinformatics201834142499250210.1093/bioinformatics/bty140 29528364
    [Google Scholar]
  30. BinY. ZhangW. TangW. Prediction of neuropeptides from sequence information using ensemble classifier and hybrid features.J. Proteome Res.20201993732374010.1021/acs.jproteome.0c00276 32786686
    [Google Scholar]
  31. KawashimaS. OgataH. KanehisaM. AAindex: Amino acid index database.Nucleic Acids Res.199927136836910.1093/nar/27.1.368 9847231
    [Google Scholar]
  32. DashM. LiuH. Feature selection for classification.Intell. Data Anal.19971313115610.3233/IDA‑1997‑1302
    [Google Scholar]
  33. SongQ. JiangH. LiuJ. Feature selection based on FDA and F-score for multi-class classification.Expert Syst. Appl.201781222710.1016/j.eswa.2017.02.049
    [Google Scholar]
  34. HenselerJ. RingleC.M. SarstedtM. A new criterion for assessing discriminant validity in variance-based structural equation modeling.J. Acad. Mark. Sci.201543111513510.1007/s11747‑014‑0403‑8
    [Google Scholar]
  35. LiD. WangY. HuW. Application of machine learning classifier to Candida auris drug resistance analysis.Front. Cell. Infect. Microbiol.20211174206210.3389/fcimb.2021.742062 34722336
    [Google Scholar]
  36. CoverT. HartP. Nearest neighbor pattern classification.IEEE Trans. Inf. Theory1967131212710.1109/TIT.1967.1053964
    [Google Scholar]
  37. UddinS. HaqueI. LuH. MoniM.A. GideE. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction.Sci. Rep.2022121625610.1038/s41598‑022‑10358‑x 35428863
    [Google Scholar]
  38. BreimanL. Random Forests.Mach. Learn.200145153210.1023/A:1010933404324
    [Google Scholar]
  39. ZhangC. ZhangY. ShiX. AlmpanidisG. FanG. ShenX. On incremental learning for gradient boosting decision trees.Neural Process. Lett.201950195798710.1007/s11063‑019‑09999‑3
    [Google Scholar]
  40. ChenT. GuestrinC. Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningSan Francisco20167859410.1145/2939672.2939785
    [Google Scholar]
  41. GeurtsP. ErnstD. WehenkelL. Extremely randomized trees.Mach. Learn.200663134210.1007/s10994‑006‑6226‑1
    [Google Scholar]
  42. RufoD.D. DebeleeT.G. IbenthalA. NegeraW.G. Diagnosis of Diabetes Mellitus using gradient boosting machine (LightGBM).Diagnostics (Basel)2021119171410.3390/diagnostics11091714 34574055
    [Google Scholar]
  43. YanJ. XuY. ChengQ. LightGBM: Accelerated genomically designed crop breeding through ensemble learning.Genome Biol.202122127110.1186/s13059‑021‑02492‑y 34544450
    [Google Scholar]
  44. VapnikV.N. An overview of statistical learning theory.IEEE Trans. Neural Netw.199910598899910.1109/72.788640 18252602
    [Google Scholar]
  45. LiM. ZhangW. PHIAF: Prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion.Brief. Bioinform.2022231bbab34810.1093/bib/bbab348 34472593
    [Google Scholar]
  46. PolikarR. Ensemble based systems in decision making.IEEE Circuits Syst. Mag.200663214510.1109/MCAS.2006.1688199
    [Google Scholar]
  47. LaurensM. Accelerating t-SNE using tree-based algorithms.J. Mach. Learn. Res.201415132213245
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936330198240924110742
Loading
/content/journals/cbio/10.2174/0115748936330198240924110742
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test