Skip to content
2000
Volume 21, Issue 5
  • ISSN: 1570-1646
  • E-ISSN: 1875-6247

Abstract

Background

The identification and classification of natural products are vital in drug discovery and bioactive compound exploration. Traditional methods are laborious and time-consuming, necessitating innovative tools for accurate predictions using advanced AI techniques.

Objectives

This paper presents NaturePred, a user-friendly tool designed to predict the class of natural products and calculate eight physicochemical properties of protein sequences. It aims to accurately predict five distinct classes of natural product biosynthetic gene clusters (BGCs): Polyketide Synthases (PKS), Non-ribosomal Peptide Synthetases (NRPS), Ribosomally Synthesized and Post-Translationally Modified Peptides (RiPPs), Terpenes, and PKS-NRPS Hybrids. It also addresses reliability in multi-class classification with a 90% confidence score threshold.

Methods

NaturePred offers three input options: single protein sequence, CSV file, or GenBank (.gbk) file. It uses a pipeline with a Natural Language Processing model based on TF-IDF (Term Frequency- Inverse Document Frequency) and a Logistic Regression classifier. Predictions are made if the confidence score exceeds 90%; otherwise, “None of the above class” is predicted. Evaluation with unseen data from the MiBIG database shows high accuracy (~96%) in assigning BGCs.

Results

NaturePred provides accurate predictions with high confidence scores, demonstrating reliability across different datasets. It calculates eight physicochemical properties of protein sequences, offering valuable insights for further analysis.

Conclusion

NaturePred's integrated features, including versatile input options, accurate predictions, and physicochemical property calculations, make it an indispensable tool in natural product research. By addressing classification challenges, NaturePred facilitates drug discovery and bioactive compound exploration, advancing the field. Tool available: (http://login1.cabgrid.res.in:5101/).

Loading

Article metrics loading...

/content/journals/cp/10.2174/0115701646322417241101055512
2025-01-01
2025-08-24
Loading full text...

Full text loading...

References

  1. NewmanD.J. CraggG.M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019.J. Nat. Prod.2022853500516 32162523
    [Google Scholar]
  2. ButlerM.S. RobertsonA.A.B. CooperM.A. Natural product and natural product derived drugs in clinical trials.Nat. Prod. Rep.201431111612166110.1039/C4NP00064A 25204227
    [Google Scholar]
  3. DemainA.L. Pharmaceutically active secondary metabolites of microorganisms.Appl. Microbiol. Biotechnol.199952445546310.1007/s002530051546 10570792
    [Google Scholar]
  4. GrisoniF. MerkD. ByrneR. SchneiderG. Scaffold-hopping from synthetic drugs by holistic molecular representation.Sci. Rep.2018811646910.1038/s41598‑018‑34677‑0 30405170
    [Google Scholar]
  5. MedemaM.H. BlinK. CimermancicP. de JagerV. ZakrzewskiP. FischbachM.A. WeberT. TakanoE. BreitlingR. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.Nucleic Acids Res.201139Web Server issue)(Suppl. 2W339W34610.1093/nar/gkr466 21672958
    [Google Scholar]
  6. SkinniderM.A. MerwinN.J. JohnstonC.W. MagarveyN.A. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes.Nucleic Acids Res.201745W1W49W5410.1093/nar/gkx320 28460067
    [Google Scholar]
  7. CimermancicP. MedemaM.H. ClaesenJ. KuritaK. Wieland BrownL.C. MavrommatisK. PatiA. GodfreyP.A. KoehrsenM. ClardyJ. BirrenB.W. TakanoE. SaliA. LiningtonR.G. FischbachM.A. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters.Cell2014158241242110.1016/j.cell.2014.06.034 25036635
    [Google Scholar]
  8. WamboP.A. ML-Miner: A machine learning tool used for identification of novel biosynthetic gene clusters.2022
    [Google Scholar]
  9. MedemaM.H. FischbachM.A. Computational approaches to natural product discovery.Nat. Chem. Biol.201511963964810.1038/nchembio.1884 26284671
    [Google Scholar]
  10. MishraD.C. MadivalS.D. SharmaA. KumarS. MajiA.K. BudhlakotiN. SinhaD. RaiA. A deep clustering-based novel approach for binning of metagenomics data.Curr. Genomics202223535336810.2174/1389202923666220928150100 36778191
    [Google Scholar]
  11. Van RossumG. DrakeF.L. Python 3 Reference Manual.Scotts Valley, CACreateSpace2009
    [Google Scholar]
  12. PedregosaF. VaroquauxG. GramfortA. MichelV. ThirionB. GriselO. Scikit-learn: Machine learning in Python.J. Mach. Learn. Res.201112Oct28252830
    [Google Scholar]
  13. CholletF. Keras 3: Deep learning for humans.2015Available from: https://github.com/fchollet/keras (accessed on 8-10-2024)
    [Google Scholar]
  14. AbadiM. BarhamP. ChenJ. ChenZ. DavisA. DeanJ. TensorFlow: A system for large-scale machine learning.12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 02 Nov2016USA265283
    [Google Scholar]
  15. CockP.J.A. AntaoT. ChangJ.T. ChapmanB.A. CoxC.J. DalkeA. FriedbergI. HamelryckT. KauffF. WilczynskiB. de HoonM.J.L. Biopython: freely available Python tools for computational molecular biology and bioinformatics.Bioinformatics200925111422142310.1093/bioinformatics/btp163 19304878
    [Google Scholar]
  16. HarrisC.R. MillmanK.J. van der WaltS.J. GommersR. VirtanenP. CournapeauD. WieserE. TaylorJ. BergS. SmithN.J. KernR. PicusM. HoyerS. van KerkwijkM.H. BrettM. HaldaneA. del RíoJ.F. WiebeM. PetersonP. Gérard-MarchantP. SheppardK. ReddyT. WeckesserW. AbbasiH. GohlkeC. OliphantT.E. Array programming with NumPy.Nature2020585782535736210.1038/s41586‑020‑2649‑2 32939066
    [Google Scholar]
  17. McKinneyW. Data structures for statistical computing in Python.Proceeding of the 9th Python in Science Conference; SCIPY201010.25080/Majora‑92bf1922‑00a
    [Google Scholar]
  18. KautsarS.A. BlinK. ShawS. Navarro-MuñozJ.C. TerlouwB.R. van der HooftJ.J.J. van SantenJ.A. TracannaV. Suarez DuranH.G. Pascal AndreuV. Selem-MojicaN. AlanjaryM. RobinsonS.L. LundG. EpsteinS.C. SistoA.C. CharkoudianL.K. CollemareJ. LiningtonR.G. WeberT. MedemaM.H. MIBiG 2.0: a repository for biosynthetic gene clusters of known function.Nucleic Acids Res.202048D1D454D458 31612915
    [Google Scholar]
  19. MadivalS.D. JhaG.K. MishraD.C. KumarS. BudhlakotiN. SharmaA. ChaturvediK.K. KabilanS. FarooqiM.S. SrivastavaS. A novel deep contrastive convolutional autoencoder based binning approach for taxonomic independent metagenomics data.J. Plant Biochem. Biotechnol.202411110.1007/s13562‑024‑00911‑2
    [Google Scholar]
  20. MikolovT. Efficient estimation of word representations in vector space.arXiv:1301.37812013
    [Google Scholar]
  21. HosmerD.W.Jr LemeshowS. SturdivantR.X. Applied Logistic Regression.John Wiley & Sons201310.1002/9781118548387
    [Google Scholar]
  22. GuoG. WangH. BellD. BiY. GreerK. KNN model-based approach in classification.On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE - OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003Catania, Sicily, Italy, Nov 3-7200311210.1007/978‑3‑540‑39964‑3_62
    [Google Scholar]
  23. LewisD.D. Naive (Bayes) at forty: The independence assumption in information retrieval.European Conference on Machine Learning199841510.1007/BFb0026666
    [Google Scholar]
  24. LohW.Y. Classification and regression trees.Wiley Interdiscip. Rev. Data Min. Knowl. Discov.201111142310.1002/widm.8
    [Google Scholar]
  25. BreimanL. Random forests.Mach. Learn.200145153210.1023/A:1010933404324
    [Google Scholar]
  26. CortesC. VapnikV. Support-vector networks.Mach. Learn.199520327329710.1007/BF00994018
    [Google Scholar]
  27. ChenT. GuestrinC. Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAug 13-17, 2016California, San Francisco, USA78579410.1145/2939672.2939785
    [Google Scholar]
  28. ProkhorenkovaL. GusevG. VorobevA. DorogushA.V. GulinA. CatBoost: Unbiased boosting with categorical features.Adv. Neural Inf. Process. Syst.2018201831
    [Google Scholar]
  29. RosenblattF. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.Washington, DCSpartan Books1962Vol. 55
    [Google Scholar]
  30. ChawlaN.V. BowyerK.W. HallL.O. KegelmeyerW.P. SMOTE: Synthetic minority over-sampling technique.J. Artif. Intell. Res.20021632135710.1613/jair.953
    [Google Scholar]
  31. MishraD.C. MadivalS.D. SharmaA. BudhlakotiN. ChaturvediK.K. AngadiU.B. Enhancing the classification of biosynthetic gene clusters through comprehensive NLP-based approach.Preprints202310.1564.v1202310.20944/preprints202310.1564.v1
    [Google Scholar]
/content/journals/cp/10.2174/0115701646322417241101055512
Loading
/content/journals/cp/10.2174/0115701646322417241101055512
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test