Skip to content
2000
Volume 20, Issue 9
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

The root system plays an irreplaceable role in plant growth. Its improvement can increase crop productivity. However, such a system is still mysterious for us. The underlying mechanism has not been fully uncovered. The investigation on proteins related to the root system is an important means to complete this task. In the previous time, lack of root-related proteins makes it impossible to adopt machine learning methods for designing efficient models for the discovery of novel root-related proteins. Recently, a public database on root-related proteins was set up and machine learning methods can be applied in this field.

Objective

The purpose of this study was to design an efficient computational method to predict root-associated proteins in three plants: maize, sorghum, and soybean.

Methods

In this study, we proposed a machine learning based model, named Graph-Root, for the identification of root-related proteins in maize, sorghum, and soybean. The features derived from protein sequences, functional domains, and one network were extracted, where the first type of features were processed by graph convolutional neural network and multi-head attention, the second type of features reflected the essential functions of proteins, and the third type of features abstracted the linkage between proteins. These features were fed into the fully connected layer to make predictions.

Results

The 5-fold cross-validation and independent tests suggested its acceptable performance. It also outperformed the only previous model, SVM-Root. Furthermore, the importance of each feature type and component in the proposed model was investigated.

Conclusion

Graph-Root had a good performance and can be a useful tool to identify novel root-related proteins. BLOSUM62 features were found to be important in determining root-related proteins.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936343410241008103219
2024-10-29
2025-11-01
Loading full text...

Full text loading...

References

  1. SchiefelbeinJ.W. SomervilleC. Genetic control of root hair development in arabidopsis thaliana.Plant Cell19902323524310.2307/3869138 12354956
    [Google Scholar]
  2. GriersonC. NielsenE. KetelaarcT. SchiefelbeinJ. Root hairs.Arabidopsis Book201412e017210.1199/tab.0172 24982600
    [Google Scholar]
  3. OguraT. GoeschlC. FiliaultD. Root system depth in arabidopsis is shaped by EXOCYST70A3 via the dynamic modulation of auxin transport.Cell20191782400412.e1610.1016/j.cell.2019.06.021 31299202
    [Google Scholar]
  4. ZhuJ. IngramP.A. BenfeyP.N. ElichT. From lab to field, new approaches to phenotyping root system architecture.Curr. Opin. Plant Biol.201114331031710.1016/j.pbi.2011.03.020 21530367
    [Google Scholar]
  5. LynchJ. Root architecture and plant productivity.Plant Physiol.1995109171310.1104/pp.109.1.7 12228579
    [Google Scholar]
  6. OberE.S. AlahmadS. CockramJ. ForestanC. HickeyL.T. KantJ. Wheat root systems as a breeding target for climate resilience.TAG Theor Appl Genet202113461645166210.1007/s00122‑021‑03819‑w
    [Google Scholar]
  7. LiY. LiuX. ChenR. TianJ. FanY. ZhouX. Genome-scale mining of root-preferential genes from maize and characterization of their promoter activity.BMC Plant Biol.201919158410.1186/s12870‑019‑2198‑8 31878892
    [Google Scholar]
  8. JungJ.K.H. McCouchS. Getting to the roots of it: Genetic and hormonal control of root architecture.Front Plant Sci2013418610.3389/fpls.2013.00186 23785372
    [Google Scholar]
  9. RamireddyE. NelissenH. LeuendorfJ.E. Van LijsebettensM. InzéD. SchmüllingT. Root engineering in maize by increasing cytokinin degradation causes enhanced root growth and leaf mineral enrichment.Plant Mol. Biol.2021106655556710.1007/s11103‑021‑01173‑5 34275101
    [Google Scholar]
  10. BushW.S. MooreJ.H. Chapter 11: Genome-wide association studies.PLOS Comput. Biol.2012812e100282210.1371/journal.pcbi.1002822 23300413
    [Google Scholar]
  11. XuF. ChenS. YangX. Genome-wide association study on root traits under different growing environments in wheat (Triticum aestivum L.).Front. Genet.20211264671210.3389/fgene.2021.646712 34178022
    [Google Scholar]
  12. KirschnerG.K. RosignoliS. GuoL. Enhançced Gravitropism 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat.Proc. Natl. Acad. Sci. USA202111835e210152611810.1073/pnas.2101526118 34446550
    [Google Scholar]
  13. KarnatamK.S. ChhabraG. SainiD.K. Genome-wide meta-analysis of QTLs associated with root traits and implications for maize breeding.Int. J. Mol. Sci.2023247613510.3390/ijms24076135 37047112
    [Google Scholar]
  14. MaJ. ZhaoD. TangX. Genome-wide association study on root system architecture and identification of candidate genes in wheat (Triticum aestivum L.).Int. J. Mol. Sci.2022233184310.3390/ijms23031843 35163763
    [Google Scholar]
  15. FizamesC. MuñosS. CazettesC. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence.Plant Physiol.20041341678010.1104/pp.103.030536 14730065
    [Google Scholar]
  16. MoisseyevG ParkK CuiA FreitasD RajagopalD KondaAR RGPDB: Database of root-associated genes and promoters in maize, soybean, and sorghum.Database J Biol Databases Curation20202020baaa03810.1093/database/baaa038
    [Google Scholar]
  17. Kumar MeherP. HatiS. SahuT.K. PradhanU. GuptaA. RathS.N. SVM-root: Identification of root-associated proteins in plants by employing the support vector machine with sequence-derived features.Curr. Bioinform.20241919110210.2174/1574893618666230417104543
    [Google Scholar]
  18. KipfTN WellingM Semi-supervised classification with graph convolutional networks.arXiv preprint, 1609, 029072016
  19. LinZ FengM. A structured self-attentive sentence embedding.arXiv preprint :1703031302017
  20. GroverA. LeskovecJ. node2vec: Scalable feature learning for networks.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningSan Francisco, California, USA20168556410.1145/2939672.2939754
    [Google Scholar]
  21. SzklarczykD. KirschR. KoutrouliM. The STRING database in 2023: Protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.Nucleic Acids Res.202351D1D638D64610.1093/nar/gkac1000 36370105
    [Google Scholar]
  22. YatesA.D. AllenJ. AmodeR.M. Ensembl Genomes 2022: An expanding genome resource for non-vertebrates.Nucleic Acids Res.202250D1D996D100310.1093/nar/gkab1007 34791415
    [Google Scholar]
  23. BatemanA. MartinM-J. OrchardS. UniProt: The universal protein knowledgebase in 2023.Nucleic Acids Res.202351D1D523D53110.1093/nar/gkac1052 36408920
    [Google Scholar]
  24. FuL. NiuB. ZhuZ. WuS. LiW. CD-HIT: Accelerated for clustering the next-generation sequencing data.Bioinformatics201228233150315210.1093/bioinformatics/bts565 23060610
    [Google Scholar]
  25. HenikoffS. HenikoffJ.G. Amino acid substitution matrices from protein blocks.Proc. Natl. Acad. Sci. USA19928922109151091910.1073/pnas.89.22.10915 1438297
    [Google Scholar]
  26. AltschulS. MaddenT.L. SchäfferA.A. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs.Nucleic Acids Res.199725173389340210.1093/nar/25.17.3389 9254694
    [Google Scholar]
  27. BoeckmannB. BairochA. ApweilerR. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.Nucleic Acids Res.200331136537010.1093/nar/gkg095 12520024
    [Google Scholar]
  28. SinghJ. LitfinT. SinghJ. PaliwalK. ZhouY. SPOT-Contact-LM: Improving single-sequence-based prediction of protein contact map using a transformer language model.Bioinformatics20223871888189410.1093/bioinformatics/btac053 35104320
    [Google Scholar]
  29. PanX ChenL LiuI NiuZ HuangT CaiYD Identifying protein subcellular locations with embeddings-based node2loc.IEEE/ACM Trans Comput Biol Bioinform20221926667510.1109/TCBB.2021.3080386
    [Google Scholar]
  30. PanX. LiH. ZengT. Identification of protein subcellular localization with network and functional embeddings.Front. Genet.20211162650010.3389/fgene.2020.626500 33584818
    [Google Scholar]
  31. ChenL. GuJ. ZhouB. PMiSLocMF: Predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs.Brief. Bioinform.2024255bbae38610.1093/bib/bbae386 39154195
    [Google Scholar]
  32. ZhaoR. HuB. ChenL. ZhouB. Identification of latent oncogenes with a network embedding method and random forest.BioMed Res. Int.2020202011110.1155/2020/5160396 33029511
    [Google Scholar]
  33. PerozziB. Al-RfouR. SkienaS. Eds. Deepwalk: Online learning of social representations.Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining,.New York, USA, 24 August201470171010.1145/2623330.2623732
    [Google Scholar]
  34. ChoH. BergerB. PengJ. Compact integration of multi-network topology for functional analysis of genes.Cell Syst.201636540548.e510.1016/j.cels.2016.10.017 27889536
    [Google Scholar]
  35. TangJ. QuM. WangM. ZhangM. YanJ. MeiQ. Eds. Line: Large-scale information network embedding.Proceedings of the 24th international conference on world wide web,.Florence, Italy, 18 May 20151067107710.1145/2736277.2741093
    [Google Scholar]
  36. MikolovT. ChenK. CorradoG. DeanJ. Efficient estimation of word representations in vector space.arXiv:130137812013
  37. ChenL. ZhangC. XuJ. PredictEFC: A fast and efficient multi-label classifier for predicting enzyme family classes.BMC Bioinformatics20242515010.1186/s12859‑024‑05665‑1 38291384
    [Google Scholar]
  38. CaiY.D. ChouK.C. Using functional domain composition to predict enzyme family classes.J. Proteome Res.20054110911110.1021/pr049835p 15707365
    [Google Scholar]
  39. LuL. QianZ. CaiY.D. LiY. ECS: An automatic enzyme classifier based on functional domain composition.Comput. Biol. Chem.200731322623210.1016/j.compbiolchem.2007.03.008 17500036
    [Google Scholar]
  40. ZouZ. TianS. GaoX. LiY. mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning.Front. Genet.2019971410.3389/fgene.2018.00714 30723495
    [Google Scholar]
  41. BlumM. ChangH.Y. ChuguranskyS. The InterPro protein families and domains database: 20 years on.Nucleic Acids Res.202149D1D344D35410.1093/nar/gkaa977 33156333
    [Google Scholar]
  42. ApweilerR. AttwoodT.K. BairochA. The InterPro database, an integrated documentation resource for protein families, domains and functional sites.Nucleic Acids Res.2001291374010.1093/nar/29.1.37 11125043
    [Google Scholar]
  43. KingmaD.P. BaJ. Adam: A method for stochastic optimization.arXiv:141269802019
  44. KohaviR. A study of cross-validation and bootstrap for accuracy estimation and model selection.Proceedings of the 14th International Joint Conference on Artificial IntelligenceMontreal, Quebec, Canada, 20 August 199511371143
    [Google Scholar]
  45. ChenL. ChenY. RMTLysPTM: Recognizing multiple types of lysine PTM sites by deep analysis on sequences.Brief. Bioinform.2023251bbad45010.1093/bib/bbad450 38066710
    [Google Scholar]
  46. PowersD. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation.J. Mach. Learn. Technol.2011213763
    [Google Scholar]
  47. MatthewsB.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme.Biochim. Biophys. Acta Protein Struct.1975405244245110.1016/0005‑2795(75)90109‑9 1180967
    [Google Scholar]
  48. ChenL. LiL. Prediction of drug pathway-based disease classes using multiple properties of drugs.Curr. Bioinform.202419985987210.2174/0115748936284973240105115444
    [Google Scholar]
  49. SrivastavaA. KumarM. Prediction of zinc binding sites in proteins using sequence derived information.J. Biomol. Struct. Dyn.201836164413442310.1080/07391102.2017.1417910 29241411
    [Google Scholar]
  50. ChenL. HuH. MBPathNCP: A metabolic pathway prediction model for chemicals and enzymes based on network consistency projection.Curr. Bioinform.2024
    [Google Scholar]
  51. ChenL. ZhaoX. PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path.Math. Biosci. Eng.20232012205532057510.3934/mbe.2023909 38124565
    [Google Scholar]
  52. ChenL. XuJ. ZhouY. PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning.Comput. Biol. Med.202416910786210.1016/j.compbiomed.2023.107862 38150886
    [Google Scholar]
  53. ChowdhuryS.Y. ShatabdaS. DehzangiA. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features.Sci. Rep.2017711493810.1038/s41598‑017‑14945‑1 29097781
    [Google Scholar]
  54. HuangF. MaQ. RenJ. Identification of smoking‐associated transcriptome aberration in blood with machine learning methods.BioMed Res. Int.202320231533336110.1155/2023/5333361 36644165
    [Google Scholar]
  55. RenJ. ZhangY. GuoW. Identification of genes associated with the impairment of olfactory and gustatory functions in covid-19 via machine-learning methods.Life202313379810.3390/life13030798 36983953
    [Google Scholar]
  56. WangY. XuY. YangZ. LiuX. DaiQ. Using recursive feature selection with random forest to improve protein structural class prediction for low-similarity sequences.Comput. Math. Methods Med.202120211910.1155/2021/5529389 34055035
    [Google Scholar]
  57. OnesimeM. YangZ. DaiQ. Genomic island prediction via chi-square test and random forest algorithm.Comput. Math. Methods Med.202120211910.1155/2021/9969751 34122622
    [Google Scholar]
  58. PedregosaF. VaroquauxG. GramfortA. MichelV. ThirionB. GriselO. Scikit-learn: Machine learning in python.J. Mach. Learn. Res.201112228252830
    [Google Scholar]
  59. RivesA. MeierJ. SercuT. GoyalS. LinZ. LiuJ. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proc. Natl. Acad. Sci. USA202111815e201623911810.1073/pnas.2016239118
    [Google Scholar]
  60. RaoR.M. LiuJ. VerkuilR. MeierJ. CannyJ. AbbeelP. MSA Transformer.Preprints202110.1101/2021.02.12.430858
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936343410241008103219
Loading
/content/journals/cbio/10.2174/0115748936343410241008103219
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test