Skip to content
2000
Volume 20, Issue 7
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

The accurate recognition of the polyadenylation signal (PAS) from DNA sequences is essential for understanding gene transcriptional regulation. A variety of machine learning-based computational methods have been developed to predict PAS in recent years; however, their performance and their generalization ability are unsatisfactory. It is highly desirable to design more preferable computational approaches for PAS prediction.

Methods

In this work, we developed an integrated framework MGCN-PolyA for PAS prediction across four species, including Homo sapiens, Bos taurus, Mus musculus, and Drosophila melanogaster. MGCN-Poly(A) benefits from the diversity of feature engineering and the effectiveness of the model architecture. We combined features from different perspectives, such as word embedding, One-hot encoding, K-mer frequency, and Enhanced Nucleic Acid Composition (ENAC), which complement each other and provide rich and comprehensive information for model learning. In model architecture, MGCN-Poly(A) leverages a two-channel multi-scale gated convolutional network to effectively learn high-level feature representations at different scales, and then combines the statistical features to predict PAS using random forest algorithm. These designs not only speed up network training, but also improves the generalization ability.

Results

The benchmarking experiments on the independent test datasets demonstrate that MGCN-PolyA outperforms other state-of-the-art algorithms in identifying PAS. MGCN-PolyA has the highest accuracy on all test datasets, and its excellent performance on cross-species validation also demonstrates the robustness of our model.

Conclusion

Extracting features from different perspectives is important for PAS recognition, and the integration of DNNs and shallow machine learning algorithms can improve the model performance.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936289520240828050951
2024-09-19
2025-09-06
Loading full text...

Full text loading...

References

  1. SachsA.B. DavisR.W. The poly(A) binding protein is required for poly(A) shortening and 60S ribosomal subunit-dependent translation initiation.Cell198958585786710.1016/0092‑8674(89)90938‑0 2673535
    [Google Scholar]
  2. ProudfootN. Poly(A) signals.Cell199164467167410.1016/0092‑8674(91)90495‑K
    [Google Scholar]
  3. AkhtarM.N. BukhariS.A. FazalZ. QamarR. ShahmuradovI.A. POLYAR, a new computer program for prediction of poly(A) sites in human sequences.BMC Genomics201011164610.1186/1471‑2164‑11‑646 21092114
    [Google Scholar]
  4. GuoY. ZhouD. LiW. Identifying polyadenylation signals with biological embedding via self-attentive gated convolutional highway networks.Appl. Soft Comput.2021103107133
    [Google Scholar]
  5. ProudfootN.J. Ending the message: Poly(A) signals then and now.Genes Dev.201125171770178210.1101/gad.17268411 21896654
    [Google Scholar]
  6. ArefeenA. XiaoX. JiangT. DeepPASTA: Deep neural network based polyadenylation site analysis.Bioinformatics201935224577458510.1093/bioinformatics/btz283 31081512
    [Google Scholar]
  7. RenF. ZhangN. ZhangL. MillerE. PuJ.J. Alternative polyadenylation: A new frontier in post transcriptional regulation.Biomark. Res.2020816710.1186/s40364‑020‑00249‑6 33292571
    [Google Scholar]
  8. Edwalds-GilbertG. VeraldiK.L. MilcarekC. Alternative poly(A) site selection in complex transcription units: Means to an end?Nucleic Acids Res.199725132547256110.1093/nar/25.13.2547 9185563
    [Google Scholar]
  9. CurinhaA. Oliveira BrazS. Pereira-CastroI. CruzA. MoreiraA. Implications of polyadenylation in health and disease.Nucleus20145650851910.4161/nucl.36360 25484187
    [Google Scholar]
  10. KalkatawiM. RangkutiF. SchrammM. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences.Bioinformatics201228112712910.1093/bioinformatics/btr602 22088842
    [Google Scholar]
  11. Magana-MoraA. KalkatawiM. BajicV.B. Omni-PolyA: A method and tool for accurate recognition of Poly(A) signals in human genomic DNA.BMC Genomics201718162010.1186/s12864‑017‑4033‑7 28810905
    [Google Scholar]
  12. AlbalawiF. ChahidA. GuoX. Hybrid model for efficient prediction of poly(A) signals in human genomic DNA.Methods2019166313910.1016/j.ymeth.2019.04.001 30991099
    [Google Scholar]
  13. KalkatawiM. Magana-MoraA. JankovicB. BajicV.B. DeepGSR: An optimized deep-learning structure for the recognition of genomic signals and regions.Bioinformatics20193571125113210.1093/bioinformatics/bty752 30184052
    [Google Scholar]
  14. XiaZ. LiY. ZhangB. DeeReCT-PolyA: A robust and generic deep learning method for PAS identification.Bioinformatics201935142371237910.1093/bioinformatics/bty991 30500881
    [Google Scholar]
  15. YuH. DaiZ. SANPolyA: A deep learning method for identifying Poly(A) signals.Bioinformatics20203682393240010.1093/bioinformatics/btz970 31904817
    [Google Scholar]
  16. GuoY. LiC. ZhouD. CaoJ. LiangH. Context-aware dynamic neural computational models for accurate Poly(A) signal prediction.Neural Netw.202215228729910.1016/j.neunet.2022.04.025 35588673
    [Google Scholar]
  17. LiuQ. FangH. WangX. DeepGenGrep: A general deep learning-based predictor for multiple genomic signals and regions.Bioinformatics202238174053406110.1093/bioinformatics/btac454 35799358
    [Google Scholar]
  18. HiggsD.R. GoodbournS.E.Y. LambJ. CleggJ.B. WeatherallD.J. ProudfootN.J. α-Thalassaemia caused by a polyadenylation signal mutation.Nature1983306594139840010.1038/306398a0
    [Google Scholar]
  19. LeungM.K.K. DelongA. FreyB.J. Inference of the human polyadenylation code.Bioinformatics201834172889289810.1093/bioinformatics/bty211 29648582
    [Google Scholar]
  20. ZhengY. WangH. ZhangY. GaoX. XingE.P. XuM. Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.PLOS Comput. Biol.20201611e100829710.1371/journal.pcbi.1008297 33151940
    [Google Scholar]
  21. WengL. LiY. XieX. ShiY. Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation.RNA201622681382110.1261/rna.055681.115 27095026
    [Google Scholar]
  22. LiZ. LiY. ZhangB. DeeReCT-APA: prediction of alternative polyadenylation site usage through deep learning.Genom Proteom Bioinform202220348349510.1016/j.gpb.2020.05.004 33662629
    [Google Scholar]
  23. LeCunY BengioY HintonG. Deep learning. nature20155217553436444
    [Google Scholar]
  24. Almagro ArmenterosJ.J. SønderbyC.K. SønderbyS.K. NielsenH. WintherO. DeepLoc: prediction of protein subcellular localization using deep learning.Bioinformatics201733244049910.1093/bioinformatics/btx548 29028934
    [Google Scholar]
  25. LiK.Y. DingG.T. WangH.T. L-FCN: A lightweight fully convolutional network for biomedical semantic segmentation Proceedings 2018 IEEE International Conference on bioinformatics and biomedicine (BIBM),.Madrid, Spain,20182363236710.1109/BIBM.2018.8621265
    [Google Scholar]
  26. ManavalanB. BasithS. ShinT.H. LeeD.Y. WeiL. LeeG. 4mCpred-EL: An ensemble learning framework for identification of dna n4-methylcytosine sites in the mouse genome.Cells2019811133210.3390/cells8111332
    [Google Scholar]
  27. YuanH. CaiL. WangZ. HuX. ZhangS. JiS. Computational modeling of cellular structures using conditional deep generative networks.Bioinformatics201935122141214910.1093/bioinformatics/bty923 30398548
    [Google Scholar]
  28. KhanalJ. TayaraH. ZouQ. ChongK.T. Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation.Comput. Struct. Biotechnol. J.2021191612161910.1016/j.csbj.2021.03.015
    [Google Scholar]
  29. GuoY. ZhouD. LiW. CaoJ. Deep multi-scale Gaussian residual networks for contextual-aware translation initiation site recognition.Exp. Syst. Applic2022207118004
    [Google Scholar]
  30. LiY. XueJ. WangK. ZhangM. LiZ. Surface defect detection of fresh-cut cauliflowers based on convolutional neural network with transfer learning.Foods20221118291510.3390/foods11182915
    [Google Scholar]
  31. TangZ. LiZ. HouT. SiGra: Single-cell spatial elucidation through an image-augmented graph transformer.Nat. Commun.2023141561810.1038/s41467‑023‑41437‑w 37699885
    [Google Scholar]
  32. TangZ. LiuX. LiZ. SpaRx: Elucidate single-cell spatial heterogeneity of drug responses for personalized treatment.Brief. Bioinform.2023246bbad33810.1093/bib/bbad338 37798249
    [Google Scholar]
  33. ZhuangJ. FengK. TengX. JiaC. GNet: An integrated context-aware neural framework for transcription factor binding signal at single nucleotide resolution prediction.Math. Biosci. Eng.2023209158091582910.3934/mbe.2023704 37919990
    [Google Scholar]
  34. MinS. LeeB. YoonS. Deep learning in bioinformatics.Brief. Bioinform.2017185851869 27473064
    [Google Scholar]
  35. ZhuangJ. GaoW. SuR. EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features.J. Bioinform. Comput. Biol.2024221245000110.1142/S021972002450001X
    [Google Scholar]
  36. RosenbloomK.R. ArmstrongJ. BarberG.P. The UCSC genome browser database: 2015 update.Nucleic Acids Res.201543D1D670D68110.1093/nar/gku1177 25428374
    [Google Scholar]
  37. HoqueM. JiZ. ZhengD. Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing.Nat. Methods201310213313910.1038/nmeth.2288
    [Google Scholar]
  38. TianB. HuJ. ZhangH. LutzC.S. A large-scale analysis of mRNA polyadenylation of human and mouse genes.Nucleic Acids Res.200533120121210.1093/nar/gki158 15647503
    [Google Scholar]
  39. TabaskaJ.E. ZhangM.Q. Detection of polyadenylation signals in human DNA sequences.Gene19992311-2778610.1016/S0378‑1119(99)00104‑3 10231571
    [Google Scholar]
  40. SalamovA.A. SolovyevV.V. Recognition of 3′ -processing sites of human mRNA precursors.Bioinformatics1997131232810.1093/bioinformatics/13.1.23 9088705
    [Google Scholar]
  41. ZhangP. ZhangH. WuH. iPro-WAEL: A comprehensive and robust framework for identifying promoters in multiple species.Nucleic Acids Res.20225018102781028910.1093/nar/gkac824 36161334
    [Google Scholar]
  42. FuL. NiuB. ZhuZ. WuS. LiW. CD-HIT: Accelerated for clustering the next-generation sequencing data.Bioinformatics201228233150315210.1093/bioinformatics/bts565 23060610
    [Google Scholar]
  43. ZhengL. ZhengL. HuangS. MuN. RAACBook: A web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule.Database20192019baz13110.1093/database/baz131
    [Google Scholar]
  44. MikolovT. ChenK. CorradoG. ScienceJ.D.J.C. Efficient estimation of word representations in vector space.arXiv:130137812013
    [Google Scholar]
  45. LeeJ. YoonW. KimS. BioBERT: A pre-trained biomedical language representation model for biomedical text mining.Bioinformatics20203641234124010.1093/bioinformatics/btz682 31501885
    [Google Scholar]
  46. GharaviE. GuA. ZhengG. Embeddings of genomic region sets capture rich biological associations in lower dimensions.Bioinformatics202137234299430610.1093/bioinformatics/btab439 34156475
    [Google Scholar]
  47. ZouQ. XingP. WeiL. LiuB. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA.RNA201925220521810.1261/rna.069112.118 30425123
    [Google Scholar]
  48. XuH. JiaP. ZhaoZ. Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.Brief. Bioinform.2021223bbaa09910.1093/bib/bbaa099 32578842
    [Google Scholar]
  49. HuangY. HeN. ChenY. ChenZ. LiL. BERMP: A cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach.Int. J. Biol. Sci.201814121669167710.7150/ijbs.27819 30416381
    [Google Scholar]
  50. NahS. KimT.H. LeeK.M. Deep multi-scale convolutional neural network for dynamic scene deblurring.2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Honolulu, HI, USA201725726510.1109/CVPR.2017.35
    [Google Scholar]
  51. HahnloserR.H.R. SarpeshkarR. MahowaldM.A. DouglasR.J. SeungH.S. Correction: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit.Nature20004056789947951
    [Google Scholar]
  52. KlambauerG. UnterthinerT. MayrA. HochreiterS. Selfnormalizing neural networks. Proceedings of the 31° International Conference on Neural Information Processing Systems,.Long Beach, CaliforniaUSA2017110
    [Google Scholar]
  53. ZhangX. ZhouX.Y. LinM.X. SunR. ShuffleNet: An extremely efficient convolutional neural network for mobile devices.2018 IEEE/CVF Conference on Computer Vision and Pattern RecognitionSalt Lake City, UT, USA201868486856
    [Google Scholar]
  54. ZillyJ.G. SrivastavaR.K. KoutníkJ. SchmidhuberJ. Recurrent highway networks. Proceedings of the 34° International Conference on Machine Learning, Proceedings of Machine Learning Research.2017110
    [Google Scholar]
  55. LiC. WangJ. NiuZ. YaoJ. ZengX. A spatial-temporal gated attention module for molecular property prediction based on molecular geometry.Brief. Bioinform.2021225bbab07810.1093/bib/bbab078 33822856
    [Google Scholar]
  56. TranH.V. NguyenQ.H. iAnt: Combination of convolutional neural network and random forest models using PSSM and BERT features to identify antioxidant proteins.Curr. Bioinform.202217218419510.2174/1574893616666210820095144
    [Google Scholar]
  57. ZhouL. WangH. A combined feature screening approach of random forest and filterbased methods for ultra-high dimensional data.Curr. Bioinform.202217434435710.2174/1574893617666220221120618
    [Google Scholar]
  58. RuderS. An overview of gradient descent optimization algorithms.arXiv:1609047472016
    [Google Scholar]
  59. WangK. DouY. SunT. QiaoP. WenD. An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks.Int. J. Intell. Syst.202237107334735510.1002/int.22883
    [Google Scholar]
  60. SnoekJ. LarochelleH. AdamsR.P. Practical Bayesian optimization of machine learning algorithms.Proceedings of the 25th International Conference on Neural Information Processing SystemsLake Tahoe, Nevada2012
    [Google Scholar]
  61. CaoC. LanC. ZhangY. ZengW. LuH. ZhangY. Skeleton-based action recognition with gated convolutional neural networks.IEEE Trans. Circ. Syst. Video Tech.201929113247325710.1109/TCSVT.2018.2879913
    [Google Scholar]
  62. J.Yu Z.Lin J.Yang X.Shen X.Lu T.Huang "Free-Form Image Inpainting With Gated Convolution,"2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South),201944704479
    [Google Scholar]
  63. SunX. GaoY. SutcliffeR. GuoS.X. WangX. FengJ. Word representation learning based on bidirectional grus with drop loss for sentiment classification.IEEE Trans. Syst. Man Cybern. Syst.20215174532454210.1109/TSMC.2019.2940097
    [Google Scholar]
  64. MangalathuS. HwangS.H. JeonJ.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach.Eng. Struct.202021911092710.1016/j.engstruct.2020.110927
    [Google Scholar]
  65. KimY. KimY.J.S.C. Case studies in construction materials.Mat. Sci.202279103677
    [Google Scholar]
  66. HatzigeorgiouA.G.J.I. Artificial neural networks based systems for recognition of genomic signals and regions: A review.Informatica200226389400
    [Google Scholar]
  67. ZhuG. FanY. LiF. GSRNet, an adversarial training-based deep framework with multi-scale CNN and BiGRU for predicting genomic signals and regions.Expert Syst. Appl.202322912043910.1016/j.eswa.2023.120439
    [Google Scholar]
  68. RuizL. GamaF. RibeiroA. Graph neural networks: Architectures, stability, and transferability.Proc. IEEE2021109566068210.1109/JPROC.2021.3055400
    [Google Scholar]
  69. ZhouY. ZhengH. HuangX. HaoS. LiD. ZhaoJ. Graph neural networks: Taxonomy, advances, and trends.ACM Trans. Intell. Syst. Technol.202213115410.1145/3495161
    [Google Scholar]
  70. FanZ. JinX. GencagaD. Degree-aware graph neural network quantization.Entropy20232511151010.3390/e25111510 37998202
    [Google Scholar]
  71. RyuJ.Y. ElalaE. RheeJ.K.K. Quantum graph neural network models for materials search.Materials20231612430010.3390/ma16124300 37374486
    [Google Scholar]
  72. ChangT.H. WuL.C. ChenY.T. Characterization and prediction of mRNA polyadenylation sites in human genes.Med. Biol. Eng. Comput.201149446347210.1007/s11517‑011‑0732‑4 21286831
    [Google Scholar]
  73. DarmonS.K. LutzC.S. Novel upstream and downstream sequence elements contribute to polyadenylation efficiency.RNA Biol.20129101255126510.4161/rna.21957 23085579
    [Google Scholar]
  74. XieB. JankovicB.R. BajicV.B. SongL. GaoX. Poly(A) motif prediction using spectral latent features from human DNA sequences.Bioinformatics20132913i316i32510.1093/bioinformatics/btt218 23813000
    [Google Scholar]
  75. LuY. LiuJ. JiangT. CuiZ. WuH. Drug-target binding affinity prediction based on three-branched multiscale convolutional neural networks.Curr. Bioinform.2023181085386210.2174/1574893618666230816090548
    [Google Scholar]
  76. WangL. YangX. KuangL. ZhangZ. ZengB. ChenZ. Graph convolutional neural network with multi-layer attention mechanism for predicting potential microbe-disease associations.Curr. Bioinform.202318649750810.2174/1574893618666230316113621
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936289520240828050951
Loading
/content/journals/cbio/10.2174/0115748936289520240828050951
Loading

Data & Media loading...

Supplements

Supplementary material is available on the publisher’s website along with the published article.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test