Skip to content
2000
image of Ensemble Regression-Based Identification of Signatures for Cancer Prognosis in RNA Expression Profiles

Abstract

Introduction

Previous studies have extensively reported various feature selection methods for identifying cancer signatures using RNA expression profiles. However, these methods often produce unreliable signatures due to four key factors. First, classifiers other than regression models are always inappropriately applied in prognostic survival analysis. Second, the unknown distribution of samples can lead to the ineffective selection of regression models. Third, high-dimensional expression profiles with small sample sizes typically result in poor predictive performance of the selected regression model. Fourth, variable control is usually overlooked.

Methods

To solve these problems, we have proposed a novel feature selection framework using ensemble regression to identify cancer prognostic signatures. This framework utilizes ensemble regression to overcome the limitations of classification models, as classification models reduce survival time to categorical labels, losing the original continuous information. At the same time, it incorporates up-sampling techniques to increase sample size and uses a bagging strategy to randomly select samples and features, addressing the challenges posed by high-dimensional data and small sample sizes. Additionally, the framework controls for clinical variables to ensure stable feature selection and reliable prediction results.

Results

Experimental results demonstrate the effectiveness of this method in addressing the issues mentioned, providing reliable prognostic signatures. The ensemble regression method significantly improves predictive performance, with robust adaptability to unknown sample distributions.

Discussion

The proposed ensemble regression model outperforms classification and single regressors in prognostic survival analysis by preserving continuous survival information, adapting to sample distribution, and benefiting from controlled variables. Using TCGA-GBM data, six prognostic miRNAs were validated as reliable biomarkers, whereas mRNA-based models showed limited robustness due to high dimensionality and small sample size.

Conclusion

The proposed feature selection framework offers a robust approach to improving the identification of cancer prognostic signatures, enhancing predictive accuracy in prognostic survival analysis.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936374758250702145027
2025-07-15
2025-12-18
Loading full text...

Full text loading...

References

  1. Venkataramani V. Yang Y. Schubert M.C. Glioblastoma hijacks neuronal mechanisms for brain invasion. Cell 2022 185 16 2899 2917.e31 10.1016/j.cell.2022.06.054 35914528
    [Google Scholar]
  2. Fine H.A. Glioblastoma: Not just another cancer. Cancer Discov. 2024 14 4 648 652 10.1158/2159‑8290.CD‑23‑1498 38571415
    [Google Scholar]
  3. Roerink S.F. Sasaki N. Lee-Six H. Intra-tumour diversification in colorectal cancer at the single-cell level. Nature 2018 556 7702 457 462 10.1038/s41586‑018‑0024‑3 29643510
    [Google Scholar]
  4. Tsai P.C. Lee T.H. Kuo K.C. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat. Commun. 2023 14 1 2102 10.1038/s41467‑023‑37179‑4 37055393
    [Google Scholar]
  5. Konstantinopoulos P.A. Matulonis U.A. Clinical and translational advances in ovarian cancer therapy. Nat. Can. 2023 4 9 1239 1257 10.1038/s43018‑023‑00617‑9 37653142
    [Google Scholar]
  6. Ostrom QT Price M Neff C CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2015–2019. Neuro-oncol 2022 24 v1 v95 (Suppl. 5) 10.1093/neuonc/noac202 36196752
    [Google Scholar]
  7. Choi J.H. Lee B.S. Jang J.Y. Single-cell transcriptome profiling of the stepwise progression of head and neck cancer. Nat. Commun. 2023 14 1 1055 10.1038/s41467‑023‑36691‑x 36828832
    [Google Scholar]
  8. Schaff L.R. Mellinghoff I.K. Glioblastoma and other primary brain malignancies in adults: a review. JAMA 2023 329 7 574 587 10.1001/jama.2023.0023 36809318
    [Google Scholar]
  9. Hu R. Zhou X.J. Li W. Computational analysis of high-dimensional DNA methylation data for cancer prognosis. J. Comput. Biol. 2022 29 8 769 781 10.1089/cmb.2022.0002 35671506
    [Google Scholar]
  10. Wang J. Biostatistical Challenges in High-Dimensional Data Analysis: Strategies and Innovations. Computational Molecular Biology 2024 14 163 172 10.5376/cmb.2024.14.0019
    [Google Scholar]
  11. Kim Y Hao J Mallavarapu T Park J Kang M. Hi-LASSO: High- Dimensional LASSO. IEEE Access 2019 7 44562 73 10.1109/ACCESS.2019.2909071
    [Google Scholar]
  12. Zhao Z. Zobolas J. Zucknick M. Aittokallio T. Tutorial on survival modeling with applications to omics data. Bioinformatics 2024 40 3 btae132 10.1093/bioinformatics/btae132 38445722
    [Google Scholar]
  13. Zhang J. Zhao Z. Zhang K. Wei Z. A feature sampling strategy for analysis of high dimensional genomic data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2019 16 2 434 441 10.1109/TCBB.2017.2779492 29990199
    [Google Scholar]
  14. Iuliano A. Occhipinti A. Angelini C. De Feis I. Liò P. Cosmonet: An r package for survival analysis using screening-network methods. Mathematics 2021 9 24 3262 10.3390/math9243262
    [Google Scholar]
  15. Jia W. Sun M. Lian J. Hou S. Feature dimensionality reduction: a review. Complex Intell Syst 2022 8 3 2663 2693 10.1007/s40747‑021‑00637‑x
    [Google Scholar]
  16. Ickwon Choi Kattan M.W. Wells B.J. Changhong Yu. A hybrid approach to survival model building using integration of clinical and molecular information in censored data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2012 9 4 1091 1105 10.1109/TCBB.2012.31 22350208
    [Google Scholar]
  17. Borah K. Das H.S. Seth S. Mallick K. Rahaman Z. Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct. Integr. Genomics 2024 24 5 139 10.1007/s10142‑024‑01415‑x 39158621
    [Google Scholar]
  18. Chen X. Huang L. Computational model for disease research. Brief. Bioinform. 2023 24 1 bbac615 10.1093/bib/bbac615 36642407
    [Google Scholar]
  19. Afshar M. Usefi H. High-dimensional feature selection for genomic datasets. Knowl. Base. Syst. 2020 206 106370 10.1016/j.knosys.2020.106370
    [Google Scholar]
  20. Bolón-Canedo V. Sánchez-Maroño N. Alonso-Betanzos A. Feature selection for high-dimensional data. Progress in Artificial Intelligence 2016 5 2 65 75 10.1007/s13748‑015‑0080‑y
    [Google Scholar]
  21. Tadist K. Najah S. Nikolov N.S. Mrabti F. Zahi A. Feature selection methods and genomic big data: a systematic review. J. Big Data 2019 6 1 79 10.1186/s40537‑019‑0241‑0
    [Google Scholar]
  22. Asir D. Appavu S. Jebamalar E. Asir Antony, S. Appavu Alias Balamurugan, and E. Jebamalar Leavline. “Literature review on feature selection methods for high-dimensional data.”. Int. J. Comput. Appl. 2016 136 1 9 17 10.5120/ijca2016908317
    [Google Scholar]
  23. Hussain I. Qureshi M. Ismail M. Iftikhar H. Zywiołek J. López-Gonzales J.L. Optimal features selection in the high dimensional data based on robust technique: Application to different health database. Heliyon 2024 10 17 e37241 10.1016/j.heliyon.2024.e37241 39296019
    [Google Scholar]
  24. Pes B. Dessì N. Angioni M. Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data. Inf. Fusion 2017 35 132 147 10.1016/j.inffus.2016.10.001
    [Google Scholar]
  25. Wang X. Branciamore S. Gogoshin G. Ding S. Rodin A.S. New analysis framework incorporating mixed mutual information and scalable Bayesian networks for multimodal high dimensional genomic and epigenomic cancer data. Front. Genet. 2020 11 648 10.3389/fgene.2020.00648 32625238
    [Google Scholar]
  26. Kutt B. Burdorf R. Bain T. Identification of Prognostic Biomarker Candidates Associated With Melanoma Using High-Dimensional Genomic Data. Front. Genet. 2021 12 707105 10.3389/fgene.2021.707105 34589115
    [Google Scholar]
  27. Ben Brahim A. Limam M. Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. 2018 12 4 937 952 10.1007/s11634‑017‑0285‑y
    [Google Scholar]
  28. Ning Z. Lin Z. Xiao Q. Multi-constraint latent representation learning for prognosis analysis using multi-modal data. IEEE Trans. Neural Netw. Learn. Syst. 2023 34 7 3737 3750 10.1109/TNNLS.2021.3112194 34596560
    [Google Scholar]
  29. García-Torres M. Ruiz R. Divina F. Evolutionary feature selection on high dimensional data using a search space reduction approach. Eng. Appl. Artif. Intell. 2023 117 105556 10.1016/j.engappai.2022.105556
    [Google Scholar]
  30. Sill M. c060: Extended inference with lasso and elastic-net regularized Cox and generalized linear models. J. Stat. Softw. 2015 62 1 22
    [Google Scholar]
  31. Pan H. Chen S. Xiong H. A high-dimensional feature selection method based on modified Gray Wolf Optimization. Appl. Soft Comput. 2023 135 110031 10.1016/j.asoc.2023.110031
    [Google Scholar]
  32. Liu S. Zhang Y. Shang X. Glassonet: Identifying discriminative gene sets among molecular subtypes of breast cancer. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2023 20 3 1905 1916 10.1109/TCBB.2022.3220623 36346852
    [Google Scholar]
  33. Zhang M. Du J. Nie B. Luo J. Liu M. Yuan Y. Hybrid mRMR and multi-objective particle swarm feature selection methods and application to metabolomics of traditional Chinese medicine. PeerJ Comput. Sci. 2024 10 e2073 10.7717/peerj‑cs.2073 38855250
    [Google Scholar]
  34. Wigmore T.J. Mohammed K. Jhanji S. Long-term survival for patients undergoing volatile versus IV anesthesia for cancer surgery: a retrospective analysis. Anesthesiology 2016 124 1 69 79 10.1097/ALN.0000000000000936 26556730
    [Google Scholar]
  35. Schalper K.A. Brown J. Carvajal-Hausdorf D. Objective measurement and clinical significance of TILs in non-small cell lung cancer. J. Natl. Cancer Inst. 2015 107 3 dju435 10.1093/jnci/dju435 25650315
    [Google Scholar]
  36. Nie K. Shi L. Chen Q. Rectal cancer: assessment of neoadjuvant chemoradiation outcome based on radiomics of multiparametric MRI. Clin. Cancer Res. 2016 22 21 5256 5264 10.1158/1078‑0432.CCR‑15‑2997 27185368
    [Google Scholar]
  37. Bernard V. Kim D.U. San Lucas F.A. Circulating nucleic acids are associated with outcomes of patients with pancreatic cancer. Gastroenterology 2019 156 1 108 118.e4 10.1053/j.gastro.2018.09.022 30240661
    [Google Scholar]
  38. Huang Y. Liu Z. He L. Radiomics signature: a potential biomarker for the prediction of disease-free survival in early-stage (I or II) non—small cell lung cancer. Radiology 2016 281 3 947 957 10.1148/radiol.2016152234 27347764
    [Google Scholar]
  39. Sun B. Zhao X. Ming J. Liu X. Liu D. Jiang C. Stepwise detection and evaluation reveal miR-10b and miR-222 as a remarkable prognostic pair for glioblastoma. Oncogene 2019 38 33 6142 6157 10.1038/s41388‑019‑0867‑6 31289362
    [Google Scholar]
  40. White B.E. Rous B. Chandrakumaran K. Incidence and survival of neuroendocrine neoplasia in England 1995–2018: A retrospective, population-based study. Lancet Reg. Health Eur. 2022 23 100510 10.1016/j.lanepe.2022.100510 36176500
    [Google Scholar]
  41. Sinha P. Delucchi K.L. McAuley D.F. O’Kane C.M. Matthay M.A. Calfee C.S. Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials. Lancet Respir. Med. 2020 8 3 247 257 10.1016/S2213‑2600(19)30369‑8 31948926
    [Google Scholar]
  42. Awodutire P.O. Kattan M.W. Ilori O.S. Ilori O.R. An accelerated failure time model to predict cause-specific survival and prognostic factors of lung and bronchus cancer patients with at least bone or brain metastases: development and internal validation using a SEER-based study. Cancers (Basel) 2024 16 3 668 10.3390/cancers16030668 38339420
    [Google Scholar]
  43. Wei W. Li Y. Huang T. Using machine learning methods to study colorectal cancer tumor micro-environment and its biomarkers. Int. J. Mol. Sci. 2023 24 13 11133 10.3390/ijms241311133 37446311
    [Google Scholar]
  44. Naemi A. Schmidt T. Mansourvar M. Ebrahimi A. Wiil U.K. Quantifying the impact of addressing data challenges in prediction of length of stay. BMC Med. Inform. Decis. Mak. 2021 21 1 298 10.1186/s12911‑021‑01660‑1 34749708
    [Google Scholar]
  45. Liu Y. Zhao X. Bian J. Wang G. Feature selection combined with top-down and bottom-up strategies for survival analysis: A case of prognostic prediction in glioblastoma. Comput. Biol. Med. 2023 153 106486 10.1016/j.compbiomed.2022.106486 36603438
    [Google Scholar]
  46. Liu T Li H Zhao X Clustering by search in descending order and automatic find of density peaks. IEEE Access 2019 7 133772 80 10.1109/ACCESS.2019.2939437
    [Google Scholar]
  47. Reddy K.B. MicroRNA (miRNA) in cancer. Cancer Cell Int. 2015 15 1 38 10.1186/s12935‑015‑0185‑1 25960691
    [Google Scholar]
  48. Chen Y. Wang X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020 48 D1 D127 D131 10.1093/nar/gkz757 31504780
    [Google Scholar]
  49. Zhao X. Liu T. Wang G. Ensemble classification based signature discovery for cancer diagnosis in RNA expression profiles across different platforms. Brief. Bioinform. 2022 23 5 bbac185 10.1093/bib/bbac185 35605226
    [Google Scholar]
  50. Bassot A. Dragic H. Haddad S.A. Identification of a miRNA multi-targeting therapeutic strategy in glioblastoma. Cell Death Dis. 2023 14 9 630 10.1038/s41419‑023‑06117‑z 37749143
    [Google Scholar]
  51. Yerukala Sathipati S Huang HL Ho SY Estimating survival time of patients with glioblastoma multiforme and characterization of the identified microRNA signatures. BMC Genomics 2016 17 (S13) 1022 (Suppl. 13) 10.1186/s12864‑016‑3321‑y 28155650
    [Google Scholar]
  52. Delfino K.R. Serão N.V. Southey B.R. Rodriguez-Zas S.L. Therapy-, gender- and race-specific microRNA markers, target genes and networks related to glioblastoma recurrence and survival. Cancer Genomics Proteomics 2011 8 4 173 183 [PMID: 21737610
    [Google Scholar]
  53. Ordóñez-Rubiano E.G. Rincón-Arias N. Espinosa S. The potential of miRNA-based approaches in glioblastoma: An update in current advances and future perspectives. Curr Res Pharmacol Drug Discov 2024 7 100193 10.1016/j.crphar.2024.100193 39055532
    [Google Scholar]
  54. Li Y. Li W. Zeng X. The role of microRNA-148a and downstream DLGAP1 on the molecular regulation and tumor progression on human glioblastoma. Oncogene 2019 38 47 7234 7248 10.1038/s41388‑019‑0922‑3 31477833
    [Google Scholar]
  55. Cai Q. Zhu A. Gong L. Exosomes of glioma cells deliver miR-148a to promote proliferation and metastasis of glioblastoma via targeting CADM1. Bull. Cancer 2018 105 7-8 643 651 10.1016/j.bulcan.2018.05.003 29921422
    [Google Scholar]
  56. Lopez-Bertoni H. Lal B. Li A. DNMT-dependent suppression of microRNA regulates the induction of GBM tumor-propagating phenotype by Oct4 and Sox2. Oncogene 2015 34 30 3994 4004 10.1038/onc.2014.334 25328136
    [Google Scholar]
  57. Lin N. Li W. Wang X. Upregulation of miR-340 inhibits tumor growth and mesenchymal transition via targeting c-MET in glioblastoma. Cancer Manag. Res. 2020 12 3343 3352 10.2147/CMAR.S250772 32494198
    [Google Scholar]
  58. Cosset E. Petty T. Dutoit V. Human tissue engineering allows the identification of active miRNA regulators of glioblastoma aggressiveness. Biomaterials 2016 107 74 87 10.1016/j.biomaterials.2016.08.009 27614160
    [Google Scholar]
  59. Li X. Gong X. Chen J. Zhang J. Sun J. Guo M. miR-340 inhibits glioblastoma cell proliferation by suppressing CDK6, cyclin-D1 and cyclin-D2. Biochem. Biophys. Res. Commun. 2015 460 3 670 677 10.1016/j.bbrc.2015.03.088 25817794
    [Google Scholar]
  60. Ying Z. Li Y. Wu J. Loss of miR-204 expression enhances glioma migration and stem cell-like phenotype. Cancer Res. 2013 73 2 990 999 10.1158/0008‑5472.CAN‑12‑2895 23204229
    [Google Scholar]
  61. Xin J. Zheng L.M. Sun D.K. Li X.F. Xu P. Tian L.Q. miR 204 functions as a tumor suppressor gene, at least partly by suppressing CYP27A1 in glioblastoma. Oncol. Lett. 2018 16 2 1439 1448 10.3892/ol.2018.8846 30008822
    [Google Scholar]
  62. Zhou L. Ma J. MIR99AHG/miR-204-5p/TXNIP/Nrf2/ARE signaling pathway decreases glioblastoma temozolomide sensitivity. Neurotox. Res. 2022 40 5 1152 1162 10.1007/s12640‑022‑00536‑0 35904670
    [Google Scholar]
  63. Shatsberg Z. Zhang X. Ofek P. Functionalized nanogels carrying an anticancer microRNA for glioblastoma therapy. J. Control. Release 2016 239 159 168 10.1016/j.jconrel.2016.08.029 27569663
    [Google Scholar]
  64. Fekrirad Z. Gharedaghi M. Saadatpour F. Combination of microRNA and suicide gene for targeting Glioblastoma: Inducing apoptosis and significantly suppressing tumor growth in vivo. Heliyon 2024 10 17 e37041 10.1016/j.heliyon.2024.e37041 39286083
    [Google Scholar]
  65. Abdoli Shadbad M. Baghbanzadeh A. Baradaran B. hsa-miR-34a-5p enhances temozolomide anti-tumoral effects on glioblastoma: in-silico and in-vitro study. EXCLI J. 2024 23 384 400 [PMID: 38655096
    [Google Scholar]
  66. de Menezes M.R. Acioli M.E.A. da Trindade A.C.L. Potential role of microRNAs as biomarkers in human glioblastoma: a mini systematic review from 2015 to 2020. Mol. Biol. Rep. 2021 48 5 4647 4658 10.1007/s11033‑021‑06423‑9 34032976
    [Google Scholar]
  67. Hu Y Wang J Chen Z. BIOM-33. Low expression of DNLT3 predicts better prognosis for female glioblastoma patients Neurooncol 2020 22 ii8 (Suppl. 2) 10.1093/neuonc/noaa215.032
    [Google Scholar]
  68. Nakahara Y. Shiraishi T. Okamoto H. Detrended fluctuation analysis of genome-wide copy number profiles of glioblastomas using array-based comparative genomic hybridization. Neuro-oncol. 2004 6 4 281 289 10.1215/S1152851703000632 15494095
    [Google Scholar]
  69. Laks D.R. Crisman T.J. Shih M.Y.S. Large-scale assessment of the gliomasphere model system. Neuro-oncol. 2016 18 10 1367 1378 10.1093/neuonc/now045 27116978
    [Google Scholar]
  70. Sun Y.F. Zhang L.C. Niu R.Z. Predictive potentials of glycosylation-related genes in glioma prognosis and their correlation with immune infiltration. Sci. Rep. 2024 14 1 4478 10.1038/s41598‑024‑51973‑0 38396140
    [Google Scholar]
  71. Xing J. Wang Z. Xu H. Pak2 inhibition promotes resveratrol‐mediated glioblastoma A172 cell apoptosis via modulating the AMPK‐YAP signaling pathway. J. Cell. Physiol. 2020 235 10 6563 6573 10.1002/jcp.29515 32017068
    [Google Scholar]
  72. Yang Y.C. Jin X.Y. Yang L.L. GNE‐317 Reverses MSN‐Mediated Proneural‐to‐Mesenchymal transition and suppresses chemoradiotherapy resistance in glioblastoma via PI3K/mTOR. Adv. Sci. (Weinh.) 2025 12 12 2412517 10.1002/advs.202412517 39921260
    [Google Scholar]
  73. Li W. Yang L. Xiong Y. Li Z. Li X. Wen Y. Correction: 4,5-Dimethoxycanthin-6-one inhibits glioblastoma stem cell and tumor growth by inhibiting TSPAN1 interaction with TM4SF1. Neurochem. Res. 2025 50 1 63 10.1007/s11064‑024‑04313‑7 39715907
    [Google Scholar]
  74. Wang J.Y. Dai X.T. Gao Q.L. Tyrosine metabolic reprogramming coordinated with the tricarboxylic acid cycle to drive glioma immune evasion by regulating PD‐L1 expression. Ibrain 2023 9 2 133 147 10.1002/ibra.12107 37786553
    [Google Scholar]
  75. Zheng T. Chen K. Zhang X. Knockdown of TXNDC9 induces apoptosis and autophagy in glioma and mediates cell differentiation by p53 activation. Aging 2020 12 18 18649 18659 10.18632/aging.103915 32897242
    [Google Scholar]
  76. Chakraborty S. Ghosh Z. A systemic insight into astrocytoma biology across different grades. J. Cell. Physiol. 2019 234 4 4243 4255 10.1002/jcp.27193 30146735
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936374758250702145027
Loading
/content/journals/cbio/10.2174/0115748936374758250702145027
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test