Full text loading...
Single-cell RNA sequencing (scRNA-seq) is crucial for unraveling gene expression complexity. However, existing feature selection methods often overlook the biological significance of co-expressed gene regions, leading to the omission of potential biomarkers.
We propose RF-SCGFS, a co-expressed gene region and gene joint selection method based on random forests. The method identifies co-expressed gene regions within homologous cell populations and builds a random forest model using cell type labels generated by the Scalable and Efficient speCtral clUstERing algorithm (Secuer). Feature importance evaluation is applied to select key co-expressed gene regions and genes.
Experiments on 13 public scRNA-seq datasets demonstrate that RF-SCGFS outperforms traditional methods with average improvements of 0.15 and 0.19 in normalized mutual information (NMI) and adjusted Rand index (ARI), respectively. When combined with mainstream unsupervised algorithms, RF-SCGFS achieves excellent performance (NMI > 0.91 on Yan and Biase datasets). In the PBMC-ctrl dataset, the method successfully identifies genes associated with immune system processes (GO:0006955, p = 2.02E-37).
RF-SCGFS addresses key challenges in single-cell analysis by reducing computational burden through efficient feature selection while maintaining biological relevance through unsupervised clustering-guided selection.
RF-SCGFS provides an interpretable framework for feature selection in single-cell data, successfully identifying relevant disease genes and revealing the potential value of co-expressed gene regions in analyzing cellular heterogeneity.
Article metrics loading...
Full text loading...
References
Data & Media loading...
Supplements