Full text loading...
The rapid development of single-cell RNA sequencing (scRNA-seq) technology has provided unprecedented opportunities to explore cell heterogeneity and function. However, the high dimensionality, sparsity, and noise inherent in scRNA-seq data present significant challenges for traditional clustering methods. This review aims to summarize machine learning-based clustering techniques for scRNA-seq data, including Traditional Methods, Graph-based Methods, Ensemble Methods, Deep Learning Methods, and Other Methods, with a focus on discussing the advantages, limitations, and challenges of these approaches. We first discuss key preprocessing steps such as normalization, dropout imputation, and dimensionality reduction, which are essential for addressing data sparsity and improving clustering performance. Furthermore, the review introduces commonly used clustering performance evaluation metrics, including Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), silhouette score, and marker gene validation. We also compare six distinct clustering methods across six datasets, evaluating the consistency in clustering accuracy with the selected methods. Our findings indicate that deep learning-based methods generally outperform other clustering methods in capturing complex relationships within the data, especially in high-dimensional and noisy datasets. However, challenges remain in areas such as computational efficiency, scalability for large-scale datasets, and handling batch effects. In this review, we systematically summarize the advantages and challenges of machine learning-based clustering algorithms. This work provides valuable insights and ideas for the development of new tools in the scRNA-seq clustering field and also helps address the numerous challenges faced in the downstream analysis of single-cell sequencing data.
Article metrics loading...
Full text loading...
References
Data & Media loading...