Skip to content
2000
Volume 20, Issue 8
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

Single-cell RNA sequencing (scRNA-seq) technology has opened new horizons in studying cellular diversity, helping researchers distinguish the gene expression patterns of each cell, identify rare cell types, and explore the dynamics of gene expression in specific cells under different environments. Clustering plays a central role in revealing unknown cell types and downstream analysis of scRNA-seq. However, the high dimensionality, high noise, and common data missing issues in scRNA-seq data significantly limit the performance of clustering. Traditional embedding algorithms often ignore the characteristics of the underlying distribution when dealing with scRNA-seq data.

Aims

In this study, we aim to achieve clustering analysis of single-cell RNA sequencing (scRNA-seq) data by developing and applying a variational graph attention autoencoder model based on the zero-inflated negative binomial (ZINB) distribution.

Methods

Therefore, we propose a scRNA-seq data clustering analysis method, scZIGVAE, which integrates the zero-inflated negative binomial (ZINB) model and variational graph attention autoencoder. It enhances the learning of complex topological structures between cells while modeling missing events. By jointly optimizing the ZINB loss and cell graph reconstruction loss to estimate missing data, scZIGVAE generates cell representations that are more suitable for clustering. Furthermore, through the method of self-optimizing embedded clustering, the clustering centers are iteratively updated to fine-tune the clustering effect of the model further.

Results

Extensive testing on twelve datasets from different single-cell RNA sequencing platforms has demonstrated that the scZIGVAE method outperforms current sota clustering techniques.

Conclusion

In summary, our research findings demonstrate that by incorporating the Zero-Inflated Negative Binomial (ZINB) distribution strategy into the Variational Graph Autoencoder (VGAE) architecture, we are able to achieve better estimation of missing values during decoding. Furthermore, the utilization of multiple loss constraints on the generated latent representations renders them more conducive to downstream analyses.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936348851241230113213
2025-02-11
2025-12-28
Loading full text...

Full text loading...

References

  1. WeiX. LiZ. JiH. WuH. EDClust: An EM–MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing.Bioinformatics202238102692269910.1093/bioinformatics/btac168 35561178
    [Google Scholar]
  2. WenL. TangF. Recent advances in single-cell sequencing technologies.Precis. Clin. Med.202251pbac00210.1093/pcmedi/pbac002 35821681
    [Google Scholar]
  3. DuhanL. KumariD. NaimeM. Single-cell transcriptomics: Background, technologies, applications, and challenges.Mol. Biol. Rep.202451160010.1007/s11033‑024‑09553‑y 38689046
    [Google Scholar]
  4. AlJanahiA.A. DanielsenM. DunbarC.E. An introduction to the analysis of single-cell RNA-sequencing data.Mol. Ther. Methods Clin. Dev.20181018919610.1016/j.omtm.2018.07.003 30094294
    [Google Scholar]
  5. WuX. YangX. DaiY. Single-cell sequencing to multi-omics: Technologies and applications.Biomark. Res.202412111010.1186/s40364‑024‑00643‑4 39334490
    [Google Scholar]
  6. WangL. ZhangQ. QinQ. Current progress and potential opportunities to infer single-cell developmental trajectory and cell fate.Curr. Opin. Syst. Biol.20212611110.1016/j.coisb.2021.03.006 33997529
    [Google Scholar]
  7. ShaY. QiuY. ZhouP. NieQ. Reconstructing growth and dynamic trajectories from single-cell transcriptomics data.Nat. Mach. Intell.202361253910.1038/s42256‑023‑00763‑w 38274364
    [Google Scholar]
  8. BoulandG.A. MahfouzA. ReindersM.J.T. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets.Genome Biol.20232418610.1186/s13059‑023‑02933‑w 37085823
    [Google Scholar]
  9. ErfanianN. HeydariA.A. FerizA.M. Deep learning applications in single-cell genomics and transcriptomics data analysis.Biomed. Pharmacother.202316511507710.1016/j.biopha.2023.115077 37393865
    [Google Scholar]
  10. JovicD. LiangX. ZengH. LinL. XuF. LuoY. Single‐cell RNA sequencing technologies and applications: A brief overview.Clin. Transl. Med.2022123e69410.1002/ctm2.694 35352511
    [Google Scholar]
  11. TsoucasD. DongR. ChenH. ZhuQ. GuoG. YuanG.C. Accurate estimation of cell-type composition from gene expression data.Nat. Commun.2019101297510.1038/s41467‑019‑10802‑z
    [Google Scholar]
  12. AdilA. KumarV. JanA.T. AsgerM. Single-cell transcriptomics: Current methods and challenges in data acquisition and analysis.Front. Neurosci.20211559112210.3389/fnins.2021.591122 33967674
    [Google Scholar]
  13. OstrovskyR. RabaniY. SchulmanL.J. SwamyC. The effectiveness of lloyd-type methods for the k-means problem.J. Assoc. Comput. Mach.201259612210.1145/2395116.2395117
    [Google Scholar]
  14. KiselevV.Y. KirschnerK. SchaubM.T. SC3: Consensus clustering of single-cell RNA-seq data.Nat. Methods201714548348610.1038/nmeth.4236 28346451
    [Google Scholar]
  15. WillieE. YangP. PatrickE. The impact of similarity metrics on cell-type clustering in highly multiplexed in situ imaging cytometry data.. Bioinformatics Advances202331vbad14110.1093/bioadv/vbad141 37928340
    [Google Scholar]
  16. Alquicira-HernandezJ. SatheA. JiH.P. NguyenQ. PowellJ.E. scPred: Accurate supervised method for cell-type classification from single-cell RNA-seq data.Genome Biol.201920126410.1186/s13059‑019‑1862‑5 31829268
    [Google Scholar]
  17. ArmingolE. GhaddarA. JoshiC.J. Inferring a spatial code of cell-cell interactions across a whole animal body.PLOS Comput. Biol.20221811e101071510.1371/journal.pcbi.1010715 36395331
    [Google Scholar]
  18. BechtE. McInnesL. HealyJ. Dimensionality reduction for visualizing single-cell data using UMAP.Nat. Biotechnol.201810.1038/nbt.4314 30531897
    [Google Scholar]
  19. WangB. RamazzottiD. De SanoL. ZhuJ. PiersonE. BatzoglouS. SIMLR: A tool for large-scale genomic analyses by multi-kernel learning.Proteomics201818210.1002/pmic.201700232
    [Google Scholar]
  20. WaniA.A. Comprehensive analysis of clustering algorithms: Exploring limitations and innovative solutions.PeerJ Comput. Sci.202410e228610.7717/peerj‑cs.2286 39314716
    [Google Scholar]
  21. LinP. TroupM. HoJ.W.K. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.Genome Biol.20171815910.1186/s13059‑017‑1188‑0 28351406
    [Google Scholar]
  22. BianJ. ZhaoD. NieF. WangR. LiX. Robust and sparse principal component analysis with adaptive loss minimization for feature selection.IEEE Trans. Neural Netw. Learn. Syst.20243533601361410.1109/TNNLS.2022.3194896 36040938
    [Google Scholar]
  23. DongR. YuanG.C. GiniClust3: A fast and memory-efficient tool for rare cell type identification.BMC Bioinformatics202021115810.1186/s12859‑020‑3482‑1 32334526
    [Google Scholar]
  24. AnuarS.H.H. AbasZ.A. YunosN.M. Comparison between Louvain and Leiden algorithm for network structure: A review.J. Phys. Conf. Ser.20212129012028
    [Google Scholar]
  25. HahslerM. PiekenbrockM. DoranD. dbscan: Fast density-based clu-stering with R.J. Stat. Softw.201991113010.18637/jss.v091.i01
    [Google Scholar]
  26. PetegrossoR. LiZ. KuangR. Machine learning and statistical methods for clustering single-cell RNA-sequencing data.Brief. Bioinform.20202141209122310.1093/bib/bbz063 31243426
    [Google Scholar]
  27. FritzD. InamoJ. ZhangF. Single-cell computational machine learning approaches to immune-mediated inflammatory disease: New tools uncover novel fibroblast and macrophage interactions driving pathogenesis.Front. Immunol.202313107670010.3389/fimmu.2022.1076700 36685542
    [Google Scholar]
  28. LiuJ. FanZ. ZhaoW. ZhouX. Machine intelligence in single-cell data analysis: Advances and new challenges.Front. Genet.20211265553610.3389/fgene.2021.655536 34135939
    [Google Scholar]
  29. Al-ShourbajiI. KachareP.H. AbualigahL. A deep batch normalized convolution approach for improving covid-19 detection from chest x-ray images.Pathogens20221211710.3390/pathogens12010017 36678365
    [Google Scholar]
  30. PuriDV KacharePH SangleSB LEADNet: Detection of Alzheimer’s disease using spatiotemporal EEG analysis and lowcomplexity CNN. IEEE Access 2024.
    [Google Scholar]
  31. XieJ. GirshickR. FarhadiA. Unsupervised deep embedding for clustering analysis.Proceedings of Machine Learning Research. June 2016,478487
    [Google Scholar]
  32. TianT. WanJ. SongQ. WeiZ. Clustering single-cell RNA-seq data with a model-based deep learning approach.Nat. Mach. Intell.20191419119810.1038/s42256‑019‑0037‑0
    [Google Scholar]
  33. ChengY. MaX. scGAC: A graph attentional architecture for clustering single-cell RNA-seq data.Bioinformatics20223882187219310.1093/bioinformatics/btac099 35176138
    [Google Scholar]
  34. WangJ. MaA. ChangY. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses.Nat. Commun.2022131255410.1038/s41467‑021‑22197‑x
    [Google Scholar]
  35. ZhangX.M. LiangL. LiuL. TangM.J. Graph neural networks and their current applications in bioinformatics.Front. Genet.20211269004910.3389/fgene.2021.690049 34394185
    [Google Scholar]
  36. HuangY. YuG. YangY. MIGGRI: A multi-instance graph neural network model for inferring gene regulatory networks for Drosophila from spatial expression images.PLOS Comput. Biol.20231911e101162310.1371/journal.pcbi.1011623 37939200
    [Google Scholar]
  37. XuH. XiaW. GaoQ. HanJ. GaoX. Graph embedding clustering: Graph attention auto-encoder with cluster-specificity distribution.Neural Netw.202114222123010.1016/j.neunet.2021.05.008 34029998
    [Google Scholar]
  38. FengX. XiuY.H. LongH.X. WangZ.T. BilalA. YangL.M. Advancing single-cell RNA-seq data analysis through the fusion of multi-layer perceptron and graph neural network.Brief. Bioinform.2023251bbad48110.1093/bib/bbad481 38171931
    [Google Scholar]
  39. LiuT. JiaC. BiY. GuoX. ZouQ. LiF. scDFN: Enhancing single-cell RNA-seq clustering with deep fusion networks.Brief. Bioinform.2024256bbae48610.1093/bib/bbae486 39373051
    [Google Scholar]
  40. TranB. TranD. NguyenH. RoS. NguyenT. scCAN: Single-cell clustering using autoencoder and network fusion.Sci. Rep.20221211026710.1038/s41598‑022‑14218‑6 35715568
    [Google Scholar]
  41. WanH. ChenL. DengM. scNAME: Neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.Bioinformatics20223861575158310.1093/bioinformatics/btac011 34999761
    [Google Scholar]
  42. DuL HanR LiuB WangY LiJ. ScCCL: Single-cell data clustering based on self-supervised contrastive learning. IEEE/ACM Trans Comput Biol Bioinform202320322334110.1109/TCBB.2023.3241129
    [Google Scholar]
  43. LiR. YuanX. RadfarM. Graph signal processing, graph neural network and graph learning on biological data: A systematic review.IEEE Rev. Biomed. Eng.20231610913510.1109/RBME.2021.3122522 34699368
    [Google Scholar]
  44. LopezR. RegierJ. ColeM.B. JordanM.I. YosefN. Deep generative modeling for single-cell transcriptomics.Nat. Methods201815121053105810.1038/s41592‑018‑0229‑2 30504886
    [Google Scholar]
  45. GrønbechC.H. VordingM.F. TimshelP.N. SønderbyC.K. PersT.H. WintherO. scVAE: Variational auto-encoders for single-cell gene expression data.Bioinformatics202036164415442210.1093/bioinformatics/btaa293 32415966
    [Google Scholar]
  46. WangZ. WangH. ZhaoJ. ZhengC. scSemiAAE: A semi-supervised clustering model for single-cell RNA-seq data.BMC Bioinformatics202324121710.1186/s12859‑023‑05339‑4 37237310
    [Google Scholar]
  47. JiangJ. XuJ. LiuY. Dimensionality reduction and visualization of single-cell RNA-seq data with an improved deep variational autoencoder.Brief. Bioinform.2023243bbad15210.1093/bib/bbad152 37088976
    [Google Scholar]
  48. PollenA.A. NowakowskiT.J. ShugaJ. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.Nat. Biotechnol.201432101053105810.1038/nbt.2967 25086649
    [Google Scholar]
  49. YanL. YangM. GuoH. Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells.Nat. Struct. Mol. Biol.20132091131113910.1038/nsmb.2660 23934149
    [Google Scholar]
  50. DarmanisS. SloanS.A. ZhangY. A survey of human brain transcriptome diversity at the single cell level.Proc. Natl. Acad. Sci. USA2015112237285729010.1073/pnas.1507125112 26060301
    [Google Scholar]
  51. MuraroM.J. DharmadhikariG. GrünD. A single-cell transcriptome atlas of the human pancreas.Cell Syst.201634385394.e310.1016/j.cels.2016.09.002 27693023
    [Google Scholar]
  52. SegerstolpeÅ. PalasantzaA. EliassonP. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes.Cell Metab.201624459360710.1016/j.cmet.2016.08.020 27667667
    [Google Scholar]
  53. KofanovaO. BelloraC. QuesadaR.A. IL8 and EDEM3 gene expression ratio indicates peripheral blood mononuclear cell (PBMC) quality.J. Immunol. Methods202047811273310.1016/j.jim.2018.11.012
    [Google Scholar]
  54. DengQ. RamsköldD. ReiniusB. SandbergR. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells.Science2014343616719319610.1126/science.1245316 24408435
    [Google Scholar]
  55. LiB. XiaoQ. ShanL. SongY. NCAPH promotes cell proliferation and inhibits cell apoptosis of bladder cancer cells through MEK/ERK signaling pathway.Cell Cycle202221442743810.1080/15384101.2021.2021050 34974790
    [Google Scholar]
  56. KolodziejczykA.A. KimJ.K. TsangJ.C.H. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation.Cell Stem Cell201517447148510.1016/j.stem.2015.09.011 26431182
    [Google Scholar]
  57. BaronM. VeresA. WolockS.L. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure.Cell Syst.201634346360.e410.1016/j.cels.2016.08.011 27667365
    [Google Scholar]
  58. RomanovR.A. ZeiselA. BakkerJ. Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes.Nat. Neurosci.201720217618810.1038/nn.4462 27991900
    [Google Scholar]
  59. WolfF.A. AngererP. TheisF.J. SCANPY: Large-scale single-cell gene expression data analysis.Genome Biol.20181911510.1186/s13059‑017‑1382‑0 29409532
    [Google Scholar]
  60. BooeshaghiA.S. PachterL. Normalization of single-cell RNA-seq counts by log(x + 1) or log(1 + x).Bioinformatics202137152223222410.1093/bioinformatics/btab085 33676365
    [Google Scholar]
  61. TangH. ZengT. ChenL. High-order correlation integration for single-cell or bulk RNA-seq data analysis.Front. Genet.20191037110.3389/fgene.2019.00371 31080457
    [Google Scholar]
  62. MengX. ZouT. Clinical applications of graph neural networks in computational histopathology: A review.Comput. Biol. Med.202316410720110.1016/j.compbiomed.2023.107201 37517325
    [Google Scholar]
  63. CaoS. LiJ. NelsonK.P. KonM.A. Coupled VAE: Improved accuracy and robustness of a variational autoencoder.Entropy 202224342310.3390/e24030423 35327933
    [Google Scholar]
  64. BielzaC. LarrañagaP. Bayesian networks in neuroscience: A survey.Front. Comput. Neurosci.2014813110.3389/fncom.2014.00131
    [Google Scholar]
  65. JuW. FangZ. GuY. A comprehensive survey on deep graph representation learning.Neural Netw.202417310620710.1016/j.neunet.2024.106207 38442651
    [Google Scholar]
  66. EckleK. Schmidt-HieberJ. A comparison of deep networks with ReLU activation function and linear spline-type methods.Neural Netw.201911023224210.1016/j.neunet.2018.11.005 30616095
    [Google Scholar]
  67. CuiT. WangT. A comprehensive assessment of hurdle and zero-inflated models for single cell RNA-sequencing analysis.Brief. Bioinform.2023245bbad27210.1093/bib/bbad272 37507115
    [Google Scholar]
  68. TimmermanM.E. CeulemansE. De RooverK. Van LeeuwenK. Subspace K-means clustering.Behav. Res. Methods20134541011102310.3758/s13428‑013‑0329‑y 23526258
    [Google Scholar]
  69. PengL. TianX. TianG. Single-cell RNA-seq clustering: Datasets, models, and algorithms.RNA Biol.202017676578310.1080/15476286.2020.1728961 32116127
    [Google Scholar]
  70. ZengY. LinJ. ZhouX. LuY. YangY. Graph convolutional network-based method for clustering single-cell RNA-seq Data.bioRxiv202010.1101/2020.09.02.278804
    [Google Scholar]
  71. ChenL. WangW. ZhaiY. DengM. Deep soft K-means clustering with self-training for single-cell RNA sequence data.NAR Genom. Bioinform.202022lqaa03910.1093/nargab/lqaa039 33575592
    [Google Scholar]
  72. LiS. GuoH. ZhangS. LiY. LiM. Attention-based deep clustering method for scRNA-seq cell type identification.PLOS Comput. Biol.20231911e101164110.1371/journal.pcbi.1011641 37948464
    [Google Scholar]
  73. ZhangS. TongH. XuJ. MaciejewskiR. Graph convolutional networks: A comprehensive review.Comput. Soc. Netw.2019611110.1186/s40649‑019‑0069‑y 37915858
    [Google Scholar]
  74. Van den BergeK. Roux de BézieuxH. StreetK. Trajectory-based differential expression analysis for single-cell sequencing data.Nat. Commun.2020111120110.1038/s41467‑020‑14766‑3 32139671
    [Google Scholar]
  75. TritschlerS. BüttnerM. FischerD.S. Concepts and limitations for learning developmental trajectories from single cell genomics.Development201914612dev17050610.1242/dev.170506 31249007
    [Google Scholar]
  76. CannoodtR. SaelensW. SaeysY. Computational methods for trajectory inference from single‐cell transcriptomics.Eur. J. Immunol.201646112496250610.1002/eji.201646347 27682842
    [Google Scholar]
  77. WolfF.A. HameyF.K. PlassM. PAGA: Graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells.Genome Biol.20192015910.1186/s13059‑019‑1663‑x 30890159
    [Google Scholar]
  78. LinT. ChenT. LiuJ. TuX.M. Extending the mann‐whitney‐wilcoxon rank sum test to survey data for comparing mean ranks.Stat. Med.20214071705171710.1002/sim.8865 33398899
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936348851241230113213
Loading
/content/journals/cbio/10.2174/0115748936348851241230113213
Loading

Data & Media loading...

Supplements

Supplementary material is available on the publisher’s website along with the published article.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test