Current Bioinformatics - Online First
Description text for Online First listing goes here...
32 results
-
-
Genome-wide Analysis of Ovarian Cancer-specific circRNAs in Alternative Splicing Regulation
Authors: Minhui Zhuang, Meng Zhang, Yulan Wang, Lingxiao Zou, Shan He, Jingjing Liu, Jian Zhao, Ping Han, Xiaofeng Song and Jing WuAvailable online: 26 May 2025More LessIntroductionOvarian cancer (OC) is a fatal female reproductive system cancer with a high mortality rate and is hard to detect at an early stage. Recent studies have indicated that alternative splicing plays an important role in OC progression by activating genes and pathways involved in tumorigenesis. Circular RNAs (circRNAs) have also been found to play a regulating role in tumor progression and present their potential ability in alternative splicing regulation. However, the underlying mechanism by which circRNAs regulate alternative splicing events (ASEs) in OC remains unclear.
MethodsIn this study, we performed a comprehensive transcriptomic study on the RNA-seq data of our collected tumor and normal samples from OC patients, aiming to investigate the regulatory roles of OC-specific circRNAs in aberrant splicing events and their underlying pathways in tumorigenesis.
ResultsWe conducted a genome-wide regulatory network with strong correlations from 300 differentially expressed (DE) circRNAs and 1,150 aberrant ASEs, mediated by 31 DE SFs. Analyses of this network revealed that dysregulation of circRNAs may lead to aberrant ASEs that are closely involved in ovarian tumorigenesis. In addition, two crucial circRNAs, circ_AKT3 (hsa_circ_0000199) and circ_GSK3B (hsa_circ_0008797), were identified due to their significant roles in the network and associations with multiple tumor-related functional pathways.
DiscussionThese findings suggest that OC-specific circRNAs may participate in tumor progression by indirectly regulating groups of ASEs through multiple SFs, rather than through direct interaction. Subnetwork analyses centered on the two hub circRNAs revealed that their associated ASEs are functionally clustered and involved in coordinated biological processes relevant to tumor biology.
ConclusionThis study provides novel insights into the regulatory pathways by which circRNAs are involved in OC progression, offering clues for discovering diagnostic biomarkers and therapeutic targets.
-
-
-
A Survey of Trends in Biomolecule Recognition for Sensing and Machine Learning Combined with Heterogeneous Information
Authors: Huiyu Ren, Cong Shen, Lingzhu Hu, Jijun Tang, Zhijun Liao and Wenyan TianAvailable online: 14 May 2025More LessBiomolecule sensing for recognition is exhibited as the fundamental upstream step concerning target identification during the metabolism of individual life. Nevertheless, it is always a complicated work that leverages both in vitro and in vivo experiments to discriminate the corresponding interaction, affinity, structure, activity, and toxicity concerning target biomolecules. Simultaneously, biological investigation with intelligent computing has extended to bio-sequence analysis and biomedical image processing, especially biomolecule identification in multi-view and multi-modal. This review presents a panorama of contemporary development among biomolecular omics and computing biological sensing, machine learning scenarios, and heterogeneous information with multi-view, multi-modal, structured, and unstructured text and biomedical images. After being given the background, the concept and database of biomolecule interaction, affinity, and structure are introduced. Then, the machine learning paradigms in bioinformatics and biomedical engineering are demonstrated according to epigenetics-centered or pharmacogenomics. Next, the multi-view or multi-modal learning algorithms and optimization strategies with structured and unstructured data formats, including texts and biomedical images are listed in detail. By comparing and analyzing the state-of-the-art works, this study has summarized the advantages of existing methods in target biomolecule identification and the challenges. Finally, future developments are prospected, including the trend of research in robustness, data augmentation, generalized model delineated, and acceleration.
-
-
-
Analysis of Alternative Splicing Heterogeneity during Early Stages of Mouse Embryonic Development
Authors: Hongxia Chi, Yu Zhang, Anqi Li, Pengwei Hu, Wuritu Yang and Yongqiang XingAvailable online: 14 May 2025More LessIntroductionPre-mRNA alternative splicing (AS) is a prevalent phenomenon in mammals, playing a crucial role in various biological processes such as embryonic development, tissue differentiation, and disease pathogenesis. Despite the advancements in single-cell RNA sequencing (scRNA-seq) technology, the extent of AS heterogeneity at the transcript level during early mouse embryonic development remains largely unexplored.
MethodsThe BRIE2 and expedition were employed to identify and quantify splicing events. Cell clustering was performed with Scanpy based on Percent Spliced In (PSI) values and gene expression levels. Then, marker AS events and differential AS events were detected by the Wlicocon rank-sum test and BRIE2's Mode-2 quantification mode. GO and KEGG enrichment analysis were conducted by ClusterProfiler.
ResultsThe results suggested substantial heterogeneity in AS events and elucidated PSI values as a critical index of cell heterogeneity during early mouse embryonic development, shedding light on the regulatory mechanisms underlying these processes. By examining marker and differential AS events, the study provided a comprehensive understanding of the dynamic changes in splicing patterns throughout early mouse embryonic development.
DiscussionThis study revealed the heterogeneity of AS and elucidated its implications during early mouse embryonic development by analyzing AS at the single-cell level. However, the results are theoretical and lack experimental validation.
ConclusionThe findings offer critical insights into studying mouse embryonic development from the perspective of RNA cellular heterogeneity, emphasizing the importance of AS in shaping cellular diversity and developmental processes.
-
-
-
Investigating the Unique Transcriptional miRNA-mRNA Regulatory Network of ALK-positive Lung Adenocarcinoma Using Machine Learning Methods
Authors: Xiandong Lin, YuSheng Bao, Shaoli Wang, Hongyu Yu, Wei Guo, KaiYan Feng, Tao Huang and Yu-Dong CaiAvailable online: 08 May 2025More LessIntroductionNon-small Cell Lung Cancer (NSCLC) is characterized by key gene mutations, such as EGFR, KRAS, and ALK. ALK rearrangement occurs in 3–5% of patients with non-small cell lung adenocarcinoma and is related to different clinical characteristics. Although ALK tyrosine kinase inhibitors have shown efficacy, drug resistance remains a challenge. This current study aims to determine the unique molecular characteristics of ALK-positive lung adenocarcinoma to improve detection and prognosis.
MethodsGSE128311 integrates expression profiling data by array from GSE128309 and noncoding RNA profiling data by array from GSE128310, including 42 patients with ALK-positive lung adenocarcinoma and 35 patients with ALK-negative lung adenocarcinoma. This data was analyzed by eight feature ranking algorithms, yielding eight feature lists. These lists were fed into incremental feature selection to extract essential features.
ResultsKey differentially expressed genes and miRNAs were identified, and functional enrichment analysis was carried out.
DiscussionResults of the imbalance of the cell cycle pathway, FOXM1 transcription factor network, and immune response process in ALK-positive tumors were emphasized. It is worth noting that CX3CL1, MMS22L, DSG3, RUFY1, miR-652-5p, and miR-1288 are potentially important markers. Gene set enrichment analysis revealed the low expression of the cell cycle pathway in ALK-positive samples.
ConclusionThis comprehensive computational analysis provides new insights into the molecular basis of ALK-positive lung adenocarcinoma and determines promising biomarkers for further research.
-
-
-
Single-Cell RNA Sequencing to Identify Natural Killer Cell-Linked Genetic Markers and Regulatory Biomolecules in Coronary Heart Disease
Available online: 25 April 2025More LessIntroductionBacterial and viral infections have been linked to an increased risk of coronary heart disease (CHD), potentially through natural killer (NK) cell-mediated innate immune mechanisms. This study aimed to integrate single-cell RNA sequencing (scRNA-seq) and bulk transcriptomics data to identify NK cell-associated genetic biomarkers that could aid in the diagnosis and assessment of CHD.
MethodsPublicly available single-cell and bulk RNA-seq datasets were analyzed to identify differentially expressed genes (DEGs). Functional enrichment analysis, protein-protein interaction (PPI) network construction, and biomarker validation were performed using standard bioinformatics pipelines.
ResultsA total of 106 shared DEGs were identified through integrated cross-comparative analysis. Enrichment analysis revealed involvement in immune activation, signal transduction, T-cell receptor signaling, and TYROBP signaling pathways. PPI network analysis identified key hub proteins, including CDK1 and PTPRC, as potential biomarkers. Regulatory analysis revealed transcription factors (TP53, YY1, and RELA) and post-transcriptional miRNAs (hsa-miR-195-5p, hsa-miR-34a-5p, and hsa-miR-16-5p) that may influence CHD-associated gene expression. Several small molecules were also predicted to interact with these targets, suggesting potential therapeutic applications.
DiscussionThe findings underscore the role of NK cell-mediated immune pathways in CHD pathogenesis. Hub genes such as CDK1 (involved in cell cycle regulation) and PTPRC (an immune signaling regulator) show promise as diagnostic biomarkers. The discovery of regulatory factors and druggable targets supports a complex, multi-level mechanism involving transcriptional and immune modulation.
ConclusionThis integrative study identifies novel NK cell-related molecular signatures and therapeutic targets, offering promising avenues for CHD diagnosis and the development of personalized treatment strategies.
-
-
-
Innovative Insights into Liver Cancer: Multi-Omics Reveals Critical Subtypes and Hub Genes
Authors: Jin-Yuan Cheng, Zi Liu, Xin Liu, Muhammad Kabir and Wang-Ren QiuAvailable online: 24 April 2025More LessIntroduction/ObjectiveHepatocellular carcinoma (HCC) is a highly heterogeneous malignant tumor, characterized by elevated mortality rates and poor diagnostic outcomes. Accurate identification of cancer subtypes is crucial for guiding personalized treatment and improving patient prognosis.
MethodsA method for precisely identifying HCC subtypes by integrating multi-omics data was presented. This approach combines the GRACES dimensionality reduction technique with the hMKL subtype identification model to analyze data from 266 HCC patients.
ResultsWe identified two subtypes more accurately, both significantly associated with overall survival. Their respective three-year mortality rates were 55.9% and 27.9%. Additionally, we observed significant differences in the activity of five pathways between these two subtypes, along with notable variations in the abundance and status of seven types of immune cells. Through further determination of the PPI network and centrality indicators, 13 up-regulated hub genes and 14 down-regulated hub genes were identified.
DiscussionBased on the above results, we compared and discussed the hub genes with the textual data, examined differences in gene upregulation and downregulation, and evaluated findings from other bioinformatics analyses to identify potential biomarkers.
ConclusionLimited research on ENPP3 and C3 in HCC suggests their potential as biomarkers. Additionally, low expression levels of PIK3R1, KDR, and CYP3A5, along with high expression levels of EGLN3 and EPO, may indicate a higher risk of liver cancer in patients. Single-gene survival analysis highlighted the significant impact of highly expressed genes on HCC prognosis, with PKM, RRM2, and EPO playing crucial roles in the risk scores.
-
-
-
Explainable Colon Cancer Stage Prediction with Multimodal Biodata through the Attention-based Transformer and Squeeze-Excitation Framework
Authors: Olalekan Ogundipe, Bing Zhai, Zeyneb Kurt and Wai Lok WooAvailable online: 12 March 2025More LessIntroductionThe heterogeneity in tumours poses significant challenges to the accurate prediction of cancer stages, necessitating the expertise of highly trained medical professionals for diagnosis. Over the past decade, the integration of deep learning into medical diagnostics, particularly for predicting cancer stages, has been hindered by the black-box nature of these algorithms, which complicates the interpretation of their decision-making processes.
MethodThis study seeks to mitigate these issues by leveraging the complementary attributes found within functional genomics datasets (including mRNA, miRNA, and DNA methylation) and stained histopathology images. We introduced the Extended Squeeze- and-Excitation Multiheaded Attention (ESEMA) model, designed to harness these modalities. This model efficiently integrates and enhances the multimodal features, capturing biologically pertinent patterns that improve both the accuracy and interpretability of cancer stage predictions.
ResultOur findings demonstrate that the explainable classifier utilised the salient features of the multimodal data to achieve an area under the curve (AUC) of 0.9985, significantly surpassing the baseline AUCs of 0.8676 for images and 0.995 for genomic data.
ConclusionFurthermore, the extracted genomics features were the most relevant for cancer stage prediction, suggesting that these identified genes are promising targets for further clinical investigation.
-
-
-
Multiple Approaches to Identifying Key Genes Linked to the Anti-inflammatory Effects of Ginsenosides
Authors: Gui-Fang Xiang, Fei-Ran Zhou, Chun-Yan Cui, Qing Liu, An-Qiong Mao and Ying ZhangAvailable online: 10 March 2025More LessGinsenoside is a naturally occurring active ingredient in ginseng, which mainly consists of four components, including Rb1, Rb2, Rc, and Rd, which are considered to be an important part of ginseng's medicinal effects. Ginsenosides can enhance the anti-fatigue ability of the body, regulate immune function, improve cardiovascular function, and have anti-aging, antioxidant, and neuroprotective effects. In recent years, many studies have found that ginsenosides have anti-inflammatory properties and are used in the treatment of many inflammatory diseases, such as endodontitis, bronchitis, and many others. Ginsenosides reduce inflammation by suppressing the release of inflammatory mediators, modulating inflammatory signaling pathways, scavenging free radicals, and modulating the immune system in a variety of ways. However, existing studies have not investigated the specific genes underlying the inflammation-reducing properties of ginsenosides. In this study, we analyzed two publicly accessible datasets from the GEO database (GSE255672 and GSE173990) to investigate the molecular basis of the anti-inflammatory effects of ginsenosides. This study aims to advance our understanding of how ginsenosides exert their anti-inflammatory properties, providing preliminary findings for identifying gene targets for their anti-inflammatory effects, thereby enhancing our understanding of their biological function and identifying new therapeutic pathways in the management of inflammation. It paves the way for further research of ginsenosides and therapeutic application of inflammation-related diseases.
-
-
-
Single-Cell RNA Sequence Analysis to Identify Lymphatic Cell-Specific Biomarkers of Guillain-Barre Syndrome by Using Bioinformatics Approaches
Available online: 28 February 2025More LessBackgroundAn uncommon neurological condition known as Guillain-Barre syndrome (GBS) develops when the body's immunological system unintentionally targets peripheral nerves.
AimThis work aimed to compare scRNA-seq and transcriptome data to find novel gene biomarkers linked to CD4+ T cells and B cells that might potentially be utilized for the diagnosis and assessment of GBS. It aimed to employ scRNA-seq data and bioinformatics tools analysis to identify cell-specific biomarkers for GBS diagnosis and prognosis.
MethodologyscRNA-seq and microarray datasets from the GEO database were utilized to identify differentially expressed genes (DEGs). Pathway enrichment, identification of potential hub genes, and gene regulatory studies were employed using FunRich, DAVID, STRING, and NetworkAnalyst tools.
ResultsAfter integrating the DEGs and performing a comparative analysis, it was discovered that there were 84 DEGs shared between scRNA-seq and microarray datasets. The presence of signal transduction, immune system, cytokine signaling, NOD-like receptor signaling, and focal adhesion was detected in the most significant gene ontology and metabolic pathways. After generating a protein-protein interaction (PPI) network, we used eleven topological algorithms of the cytoHubba plugin for identifying six key hub genes, including CDC42, PTPRC, SRSF1, HNRNPA2B1, NIPBL, and FOS. Several crucial transcription factors (CHD1, IRF1, FOXC1, GATA2, YY1, E2F1, and CREB1) and two significant microRNAs (hsa-mir-20a-5p and hsa-mir-16-5p) were also discovered as hub gene regulators. The receiver operating characteristics (ROC) curve was used to evaluate the prognostic, expression, and diagnostic capabilities of the six major hub genes, indicating a good scoring value.
ConclusionFinally, functional enrichment pathway analysis, PPI, and regulatory networks analysis demonstrated the critical functions of the identified key hub genes. After further wet lab research is validated, our research work may offer useful predicted potential biomarkers for the diagnosis and prognosis of GBS.
-
-
-
Integrative Analysis of Single Cell and Bulk RNA Sequencing Data Reveals T-Cell Specific Biomarkers for Diagnosis and Assessment of Celiac Disease: A Comprehensive Bioinformatics Approach
Available online: 10 February 2025More LessBackgroundCeliac Disease (CD) is a common autoimmune disorder caused by the activation of CD4+ T cells that specifically target gluten and CD8+ T cells, further causing cell death inside the epithelial layer despite no available established biomarkers of CD diagnosis.
ObjectiveThis work aimed to compare scRNA-seq and transcriptome data to find novel gene biomarkers linked to T cells that might potentially be utilized for the diagnosis and assessment of CD.
MethodsCollecting the scRNA and RNAseq datasets from the NCBI database, the Seurat package of R studio, and the statistical analysis tool GREIN server were employed to identify Differentially Expressed Genes (DEGs). Then, DAVID, FunRich, STRING, and NetworkAnalyst tools were utilized to explore significant pathways, key hub proteins, and gene regulators.
ResultsAfter integrating genes and conducting a comparative analysis, a total of 115 genes were identified as DEGs. Exosomes, MHC class II receptor activity, immune response, interferon gamma signaling, and bystander B cell activation within the immune system pathways were the significant Gene Ontology (GO) and metabolic pathways identified. Besides, eleven topological algorithms discovered two hub proteins, namely HLA-DRA and HLA-DRB1, from the PPI network. Through the analysis of the regulatory network, we have identified four crucial Transcription Factors (TFs), including YY1, FOXC1, GATA2, and USF2, and seven significant miRNAs (hsa-mir-129-2-3p, and hsa-mir-155-5p, etc.) in transcriptionally and post-transcriptionally regulated. Validation of hub proteins and transcription factors using Receiver Operating Characteristic (ROC) analysis indicates the acceptable value of the Area Under the Curve (AUC).
ConclusionThis study utilized single-cell RNA sequencing and transcriptomics data analysis to define unique protein biomarkers associated with T cells throughout the progression of CD. Furthermore, wet lab studies will be needed to validate the potential hub proteins, TFs, and miRNAs as clinical biomarkers.
-
-
-
An Analysis of the Interactions between the 5' UTR and Introns in Mitochondrial Ribosomal Protein Genes
Authors: Junchao Deng, Ruifang Li, Xinwei Song, Shan Gao, Shiya Peng and Xu TianAvailable online: 10 February 2025More LessBackgroundThe 5' UTR plays a crucial role in gene regulation, which may be through its interaction with introns. Hence, there is a need to further study this interaction.
ObjectiveThis study aimed to investigate the interactions between 5' UTR and introns and their correlation with species evolution.
MethodsThe optimally matched segments between 5' UTR and introns were identified using Smith-Waterman local similarity matching, and the biological statistical methods were applied to compare the optimally matched segments between different species.
ResultsThe interactions between 5' UTR and introns were found to be primarily mediated by weak bonds and demonstrated a directional change with species evolution. Additionally, a large proportion of the optimally matched segments were very similar to miRNA and siRNA in terms of length and matching rate characteristics.
ConclusionThe weak bonds in the interactions between the 5' UTR and the introns could enhance the flexibility of expression regulation, and an important correlation was found between the characteristic distributions of the optimally matched segments and species evolution. Additionally, the length and matching rate of a large proportion of optimally matched segments were very similar to those of miRNA and siRNA. In conclusion, it is highly probable that quite a few of the optimally matched segments are some kinds of functional non-coding RNAs.
-
-
-
PDTDAHN: Predicting Drug-Target-Disease Associations using a Heterogeneous Network
Authors: Lei Chen and Jingdong LiAvailable online: 10 February 2025More LessBackgroundDisease is a major threat to life, and extensive efforts have been made over the past centuries to develop effective treatments. Identifying drug-disease and disease-target associations is crucial for therapeutic advancements, whereas drug-target associations facilitate the design of more effective treatment strategies. However, traditional experimental approaches for identifying these associations are costly and time-consuming. Numerous computational models have been developed to predict drug-target, drug-disease, and disease-target associations. However, these models are designed individually and cannot directly predict drug-target-disease associations, which involve interconnections among drugs, targets, and diseases. Such triple associations provide deeper insights into disease mechanisms and therapeutic interventions by capturing high-order associations.
ObjectiveThis study proposes a computational model named PDTDAHN to predict drug-target-disease triple associations.
MethodSix association types retrieved from public databases are used to construct a heterogeneous network comprising drugs, targets, and diseases. The network embedding algorithm Mashup is applied to extract features for drugs, targets, and diseases, which are then combined to represent each drug-target-disease association. The classification model is trained using LightGBM.
ResultsCross-validation on eight datasets demonstrates the high performance of PDTDAHN, with AUROC and AUPR exceeding 0.9. This model outperforms previous models based on pairwise association predictions.
ConclusionThe proposed model effectively predicts drug-target-disease triple associations.
-
-
-
Integrative Multi-Omics Approaches for Personalized Medicine and Health
Authors: Prateek Tiwari, Raghvendra Pandey and Sonia ChadhaAvailable online: 10 February 2025More LessIntroductionMulti-omics data integration has transformed personalized medicine, providing a comprehensive understanding of disease mechanisms and informed precision therapeutic options. Multi-omics data generated for the same samples/patients can help in getting insights into the flow of biological information at several levels, thereby providing in-depth information regarding the molecular mechanisms underlying pathological conditions. Multi-omics integration plays a pivotal role in personalized medicine by providing comprehensive insights into the complex biological systems of individual patients. This review provides a comprehensive account of the current and future progress brought into multi-omics methodologies, promising to refine diagnostics and therapeutic strategy by integrating genomic, transcriptomic analyses, proteomics approaches and metabolome screens.
MethodsA literature search was performed in PubMed using keywords like genomics, proteomics, transcriptomics, metabolomics, multi-omics, and precision medicine to identify published research articles. A thorough review of all results was then conducted, and their results and conclusions were compiled and summarized.
ResultBy analyzing various omics layers, such as genomics, transcriptomics, proteomics, and metabolomics, multi-omics approaches enable the identification of patient-specific molecular traits and the discovery of new clinical therapeutics for diseases. Integration of various data types augments diagnostics, optimizes therapeutic regimens and supports personalized medicine according to an individual patient profile.
ConclusionIntegration of multi-omics data and its applications in various fields, such as cancer research, helps in optimizing patient-specific treatment and improvement of patient health. With time, as these technologies reach more people, they stand to democratize precision medicine and hopefully bridge health disparities. In conclusion, the present review highlights multiomics data integration as a transformative step towards personalized medicine and ultimately changing patient care from empirical-based to precision or individualized.
-
-
-
Exploiting Gene Expression Signatures in Breast Cancer Cell Lines to Unveil Novel Drug Candidates and Synergistic Combinations
Authors: Hsueh-Chuan Liu, Chia-Wei Weng and Ka-Lok NgAvailable online: 04 February 2025More LessAimThis study aimed to study breast cancer, the most common cancer affecting women worldwide, using one primary and two metastatic breast tumor cell lines to identify therapeutic drugs.
BackgroundInvestigating the changes in gene expression triggered by drugs offers a robust method for uncovering potential new treatments. Through the analysis of the impacts of drugs on gene activity, scientists can unravel the molecular mechanisms within cells, comprehend the effects of drugs, identify chances for drug repositioning, and foresee patient outcomes to treatments.
ObjectiveOur approach has involved two main strategies: analyzing drug-perturbed gene expression profiles and leveraging drug-induced gene expression profiles. Firstly, we have assessed how drugs affect the expression of target genes in a dose-dependent manner, determining whether they inhibit or activate gene expression. This analysis could inform the identification of new potential drugs. Secondly, we have grouped drugs based on their expression profiles to explore potential synergistic effects.
MethodsOur methodology has involved quantifying gene profile changes relative to drug dosage, categorizing effects as up-regulating or down-regulating, and employing functional enrichment with cancer hallmark annotations to predict drugs with potential for cancer treatment. Additionally, we have determined the optimal number of drug groups with similar effects on gene expression and explored their mechanisms of action through cancer hallmark annotations.
ResultsBy analyzing dose-dependent gene expression, we have found that seven, three, and five drugs may induce similar sets of up-regulated and down-regulated genes in Hs-578-T, MCF7, and MDA-MB-231 cell lines, respectively. Clustering and functional enrichment analyses have suggested a shared molecular mechanism of action among these drug candidates.
ConclusionWe have thus categorized drugs with opposing gene expression profiles and proposed new drug candidates for breast cancer treatment based on cancer hallmark annotations. Moreover, our study has uncovered synergistic drug combinations, including those utilizing FDA-approved drugs, for primary and metastatic breast cancer cell lines.
-
-
-
An Overview of Spatial Transcriptomics Methodologies in Traversing the Biological System
Available online: 30 January 2025More LessTranscriptomics covers the in-depth analysis of RNA molecules in cells or tissues and plays an essential role in understanding cellular functions and disease mechanisms. Advances in spatial transcriptomics (ST) in recent times have revolutionized the field by combining gene expression data with spatial information, enabling the analysis of RNA molecules within their tissue context. The evolution of spatial transcriptomics, particularly the integration of artificial intelligence (AI) in data analysis, and its diverse applications have been found to be superior methods in developmental research. Spatial transcriptomics technologies, along with single-cell RNA sequencing (scRNA-seq), offer unprecedented possibilities to unravel intricate cellular interactions within tissues. It emphasizes the importance of accurate cell localization for in-depth discoveries and developments via high-throughput spatial transcriptome profiling. The integration of artificial intelligence in spatial transcriptomics analysis is a key focus, showcasing its role in detecting spatially variable genes, clustering cell populations, communication analysis, and enhancing data interpretation. The evolution of AI methods tailored for spatial transcriptomics is highlighted, addressing the unique challenges posed by spatially resolved transcriptomic data. Applications of spatial transcriptomics integrated with other omics data, such as genomics, proteomics, and metabolomics, provide a detailed view of molecular processes within tissues and emerge in diverse applications. Integrating spatial transcriptomics with AI represents a transformative approach to understanding tissue architecture and cellular interactions. This innovative synergy not only enhances our understanding of gene expression patterns but also offers a holistic view of molecular processes within tissues, with profound implications for disease mechanisms and therapeutic development.
-
-
-
Exploring Coding Sequence Length Distributions Across Taxonomic Kingdoms Based on Maximum Information Principle
Available online: 30 January 2025More LessBackgroundGenetic information about organisms' traits is stored and encoded in deoxyribonucleic acid (DNA) sequences. The fundamental inquiry into the storage mechanisms of this genetic information within genomes has long been of interest to geneticists and biophysicists.
ObjectiveThe objective of this study was to investigate the distribution of coding sequence (CDS) lengths in species genomes across different kingdoms.
MethodsIn this study, we used the maximum entropy principle and the gamma distribution model based on a comprehensive dataset including viruses, archaea, bacteria, and eukaryote species.
ResultsOur study result revealed unique patterns in CDS length distributions among kingdoms and CDS lengths exhibit a right-skewed distribution, with varying preferences among kingdoms. Eukaryotes displayed bimodal distributions, with CDS sequences longer than those of prokaryotes. Fitting the gamma distribution model revealed differences in shape and scale parameters among kingdoms, with eukaryotes exhibiting larger scale parameters, indicating longer CDS sequences. Additionally, analysis of moments highlighted the complexity of eukaryotic genomes relative to prokaryotes.
ConclusionThis study result deepens our understanding of genome evolution and provides valuable insights for biological research.
-
-
-
A Review of Biosequences Alignment, Matching, and Mining Based on GPU
Authors: Xianghua Kong, Cong Shen and Jijun TangAvailable online: 28 January 2025More LessSequence alignment, pattern matching, and mining are important cornerstones in bioinformatics, and they include identifying genome structure, protein function, and biological metabolic regulatory network. However, because it helps speed up the dealing process, the parallel sequential pattern recognition method has gained attention as data volume has increased. This review summarizes the GPU-based sequence alignment, pattern matching, and mining with the tools and their applications in bioinformatics. After giving an overview of the background, this review first introduces the concept and database of sequence alignment, pattern matching, and mining. Then, the basic architecture and parallel computing principle of GPU are briefly described. Next, the design of GPU-based algorithms and optimization strategies in sequence alignment, pattern matching, and mining are listed in detail. By comparing and analyzing the existing research, the summarization of the advantages and challenges of GPU application in bioinformatics are given. Finally, the future research direction is prospected, including the further development of the algorithm combined with machine learning and deep learning.
-
-
-
A Deep Learning Method for Identifying G-Protein Coupled Receptors based on a Feature Pyramid Network and Attention Mechanism
Authors: Zhe Lv, Siqin Hu, Xin Wei and Wangren QiuAvailable online: 08 January 2025More LessBackgroundG-protein coupled receptors (GPCRs) represent a large family of membrane proteins, distinguished by their seven-transmembrane helical structures. These receptors play a pivotal role in numerous physiological processes. Nowadays, many researchers have proposed computational methods to identify GPCRs. In the past, we introduced a powerful method, EMCBOW-GPCR, which was designed for this purpose. However, the feature extraction technique employed is susceptible to out-of-vocabulary errors, indicating the potential for enhanced accuracy in GPCR identification.
MethodsTo solve the challenges, we propose a novel approach termed GPCR-AFPN. This method leverages the FastText algorithm to effectively extract features from protein sequences. Additionally, it employs a powerful deep neural network as the predictive model to improve prediction accuracy.
ResultsTo validate the efficacy of the proposed GPCR-AFPN method, we conducted five-fold cross-validation and independent tests, respectively. The experimental results indicate that GPCR-AFPN outperforms existing methods.
ConclusionOverall, our proposed method, GPCR-AFPN, can improve the accuracy of GPCR identification. For the convenience of researchers interested in applying our latest advancements, a user-friendly webserver for GPCR-AFPN is available at www.lzzzlab.top/gpcrafpn/, and the corresponding code can be downloaded at https://github.com/454170054/GPCR-AFPN.
-
-
-
Screening of Candidate Chemical Regulators for the m6A Writer MTA in Arabidopsis
Authors: Beilei Lei, Chengchao Jia, Cuixia Tan, Pengjun Ding, Zenglin Li, Jing Yang, Jiyuan Liu, XiaoMin Wei, Shiheng Tao and Chuang MaAvailable online: 07 January 2025More LessBackgroundThe MTA gene encodes a core component of m6A methyltransferase complex, which plays a crucial role in the post-transcriptional modification of RNA that influences many vital processes in plants. However, due to the constraint of embryonic lethality in MTA knockout mutation, the molecular function of MTA gene has yet to be comprehensively investigated.
ObjectiveThe aim of this study is to investigate the expression and regulation of MTA in Arabidopsis.
MethodsA large-scale transcriptome and genome analysis were carried out for the expression and nsSNP (non-synonymous Single Nucleotide Polymorphism) studies. Structured-based virtual screening, molecular dynamics simulation, binding free energy calculation and m6A modification level assay were employed to mine and validate MTA regulators from COCONUT natural product database.
ResultsTissue-specific expression and stress-responsive expression patterns of MTA were observed in Arabidopsis. nsSNPs from the 1,001 Arabidopsis project were not detected in the binding site of the methyl-donor substrate S-adenosylmethionine (SAM) in MTA. 10 small molecules were identified as potential regulators, among which CNP0251613 (adenosine diphosphate glucose, ADPG) was selected and validated to decrease m6A levels at 10µM vs. the control in Arabidopsis.
ConclusionOur results provide a new insight and chemical entity into the in-depth study of RNA m6A writer MTA in plants.
-
-
-
DSPE: An End-to-End Drug Synergy Combination Prediction Algorithm for Echinococcosis
Authors: Haitao Li, Liyuan Jiang, Yuanyuan Chu, Yuansheng Liu, Chunhou Zheng and Yansen SuAvailable online: 07 January 2025More LessBackgroundEchinococcosis, a parasitic disease caused by the larvae of the Echinococcus parasite, poses a serious threat to human health. Medication is an indispensable means of treatment for Echinococcosis; however, due to the less satisfactory efficacy of single drugs, identifying effective drug combinations for the treatment of Echinococcosis is essential. Yet, current predictive models for drug synergy in Echinococcosis face accuracy challenges due to data scarcity, method limitations, and insufficient feature representation.
ObjectiveThis work aims to design an end-to-end method to predict drug synergistic combinations, which enables efficient and accurate identification of drug combinations against Echinococcosis.
MethodsIn this work, an end-to-end method, named DSPE, is proposed for predicting anti-Echinococcosis drug synergistic combinations. In DSPE, a dataset of Echinococcosis drug synergistic combinations is constructed by retrieving and extracting information from related scientific articles. Further, DSPE employs a residual graph attention network to deeply analyze drug characteristics and their interactions, thereby enhancing the performance of deep learning models. It also explores the protein-protein interaction network related to Echinococcosis, using node2vec combined with an attention mechanism to efficiently encode disease features. Finally, it predicts the synergy of drug combinations based on the Bliss score by integrating drug combinations and disease features.
ResultsExperimental evidence shows that DSPE outperforms five state-of-the-art algorithms in predicting drug combination effects by leveraging disease-target information and single-agents for the treatment.
ConclusionDSPE improves prediction accuracy and addresses the issue of data scarcity for new diseases, offering new insights and methods for the development of treatment plans for parasitic diseases in the future.
-
-
-
PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset
Authors: Jidon Jang, Dokyun Na and Kwang-Seok OhAvailable online: 02 January 2025More LessAimThis study aims to develop and validate a machine learning-based model for the accurate prediction of androgen receptor (AR) agonistic toxicity, addressing the challenges posed by data imbalance in existing predictive models.
BackgroundAnomalous agonistic activity of the androgen receptor is a known major indicator of reproductive toxicity, which can lead to prostate cancer. Machine learning-based models have been developed for the rapid prediction of such agonists. However, the existing models have exhibited biased learning outcomes and low sensitivity due to the imbalance in the available training data. In the early screening process of drug discovery, low sensitivity caused by data imbalance can hinder the detection of potentially toxic compounds.
ObjectiveThe objective of this study is to develop a machine learning prediction model that classifies whether a drug candidate is an androgen receptor agonist or not with highly balanced performance compared to existing models.
MethodsPredART is a bootstrap aggregated k-nearest neighbor model for the balanced prediction of androgen receptor agonistic toxicity using 381 active and 8,089 inactive datasets with structural features of them.
ResultIn this work, we propose an advanced model that combines the bootstrap aggregating algorithm with machine learning binary classifiers to identify androgen receptor-based reproductive toxicity while avoiding biased prediction results. The optimal model using k-nearest neighbor classifiers achieved an accuracy of 0.831, positive predictive value (PPV) of 0.882, sensitivity of 0.625, specificity of 0.951, Mathews correlation coefficient (MCC) of 0.633 on external test data, demonstrating a significant improvement in sensitivity compared to the previous study and achieving balanced learning. Furthermore, by calculating the standard deviation among outputs of the classifiers and employing this prediction uncertainty as a screening metric to select reliable predictions, the model's performance could be further enhanced.
ConclusionBased on the bootstrap aggregating algorithm, our prediction model effectively addressed data imbalance while evaluating the performance of various machine learning and deep learning classifiers for a benchmark. Additionally, by quantifying uncertainty, our model provided an intuitive assessment of prediction reliability during large-scale screening processes.
-
-
-
A Method of Enhancing Heterogeneous Graph Representation for Predicting the Associations between lncRNAs and Diseases
Authors: Dengju Yao, Yuehu Wu and Xiaojuan ZhanAvailable online: 06 November 2024More LessBackgroundLong non-coding RNAs (lncRNAs) are a category of more extended RNA strands that lack protein-coding abilities. Although they are not involved in the translation of proteins, studies have shown that they play essential regulatory functions in cells, regulating gene expression and cell biological processes. However, it is both costly and inefficient to determine the associations between lncRNAs and diseases through biological experiments. Therefore, there is an urgent need to develop convenient and fast computational methods to predict lncRNA-disease associations (LDAs) more efficiently.
ObjectivePredicting disease-associated lncRNAs can help explore the mechanisms of action of lncRNAs in diseases, and this is crucial for early intervention and treatment of diseases.
MethodsIn this paper, we propose an enhanced heterogeneous graph representation method for predicting LDAs, named GCGALDA. The GCGALDA first obtains the topological structure features of nodes by a biased random walk. Based on this, the neighboring nodes of a node are weighted using the attention mechanism to further mine the semantic association relationships between nodes in the graph data. Then, a graph convolution network (GCN) is used to transfer the neighborhood features of the node to the central node and combine them with the node's features so that the final node representation contains not only structural information but also semantic association information. Finally, the association score between lncRNA and disease is obtained by multilayer perceptron (MLP).
ResultsAs evidenced by the experimental findings, the GCGALDA outperforms other advanced models in terms of prediction accuracy on openly accessible databases. In addition, case studies on several human diseases further confirm the predictive ability of the GCGALDA.
ConclusionIn conclusion, the proposed GCGALDA model extracts multi-perspective features, such as topology, semantic association, and node attributes, obtains high-quality heterogeneous graph node representations, and effectively improves the performance of the LDA prediction model.
-
-
-
Identification and Analysis of Plant miRNAs: Evolution of In-silicoResources and Future Challenges
Authors: Abhishek Kushwaha, Hausila Prasad Singh and Noopur SinghAvailable online: 04 November 2024More LessEndogenous small RNAs (miRNA) are the key regulators of numerous eukaryotic lineages playing an important role in a broad range of plant development. Computational analysis of miRNAs facilitates the understanding of miRNA-based regulations in plants. The discovery of small non-coding RNAs has led to a greater understanding of gene regulation, and the development of bioinformatic tools has enabled the identification of microRNAs (miRNAs) and their targets. The need for comprehensive miRNA analysis is being accomplished by the development of advanced computational tools/algorithms and databases. Each resource has its own specificity and limitations for the analysis. This review provides a comprehensive overview of various algorithms used by computational tools, software, and databases for plant miRNA analysis. However, over a period of about two decades, a lot of knowledge has been added to our understanding of the biogenesis and functioning of miRNAs in other plants. Several parameters were already integrated and others need to be incorporated in order to give more accurate and efficient results. The reassessment of computational recourses (based on old algorithms) is required on the basis of new miRNA research and development. Generally, computational methods, including ab-initio and homology search-based methods, are used for miRNA identification and target prediction. This review presents the new challenges faced by the existing computational methods and the need to develop new tools and advanced algorithms and highlight the limitations of existing computational tools and methods, and emphasizing the need for a comprehensive platform for miRNA gene exploration.
-
-
-
GVNNVAE: A Novel Microbe-Drug Association Prediction Model based on an Improved Graph Neural Network and the Variational Auto-Encoder
Authors: Yiming Chen, Zhen Zhang, Xin Liu, Bin Zeng and Lei WangAvailable online: 31 October 2024More LessMicroorganisms play a crucial role in human health and disease. Identifying potential microbe-drug associations is essential for drug discovery and clinical treatment. In this manuscript, we proposed a novel prediction model named GVNNVAE by combining an Improved Graph Neural Network (GNN) and the Variational Auto-Encoder (VAE) to infer potential microbe-drug associations. In GVNNVAE, we first established a heterogeneous microbe-drug network N by integrating multiple similarity metrics of microbes, drugs, and diseases. Subsequently, we introduced an improved GNN and the VAE to extract topological and attribute representations for nodes in N respectively. Finally, through incorporating various original attributes of microbes and drugs with above two kinds of newly obtained topological and attribute representations, predicted scores of potential microbe-drug associations would be calculated. Furthermore, To evaluate the prediction performance of GVNNVAE, intensive experiments were done and comparative results showed that GVNNVAE could achieve a satisfactory AUC value of 0.9688, which outperformed existing competitive state-of-the-art methods. And moreover, case studies of known microbes and drugs confirmed the effectiveness of GVNNVAE as well, which highlighted its potential for predicting latent microbe-drug associations.
-
-
-
Graph-Root: Prediction of Root-Associated Proteins in Maize, Sorghum, And Soybean Based on Graph Convolutional Network and Network Embedding Method
Authors: Bo Zhou, Siyang Liu, Lei Chen and Qi DaiAvailable online: 29 October 2024More LessBackgroundThe root system plays an irreplaceable role in plant growth. Its improvement can increase crop productivity. However, such a system is still mysterious for us. The underlying mechanism has not been fully uncovered. The investigation on proteins related to the root system is an important means to complete this task. In the previous time, lack of root-related proteins makes it impossible to adopt machine learning methods for designing efficient models for the discovery of novel root-related proteins. Recently, a public database on root-related proteins was set up and machine learning methods can be applied in this field.
ObjectiveThe purpose of this study was to design an efficient computational method to predict root-associated proteins in three plants: maize, sorghum, and soybean.
MethodIn this study, we proposed a machine learning based model, named Graph-Root, for the identification of root-related proteins in maize, sorghum, and soybean. The features derived from protein sequences, functional domains, and one network were extracted, where the first type of features were processed by graph convolutional neural network and multi-head attention, the second type of features reflected the essential functions of proteins, and the third type of features abstracted the linkage between proteins. These features were fed into the fully connected layer to make predictions.
ResultsThe 5-fold cross-validation and independent tests suggested its acceptable performance. It also outperformed the only previous model, SVM-Root. Furthermore, the importance of each feature type and component in the proposed model was investigated.
ConclusionGraph-Root had a good performance and can be a useful tool to identify novel root-related proteins. BLOSUM62 features were found to be important in determining root-related proteins.
-
-
-
Robust Somatic Copy Number Estimation using Coarse-to-fine Segmentation
Available online: 28 October 2024More LessIntroductionCancers routinely exhibit chromosomal instability that results in copy number variants (CNVs), namely changes in the abundance of genomic material. Unfortunately, the detection of these variants in cancer genomes is difficult.
MethodsWe present Ploidetect, a software package that effectively identifies CNVs within whole-genome sequenced tumors. Ploidetect utilizes a coarse-to-fine segmentation approach which yields highly contiguous segments while allowing for focal CNVs to be detected with high sensitivity.
ResultsWe benchmark Ploidetect against popular CNV tools using synthetic data, cell line data, and real-world metastatic tumor data and demonstrate strong performance in all tests. We show that high quality CNVs from Ploidetect enable the identification of recurrent homozygous deletions and genes associated with chromosomal instability in a multi-cancer cohort of 687 patients. Using highly contiguous CNV calls afforded by Ploidetect, we also demonstrate the use of segment N50 as a novel metric for the measurement of chromosomal instability within tumor biopsies.
ConclusionWe propose that the increasingly accurate determination of CNVs is critical for their productive study in cancer, and our work demonstrates advances made possible by progress in this regard.
-
-
-
PredPVP: A Stacking Model for Predicting Phage Virion Proteins Based on Feature Selection Methods
Authors: Qian Cao, Xufeng Xiao, Yannan Bin, Jianping Zhao and Chunhou ZhengAvailable online: 28 October 2024More LessBackgroundPhage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information.
MethodsIn this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods wereutilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model.
ResultsThe results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%.
Conclusion:We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.
-
-
-
A Low Transformed Tubal Rank Tensor Model Using a Spatial-Tubal Constraint for Sample Clustering with Cancer Multi-omics Data
Authors: Sheng-Nan Zhang, Ying-Lian Gao, Yu-Lin Zhang, Junliang Shang, Chun-Hou Zheng and Jin-Xing LiuAvailable online: 21 October 2024More LessBackgroundSince each dimension of a tensor can store different types of genomics data, compared to matrix methods, utilizing tensor structure can provide a deeper understanding of multi-dimensional data while also facilitating the discovery of more useful information related to cancer. However, in reality, there are issues such as insufficient utilization of prior knowledge in multi-omics data and limitations in the recovery of low-tubal-rank tensors. Therefore, the method proposed in this article was developed.
Objective: In this paper, we proposed a low transformed tubal rank tensor model (LTTRT) using a spatial-tubal constraint to accurately partition different types of cancer samples and provide reliable theoretical support for the identification, diagnosis, and treatment of cancer.
MethodIn the LTTRT method, the transformed tensor nuclear norm based on the transformed tensor singular value decomposition is characterized by the low-rank tensor, which can explore the global low-rank property of the tensor, resolving the challenge of the tensor nuclear norm-based method not achieving the lowest tubal rank. Additionally, the introduction of weighted total variation regularization is conducive to extracting more information from sequencing data in both spatial and tubal dimensions, exploring cross-correlation features of multiple genomic data, and addressing the problem of overlooking prior knowledge from various perspectives. In addition, the L1-norm is used to improve sparsity. A symmetric Gauss‒Seidel-based alternating direction method of multipliers (sGS-ADMM) is used to update the LTTRT model iteratively.
ResultsThe experiments of sample clustering on multiple integrated cancer multi-omics datasets show that the proposed LTTRT method is better than existing methods. Experimental results validate the effectiveness of LTTRT in accurately partitioning different types of cancer samples.
ConclusionThe LTTRT method achieves precise segmentation of different types of cancer samples.
-
-
-
Predicting Molecular Subtypes of Breast Cancer Using Gene Expression Profiling and Random Forest Classifier
Available online: 14 October 2024More LessBackgroundOne of the main causes of cancer-related mortality in women is breast cancer [BC]. There were four molecular subtypes of this malignancy, and adjuvant therapy efficacy differed based on these subtypes. Gene expression profiles provide valuable information that is helpful for patients whose prognosis is not clear from clinical markers and immunohistochemistry.
ObjectiveIn this study, we aim to predict molecular types of BC using a gene expression dataset of patients with BC and normal samples using six well-known ensemble machine-learning techniques.
MethodsTwo microarray datasets were downloaded; [GSE45827] and [GSE140494] from the Gene Expression Omnibus [GEO] database. These datasets comprise 21 samples of normal tissues that were part of a cohort analysis of primary invasive breast cancer [57 basal, 36 HER2, 56 Luminal A, and 66 Luminal B]. Namely, we used AdaBoost, Random Forest [RF], Artificial Neural Network [ANN], Naïve Bayes [NB], Classification and Regression Tree [CART], and Linear Discriminant Analysis [LDA] classifiers.
ResultThe results of the data analysis show that the RF and NB classifiers outperform the other models in the prediction of the BC subtype. The RF shows superior performance with an accuracy range between 0.89 and 1.0 in contrast to its competitor NB, which has an average accuracy of 0.91. Our approach perfectly discriminates un-affected cases [normal] from the carcinoma. In this case, the RF provides perfect prediction with zero errors. Additionally, we used PCA, DHWT low-frequency, and DHWT high-frequency to perform a dimensional reduction for the numerous gene expression values. Consequently, the LDA achieves up to 95% improvement in performance through data reduction. Moreover, feature selection allowed for the best performance, which is recorded by the RF with classification accuracy 98%.
ConclusionOverall, we provide a successful framework that leads to shorter computation times and smaller ML models, especially where memory and time restrictions are crucial.
-
-
-
NEXT-GEN Medicine: Designing Drugs to Fit Patient Profiles
Authors: Raj Kamal, Diksha, Priyanka Paul, Ankit Awasthi and Amandeep SinghAvailable online: 14 October 2024More LessBackground : Personalized medicine, with its focus on tailoring drug formulations to individual patient profiles, has made significant strides in healthcare. The integration of genomics, biomarkers, nanotechnology, 3D printing, and real-time monitoring provides a comprehensive approach to optimizing drug therapies on an individual basis. This review aims to highlight the recent advancements in personalized medicine and its applications in various diseases, such as cancer, cardiovascular diseases, diabetes mellitus, and neurodegenerative diseases. The review explores the integration of multiple technologies in the field of personalized medicine, including genomics, biomarkers, nanotechnology, 3D printing, and real-time monitoring. As these technologies continue to evolve, we are entering an era of truly personalized medicine that promises improved treatment outcomes, reduced adverse effects, and a more patient-centric approach to healthcare. The advancements in personalized medicine hold great promise for improving patient outcomes and reducing adverse effects, heralding a new era in patient-centric healthcare.
-
-
-
Artificial Intelligence in Diabetes Mellitus Prediction: Advancements and Challenges - A Review
Authors: Rohit Awasthi, Anjali Mahavar, Shraddha Shah, Darshana Patel, Mukti Patel, Drashti Shah and Ashish PatelAvailable online: 11 October 2024More LessPoor dietary habits and a lack of understanding are contributing to the rapid global increase in the number of diabetic people. Therefore, a framework that can accurately forecast a large number of patients based on clinical details is needed. Artificial intelligence (AI) is a rapidly evolving field, and its implementations to diabetes, a worldwide pandemic, have the potential to revolutionize the strategy of diagnosing and forecasting this chronic condition. Algorithms based on artificial intelligence fundamentals have been developed to support predictive models for the risk of developing diabetes or its complications. In this review, we will discuss AI-based diabetes prediction. Thus, AI-based new-onset diabetes prediction has not beaten the statistically based risk stratification models, in traditional risk stratification models. Despite this, it is anticipated that in the near future, a vast quantity of well-organized data and an abundance of processing power will optimize AI's predictive capabilities, greatly enhancing the accuracy of diabetic illness prediction models.
-
-
-
scADCA: An Anomaly Detection-Based scRNA-seq Dataset Cell Type Annotation Method for Identifying Novel Cells
Authors: Yongle Shi, Yibing Ma, Xiang Chen and Jie GaoAvailable online: 10 October 2024More LessBackgroundWith the rapid evolution of single-cell RNA sequencing technology, the study of cellular heterogeneity in complex tissues has reached an unprecedented resolution. One critical task of the technology is cell-type annotation. However, challenges persist, particularly in annotating novel cell types.
ObjectiveCurrent methods rely heavily on well-annotated reference data, using correlation comparisons to determine cell types. However, identifying novel cells remains unstable due to the inherent complexity and heterogeneity of scRNA-seq data and cell types. To address this problem, we propose scADCA, a method based on anomaly detection, for identifying novel cell types and annotating the entire dataset.
MethodsThe convolutional modules and fully connected networks are integrated into an autoencoder, and the reference dataset is trained to obtain the reconstruction errors. The threshold based on these errors can distinguish between novel and known cells in the query dataset. After novel cells are identified, a multinomial logistic regression model fully annotates the dataset.
ResultsUsing a simulation dataset, three real scRNA-seq pancreatic datasets, and a real scRNA-seq lung cancer cell line dataset, we compare scADCA with six other cell-type annotation methods, demonstrating competitive performance in terms of distinguished accuracy, full accuracy, -score, and confusion matrix.
ConclusionIn conclusion, the scADCA method can be further improved and expanded to achieve better performance and application effects in cell type annotation, which is helpful to improve the accuracy and reliability of cytology research and promote the development of single-cell omics.
-