Current Bioinformatics - Volume 18, Issue 4, 2023
Volume 18, Issue 4, 2023
-
-
scTSSR-D: Gene Expression Recovery by Two-side Self-Representation and Dropout Information for scRNA-seq Data
Authors: Meng Liu, Wenhao Chen, Jianping Zhao, Chunhou Zheng and Feilong GuoBackground: Single-cell RNA sequencing is an advanced technology that makes it possible to unravel cellular heterogeneity and conduct single-cell analysis of gene expression. However, owing to technical defects, many dropout events occur during sequencing, bringing about adverse effects on downstream analysis. Methods: To solve the dropout events existing in single-cell RNA sequencing, we propose an imputation method scTSSR-D, which recovers gene expression by two-side self-representation and dropout information. scTSSR-D is the first global method that combines a partial imputation method to impute dropout values. In other words, we make full use of genes, cells, and dropout information when recovering the gene expression. Results: The results show scTSSR-D outperforms other existing methods in the following experiments: capturing the Gini coefficient and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization, down-sampling experiments, differential expression analysis, and the accuracy of cell clustering. Conclusion: scTSSR-D is a more stable and reliable method to recover gene expression. Meanwhile, our method improves even more dramatically on large datasets compared to the result of existing methods.
-
-
-
DEGoldS: A Workflow to Assess the Accuracy of Differential Expression Analysis Pipelines through Gold-standard Construction
Background: Non-model species lacking public genomic resources have an extra handicap in bioinformatics that could be assisted by parameter tuning and the use of alternative software. Indeed, for RNA-seq-based gene differential expression analysis, parameter tuning could have a strong impact on the final results that should be evaluated. However, the lack of gold-standard datasets with known expression patterns hampers robust evaluation of pipelines and parameter combinations. Objective: The aim of the presented workflow is to assess the best differential expression analysis pipeline among several alternatives, in terms of accuracy. To achieve this objective, an automatic procedure of gold-standard construction for simulation-based benchmarking is implemented. Methods: The workflow, which is divided into four steps, simulates read libraries with known expression values to enable the construction of gold-standards for benchmarking pipelines in terms of true and false positives. We validated the workflow with a case study consisting of real RNA-seq libraries of radiata pine, a forest tree species with no publicly available reference genome. Results: The workflow is available as a freeware application (DEGoldS) consisting on sequential Bash and R scripts that can run in any UNIX OS platform. The presented workflow proved to be able to construct a valid gold-standard from real count data. Additionally, benchmarking showed that slight pipeline modifications produced remarkable differences in the outcome of differential expression analysis. Conclusion: The presented workflow solves the issues associated with robust gold-standard construction for benchmarking in differential expression experiments and can accommodate with a wide range of pipelines and parameter combinations.
-
-
-
CCRMDA: MiRNA-disease Association Prediction Based on Cascade Combination Recommendation Method on a Heterogeneous Network
Authors: Yuan-Lin Ma, Dong-Ling Yu, Ya-Fei Liu and Zu-Guo YuBackground: MicroRNAs (miRNAs) are a class of short and endogenous single-stranded non-coding RNAs, with a length of 21-25nt. Many studies have proved that miRNAs are closely related to human diseases. Many algorithms based on network structure have been proposed to predict potential miRNA-disease associations. Methods: In this work, a cascade combination method based on network topology is developed to explore disease-related miRNAs. We name our method as CCRMDA. First, the hybrid recommendation algorithm is used for a rough recommendation, and then the structural perturbation method is used for a precise recommendation. A special perturbation set is constructed to predict new miRNA-disease associations in the miRNA-disease heterogeneous network. Results: To verify the effectiveness of CCRMDA, experimental analysis is performed on HMDD V2.0 and V3.2 datasets, respectively. For HMDD V2.0 dataset, CCRMDA is compared with several state-ofthe- art algorithms based on network structure, and the results show that CCRMDA has the best performance. The CCRMDA method also achieves excellent performance with an average AUC of 0.953 on HMDD V3.2 dataset. In addition, case studies further prove the effectiveness of CCRMDA. Conclusion: CCRMDA is a reliable method for predicting miRNA-disease.
-
-
-
Identification of Potential Biomarkers in Stomach Adenocarcinoma using Machine Learning Approaches
Background: Stomach adenocarcinoma (STAD) is a common cancer with poor clinical outcomes globally. Due to a lack of early diagnostic markers of disease, the majority of patients are diagnosed at an advanced stage. Objective: The aim of the present study is to provide some new insights into the available biomarkers for patients with STAD using bioinformatics. Methods: RNA-Sequencing and other relevant data of patients with STAD from The Cancer Genome Atlas (TCGA) database were evaluated to identify differentially expressed genes (DEGs). Then, Machine Learning algorithms were undertaken to predict biomarkers. Additionally, Kaplan-Meier analysis was used to detect prognostic biomarkers. Furthermore, the Gene Ontology and Reactome pathways, protein-protein interactions (PPI), multiple sequence alignment, phylogenetic mapping, and correlation between clinical parameters were evaluated. Results: The results showed 61 DEGs, and the key dysregulated genes associated with STAD are MTHFD1L (Methylenetetrahydrofolate dehydrogenase 1-like), ZWILCH (Zwilch Kinetochore Protein), RCC2 (Regulator of chromosome condensation 2), DPT (Dermatopontin), GCOM1 (GRINL1A complex locus 1), and CLEC3B (C-Type Lectin Domain Family 3 Member B). Moreover, the survival analysis reported ASPA (Aspartoacylase) as a prognostic marker. Conclusion: Our study provides a proof of concept of the potential value of ASPA as a prognostic factor in STAD, requiring further functional investigations to explore the value of emerging markers.
-
-
-
MaxDEL: Accurate and Efficient Calling of Genomic Deletions from Single Molecular Real-time Sequencing Using Integrated Method
Authors: Xinyu Yu, Yaoxian Lv, Lei Cai and Jingyang GaoBackground: Single-molecule real-time (SMRT) sequencing data are characterized by long read length and high read depth. Compared to next-generation sequencing (NGS), SMRT sequencing data can present more structural variations (SVs) and have greater advantages in calling variation. However, there are high sequencing errors and noises in SMRT sequencing data, which causes inaccuracy in calling SVs from sequencing data. Most existing tools cannot overcome sequencing errors and detect genomic deletions. Objective: In this investigation, we propose a new method for calling deletions from SMRT sequencing data called MaxDEL. Methods: Firstly, MaxDEL uses a machine learning method to calibrate the deletion regions from the variant call format (VCF) file. Secondly, it develops a novel feature visualization method to convert the variant features to images and uses these images to accurately call the deletions based on a convolutional neural network (CNN). Results: The result shows that MaxDEL performs better in terms of accuracy and recall for calling variants when compared to existing methods in both real data and simulative data. Conclusion: MaxDEL can effectively overcome SMRT sequencing data's noise and integrate new machine learning and deep learning technologies. The method can capture the variant features of the deletions and establish the learning model between images and gene data. In our experiment, the MaxDEL method is superior to NextSV, SVIM, Sniffles, Picky and SMRT-SV, especially in recall and F1-score.
-
-
-
Identification of Membrane Protein Types Based Using Hypergraph Neural Network
Authors: Weizhong Lu, Meiling Qian, Yu Zhang, Hongjie Wu, Yijie Ding, Jiawei Shen, Xiaoyi Chen, Haiou Li and Qiming FuIntroduction: Membrane proteins play an important role in living organisms as one of the main components of biological membranes. The problem in membrane protein classification and prediction is an important topic of membrane proteomics research because the function of proteins can be quickly determined if membrane protein types can be discriminated. Methods: Most current methods to classify membrane proteins are labor-intensive and require a lot of resources. In this study, five methods, Average Block (AvBlock), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Histogram of Orientation Gradient (HOG), and Pseudo-PSSM (PsePSSM), were used to extract features in order to predict membrane proteins on a large scale. Then, we combined the five obtained feature matrices and constructed the corresponding hypergraph association matrix. Finally, the feature matrices and hypergraph association matrices were integrated to identify the types of membrane proteins using a hypergraph neural network model (HGNN). Results: The proposed method was tested on four membrane protein benchmark datasets to evaluate its performance. The results showed 92.8%, 88.6%, 88.2%, and 99.0% accuracy on each of the four datasets. Conclusion: Compared to traditional machine learning classifier methods, such as Random Forest (RF), Support Vector Machine (SVM), etc. HGNN prediction performance was found to be better.
-
-
-
An Iterative Model for Identifying Essential Proteins Based on the Whole Process Network of Protein Evolution
Authors: Zhen Zhang, Yaocan Zhu, Hongjing Pei, Xiangyi Wang and Lei WangIntroduction: Essential proteins play important roles in cell growth and regulation. However, due to the high costs and low efficiency of traditional biological experiments to identify essential proteins, in recent years, with the development of high-throughput technologies and bioinformatics, more and more computational models have been proposed to infer key proteins based on Protein-Protein Interaction (PPI) networks. Methods: In this manuscript, a novel prediction model named MWPNPE (Model based on the Whole Process Network of Protein Evolution) was proposed, in which, a whole process network of protein evolution was constructed first based on known PPI data and gene expression data downloaded from benchmark databases. And then, considering that the interaction between proteins is a kind of dynamic process, a new measure was designed to estimate the relationships between proteins, based on which, an improved iterative algorithm was put forward to evaluate the importance of proteins. Results: Finally, in order to verify the predictive performance of MWPNPE, we compared it with stateof- the-art representative computational methods, and experimental results demonstrated that the recognition accuracy of MWPNPE in the top 100, 200, and 300 candidate key proteins can reach 89, 166, and 233 respectively, which is significantly better than the predictive accuracies achieved by these competitive methods. Conclusion: Hence, it can be seen that MWPNPE may be a useful tool for the development of key protein recognition in the future.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
