Volume 18, Issue 4

Current Bioinformatics - Volume 18, Issue 4, 2023

Volume 18, Issue 4, 2023

- scTSSR-D: Gene Expression Recovery by Two-side Self-Representation and Dropout Information for scRNA-seq Data
  
  Authors: Meng Liu, Wenhao Chen, Jianping Zhao, Chunhou Zheng and Feilong Guo
  
  https://doi.org/10.2174/1574893618666230217085543
  More Less
  
  Background: Single-cell RNA sequencing is an advanced technology that makes it possible to unravel cellular heterogeneity and conduct single-cell analysis of gene expression. However, owing to technical defects, many dropout events occur during sequencing, bringing about adverse effects on downstream analysis. Methods: To solve the dropout events existing in single-cell RNA sequencing, we propose an imputation method scTSSR-D, which recovers gene expression by two-side self-representation and dropout information. scTSSR-D is the first global method that combines a partial imputation method to impute dropout values. In other words, we make full use of genes, cells, and dropout information when recovering the gene expression. Results: The results show scTSSR-D outperforms other existing methods in the following experiments: capturing the Gini coefficient and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization, down-sampling experiments, differential expression analysis, and the accuracy of cell clustering. Conclusion: scTSSR-D is a more stable and reliable method to recover gene expression. Meanwhile, our method improves even more dramatically on large datasets compared to the result of existing methods.
  
  Add to my favourites
  
  Email this

- DEGoldS: A Workflow to Assess the Accuracy of Differential Expression Analysis Pipelines through Gold-standard Construction
  
  Authors: Mikel Hurtado, Fernando Mora-Márquez, Álvaro Soto, Daniel Marino, Pablo G. Goicoechea and Unai López de Heredia
  
  https://doi.org/10.2174/1574893618666230222122054
  More Less
  
  Background: Non-model species lacking public genomic resources have an extra handicap in bioinformatics that could be assisted by parameter tuning and the use of alternative software. Indeed, for RNA-seq-based gene differential expression analysis, parameter tuning could have a strong impact on the final results that should be evaluated. However, the lack of gold-standard datasets with known expression patterns hampers robust evaluation of pipelines and parameter combinations. Objective: The aim of the presented workflow is to assess the best differential expression analysis pipeline among several alternatives, in terms of accuracy. To achieve this objective, an automatic procedure of gold-standard construction for simulation-based benchmarking is implemented. Methods: The workflow, which is divided into four steps, simulates read libraries with known expression values to enable the construction of gold-standards for benchmarking pipelines in terms of true and false positives. We validated the workflow with a case study consisting of real RNA-seq libraries of radiata pine, a forest tree species with no publicly available reference genome. Results: The workflow is available as a freeware application (DEGoldS) consisting on sequential Bash and R scripts that can run in any UNIX OS platform. The presented workflow proved to be able to construct a valid gold-standard from real count data. Additionally, benchmarking showed that slight pipeline modifications produced remarkable differences in the outcome of differential expression analysis. Conclusion: The presented workflow solves the issues associated with robust gold-standard construction for benchmarking in differential expression experiments and can accommodate with a wide range of pipelines and parameter combinations.
  
  Add to my favourites
  
  Email this

- CCRMDA: MiRNA-disease Association Prediction Based on Cascade Combination Recommendation Method on a Heterogeneous Network
  
  Authors: Yuan-Lin Ma, Dong-Ling Yu, Ya-Fei Liu and Zu-Guo Yu
  
  https://doi.org/10.2174/1574893618666230222124311
  More Less
  
  Background: MicroRNAs (miRNAs) are a class of short and endogenous single-stranded non-coding RNAs, with a length of 21-25nt. Many studies have proved that miRNAs are closely related to human diseases. Many algorithms based on network structure have been proposed to predict potential miRNA-disease associations. Methods: In this work, a cascade combination method based on network topology is developed to explore disease-related miRNAs. We name our method as CCRMDA. First, the hybrid recommendation algorithm is used for a rough recommendation, and then the structural perturbation method is used for a precise recommendation. A special perturbation set is constructed to predict new miRNA-disease associations in the miRNA-disease heterogeneous network. Results: To verify the effectiveness of CCRMDA, experimental analysis is performed on HMDD V2.0 and V3.2 datasets, respectively. For HMDD V2.0 dataset, CCRMDA is compared with several state-ofthe- art algorithms based on network structure, and the results show that CCRMDA has the best performance. The CCRMDA method also achieves excellent performance with an average AUC of 0.953 on HMDD V3.2 dataset. In addition, case studies further prove the effectiveness of CCRMDA. Conclusion: CCRMDA is a reliable method for predicting miRNA-disease.
  
  Add to my favourites
  
  Email this

- Identification of Potential Biomarkers in Stomach Adenocarcinoma using Machine Learning Approaches
  
  Authors: Elham Nazari, Ghazaleh Pourali, Majid Khazaei, Alireza Asadnia, Mohammad Dashtiahangar, Reza Mohit, Mina Maftooh, Mohammadreza Nassiri, Seyed Mahdi Hassanian, Majid Ghayour-Mobarhan, Gordon A. Ferns, Soodabeh Shahidsales and Amir Avan
  
  https://doi.org/10.2174/1574893618666230227103427
  More Less
  
  Background: Stomach adenocarcinoma (STAD) is a common cancer with poor clinical outcomes globally. Due to a lack of early diagnostic markers of disease, the majority of patients are diagnosed at an advanced stage. Objective: The aim of the present study is to provide some new insights into the available biomarkers for patients with STAD using bioinformatics. Methods: RNA-Sequencing and other relevant data of patients with STAD from The Cancer Genome Atlas (TCGA) database were evaluated to identify differentially expressed genes (DEGs). Then, Machine Learning algorithms were undertaken to predict biomarkers. Additionally, Kaplan-Meier analysis was used to detect prognostic biomarkers. Furthermore, the Gene Ontology and Reactome pathways, protein-protein interactions (PPI), multiple sequence alignment, phylogenetic mapping, and correlation between clinical parameters were evaluated. Results: The results showed 61 DEGs, and the key dysregulated genes associated with STAD are MTHFD1L (Methylenetetrahydrofolate dehydrogenase 1-like), ZWILCH (Zwilch Kinetochore Protein), RCC2 (Regulator of chromosome condensation 2), DPT (Dermatopontin), GCOM1 (GRINL1A complex locus 1), and CLEC3B (C-Type Lectin Domain Family 3 Member B). Moreover, the survival analysis reported ASPA (Aspartoacylase) as a prognostic marker. Conclusion: Our study provides a proof of concept of the potential value of ASPA as a prognostic factor in STAD, requiring further functional investigations to explore the value of emerging markers.
  
  Add to my favourites
  
  Email this

- MaxDEL: Accurate and Efficient Calling of Genomic Deletions from Single Molecular Real-time Sequencing Using Integrated Method
  
  Authors: Xinyu Yu, Yaoxian Lv, Lei Cai and Jingyang Gao
  
  https://doi.org/10.2174/1574893618666230224160716
  More Less
  
  Background: Single-molecule real-time (SMRT) sequencing data are characterized by long read length and high read depth. Compared to next-generation sequencing (NGS), SMRT sequencing data can present more structural variations (SVs) and have greater advantages in calling variation. However, there are high sequencing errors and noises in SMRT sequencing data, which causes inaccuracy in calling SVs from sequencing data. Most existing tools cannot overcome sequencing errors and detect genomic deletions. Objective: In this investigation, we propose a new method for calling deletions from SMRT sequencing data called MaxDEL. Methods: Firstly, MaxDEL uses a machine learning method to calibrate the deletion regions from the variant call format (VCF) file. Secondly, it develops a novel feature visualization method to convert the variant features to images and uses these images to accurately call the deletions based on a convolutional neural network (CNN). Results: The result shows that MaxDEL performs better in terms of accuracy and recall for calling variants when compared to existing methods in both real data and simulative data. Conclusion: MaxDEL can effectively overcome SMRT sequencing data's noise and integrate new machine learning and deep learning technologies. The method can capture the variant features of the deletions and establish the learning model between images and gene data. In our experiment, the MaxDEL method is superior to NextSV, SVIM, Sniffles, Picky and SMRT-SV, especially in recall and F1-score.
  
  Add to my favourites
  
  Email this

- Identification of Membrane Protein Types Based Using Hypergraph Neural Network
  
  Authors: Weizhong Lu, Meiling Qian, Yu Zhang, Hongjie Wu, Yijie Ding, Jiawei Shen, Xiaoyi Chen, Haiou Li and Qiming Fu
  
  https://doi.org/10.2174/1574893618666230224143726
  More Less
  
  Introduction: Membrane proteins play an important role in living organisms as one of the main components of biological membranes. The problem in membrane protein classification and prediction is an important topic of membrane proteomics research because the function of proteins can be quickly determined if membrane protein types can be discriminated. Methods: Most current methods to classify membrane proteins are labor-intensive and require a lot of resources. In this study, five methods, Average Block (AvBlock), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Histogram of Orientation Gradient (HOG), and Pseudo-PSSM (PsePSSM), were used to extract features in order to predict membrane proteins on a large scale. Then, we combined the five obtained feature matrices and constructed the corresponding hypergraph association matrix. Finally, the feature matrices and hypergraph association matrices were integrated to identify the types of membrane proteins using a hypergraph neural network model (HGNN). Results: The proposed method was tested on four membrane protein benchmark datasets to evaluate its performance. The results showed 92.8%, 88.6%, 88.2%, and 99.0% accuracy on each of the four datasets. Conclusion: Compared to traditional machine learning classifier methods, such as Random Forest (RF), Support Vector Machine (SVM), etc. HGNN prediction performance was found to be better.
  
  Add to my favourites
  
  Email this

- An Iterative Model for Identifying Essential Proteins Based on the Whole Process Network of Protein Evolution
  
  Authors: Zhen Zhang, Yaocan Zhu, Hongjing Pei, Xiangyi Wang and Lei Wang
  
  https://doi.org/10.2174/1574893618666230315154807
  More Less
  
  Introduction: Essential proteins play important roles in cell growth and regulation. However, due to the high costs and low efficiency of traditional biological experiments to identify essential proteins, in recent years, with the development of high-throughput technologies and bioinformatics, more and more computational models have been proposed to infer key proteins based on Protein-Protein Interaction (PPI) networks. Methods: In this manuscript, a novel prediction model named MWPNPE (Model based on the Whole Process Network of Protein Evolution) was proposed, in which, a whole process network of protein evolution was constructed first based on known PPI data and gene expression data downloaded from benchmark databases. And then, considering that the interaction between proteins is a kind of dynamic process, a new measure was designed to estimate the relationships between proteins, based on which, an improved iterative algorithm was put forward to evaluate the importance of proteins. Results: Finally, in order to verify the predictive performance of MWPNPE, we compared it with stateof- the-art representative computational methods, and experimental results demonstrated that the recognition accuracy of MWPNPE in the top 100, 200, and 300 candidate key proteins can reach 89, 166, and 233 respectively, which is significantly better than the predictive accuracies achieved by these competitive methods. Conclusion: Hence, it can be seen that MWPNPE may be a useful tool for the development of key protein recognition in the future.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 18, Issue 4, 2023

Volume 18, Issue 4, 2023

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed