Current Bioinformatics - Volume 14, Issue 1, 2019
Volume 14, Issue 1, 2019
-
-
Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations
Authors: Fang Jing, Shao-Wu Zhang and Shihua ZhangBackground: Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task. Objective and Method: Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment. Results and Conclusion: We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.
-
-
-
Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model
Authors: Lin Zhang, Yanling He, Huaizhi Wang, Hui Liu, Yufei Huang, Xuesong Wang and Jia MengBackground: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.
-
-
-
A Computational Approach to Identify Novel Potential Precursor miRNAs and their Targets from Hepatocellular Carcinoma Cells
Background: Recent advances in next-generation sequencing technology allow highthroughput RNA-Sequencing to be widely applied in studying coding and non-coding RNA profiling in cells. RNA-Seq data usually contains functional transcriptomic and other small and larger non-coding (nc) RNA sequences. Objective: MicroRNAs (miRNAs), a small nc-RNA act as epigenetic markers and the expression of their target genes and pathways that regulate Hepatocellular Carcinoma (HCC), a primary malignancy of the liver. The unreported potential novel miRNAs targeting HCC pathways can be identified from the sequenced data. Methods: In this study, we performed a computational identification of novel putative miRNAs and their targets from publicly available high-throughput sequencing Fastq data of human HCC cells HepG2, NorHep and SKHep1, retrieved from NCBI-SRA. Results: Totally, 572 unique known precursor miRNAs and 1062 unique novel miRNAs were identified from HepG2, Nor and SKHep1 HCC cell lines. Interestingly, 140 novel miRNAs were predicted to be extensively involved in targeting genes of HCC related pathways such as apoptosis, cell signaling, cell division, cell-cycle arrest, GPCR, MAPK cascade, TOR signaling, TNFSF11 signaling and liver development. Conclusion: The predicted novel miRNAs reported in the paper might have a vital role in regulating the molecular mechanism of HCC and thus, further studies on these miRNAs will provide significant clues for researchers into the complex biological process of liver cancer.
-
-
-
In Silico Identification of Conserved MiRNAs from Physcomitrella patens ESTs and their Target Characterization
Authors: Behzad Hajieghrari, Naser Farrokhi, Bahram Goliaei and Kaveh KavousiBackground: MicroRNAs (miRNAs) are groups of small non-protein-coding endogenous single stranded RNAs with approximately 18-24 nucleotides in length. High evolutionary sequence conservation of miRNAs among plant species and availability of powerful computational tools allow identification of new orthologs and paralogs. Methods: New conserved miRNAs in P. patens were found by EST-based homology search approaches. All candidates were screened according to a series of miRNA filtering criteria. Unigene, DFCI Gene Index (PpspGI) databases and psRNATarget algorithm were applied to identify target transcripts using P. patens putative conserved miRNA sequences. Results: Nineteen conserved P. patens miRNAs were identified. The sequences were homologous to known reference plant mature miRNA from 10 miRNA families. They could be folded into the typical miRNA secondary structures. RepeatMasker algorithm demonstrated that ppt-miR2919e and pptmiR1533 had simple sequence repeats in their sequences. Target sites (49 genes) were identified for 7 out of 19 miRNAs. GO and KEGG analysis of targets indicated the involvement of some in important multiple biological and metabolic processes. Conclusion: The majority of the registered miRNAs in databases were predicted by computational approaches while many more have remained unknown. Due to the conserved nature of miRNAs in plant species from closely to distantly related, homology search-based approaches between plants species could lead to the identification of novel miRNAs in other plant species providing baseline information for further search about the biological functions and evolution of miRNAs.
-
-
-
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
Authors: Nishith Kumar, Md. A. Hoque, Md. Shahjaman, S.M. Shahinul Islam and Md. N. H. MollahBackground: Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatography mass spectrometry (GC-MS), liquid chromatography mass spectrometry (LC-MS), etc.) metabolomics data frequently contain missing values that make some quantitative analysis complex. Typically metabolomics datasets contain 10% to 20% missing values that originate from several reasons, like analytical, computational as well as biological hazard. Imputation of missing values is a very important and interesting issue for further metabolomics data analysis. Objective: This paper introduces a new algorithm for missing value imputation in the presence of outliers for metabolomics data analysis. Method: Currently, the most well known missing value imputation techniques in metabolomics data are knearest neighbours (kNN), random forest (RF) and zero imputation. However, these techniques are sensitive to outliers. In this paper, we have proposed an outlier robust missing imputation technique by minimizing twoway empirical mean absolute error (MAE) loss function for imputing missing values in metabolomics data. Results: We have investigated the performance of the proposed missing value imputation technique in a comparison of the other traditional imputation techniques using both simulated and real data analysis in the absence and presence of outliers. Conclusion: Results of both simulated and real data analyses show that the proposed outlier robust missing imputation technique is better performer than the traditional missing imputation methods in both absence and presence of outliers.
-
-
-
Genome-wide Analysis of the Distribution of Riboswitches and Function Analyses of the Corresponding Downstream Genes in Prokaryotes
Authors: Xinfeng Li, Fang Chen, Jinfeng Xiao, Shan-Ho Chou, Xuming Li and Jin HeBackground: Riboswitches are structured elements that usually reside in the noncoding regions of mRNAs, with which various ligands bind to control a wide variety of downstream gene expressions. To date, more than twenty different classes of riboswitches have been characterized to sense various metabolites, including purines and their derivatives, coenzymes, amino acids, and metal ions, etc. Objective: This study aims to study the genome-wide analysis of the distribution of riboswitches and function analyses of the corresponding downstream genes in prokaryotes. Results: In this study, we have completed a genome context analysis of 27 riboswitches to elucidate their metabolic capacities of riboswitch-mediated gene regulation from the completely-sequenced 3,079 prokaryotic genomes. Furthermore, Cluster of Orthologous Groups of proteins (COG) annotation was applied to predict and classify the possible functions of corresponding downstream genes of these riboswitches. We found that they could all be successfully annotated and grouped into 20 different COG functional categories, in which the two main clusters "coenzyme metabolism [H]" and "amino acid transport and metabolism [E]" were the most significantly enriched. Conclusion: Riboswitches are found to be widespread in bacteria, among which three main classes of TPP-, cobalamin- and SAM-riboswitch were the most widely distributed. We found a wide variety of functions were associated with the corresponding downstream genes, suggesting that a wide extend of regulatory roles were mediated by these riboswitches in prokaryotes.
-
-
-
Identification of Bone Metastasis-associated Genes of Gastric Cancer by Genome-wide Transcriptional Profiling
Authors: Mingzhe Lin, Xin Li, Haizhou Guo, Faxiang Ji, Linhan Ye, Xuemei Ma and Wen ChengBackground: Gastric cancer is one of the leading causes of cancer-related mortality worldwide. Genome-wide transcriptional profiling has provided valuable insights into the molecular basis underlying processes involved in gastric cancer initiation and progression. Objective: To understand the pathological and biological mechanisms of gastric cancer metastasis in a genome-wide context. Method: In this study, we constructed libraries from blood of gastric cancer patients with, and without, bone metastasis. High-throughput sequencing combined with differential expression analysis was used to investigate transcriptional changes. Results: We identified a total of 425 significantly differentially expressed genes. Protein-protein interaction network analysis suggested that most of these genes are involved in DNA replication, DNA damage response, collagen homeostasis and cell adhesion. Furthermore, our data suggested that NFkappaB and DNA damage response pathways were the key regulators of the bone metastasis associated with gastric cancer. Finally, most of these target genes were involved in pathways such as extracellular matrix organization and extracellular structure organization as revealed by gene set enrichment assay. Conclusion: Our study provides a comprehensive analysis of the transcriptional alterations involved in gastric cancer bone metastasis, which provides greater insights into the complexity of regulatory changes during tumorigenesis and offers novel diagnostic as well as therapeutic avenues.
-
-
-
Protein Stability Determination (PSD): A Tool for Proteomics Analysis
Authors: Anindya S. Panja, Akash Nag, Bidyut Bandopadhyay and Smarajit MaitiBackground: Protein Stability Determination (PSD) is a sequence-based bioinformatics tool which was developed by utilizing a large input of datasets of protein sequences in FASTA format. The PSD can be used to analyze the meta-proteomics data which will help to predict and design thermozyme and mesozyme for academic and industrial purposes. The PSD also can be utilized to analyze the protein sequence and to predict whether it will be stable in thermophilic or in the mesophilic environment. Method and Results: This tool which is supported by any operating system is designed in Java and it provides a user-friendly graphical interface. It is a simple programme and can predict the thermostability nature of proteins with >90% accuracy. The PSD can also predict the nature of constituent amino acids i.e. acidic or basic and polar or nonpolar etc. Conclusion: PSD is highly capable to determine the thermostability status of a protein of hypothetical or unknown peptides as well as meta-proteomics data from any established database. The utilities of the PSD driven analyses include predictions on the functional assignment to a protein. The PSD also helps in designing peptides having flexible combinations of amino acids for functional stability. PSD is freely available at https://sourceforge.net/projects/protein-sequence-determination.
-
-
-
Mining Gene Expression Profile with Missing Values: An Integration of Kernel PCA and Robust Singular Values Decomposition
Background: Gene expression profiling and transcriptomics provide valuable information about the role of genes that are differentially expressed between two or more samples. It is always important and challenging to analyse High-throughput DNA microarray data with a number of missing values under various experimental conditions. Objectives: Graphical data visualizations of the expression of all genes in a particular cell provide holistic views of gene expression patterns, which improve our understanding of cellular systems under normal and pathological conditions. However, current visualization methods are sensitive to missing values, which are frequently observed in microarray-based gene expression profiling, potentially affecting the subsequent statistical analyses. Methods: We addressed in this study the problem of missing values with respect to different imputation methods using gene expression biplot (GE biplot), one of the most popular gene visualization techniques. The effects of missing values for mining differentially expressed genes in gene expression data were evaluated using four well-known imputation methods: Robust Singular Value Decomposition (Robust SVD), Column Average (CA), Column Median (CM), and K-nearest Neighbors (KNN). Frobenius norm and absolute distances were used to measure the accuracy of the methods. Results: Three numerical experiments were performed using simulated data (i) and publicly available colon cancer (ii) and leukemia data (iii) to analyze the performance of each method. The results showed that CM and KNN performed better than Robust SVD and CA for identifying the index gene profile in the biplot visualization in both the simulation study and the colon cancer and leukemia microarray datasets. Conclusion: The impact of missing values on the GE biplot was smaller when the data matrix was imputed by KNN than by CM. This study concluded that KNN performed satisfactorily in generating a GE biplot in the presence of missing values in microarray data.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
