Volume 11, Issue 2

Current Bioinformatics - Volume 11, Issue 2, 2016

Volume 11, Issue 2, 2016

- Meet Our Editorial Board Member:
  
  By Ren Zhang
  
  https://doi.org/10.2174/157489361102160401164311
  More Less
  
  Add to my favourites
  
  Email this

- Editorial (Thematic Issue: Nonlinear Science and Network Methods for Prediction Problems in Bioinformatics and Systems Biology)
  
  Authors: Jian-Xin Wang, Min Li and Zu-Guo Yu
  
  https://doi.org/10.2174/157489361102160401165256
  More Less
  
  Add to my favourites
  
  Email this

- Improved Prediction of DNA-Binding Proteins Using Chaos Game Representation and Random Forest
  
  Authors: Xiaohui Niu and Xuehai Hu
  
  https://doi.org/10.2174/1574893611666160223213853
  More Less
  
  DNA-binding proteins (DNA-BPs) play an important role in many biological processes. Now next-generation sequencing technologies are widely used to obtain genome of many organisms. Consequently, identification of DNA-BPs accurately and rapidly will provide significant helps in annotation of genomes. Chaos game representation (CGR) can reveal the information hidden in protein sequences. Furthermore, fractal dimensions are a vital index to measure compactness of complex and irregular geometric objects. In this research, in order to extract the intrinsic correlation with DNAbinding property from protein sequence, CGR algorithm and fractal dimension, together with amino acid composition are applied to formulate the protein samples. Here we employ the random forest as the classifier to predict DNA-BPs based on sequence-derived features with amino acid composition and fractal dimension. This resulting predictor is compared with three important existing methods DNA-Prot, iDNA-Prot and DNAbinder in the same datasets. On two benchmark datasets from DNA-Prot and iDNA-Prot, the average accuracies (ACC) achieve 82.07%, 84.91% respectively, and average Matthew's correlation coefficients (MCC) achieve 0.6085, 0.6981 respectively. The point to point comparisons demonstrate that our fractal approach shows some improvements.
  
  Add to my favourites
  
  Email this

- Analysis of Differential Gene Expression Based on Bayesian Estimation of Variance
  
  Authors: Jiyuan An, John Lai, Lingzao Zeng and Colleen C. Nelson
  
  https://doi.org/10.2174/1574893611666160125221655
  More Less
  
  Gene expression is arguably the most important indicator of biological function. Thus identifying differentially expressed genes is one of the main aims of high throughout studies that use microarray and RNAseq platforms to study deregulated cellular pathways. There are many tools for analysing differentia gene expression from transciptomic datasets. The major challenge of this topic is to estimate gene expression variance due to the high amount of ‘background noise’ that is generated from biological equipment and the lack of biological replicates. Bayesian inference has been widely used in the bioinformatics field. In this work, we reveal that the prior knowledge employed in the Bayesian framework also helps to improve the accuracy of differential gene expression analysis when using a small number of replicates. We have developed a differential analysis tool that uses Bayesian estimation of the variance of gene expression for use with small numbers of biological replicates. Our method is more consistent when compared to the widely used cyber-t tool that successfully introduced the Bayesian framework to differential analysis. We also provide a user-friendly web based Graphic User Interface for biologists to use with microarray and RNAseq data. Bayesian inference can compensate for the instability of variance caused when using a small number of biological replicates by using pseudo replicates as prior knowledge. We also show that our new strategy to select pseudo replicates will improve the performance of the analysis.
  
  Add to my favourites
  
  Email this

- Enhanced Prediction of Small Non-coding RNA in Bacterial Genomes Based on Improved Inter-Nucleotide Distances of Genomes
  
  Authors: Li-Qian Zhou, Rui Li and Liu Hu
  
  https://doi.org/10.2174/1574893611666160223201114
  More Less
  
  Small non-coding RNA genes have been concerned as an important field of life sciences in recent years. It plays important regulatory roles in cellular processes. However, the prediction of noncoding RNA genes is a great challenge, because non-coding RNAs have a small size, are not translated into proteins and show variable stability. In this paper, we propose an improved inter-nucleotide distances model as sequence characteristics, and combine with support vector machines (SVM) to predict small non-coding RNA in bacterial genomes. The prediction result of the mixed bacterial ncRNA is 95.38%, which shows that our method can effectively predict bacterial ncRNAs.
  
  Add to my favourites
  
  Email this

- Protein Folding Kinetic Order Prediction from Amino Acid Sequence Based on Horizontal Visibility Network
  
  Authors: Zhi-Qin Zhao, Zu-Guo Yu, Vo Anh, Jing-Yang Wu and Guo-Sheng Han
  
  https://doi.org/10.2174/1574893611666160125221326
  More Less
  
  Protein folding is one of the most important problems in molecular biology. The kinetic order of protein folding is one of the main aspects of the folding process. Previous methods for predicting protein folding kinetic order require to use the information on tertiary or predicted secondary structure of a protein. In this paper, based on physicochemical properties of amino acids, we propose an approach to predict the protein folding kinetic order from the primary structure of a protein using support vector machine combined with principal component analysis. The horizontal visibility network, Hilbert-Huang transform, global descriptor, and Lempel-Ziv complexity are used to extract features in our approach. To evaluate our approach, the leave-one-out cross-validation test is employed on two widely-used data sets (“IvankovData” and “ZhengData” data sets) consisting of two-state and multi-state proteins. The overall accuracies of prediction can reach 83.87% for “IvankovData” data set and 85% for “ZhengData” data set respectively. Comparisons with the existing methods show that the present approach performs better on the “IvankovData” data set. These results indicate that the present approach is effective and valuable for predicting protein folding kinetic order. Based on factor analysis, we find that the length of protein sequence, hydrophobicity and hydrophilicity of amino acids are important features in our approach.
  
  Add to my favourites
  
  Email this

- Global Propagation Method for Predicting Protein Function by Integrating Multiple Data Sources
  
  Authors: Jun Meng, Xin Zhang and Yushi Luan
  
  https://doi.org/10.2174/1574893611666160125221828
  More Less
  
  Protein function prediction is one of the most important tasks in bioinformatics. Nowadays, high-throughput experiments have generated large scale genomics and proteomics data. To accurately annotate proteins, it is necessary and wise to integrate these heterogeneous data sources. In this paper, a multi-source protein global propagation (MS-PGP) algorithm has been proposed, which integrates multiple data sources and combines protein global propagation with label correlation (PGP) algorithm to predict functions for unannotated proteins. Specifically, we use three data sources to predict protein functions: sequence data, microarray gene expression data and protein-protein interaction data. A naïve Bayesian fashion method is adopted to fuse the three data sources into a combined network. Gene ontology biological process annotation is used to calculate the association scores between unannotated proteins and functions. The experimental results on Yeast show that the proposed method has a higher accuracy over other multiple network methods. It is efficient to predict the function of unannotated proteins.
  
  Add to my favourites
  
  Email this

- Prioritizing Disease Genes by Using Search Engine Algorithm
  
  Authors: Min Li, Ruiqing Zheng, Qi Li, Jianxin Wang, Fang-Xiang Wu and Zhuohua Zhang
  
  https://doi.org/10.2174/1574893611666160125220905
  More Less
  
  It is a fundamental challenge that identifying disease genes from a large number of candidates for a specific disease. As the biological experiment-based methods are generally timeconsuming and laborious, it has become a new strategy to identify disease candidates by using computational approaches. In this paper, we proposed an algorithm based on the search engine ranking method, named PDGTR, to prioritize disease candidates. Firstly, we constructed a weighted human disease network by calculating the topological similarity and phenotype similarity of each pair of diseases. Then, we calculated the similarities of all the genes by using the protein-protein interaction network and the edge clustering coefficient. For a specific disease, a logistic regression model was used to generate the prior-knowledge of each gene. Finally, the search engine ranking based algorithm PDGTR was applied to prioritize the disease candidates. The proposed algorithm PDGTR was tested on five typical cancers: Breast Cancer, Colorectal Cancer, Hepatocellular carcinoma, Gastric Cancer and Osteoporosis, and compared with four state-of-the-art algorithms: RWR, DADA, PRINCE and PRP. The experimental results based on the leave-one-out cross validation, precision, ROC curve, and enrichment show that the proposed algorithm PDGTR outperforms RWR, DADA, PRINCE and PRP. Moreover, some potential disease genes were predicted by PDGTR and already mentioned by some literatures.
  
  Add to my favourites
  
  Email this

- Network Propagation Reveals Novel Features Predicting Drug Response of Cancer Cell Lines
  
  Authors: Jiguang Wang, Judith Kribelbauer and Raul Rabadan
  
  https://doi.org/10.2174/1574893611666160125222144
  More Less
  
  Translating data derived from cancer genomes into personalized cancer therapy is a holy grail of computational biology. An important, yet challenging, question in this undertaking is to relate features of tumor cells to clinical outcomes of anticancer drugs. Recent progress in large pharmacogenomic studies has provided a wealth of data about cancer cell lines, indicating that many genetic and gene expression candidates might predict the drug response of cancer cells. Unfortunately, most of the predicted features are inconsistent with current clinical knowledge and lack mutual dependencies that could explain their molecular mode of action. To address this question, we have developed a new method, named dNetFS, to prioritize genetic and gene expression features of cancer cell lines that predict drug response, by integrating genomic/pharmaceutical data, protein-protein interaction network, and prior knowledge of drug-targets interaction with the techniques of network propagation. Comparing with previous methods, dNetFS is more accurate in cross-validation analysis, and it is able to reveal the key pathways involved in drug response. It therefore provides a basis to identify the underlying molecular mechanism for a given compound in different genomic backgrounds.
  
  Add to my favourites
  
  Email this

- Applications of Random Walk Model on Biological Networks
  
  Authors: Wei Peng, Jianxin Wang, Zhen Zhang and Fang-Xiang Wu
  
  https://doi.org/10.2174/1574893611666160223200823
  More Less
  
  Biological networks play a significant role in addressing biological problems. Random walk model is a highly efficient way to study networks which has been widely used in solving biological problems based on networks. In this work, those biological problems are classified into four categories, ranking nodes in biological networks, measuring similarity or distance between nodes in biological networks, detecting models from biological networks and finding interrelationship between nodes from different biological networks. After that, we survey the recent advance in applications of random walk models to solve these types of problems on the basis of biological networks.
  
  Add to my favourites
  
  Email this

- A Markov Clustering Based Link Clustering Method to Identify Overlapping Modules in Protein-Protein Interaction Networks
  
  Authors: Yan Wang, Guishen Wang, Di Meng, Lan Huang, Enrico Blanzieri and Juan Cui
  
  https://doi.org/10.2174/1574893611666160125222017
  More Less
  
  Previous studies indicated that many overlapping structures exist among the modular structures in protein-protein interaction (PPI) networks, which may reflect common functional components shared by different biological processes. In this paper, a Markov clustering based Link Clustering (MLC) method for the identification of overlapping modular structures in PPI networks is proposed. Firstly, MLC method calculates the extended link similarity and derives a similarity matrix to represent the relevance among the protein interactions. Then it employs markov clustering to partition the link similarity matrix and obtains overlapping network modules with significantly less parameters and threshold constraints compared to most current methodologies. Experiments on two networks with known reference classes and two biological PPI networks of Escherichia coli, Saccharomyces cerevisiae, respectively, show that MLC outperforms the original Link Clustering and the classical Clique Percolation Method in terms of accurate identification of the core modules in each test network. Therefore, we consider the MLC method is high promisingly in identifying important pathways through studying the interplay between functional processes in different organism.
  
  Add to my favourites
  
  Email this

- Detecting Non-Trivial Protein Structure Relationships
  
  By Aleksandar Poleksic
  
  https://doi.org/10.2174/1574893610666150624171116
  More Less
  
  Automated methods for protein three-dimensional structure comparison play an important role in understanding protein function, evolution and biochemical reaction mechanisms. Since the tertiary structure of proteins is more conserved than their amino-acid sequences, accurately aligning three-dimensional structures allows to detect homology between proteins in the “twilight zone”, those sharing less than ~25% sequence identity. Unfortunately, existing methods for protein structure comparison are often unable to properly compare and align proteins related by complex structural modifications, such as circular permutations, large conformational changes and large residue insertions and deletions. In this paper, we present an algorithm capable of computing biologically meaningful alignments from structurally homologous but spatially distant fragments. Accurate alignments of proteins that have undergone large conformational variations are derived from multiple spatial superpositions. For mild to moderate conformational variations, approximate rigid body superpositions are recursively relaxed to allow matching of spatially distant regions. The algorithm incorporates an exact procedure for computing alignments of proteins related by circular permutations. We used two benchmarking datasets to demonstrate that our algorithm compares favorably to some of the most accurate methods available today. In the most difficult RIPC test set, the median accuracy of our method is 100%. The algorithm is freely available as a Web service at http://bioinfo.cs.uni.edu.
  
  Add to my favourites
  
  Email this

- Reconstruction, Topological and Gene Ontology Enrichment Analysis of Cancerous Gene Regulatory Network Modules
  
  By Khalid Raza
  
  https://doi.org/10.2174/1574893611666160115212806
  More Less
  
  The availability of large set of high throughput biological data needs algorithm that automatically reconstructs gene regulatory networks from these datasets. Cancerous regulatory network modules when analyzed critically may reveal the underlying mechanism of cancer, which may help in better diagnosis. Identification of cancerous genes and their regulation is an important research area in cancer systems biology. In this paper, we introduced an algorithm to infer cancerous gene regulatory network modules from gene expression profiles. The proposed algorithm has been applied to gene expression dataset of colon cancer patients and several network modules have been identified. We performed topological analysis of inferred network modules in terms of network density, degree distribution, clustering coefficient, average path length, network heterogeneity, and centrality measures. Further, GO-based enrichment analysis of the inferred network has been performed. To validate the proposed algorithm, it has been tested on benchmark dataset taken from DREAM3 challenge project.
  
  Add to my favourites
  
  Email this

- ORFpred: A Machine Learning Program to Identify Translatable Small Open Reading Frames in Intergenic Regions of the Plasmodium falciparum Genome
  
  Authors: Vivek Srinivas, Mayank Kumar, Santosh Noronha and Swati Patankar
  
  https://doi.org/10.2174/1574893611666160122221757
  More Less
  
  Motivation: Small Open Reading Frames (smORFs) are involved in a variety of cellular processes varying from metabolism to gene regulation and eukaryotic genomes have been predicted to contain a large number of smORFs. Only a meager 174 smORFs have been annotated in the genome of the human malaria parasite Plasmodium falciparum. Although millions of smORFs can be extracted from the parasite genome, the identification of translatable smORFs from the P. falciparum genome is a challenging task due to low accuracy of existing smORF predictors when applied to an AT biased genome. Result: We developed ORFpred, a machine learning algorithm which calculates the probability of translation initiation and elongation of ORFs in the P. falciparum genome. ORFpred identified 2204 translatable smORFs and when compared to available predictors, showed higher accuracy. We believe that ORFpred will help in identification of probable protein coding smORFs in other eukaryotic genomes. Availability and Implementation: Database used for training and testing the algorithm and source codes are freely available at http://www.bio.iitb.ac.in/~patankar/software/ORFpred.
  
  Add to my favourites
  
  Email this

- Ubipredictor: A New Tool for Species-Specific Prediction of Ubiquitination Sites Using Linear Discriminant Analysis
  
  Authors: Muhammad Saeed, Wajya Ajmal, Anum Masood, M. Rizwan Riaz and Malik Nadeem Akhtar
  
  https://doi.org/10.2174/1574893611666160122221505
  More Less
  
  Ubiquitination is involved in various cellular processes such as protein degradation and stability, cell cycle progression, transcriptional regulation, antigen processing, DNA repair, inflammation and regulation of apoptosis, etc. In silico prediction of potential candidate lysine (K) for ubiquitination will not only save time and money but will also generate valuable data for further scientific research. We developed Ubipredictor (http://chemdp.com/ubipredictor.php) tool for prediction of potential ubiquitinated lysine in protein sequences of human, mouse and yeast dataset using LDA. The statistically significant features selected through LDA were amino acid dimers, position specific score matrix (PSSM) and physicochemical properties of amino acid like electrostatic charge, heat capacity, codon diversity and secondary structure, etc. Testing on three different model organism datasets (human, mouse, yeast) showed that the predictive performance of Ubipredictor was better than two existing tools. On human and mouse datasets, Ubipredictor was found to be more sensitive than Ubipred and Ubpred. Unlike previously designed tools, we trained Ubipredictor specifically on experimentally verified ubiquitinated dataset for each of the human mouse and yeast species.
  
  Add to my favourites
  
  Email this

- Understanding Effects of Psychological Stress on Physiology and Disease Through Human Stressome - An Integral Algorithm
  
  Authors: Sushri Priyadarshini and Palok Aich
  
  https://doi.org/10.2174/157489361102160401163021
  More Less
  
  Psychological stress perturbs normal physiological function or homeostasis. Restoration of normalcy demands more supply of energy. A physiological mechanism via activated stress response system is aimed at providing quick energy to deal with such emergency situations. If stress response system remains activated for longer period, maintaining physiological homeostasis becomes difficult because of higher demand for energy which eventually leads to increased susceptibility to infection or disease. Although there are reports, associating psychological stress with physiological functions and diseases, a clear understanding of mechanism of stress manifestation is yet to be established. In order to facilitate extensive exploration and prediction of possible mechanisms, integration of molecular (gene-level) data pertaining to psychological stress, physiological processes and stress-associated diseases is needed. We report power of text-mining in combination with our data-integration methods and mathematical formulation to develop integrated geneassociation networks. These networks can be analyzed to gain holistic insights into the relationship between psychological stress-associated genes (stressome) and related physiological functions and diseases. We built the human psychostressome networks to understand and predict pathways and candidate genes responsible for perturbing balance among various physiological functions and disease manifestation. Using the current methodology, we were able to predict involvement of serotonin receptors and uridine 5'-diphospho-glucuronosyltransferases in mediating effects of psychological stress.
  
  Add to my favourites
  
  Email this

- Suitability of Sequence-Based Feature Vector for Classification Algorithm Improves Accuracy of Human Protein-Protein Interaction Prediction: A Red Blood Cell Case Study
  
  Authors: Afsaneh Maali, Mahmood A. Mahdavi and Reza Gheshlaghi
  
  https://doi.org/10.2174/1574893610666151026215233
  More Less
  
  To classify human protein-protein interaction information and consolidate existing data, supervised learning algorithms are implemented. These algorithms require a feature vector to generate a prediction model and feature vectors could be constructed based on various input data. The suitability of feature vector for classification algorithm results in a more predictive model and predictions with higher accuracies based on low-dimension vectors. To investigate the proper combination of feature sets and the algorithms, three feature vectors including AA Frequency, AA Graphical Parameter, and AA Triplex based on the sole knowledge of primary structure of human red blood cell proteins were constructed and then applied to five different classification methods. The results indicated that support vector machine (SVM) algorithm produced the highest accuracy of 84.65% with AA Graphical Parameter feature set while it reached accuracy of 80.65% with AA Triplex feature set. Random forest (RF) achieved high accuracy of 83.69% with all three feature sets on average. Bayesian classifier of TAN performed better than NB using all three features. Artificial neural network (ANN) classifier demonstrated the lowest average accuracy of 76%; however, the performance was comparable with TAN where AA triplex learning feature was used with the accuracy of 77.90%. These figures demonstrated that selecting an appropriate feature set for a classification task results in a higher accuracy with the advantage of utilizing low-dimension feature vectors constructed from more simple data.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 11, Issue 2, 2016

Volume 11, Issue 2, 2016

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed