Volume 8, Issue 2

Current Bioinformatics - Volume 8, Issue 2, 2013

Volume 8, Issue 2, 2013

- Editorial (Hot Topic: Intelligent Methodology Development in Computational Biology)
  
  By Hong-Bin Shen
  
  https://doi.org/10.2174/1574893611308020001
  More Less
  
  Add to my favourites
  
  Email this

- A Review on the Techniques for Characterizing and Predicting Human Genomic DNA Methylation
  
  Authors: Hao Zheng, Shi-Wen Jiang and Hongwei Wu
  
  https://doi.org/10.2174/1574893611308020002
  More Less
  
  Epigenetic modification refers to heritable changes in genotypes or phenotypes through biochemical modifications without altering the underlying DNA sequence. In humans, DNA methylation is a major epigenetic modification that adds a methyl group mainly to the carbon-5 position of the cytosine pyrimidine ring in the cytosine guanine dinucleotide. This epigenetic modification is crucial to normal development and cellular differentiation as well as a number of key processes including genomic imprinting, X-chromosome inactivation, suppression of repetitive elements, and tumorigenesis. Recently, a lot of resources and efforts have been devoted to the DNA methylation profiling of human genome based on biochemical experiments or computational prediction. Here, we provide a review on these experimental and computational techniques, and large-scaled DNA methylation data sets and databases. While the description of the biochemical techniques is mainly to provide an overview of the biological background, we focus more on the computational techniques, particularly on the data resources, methodologies and problems that have been studied. Our goal is to provide a guidance for future bioinformatics research on DNA methylation.
  
  Add to my favourites
  
  Email this

- Identifying Coevolution Between Amino Acid Residues in Protein Families: Advances in the Improvement and Evaluation of Correlated Mutation Algorithms
  
  Authors: Haisong Xu, Xiaoqin Li, Ziding Zhang and Jiangning Song
  
  https://doi.org/10.2174/1574893611308020003
  More Less
  
  Correlated mutation is regarded as a phenomenon induced by the demand of maintaining the structure and/or function of a protein during its biological evolution. Since it is closely related to the underlying mechanism of protein structure and function, tremendous efforts have been made to reveal the relationship between correlated mutations and the structure and function of the protein. In the past few decades, different coevolutionary analysis algorithms have been developed. They have been applied to study various aspects of protein structure and function, such as prediction of disulfide bonds, functionally important residues, residue-residue contacts and protein-protein interaction. Although considerable progress has been achieved so far, obstacles exist in many aspects such as identification, evaluation and interpretation of correlated mutations. In this review, we discuss several essential issues related to the overcoming of these obstacles in coevolution analysis, including the alignment size bias, phylogenetic bias, algorithm evaluation and coevolution interpretation. In particular, we focus on the inconsistent results generated by different algorithms and discuss possible reasons accounting for this discrepancy. We also discuss future challenges and research directions in coevolution analysis.
  
  Add to my favourites
  
  Email this

- Recent Advances in Predicting Functional Impact of Single Amino Acid Polymorphisms: A Review of Useful Features, Computational Methods and Available Tools
  
  Authors: Mingjun Wang, Zhongwei Sun, Tatsuya Akutsu and Jiangning Song
  
  https://doi.org/10.2174/1574893611308020004
  More Less
  
  Owing to the biological significance of single amino acid polymorphism (SAP), there has been an increasing interest in understanding how certain amino acid substitutions give rise to functional change and consequent disease association, while others remain neutral polymorphisms. With the increasing availability of biological data, our knowledge regarding functional elements of the proteome continues to expand. As experimental approaches to characterize specific genetic variants are expensive and time-consuming, it is greatly desirable to develop effective computational methods that are capable of accurately predicting the functional impact of SAPs. In this review, we summarize 22 in silico tools that were previously developed and also discuss the related work of the functional impact prediction of SAPs that did not specifically develop webservers/tools. Procedures regarding how to extract annotations of SAPs and select the relevant useful features, as well as how to choose appropriate algorithms are also described in this review. In the end, a case study is given as an illustration to assess the predictive ability of available tools for predicting the functional consequence of SAPs. It is our hope that this review could serve as a useful guidance for developing nextgeneration in silico approaches for identification of the functional impacts of SAPs in the future.
  
  Add to my favourites
  
  Email this

- Current Status of Machine Learning-Based Methods for Identifying Protein-Protein Interaction Sites
  
  Authors: Bing Wang, Wenlong Sun, Jun Zhang and Peng Chen
  
  https://doi.org/10.2174/1574893611308020005
  More Less
  
  High-throughput experimental technologies continue to alter the study of current system biology. Investigators are understandably eager to harness the power of these new technologies. Protein-protein interactions on these platforms, however, present numerous production and bioinformatics challenges. Some issues like feature extraction, feature representation, prediction algorithm and results analysis have become increasingly problematic in the prediction of protein-protein interaction sites. The development of powerful, efficient prediction methods for inferring protein interface residues based on protein primary sequence or/and 3D structure is critical for the research community to accelerate research and publications. Currently, machine learning-based approaches are drawing the most attention in predicting protein interaction sites. This review aims to describe the state of the whole pipeline when machine learning strategies are applied to infer protein interaction sites.
  
  Add to my favourites
  
  Email this

- Predicting Protein N-Terminal Signal Peptides Using Position-Specific Amino Acid Propensities and Conditional Random Fields
  
  Authors: Yong-Xian Fan, Jiangning Song, Chen Xu and Hong-Bin Shen
  
  https://doi.org/10.2174/1574893611308020006
  More Less
  
  Protein signal peptides play a vital role in targeting and translocation of most secreted proteins and many integral membrane proteins in both prokaryotes and eukaryotes. Consequently, accurate prediction of signal peptides and their cleavage sites is an important task in molecular biology. In the present study, firstly, we develop a novel discriminative scoring method for classifying proteins with or without signal peptides. This method successfully captured the characteristics of signal peptides and non-signal peptides by integrating hydrophobicity alignment and positionspecific amino acid propensities based on the highest average positions. As a result, this method is capable of discriminating proteins with signal peptides at the overall accuracies of 96.3%, 97.0% and 97.2% by leave-one-out jackknife tests on the constructed benchmark datasets for three different organisms, i.e. Eukaryotic, Gram-negative, and Gram-positive respectively. Secondly, we consider the prediction task of signal peptide cleavage sites as a sequence labeling problem and apply Conditional Random Fields (CRFs) algorithm to solve it. Experimental results demonstrate that the proposed CRFs-based cleavage site finding approach can achieve the prediction success rates of 80.8%, 89.4%, and 74.0% respectively, for the secretory proteins from three different organisms. An online tool, LnSignal, is established for labeling the N-terminal signal cleavage sites and is freely available for academic use at http: //www.csbio.sjtu.edu.cn/bioinf/LnSignal.
  
  Add to my favourites
  
  Email this

- SubChlo-GO: Predicting Protein Subchloroplast Locations with Weighted Gene Ontology Scores
  
  Authors: Pufeng Du, Tingting Li, Xin Wang and Chao Xu
  
  https://doi.org/10.2174/1574893611308020007
  More Less
  
  Chloroplasts are subcellular organelles found only in green plants and eukaryotic algae. Chloroplasts are of central importance in the photosynthesis process. The subchloroplast localizations of chloroplast proteins are critical in understanding their functions and important for fully decipher the photosynthesis process. Although there are several existing methods that computationally determine protein subchloroplast localizations, prediction performance and software availability can still be improved. We proposed a novel computational method, namely, the Weighted Gene Ontology Scores, to predict protein subchloroplast locations. This method can achieve at least 88% prediction accuracy on the benchmarking dataset, which is significantly higher than existing methods. SubChlo-GO, which is an easy-to-use webbased online service, has been constructed based on the proposed method. We hope that SubChlo-GO could be helpful in chloroplast proteome research.
  
  Add to my favourites
  
  Email this

- Prediction of Metabolic Pathway Using Graph Property, Chemical Functional Group and Chemical Structural Set
  
  Authors: Lei Chen, Wei-Ming Zeng, Yu-Dong Cai and Tao Huang
  
  https://doi.org/10.2174/1574893611308020008
  More Less
  
  In systems biology, it is a great challenge for researchers to identify whether the given set of organic compounds can combine together and form a meaningful pathway. Fortunately, it becomes more and more feasible to address and solve such a problem with the rapidly accumulated information on various organisms. Based on the attainable information, a novel computational approach is proposed to investigate this problem by adopting the metabolic pathway of yeast as the subject of the study. And we produced a benchmark dataset with 13,736 pathways consisting of both valid and invalid pathways and identified the valid pathways among them. Each of these pathways was encoded into a numeric vector, consisting of three parts: graph property, chemical functional group, and chemical structural set. Methods of Minimum Redundancy Maximum Relevance and Incremental Feature Selection were utilized to select an optimal feature set, and Nearest Neighbor Algorithm was adopted as the classification model, while Jackknife Test was used to evaluate the model. As a result, an optimal feature set consisting of 16 features, which were able to identify the valid pathways most successfully, was obtained.
  
  Add to my favourites
  
  Email this

- Non-Binary Coding for Texture Descriptors in Sub-Cellular and Stem Cell Image Classification
  
  Authors: Michelangelo Paci, Loris Nanni, Anna Lahti, Katriina Aalto-Setala, Jari Hyttinen and Stefano Severi
  
  https://doi.org/10.2174/1574893611308020009
  More Less
  
  In recent years, binary coding of image features, such as local binary patterns and local phase quantization, have become popular in a large variety of image quantification tasks. Lately, some non-binary codings, such as local ternary pattern, have been proposed to improve the performance of these binary based approaches. In these methods it is very important to correctly choose the thresholds applied for building the coding used to represent a given image and its features by a feature vector. In this work we compare several approaches for extracting local ternary/quinary pattern image features and ternary coding for local phase quantization on various types of biological microscope images using six image databases for sub-cellular and stem cell image classification. We use these image features for training a stand-alone support vector machine and a random subspace of support vector machines to separate the different classes present in each dataset. Moreover, several distance measures are tested. Our results show that, on the chosen datasets, the best approach uses a multi-threshold local quinary coding. The use of a more discriminating coding scheme than the binary one, combined with a pool of thresholds, helps in distinguishing descriptive features from noise, thus improving classification results. The Matlab code is available at bias.csr.unibo.it/nanni/TernaryCoding.rar.
  
  Add to my favourites
  
  Email this

- J-TM Align: Efficient Comparison of Protein Structure Based on TMAlign
  
  Authors: Pietro H. Guzzi, Pierangelo Veltri and Mario Cannataro
  
  https://doi.org/10.2174/1574893611308020010
  More Less
  
  Proteins interact among them and different interactions are represented as graphs named Protein to Protein Interaction (PPI) networks. From a physical point of view, interactions are performed by contacts among protein structure. Consequently, the study and the comparison of protein structure is an important field in Bioinformatics and Computational Biology. The TM-Align algorithm is a method that presents one of the best performance but is currently available only as a stand alone application with a simple command-line interface available only on Linux platforms. We provide a comprehensive tool, (J-TMAlign) allowing a graphical, easy to use, interface to access J-TMAlign functions and the possibility to visualize compared structure. Finally, J-TMAlign is based on a multi-threaded architecture enables user to submit multiple jobs that are executed in a concurrent and time-efficient way.
  
  Add to my favourites
  
  Email this

- Progress in Gene Prediction: Principles and Challenges
  
  Authors: Srabanti Maji and Deepak Garg
  
  https://doi.org/10.2174/1574893611308020011
  More Less
  
  Bioinformatics is a promising and innovative research field in 21st century. Automatic gene prediction has been an actively researched field of bioinformatics. Despite a high number of techniques specifically dedicated to bioinformatics problems as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, engineering, chemistry, physics, and mathematics. Presently, a large number of gene identification tools are based on computational intelligence approaches. Here, we have discussed the existing conventional as well as computational methods to identify gene(s) and various gene predictors are compared. The paper includes some drawbacks of the presently available methods and also, the probable guidelines for future directions are discussed.
  
  Add to my favourites
  
  Email this

- Analysis of Gene Logic Networks for Arabidopsis
  
  Authors: Yansen Su, Shudong Wang, Eryan Li, Tao Song, Hui Yu and Dazhi Meng
  
  https://doi.org/10.2174/1574893611308020012
  More Less
  
  External stimuli may activate the stress response in Arabidopsis thaliana. The molecules which play important roles in the stress response have been widely studied. However, the interactions, especially logic interactions, among these molecules, need to be studied. In this paper, logic networks are constructed based on gene expression profiles of Arabidopsis under the normal condition and four different stimuli conditions, respectively. It is found that the distribution of different types of 2-order logics in the gene logic network under the normal condition is different from the others. Furthermore, the logic networks of genes which play important roles are constructed and their dynamics are simulated. It is then observed that the number of attractors in the logic network for Arabidopsis under the normal condition is less than those under four external stimuli. It is also observed that the number of attractors with large attraction domain in the logic network for Arabidopsis under the normal condition is greater than those under four external stimuli. The results show that the distribution of different types of 2-order logics and the number of attractors clearly distinguish logic network under the normal condition from those under external stimuli conditions. Our studies will provide the theoretical basis for experimental studies on the stress response of Arabidopsis.
  
  Add to my favourites
  
  Email this

- A Review of Computational Approaches for In Silico Metabolic Engineering for Microbial Fuel Production
  
  Authors: Weng H. Chan, Mohd S. Mohamad, Safaai Deris and Rosli M. Illias
  
  https://doi.org/10.2174/1574893611308020013
  More Less
  
  High energy consumption nowadays alongside with concerns on the environment had caused rising demand for synthetic alternative fuels. These include biofuels that can be produced from a variety of engineered microbes such as Escherichia coli. In the metabolic engineering field, this is done by genetically modifying the target microbes to obtain optimal production of a particular biochemical. Conventional metabolic engineering approaches often intuitive, but with advancements in modern biology, vast amount of informative data generated from time to time to describe the metabolism system of the microbes more thoroughly. Discoveries from interpreting these available data using computational approaches are highly beneficial to metabolic engineers, especially professionals working in the in silico metabolic engineering field. Within the past decade, many computational approaches and routines have been proposed and developed in providing a platform to discover rational strategies to aid biologists in engineering the metabolic network. Here, efforts to find the optimal butanol production route in E. coli as well as several optimization algorithms currently available for finding optimal solution to enhance biochemical production in designated target microbe are discussed. This review aims to show different optimization algorithms developed for in silico metabolic engineering and their applications in microbial fuel production.
  
  Add to my favourites
  
  Email this

- Matrix Decomposition Methods in Bioinformatics
  
  Authors: Li-Ping Tian, Lizhi Liu and Fang-Xiang Wu
  
  https://doi.org/10.2174/1574893611308020014
  More Less
  
  With advances in biotechnology, a huge amount of high throughput biological data has been and will continuously be produced. The information contained in such data is very useful in understanding the biological process from which such data is collected. Generally, high throughput biological data such as gene expression data is presented in a data matrix. Through matrix decomposition methods, we can often discover some very useful information. In bioinformatics, principal component analysis (PCA), independent component analysis (ICA), nonnegative matrix factorization (NMF) and network component analysis (NCA) are widely used to help understand and utilize high throughput data. They are all matrix decomposition methods, but subject to different constraints. In this paper, each of these methods is introduced and its applications to high throughput biological data are discussed. We also compare these methods and discuss their pros and cons.
  
  Add to my favourites
  
  Email this

- Periodic Correlation Structures in Bacterial and Archaeal Complete Genomes
  
  By Zuo-Bing Wu
  
  https://doi.org/10.2174/1574893611308020015
  More Less
  
  The periodic transference of nucleotide strings in bacterial and archaeal complete genomes is investigated by using the metric representation and the recurrence plot method. The generated periodic correlation structures exhibit four kinds of fundamental transferring characteristics: a single increasing period, several increasing periods, an increasing quasi-period and almost noincreasing period. The mechanism of the periodic transference is further analyzed by determining all long periodic nucleotide strings in the bacterial and archaeal complete genomes and is explained as follows: both the repetition of basic periodic nucleotide strings and the transference of non-periodic nucleotide strings would form the periodic correlation structures with approximately the same increasing periods.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 8, Issue 2, 2013

Volume 8, Issue 2, 2013

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed