Volume 9, Issue 3

Current Bioinformatics - Volume 9, Issue 3, 2014

Volume 9, Issue 3, 2014

- Editorial (Thematic Issue: Nonlinear Science Methods in Bioinformatics and Systems Biology)
  
  By Zu-Guo Yu
  
  https://doi.org/10.2174/157489360903140619113217
  More Less
  
  Add to my favourites
  
  Email this

- A 2D Pattern Matching Algorithm for Comparing Primary Protein Sequences
  
  Authors: Guohua Huang, Weiping Huang, Wenping Xie, Yongfan Li, Lixin Xu and Houqing Zhou
  
  https://doi.org/10.2174/1574893609666140516005556
  More Less
  
  Sequence comparison in the form of alignment plays a crucial role in the area of bioinformatics. However, alignment is commonly restricted by the number of aligned sequences. To address this problem, we presented a 2D pattern matching algorithm for comparing protein sequences. The new algorithm which is an alignment-free comparison is capable of allowing fast comparison even among a large number of protein sequences. The simulation on the artificial sequences indicated that our method would be robust. And the experiment on real protein sequences showed that our method would be effective.
  
  Add to my favourites
  
  Email this

- Dissimilarities in Alignment-Free Methods for Phylogenetic Analysis Based on Genomes
  
  Authors: Xiao-Su Chen, Zu-Guo Yu and Juan Zheng
  
  https://doi.org/10.2174/1574893609999140523124702
  More Less
  
  Whole genome sequences are generally accepted as excellent tools for studying evolutionary relationships. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignments could not be directly applied to the whole-genome comparison and phylogenomic studies. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. The “distances” used in these alignment-free methods are not proper distance metrics in the strict mathematical sense. In this study, we first review them in a more general frame — dissimilarity. Then we propose some new dissimilarities for phylogenetic analysis. Last three genome datasets are employed to evaluate these dissimilarities from a biological point of view.
  
  Add to my favourites
  
  Email this

- Influenza Pandemic Early Warning Research on HA/NA Protein Sequences
  
  Authors: Jie Gao, Ling Zhang and Peixuan Jin
  
  https://doi.org/10.2174/1574893609999140523124205
  More Less
  
  Using CGR-walk model, this paper studies influenza virus HA/NA protein sequences from 1914 to 2012, and figures out multiple early warning signal values of influenza pandemic outbreak. The variances and lag 2 autocorrelation coefficients of protein sequences obtained according to the detailed HP model of the epidemic outbreak years and the last two years are significantly higher than those of the last adjacent years, while there is not the feature in the non-epidemic years.
  
  Add to my favourites
  
  Email this

- Optimizing I/O Cost and Managing Memory for Composition Vector Method Based on Correlation Matrix Calculation in Bioinformatics
  
  Authors: Anaththa P.D. Krishnajith, Wayne Kelly and Yu-Chu Tian
  
  https://doi.org/10.2174/1574893609666140516005327
  More Less
  
  The generation of a correlation matrix for set of genomic sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. Each sequence may be millions of bases long and there may be thousands of such sequences which we wish to compare, so not all sequences may fit into main memory at the same time. Each sequence needs to be compared with every other sequence, so we will generally need to page some sequences in and out more than once. In order to minimize execution time we need to minimize this I/O. This paper develops an approach for faster and scalable computing of large-size correlation matrices through the maximal exploitation of available memory and reducing the number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different bioinformatics problems with different correlation matrix sizes. The significant performance improvement of the approach over previous work is demonstrated through benchmark examples.
  
  Add to my favourites
  
  Email this

- Robustness of Link-Prediction Algorithm Based on Similarity and Application to Biological Networks
  
  Authors: Liang Wang, Ke Hu and Yi Tang
  
  https://doi.org/10.2174/1574893609666140516005740
  More Less
  
  Many algorithms have been proposed to predict missing links in a variety of real networks. Emphasis is put on raising both accuracy and efficiency of these algorithms. However, less attention is paid to their robustness against either noise or irrationality of a link which exists in almost all of real networks. In this paper, we investigate the robustness of several typical node-similarity-based algorithms and find that these algorithms are sensitive to the strength of noise. Moreover, we find that it also depends on the structure properties of networks, especially on network efficiency, clustering coefficient and average degree. In addition, we make an attempt to enhance the robustness by using link weighting method to transform un-weighted network into weighted one and then making use of weights of links to characterize their reliability. The result shows that proper link weighting scheme can enhance both robustness and accuracy of these algorithms significantly in biological networks.
  
  Add to my favourites
  
  Email this

- Secondary Structure Element Alignment Kernel Method for Prediction of Protein Structural Classes
  
  Authors: Guo-Sheng Han, Zu-Guo Yu and Vo Anh
  
  https://doi.org/10.2174/1574893609999140523124847
  More Less
  
  In this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.
  
  Add to my favourites
  
  Email this

- Semi-Supervised Transductive Hot Spot Predictor Working on Multiple Assumptions
  
  Authors: Jim Jing-Yan Wang, Islam Khaleel Almasri, Yuexiang Shi and Xin Gao
  
  https://doi.org/10.2174/1574893609999140523124421
  More Less
  
  Protein-protein interactions are critically dependent on just a few residues (“hot spots”) at the interfaces. Hot spots make a dominant contribution to the binding free energy and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there exists a need for accurate and reliable computational hot spot prediction methods. Compared to the supervised hot spot prediction algorithms, the semi-supervised prediction methods can take into consideration both the labeled and unlabeled residues in the dataset during the prediction procedure. The transductive support vector machine has been utilized for this task and demonstrated a better prediction performance. To the best of our knowledge, however, none of the transductive semi-supervised algorithms takes all the three semisupervised assumptions, i.e., smoothness, cluster and manifold assumptions, together into account during learning. In this paper, we propose a novel semi-supervised method for hot spot residue prediction, by considering all the three semisupervised assumptions using nonlinear models. Our algorithm, IterPropMCS, works in an iterative manner. In each iteration, the algorithm first propagates the labels of the labeled residues to the unlabeled ones, along the shortest path between them on a graph, assuming that they lie on a nonlinear manifold. Then it selects the most confident residues as the labeled ones for the next iteration, according to the cluster and smoothness criteria, which is implemented by a nonlinear density estimator. Experiments on a benchmark dataset, using protein structure-based features, demonstrate that our approach is effective in predicting hot spots and compares favorably to other available methods. The results also show that our method outperforms the state-of-the-art transductive learning methods.
  
  Add to my favourites
  
  Email this

- RNA Secondary Structure Prediction Algorithms Including Pseudoknots
  
  Authors: Dolly Sharma, Shailendra Singh and Trilok Chand
  
  https://doi.org/10.2174/15748936113086660010
  More Less
  
  Pseudoknot is an important motif in RNA secondary structure. Early researchers of RNA secondary structure prediction ignored pseudoknots, but now pseudoknot is in focus in RNA secondary structure prediction. Several algorithms like dynamic programming, comparative algorithms, heuristic algorithms, formal grammar algorithms etc have so far been used for pseudoknot prediction, but the prediction of arbitrary pseudoknots is still an open problem. Also, there does not exist standard categorization of pseudoknot types. This article provides a brief description and comparison of various algorithms being used in pseudoknot prediction along with an overview of various forms of pseudoknots and their representations.
  
  Add to my favourites
  
  Email this

- Prediction of Vanillin and Glutamate Productions in Yeast Using a Hybrid of Continuous Bees Algorithm and Flux Balance Analysis (CBAFBA)
  
  Authors: Leang Huat Yin, Yee Wen Choon, Mohd Saberi Mohamad, Lian En Chai, Chuii Khim Chong, Afnizanfaizal Abdullah, Safaai Deris and Rosli M Illias
  
  https://doi.org/10.2174/1574893608666131120233937
  More Less
  
  Most food and beverages contain artificial flavor compounds. Creation of artificial flavors is not an easy step and it is hardly ever completely effective. In this paper, we introduce an in silico method in optimization of microbial strains of flavor compound synthesis. Previously, several algorithms exist such as Genetic Algorithm, Evolutionary Algorithm, Opt Knock tool and other related techniques which are widely used to predict the yield of target compound by suggesting the gene knockouts. The use of these algorithms or tools to is able to predict the yield of production instead of using trial and error method for gene deletions. Nowadays, without using in silico method, the direct experiment methods are not cost effective and time consuming. As we know, the cost of chemical is expensive and not all flavorists are able to afford the cost. However, the main limitations of previous algorithms are that they failed to optimize the prediction of the yield and suggesting unrealistic flux distribution. Therefore, this paper proposed a hybrid of continuous Bees algorithm and Flux Balance Analysis. The target compound in this research is vanillin and glutamate compound. The aim of study is to identify optimum gene knockouts. The results in this paper are the prediction of the yield and the growth rate values of the model. The predictive results showed that the improvement in terms of yield may help in food flavorings.
  
  Add to my favourites
  
  Email this

- A Comprehensive View on Metabolic Pathway Analysis Methodologies
  
  Authors: Namrata Tomar and Rajat K. De
  
  https://doi.org/10.2174/1574893609666140516005147
  More Less
  
  Advances in ‘omics’ high-throughput technologies have led to a vast amount and quality of available biological data. It has fostered the development of bioinformatics methods to interpret these data. In this regard, characterization of cellular metabolism is a useful task to understand the phenotypic capabilities of an organism. Several in silico approaches have emerged for analysis of metabolic pathways, including structural and stoichiometric analysis, metabolic flux analysis, metabolic control analysis, and several kinetic modeling based analysis. The present article provides the comprehensive survey on existing metabolic pathway analysis methodologies.
  
  Add to my favourites
  
  Email this

- Select Cluster Features for Better Layered Protein Function Prediction
  
  Authors: Wei Zhu, Jingyu Hou and Yi-Ping Phoebe Chen
  
  https://doi.org/10.2174/1574893608666131121224631
  More Less
  
  Background: High-throughput protein-protein interaction (PPI) datasets make it possible to exploit the interaction relationship between proteins to predict functions for those proteins that are still functionally unannotated. Although the clustering based approach has proved to be one of effective methods in some cases for protein function prediction, in most cases the prediction results are unsatisfactory. How to define a better similarity/distance measurement between proteins, how to choose proper clustering methods and how to select feature functions from clusters for better predictions still remain challenges to the improvement of the clustering based prediction approach. On the other hand, predicting functions at different functional layers for the unannotated proteins to provide more meaningful information about protein functions was rarely investigated by the existing algorithms. Results: In this paper, we propose algorithms that address the selection of feature functions from clusters to increase the prediction quality of clustering based prediction methods. Meanwhile, clustering based protein function prediction methods can effectively predict protein functions at different functional layers when incorporating our algorithms of cluster feature function selection. Evaluations on real PPI datasets demonstrated the effectiveness of the proposed algorithms. Conclusion: The proposed algorithms of cluster feature function selection reasonably reflect the intrinsic relationship among proteins. The multi-layered function prediction supported by our proposed algorithms provides more meaningful information for better understanding protein functions.
  
  Add to my favourites
  
  Email this

- Trends in Genome Compression
  
  Authors: Sebastian Wandelt, Marc Bux and Ulf Leser
  
  https://doi.org/10.2174/1574893609666140516010143
  More Less
  
  Technological advancements in high throughput sequencing have led to a tremendous increase in the amount of genomic data produced. With the cost being down to 2,000 USD for a single human genome, sequencing dozens of individuals is an undertaking that is feasible even for a smaller projects or organizations established. However, generating the sequence is only one issue; another one is storing, managing, and analyzing it. These tasks become more and more challenging due to the sheer size of the data sets and are increasingly considered to be the major bottlenecks in larger genome projects. One possible countermeasure is to compress the data; compression reduces costs in terms of requiring less hard disk storage and in terms of requiring less bandwidth if data is shipped to large compute clusters for parallel analysis. Accordingly, sequence compression has recently attracted much interest in the scientific community. In this paper, we explain the different basic techniques for sequence compression, point to distinctions between different compression tasks (e.g., genome compression versus read compression), and present a comparison of current approaches and tools. To further stimulate progress in genome compression research, we also identify key challenges for future systems.
  
  Add to my favourites
  
  Email this

- Molecular Modeling and Assessing the Catalytic Activity of Glucose Dehydrogenase of Gluconobacter suboxydans with a New Approach for Power Generation in a Microbial Fuel Cell
  
  Authors: R. Navanietha Krishnaraj, Saravanan Chandran, Parimal Pal and Sheela Berchmans
  
  https://doi.org/10.2174/1574893608666131217234633
  More Less
  
  Microbial fuel cells are electrochemical energy systems that transform the organic substrates for bioelectricity generation using the immense catalytic potential of the electrigens. Quinoprotein glucose dehydrogenase of Gluconobacter plays a key role in the oxidation of glucose in MFC’s. The structure of the Quinoprotein glucose dehydrogenase of Gluconobacter suboxydans is still unexplored. Herein, the modeled structure of Quinoprotein glucose dehydrogenase of Gluconobacter suboxydans is reported. The modeled structure is validated with the Ramachandran plot analysis. The active sites of the modeled protein are identified using the Q site finder. The catalytic activity of the modeled glucose dehydrogenase of G. suboxydans is analyzed based on its binding energy with the substrate. The experimental results show that the modeled structure has excellent stereochemical and electrocatalytic activity. The good electrocatalytic activity of glucose dehydrogenase offers higher electrogenic activity to Gluconobacter for its use as electrigens in MFC’s.
  
  Add to my favourites
  
  Email this

- Review of Protein Subcellular Localization Prediction
  
  Authors: Zhen Wang, Quan Zou, Yi Jiang, Ying Ju and Xiangxiang Zeng
  
  https://doi.org/10.2174/1574893609666140212000304
  More Less
  
  Protein subcellular localization is closely related to protein functions. Protein can work only in specific subcellular positions, so protein localization in a cell is very important in studies on cytobiology, proteomics, and drug design. Protein subcellular localization prediction based on machine learning is timely and has generated great interest in the field of bioinformatics. This paper reviews the research status of this problem in recent years from the following four aspects: protein dataset construction, features extraction of protein sequence, machine learning algorithms, and web server construction. Finally, we analyzed the challenges in predicting protein subcellular localization and identified possible future research trends.
  
  Add to my favourites
  
  Email this

- Prediction of miRNA in Human MHC that Encodes Different Immunological Functions Using Support Vector Machines
  
  Authors: Archana Prabahar and Jeyakumar Natarajan
  
  https://doi.org/10.2174/1574893608666131120002036
  More Less
  
  MicroRNAs (miRNAs) are short non-coding RNAs known to be involved in the gene regulatory functions in human. Major histocompatibility complex (MHC) located on the short arm of chromosome 6 remains as one of the most important regions associated with several human diseases. The complex spans ~4 Mb and covers >120 expressed genes. Gene expression at transcriptional and post transcriptional level is modulated by microRNA (miRNA) in collision with sequence polymorphism and epigenetic factors. In this study, we aim to predict miRNA responsible for different immunological functions and disorders in MHC region. Sequential and structural features of microRNAs were used for the classification of miRNA and other non-coding RNA data. Support vector machine (SVM) classifier was used for prediction and evaluated by jackknife validation technique. Overall accuracy was found to be 97.56% using leave-one-out cross validation technique. These experimental results confirm that our classification method predicts immune related miRNA with high accuracy.
  
  Add to my favourites
  
  Email this

- A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification
  
  Authors: Francisco J. Burguillo, Luis A. Corchete, Javier Martin, Inmaculada Barrera and William G. Bardsley
  
  https://doi.org/10.2174/15748936113086660011
  More Less
  
  An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction- can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work. We present an iterative PLS algorithm based on backward variable elimination through the “Variable Influence on Projection” (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples. The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 9, Issue 3, 2014

Volume 9, Issue 3, 2014

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed