Current Bioinformatics - Volume 9, Issue 3, 2014
Volume 9, Issue 3, 2014
-
-
A 2D Pattern Matching Algorithm for Comparing Primary Protein Sequences
Authors: Guohua Huang, Weiping Huang, Wenping Xie, Yongfan Li, Lixin Xu and Houqing ZhouSequence comparison in the form of alignment plays a crucial role in the area of bioinformatics. However, alignment is commonly restricted by the number of aligned sequences. To address this problem, we presented a 2D pattern matching algorithm for comparing protein sequences. The new algorithm which is an alignment-free comparison is capable of allowing fast comparison even among a large number of protein sequences. The simulation on the artificial sequences indicated that our method would be robust. And the experiment on real protein sequences showed that our method would be effective.
-
-
-
Dissimilarities in Alignment-Free Methods for Phylogenetic Analysis Based on Genomes
Authors: Xiao-Su Chen, Zu-Guo Yu and Juan ZhengWhole genome sequences are generally accepted as excellent tools for studying evolutionary relationships. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignments could not be directly applied to the whole-genome comparison and phylogenomic studies. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. The “distances” used in these alignment-free methods are not proper distance metrics in the strict mathematical sense. In this study, we first review them in a more general frame — dissimilarity. Then we propose some new dissimilarities for phylogenetic analysis. Last three genome datasets are employed to evaluate these dissimilarities from a biological point of view.
-
-
-
Influenza Pandemic Early Warning Research on HA/NA Protein Sequences
Authors: Jie Gao, Ling Zhang and Peixuan JinUsing CGR-walk model, this paper studies influenza virus HA/NA protein sequences from 1914 to 2012, and figures out multiple early warning signal values of influenza pandemic outbreak. The variances and lag 2 autocorrelation coefficients of protein sequences obtained according to the detailed HP model of the epidemic outbreak years and the last two years are significantly higher than those of the last adjacent years, while there is not the feature in the non-epidemic years.
-
-
-
Optimizing I/O Cost and Managing Memory for Composition Vector Method Based on Correlation Matrix Calculation in Bioinformatics
Authors: Anaththa P.D. Krishnajith, Wayne Kelly and Yu-Chu TianThe generation of a correlation matrix for set of genomic sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. Each sequence may be millions of bases long and there may be thousands of such sequences which we wish to compare, so not all sequences may fit into main memory at the same time. Each sequence needs to be compared with every other sequence, so we will generally need to page some sequences in and out more than once. In order to minimize execution time we need to minimize this I/O. This paper develops an approach for faster and scalable computing of large-size correlation matrices through the maximal exploitation of available memory and reducing the number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different bioinformatics problems with different correlation matrix sizes. The significant performance improvement of the approach over previous work is demonstrated through benchmark examples.
-
-
-
Robustness of Link-Prediction Algorithm Based on Similarity and Application to Biological Networks
Authors: Liang Wang, Ke Hu and Yi TangMany algorithms have been proposed to predict missing links in a variety of real networks. Emphasis is put on raising both accuracy and efficiency of these algorithms. However, less attention is paid to their robustness against either noise or irrationality of a link which exists in almost all of real networks. In this paper, we investigate the robustness of several typical node-similarity-based algorithms and find that these algorithms are sensitive to the strength of noise. Moreover, we find that it also depends on the structure properties of networks, especially on network efficiency, clustering coefficient and average degree. In addition, we make an attempt to enhance the robustness by using link weighting method to transform un-weighted network into weighted one and then making use of weights of links to characterize their reliability. The result shows that proper link weighting scheme can enhance both robustness and accuracy of these algorithms significantly in biological networks.
-
-
-
Secondary Structure Element Alignment Kernel Method for Prediction of Protein Structural Classes
Authors: Guo-Sheng Han, Zu-Guo Yu and Vo AnhIn this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.
-
-
-
Semi-Supervised Transductive Hot Spot Predictor Working on Multiple Assumptions
Authors: Jim Jing-Yan Wang, Islam Khaleel Almasri, Yuexiang Shi and Xin GaoProtein-protein interactions are critically dependent on just a few residues (“hot spots”) at the interfaces. Hot spots make a dominant contribution to the binding free energy and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there exists a need for accurate and reliable computational hot spot prediction methods. Compared to the supervised hot spot prediction algorithms, the semi-supervised prediction methods can take into consideration both the labeled and unlabeled residues in the dataset during the prediction procedure. The transductive support vector machine has been utilized for this task and demonstrated a better prediction performance. To the best of our knowledge, however, none of the transductive semi-supervised algorithms takes all the three semisupervised assumptions, i.e., smoothness, cluster and manifold assumptions, together into account during learning. In this paper, we propose a novel semi-supervised method for hot spot residue prediction, by considering all the three semisupervised assumptions using nonlinear models. Our algorithm, IterPropMCS, works in an iterative manner. In each iteration, the algorithm first propagates the labels of the labeled residues to the unlabeled ones, along the shortest path between them on a graph, assuming that they lie on a nonlinear manifold. Then it selects the most confident residues as the labeled ones for the next iteration, according to the cluster and smoothness criteria, which is implemented by a nonlinear density estimator. Experiments on a benchmark dataset, using protein structure-based features, demonstrate that our approach is effective in predicting hot spots and compares favorably to other available methods. The results also show that our method outperforms the state-of-the-art transductive learning methods.
-
-
-
RNA Secondary Structure Prediction Algorithms Including Pseudoknots
Authors: Dolly Sharma, Shailendra Singh and Trilok ChandPseudoknot is an important motif in RNA secondary structure. Early researchers of RNA secondary structure prediction ignored pseudoknots, but now pseudoknot is in focus in RNA secondary structure prediction. Several algorithms like dynamic programming, comparative algorithms, heuristic algorithms, formal grammar algorithms etc have so far been used for pseudoknot prediction, but the prediction of arbitrary pseudoknots is still an open problem. Also, there does not exist standard categorization of pseudoknot types. This article provides a brief description and comparison of various algorithms being used in pseudoknot prediction along with an overview of various forms of pseudoknots and their representations.
-
-
-
Prediction of Vanillin and Glutamate Productions in Yeast Using a Hybrid of Continuous Bees Algorithm and Flux Balance Analysis (CBAFBA)
Most food and beverages contain artificial flavor compounds. Creation of artificial flavors is not an easy step and it is hardly ever completely effective. In this paper, we introduce an in silico method in optimization of microbial strains of flavor compound synthesis. Previously, several algorithms exist such as Genetic Algorithm, Evolutionary Algorithm, Opt Knock tool and other related techniques which are widely used to predict the yield of target compound by suggesting the gene knockouts. The use of these algorithms or tools to is able to predict the yield of production instead of using trial and error method for gene deletions. Nowadays, without using in silico method, the direct experiment methods are not cost effective and time consuming. As we know, the cost of chemical is expensive and not all flavorists are able to afford the cost. However, the main limitations of previous algorithms are that they failed to optimize the prediction of the yield and suggesting unrealistic flux distribution. Therefore, this paper proposed a hybrid of continuous Bees algorithm and Flux Balance Analysis. The target compound in this research is vanillin and glutamate compound. The aim of study is to identify optimum gene knockouts. The results in this paper are the prediction of the yield and the growth rate values of the model. The predictive results showed that the improvement in terms of yield may help in food flavorings.
-
-
-
A Comprehensive View on Metabolic Pathway Analysis Methodologies
Authors: Namrata Tomar and Rajat K. DeAdvances in ‘omics’ high-throughput technologies have led to a vast amount and quality of available biological data. It has fostered the development of bioinformatics methods to interpret these data. In this regard, characterization of cellular metabolism is a useful task to understand the phenotypic capabilities of an organism. Several in silico approaches have emerged for analysis of metabolic pathways, including structural and stoichiometric analysis, metabolic flux analysis, metabolic control analysis, and several kinetic modeling based analysis. The present article provides the comprehensive survey on existing metabolic pathway analysis methodologies.
-
-
-
Select Cluster Features for Better Layered Protein Function Prediction
Authors: Wei Zhu, Jingyu Hou and Yi-Ping Phoebe ChenBackground: High-throughput protein-protein interaction (PPI) datasets make it possible to exploit the interaction relationship between proteins to predict functions for those proteins that are still functionally unannotated. Although the clustering based approach has proved to be one of effective methods in some cases for protein function prediction, in most cases the prediction results are unsatisfactory. How to define a better similarity/distance measurement between proteins, how to choose proper clustering methods and how to select feature functions from clusters for better predictions still remain challenges to the improvement of the clustering based prediction approach. On the other hand, predicting functions at different functional layers for the unannotated proteins to provide more meaningful information about protein functions was rarely investigated by the existing algorithms. Results: In this paper, we propose algorithms that address the selection of feature functions from clusters to increase the prediction quality of clustering based prediction methods. Meanwhile, clustering based protein function prediction methods can effectively predict protein functions at different functional layers when incorporating our algorithms of cluster feature function selection. Evaluations on real PPI datasets demonstrated the effectiveness of the proposed algorithms. Conclusion: The proposed algorithms of cluster feature function selection reasonably reflect the intrinsic relationship among proteins. The multi-layered function prediction supported by our proposed algorithms provides more meaningful information for better understanding protein functions.
-
-
-
Trends in Genome Compression
Authors: Sebastian Wandelt, Marc Bux and Ulf LeserTechnological advancements in high throughput sequencing have led to a tremendous increase in the amount of genomic data produced. With the cost being down to 2,000 USD for a single human genome, sequencing dozens of individuals is an undertaking that is feasible even for a smaller projects or organizations established. However, generating the sequence is only one issue; another one is storing, managing, and analyzing it. These tasks become more and more challenging due to the sheer size of the data sets and are increasingly considered to be the major bottlenecks in larger genome projects. One possible countermeasure is to compress the data; compression reduces costs in terms of requiring less hard disk storage and in terms of requiring less bandwidth if data is shipped to large compute clusters for parallel analysis. Accordingly, sequence compression has recently attracted much interest in the scientific community. In this paper, we explain the different basic techniques for sequence compression, point to distinctions between different compression tasks (e.g., genome compression versus read compression), and present a comparison of current approaches and tools. To further stimulate progress in genome compression research, we also identify key challenges for future systems.
-
-
-
Molecular Modeling and Assessing the Catalytic Activity of Glucose Dehydrogenase of Gluconobacter suboxydans with a New Approach for Power Generation in a Microbial Fuel Cell
Authors: R. Navanietha Krishnaraj, Saravanan Chandran, Parimal Pal and Sheela BerchmansMicrobial fuel cells are electrochemical energy systems that transform the organic substrates for bioelectricity generation using the immense catalytic potential of the electrigens. Quinoprotein glucose dehydrogenase of Gluconobacter plays a key role in the oxidation of glucose in MFC’s. The structure of the Quinoprotein glucose dehydrogenase of Gluconobacter suboxydans is still unexplored. Herein, the modeled structure of Quinoprotein glucose dehydrogenase of Gluconobacter suboxydans is reported. The modeled structure is validated with the Ramachandran plot analysis. The active sites of the modeled protein are identified using the Q site finder. The catalytic activity of the modeled glucose dehydrogenase of G. suboxydans is analyzed based on its binding energy with the substrate. The experimental results show that the modeled structure has excellent stereochemical and electrocatalytic activity. The good electrocatalytic activity of glucose dehydrogenase offers higher electrogenic activity to Gluconobacter for its use as electrigens in MFC’s.
-
-
-
Review of Protein Subcellular Localization Prediction
Authors: Zhen Wang, Quan Zou, Yi Jiang, Ying Ju and Xiangxiang ZengProtein subcellular localization is closely related to protein functions. Protein can work only in specific subcellular positions, so protein localization in a cell is very important in studies on cytobiology, proteomics, and drug design. Protein subcellular localization prediction based on machine learning is timely and has generated great interest in the field of bioinformatics. This paper reviews the research status of this problem in recent years from the following four aspects: protein dataset construction, features extraction of protein sequence, machine learning algorithms, and web server construction. Finally, we analyzed the challenges in predicting protein subcellular localization and identified possible future research trends.
-
-
-
Prediction of miRNA in Human MHC that Encodes Different Immunological Functions Using Support Vector Machines
Authors: Archana Prabahar and Jeyakumar NatarajanMicroRNAs (miRNAs) are short non-coding RNAs known to be involved in the gene regulatory functions in human. Major histocompatibility complex (MHC) located on the short arm of chromosome 6 remains as one of the most important regions associated with several human diseases. The complex spans ~4 Mb and covers >120 expressed genes. Gene expression at transcriptional and post transcriptional level is modulated by microRNA (miRNA) in collision with sequence polymorphism and epigenetic factors. In this study, we aim to predict miRNA responsible for different immunological functions and disorders in MHC region. Sequential and structural features of microRNAs were used for the classification of miRNA and other non-coding RNA data. Support vector machine (SVM) classifier was used for prediction and evaluated by jackknife validation technique. Overall accuracy was found to be 97.56% using leave-one-out cross validation technique. These experimental results confirm that our classification method predicts immune related miRNA with high accuracy.
-
-
-
A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification
An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction- can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work. We present an iterative PLS algorithm based on backward variable elimination through the “Variable Influence on Projection” (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples. The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
