Current Bioinformatics - Volume 9, Issue 1, 2014
Volume 9, Issue 1, 2014
-
-
A 2-Layer Web Server for Enzyme and Multifunctional Enzyme Identification
More LessAuthors: Leyi Wei, Guilin Li, Xing Gao, Weicheng Chen, Ping Xuan and Minghong LiaoWith the explosive growth of databanks consisting of protein sequences, there is an increasing need for annotating a number of newly discovered enzyme sequences. Given a protein sequence, the question arises on how to identify whether it is an enzyme or a non-enzyme? If it is an enzyme, and then which main functional class does it belong to? Since the biology experiment methods are both time-consuming and expensive, it is highly desired to develop an in silicon method to address these problems. In this paper, two effective methods are taken into consideration to constitute the 2-layer predictor: the 1st layer prediction engine respectively extracts 188-D features based on composition and physical-chemical property of protein and extract 20-D features by using position-specific scoring matrix (PSSM), for determining a query protein as an enzyme or a non-enzyme; the 2nd layer prediction engine extracts 20-D feature by PSSM and is designed for predicting the main family class of the enzyme. In our experiment, multifunctional enzymes due to their specific characterstics are viewed as the 7th category of enzyme. As a result, the accuracy of 1st layer prediction reaches 98.99% (188-D) and 98.25% (20-D) using 10-cross-validation, and for the 2nd layer prediction, 97.12% by Random Forest and 98.39% accuracy by IB1 are obtained. These high accuracies indicate that the current method could be an effective and promising high throughput method in the enzyme research. Furthermore, we developed an online web server which can be accessed via http://datamining.xmu.edu.cn:8080/PredictE/.
-
-
-
Bioinformatics: A Molecular Microbiologist’s Perspective
More LessAuthors: Sílvia A. Sousa, Joana R. Feliciano, Andre M. Grilo and Jorge H. LeitaoResearch activities in the area of biological sciences, and particularly molecular microbiology, nowadays generate vast amounts of data. The development of faster and cheaper sequencing methods has definitively contributed to this huge amount of data, requiring the development of fast and reliable tools for their analysis. In the last years, many easy-to-use, reliable and powerful bioinformatics tools have been developed. In this work, we review the fundamentals of some of these tools, and describe how we have been using them to analyze data resulting from our research envisaging the identification and characterization of virulence factors and determinants from the human opportunistic pathogens of the Burkholderia cepacia complex. Examples given illustrate the user-friendly characteristic of these tools and their power both in analyzing information and in orientating future experimental work.
-
-
-
A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data
More LessAuthors: Kohbalan Moorthy, Mohd Saberi Mohamad and Safaai DerisMany bioinformatics analytical tools, especially for cancer classification and prediction, require complete sets of data matrix. Having missing values in gene expression studies significantly influences the interpretation of final data. However, to most analysts’ dismay, this has become a common problem and thus, relevant missing value imputation algorithms have to be developed and/or refined to address this matter. This paper intends to present a review of preferred and available missing value imputation methods for the analysis and imputation of missing values in gene expression data. Focus is placed on the abilities of algorithms in performing local or global data correlation to estimate the missing values. Approaches of the algorithms mentioned have been categorized into global approach, local approach, hybrid approach, and knowledge assisted approach. The methods presented are accompanied with suitable performance evaluation. The aim of this review is to highlight possible improvements on existing research techniques, rather than recommending new algorithms with the same functional aim.
-
-
-
Medherb: An Interactive Bioinformatics Database and Analysis Resource for Medicinally Important Herbs
More LessAuthors: Muhammad Ibrahim Rajoka, Sobia Idrees, Sana Khalid and Beenish EhsanEthnopharmacological findings are spread over several databases but are not well connected to other biomedical databases. Consequently, the utility of these sources as knowledge resources are limited. Herbal medicines have long been used for the treatment of different ailments and have attracted the attention of scientists and patients due to their easy availability, low cost, and affordability. Research is being performed on the active constituents, protein availability, healing benefits of the medicinal herbs, and their implication in the treatment of multidrug resistant pathogens. Thus, the cataloging of medicinal herbs’ information along with their DNA/protein sequences has become a fundamental step in the development of new medicinal drugs against diseases. However, assembly of this information requires proper storage, management and analyzing tools. This database provides comprehensive information on medicinal properties of herbs with a stylish web interface. MedHerb database provides quick information access to medicinal herbs, genes, proteins, plant species, statistical vision, and published literature with detailed information on each aspect at only one click. Although several medicinal plants have been tested for different diseases but detailed information on above aspects on only 9 medicinal plant species is available. Studies are being done to explore other plant species and assemble their all characteristics and will be added in this database when available. Availability of primers is another unique feature of the database assisting in Polymerase Chain Reaction (PCR) application. This database aims to expand the information by adding new features like addition of more plant species, expressed sequence tags, information on active constituents and new tools to facilitate the researchers from different backgrounds. The database is available at http://medicinalherbs.comule.com/.
-
-
-
Predicting Recombination Hotspots in Yeast Based on DNA Sequence and Chromatin Structure
More LessAuthors: Bingjie Zhang and Guoqing LiuMeiotic recombination does not occur evenly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Recombination depends not only on sequence features, but also on chromatin structure. Identification and characterization of hotspots and coldspots are considerably significant as the information about hot/cold spots would shed light on the mechanism of recombination and genome evolution. In this study, we analyzed the correlation between recombination and nucleosome occupancy, and presented a model for predicting recombination hotspots based on the sequence and nucleosome occupancy in yeast. Our results show that the regions with high nucleosome occupancy have high recombination rate in the yeast genome and an improved prediction accuracy of 81.6% is achieved when nucleosome occupancy is used as an additional feature.
-
-
-
SMOTER, A Structured Motif Finder Based on an Exhaustive Tree-Based Algorithm
More LessAuthors: Siavash Sheikhizadeh and Samin HosseiniIn this paper, an exhaustive algorithm for extracting structured motifs has been presented. Structured motif is defined as an ordered set of highly-conserved over-presented patterns which occur near each other in a set of DNA sequences .The presented algorithm is based on an innovative data structure called l-mer trie. As opposed to other existing motif finders, this algorithm offers more flexibility in terms of the possibility of determining a range for length of the single patterns, their substitution rates and the spacing between them. The possibility of defining a minimum bound for substitution rates saves considerable time and space in the case of searching for weak motifs occurring with many mutations. Efficiency of the algorithm has been verified on some artificial sequences as well as real DNA sequences of some plant viruses. The results have been compared with those achieved by RISO, another tree-based algorithm, which is claimed to have notable time and space gains over the best known exact algorithms.
-
-
-
Reference Alignment Based Methods for Quality Evaluation of Multiple Sequence Alignment - A Survey
More LessAuthors: Pawel Wojciechowski, Piotr Formanowicz and Jacek BlazewiczFast development of new heuristic methods solving a multiple sequence alignment~(MSA) problem enforces a development of benchmarks for evaluation of the quality of these methods. One of the most reliable benchmarks relies on a comparison of an alignment created by a tested method and the reference alignment (which is usually created on the basis of some additional knowledge like its tertiary structure) stored in the database. This paper surveys methods for the quality evaluation of multiple sequence alignment based on databases of reference alignments. Several classes of reference alignments have been defined. Then, applicable quality measures for these classes have been described. We have also presented applications which can be used to perform the evaluation of the quality of a tested alignment. Among others, this survey contains our remarks and observations concerning reference alignments, the quality measures and the applications for the quality evaluation. Based on our experience, we have proposed a new nomenclature for quality measures. To clarify described problems, two example calculations of various measures for several classes of the reference alignments are presented in the Appendix.
-
-
-
Improved Prediction of Protein Crystallization, Purification and Production Propensity Using Hybrid Sequence Representation
More LessAuthors: Jianzhao Gao, Gang Hu, Zhonghua Wu, Jishou Ruan, Shiyi Shen, Michelle Hanlon and Kui WangProduction of high-quality crystals is one of the main bottlenecks in X-ray crystallography-based protein structure determination. In this paper we introduce PPCinter, a novel method to predict the propensity for production of diffraction-quality crystals, production of crystals, purification and production of protein material. PPCinter utilizes not only intra-molecular factors, but considers inter-molecular factors as well. Our method outperforms several current crystallization predictors, obtaining an overall accuracy of 57.5% and an average MCC of 0.39. Our method also reveals several factors that influence the success of the crystallization process, including the unfold-based index, energy-based, solvent accessibility, and hydrophobicity-based indices, amino acid composition, the isoelectric point and disorder-based features. The proposed method, PPCinter, could provide useful input for the target selection procedures utilized by structural genomics centers.
-
-
-
Parameter Estimation by Using an Improved Bee Memory Differential Evolution Algorithm (IBMDE) to Simulate Biochemical Pathways
More LessAssessing and estimating essential parameters for a metabolic pathway by using a mathematical model is a significant step in Systems Biology. However, estimating process often faces numerous obstacles, for example when the number of unknown parameters escalates or data has noise, gets trapped in local minima and or having repeated exploration of poor solution during search process. Thus, this study proposes an improved Bee Memory Differential Evolution algorithm (IBMDE), which is a combination of the Differential Evolution algorithm (DE), the Kalman Filter (KF), the Artificial Bee Colony algorithm (ABC), and a memory feature to solve the aforementioned problems. The implemented metabolic pathways for this improved estimation algorithm were glycerol and pyruvate synthesis pathways. IBMDE was successful in generating the estimated optimal kinetic parameter values with noticeable reduction in errors (81.36% and 99.46% respectively) and faster convergence times (6.19% and 15.72% respectively) compared to DE, the Genetic Algorithm (GA), the Nelder Mead (NM), and the Simulated Annealing (SA). The results indicated that, most importantly, the kinetic parameters produced by IBMDE had enhanced the production of desired metabolites than the other estimation algorithms. Besides that, the results also demonstrated the reliability of IBMDE as an estimation algorithm in terms of lower error.
-
-
-
Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification
More LessAuthors: Srabanti Maji and Deepak GargPrediction of coding region from genomic DNA sequence is the foremost step in the quest of gene identification. In the eukaryotic organism, the gene structure consists of promoter, intron, start codon, exon and stop codon, etc. In the prediction of splice site, which is the separation between exons and introns, the accuracy is lower than 90% even when the sequences adjacent to the splice sites have a high conservation. Therefore, the algorithms used in the splice sites identification must be improved in order to recover the prediction accuracy. Hence, an efficient method, MM2F-SVM is proposed through this article, which consists of three stages – initial stage, in which a second order Markov Model (MM2) is used, i.e. feature extraction; intermediate, or the second stage in which principal feature analysis (PFA) is done, i.e. feature selection; and the final or the third stage, in which a support vector machine (SVM) with Gaussian kernel is used for final classification. While comparing this proposed MM2F-SVM model with the other existing splice site prediction programs, superior performance for the former has been noticed.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month