Volume 9, Issue 1

Current Bioinformatics - Volume 9, Issue 1, 2014

Volume 9, Issue 1, 2014

- Editorial
  
  By Alessandro Giuliani
  
  https://doi.org/10.2174/157489360901140120160332
  More Less
  
  Add to my favourites
  
  Email this

- A 2-Layer Web Server for Enzyme and Multifunctional Enzyme Identification
  
  Authors: Leyi Wei, Guilin Li, Xing Gao, Weicheng Chen, Ping Xuan and Minghong Liao
  
  https://doi.org/10.2174/1574893608999140109121259
  More Less
  
  With the explosive growth of databanks consisting of protein sequences, there is an increasing need for annotating a number of newly discovered enzyme sequences. Given a protein sequence, the question arises on how to identify whether it is an enzyme or a non-enzyme? If it is an enzyme, and then which main functional class does it belong to? Since the biology experiment methods are both time-consuming and expensive, it is highly desired to develop an in silicon method to address these problems. In this paper, two effective methods are taken into consideration to constitute the 2-layer predictor: the 1st layer prediction engine respectively extracts 188-D features based on composition and physical-chemical property of protein and extract 20-D features by using position-specific scoring matrix (PSSM), for determining a query protein as an enzyme or a non-enzyme; the 2nd layer prediction engine extracts 20-D feature by PSSM and is designed for predicting the main family class of the enzyme. In our experiment, multifunctional enzymes due to their specific characterstics are viewed as the 7th category of enzyme. As a result, the accuracy of 1st layer prediction reaches 98.99% (188-D) and 98.25% (20-D) using 10-cross-validation, and for the 2nd layer prediction, 97.12% by Random Forest and 98.39% accuracy by IB1 are obtained. These high accuracies indicate that the current method could be an effective and promising high throughput method in the enzyme research. Furthermore, we developed an online web server which can be accessed via http://datamining.xmu.edu.cn:8080/PredictE/.
  
  Add to my favourites
  
  Email this

- Bioinformatics: A Molecular Microbiologist’s Perspective
  
  Authors: Sílvia A. Sousa, Joana R. Feliciano, Andre M. Grilo and Jorge H. Leitao
  
  https://doi.org/10.2174/1574893608999140109121908
  More Less
  
  Research activities in the area of biological sciences, and particularly molecular microbiology, nowadays generate vast amounts of data. The development of faster and cheaper sequencing methods has definitively contributed to this huge amount of data, requiring the development of fast and reliable tools for their analysis. In the last years, many easy-to-use, reliable and powerful bioinformatics tools have been developed. In this work, we review the fundamentals of some of these tools, and describe how we have been using them to analyze data resulting from our research envisaging the identification and characterization of virulence factors and determinants from the human opportunistic pathogens of the Burkholderia cepacia complex. Examples given illustrate the user-friendly characteristic of these tools and their power both in analyzing information and in orientating future experimental work.
  
  Add to my favourites
  
  Email this

- A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data
  
  Authors: Kohbalan Moorthy, Mohd Saberi Mohamad and Safaai Deris
  
  https://doi.org/10.2174/1574893608999140109120957
  More Less
  
  Many bioinformatics analytical tools, especially for cancer classification and prediction, require complete sets of data matrix. Having missing values in gene expression studies significantly influences the interpretation of final data. However, to most analysts’ dismay, this has become a common problem and thus, relevant missing value imputation algorithms have to be developed and/or refined to address this matter. This paper intends to present a review of preferred and available missing value imputation methods for the analysis and imputation of missing values in gene expression data. Focus is placed on the abilities of algorithms in performing local or global data correlation to estimate the missing values. Approaches of the algorithms mentioned have been categorized into global approach, local approach, hybrid approach, and knowledge assisted approach. The methods presented are accompanied with suitable performance evaluation. The aim of this review is to highlight possible improvements on existing research techniques, rather than recommending new algorithms with the same functional aim.
  
  Add to my favourites
  
  Email this

- Medherb: An Interactive Bioinformatics Database and Analysis Resource for Medicinally Important Herbs
  
  Authors: Muhammad Ibrahim Rajoka, Sobia Idrees, Sana Khalid and Beenish Ehsan
  
  https://doi.org/10.2174/1574893608999140109122052
  More Less
  
  Ethnopharmacological findings are spread over several databases but are not well connected to other biomedical databases. Consequently, the utility of these sources as knowledge resources are limited. Herbal medicines have long been used for the treatment of different ailments and have attracted the attention of scientists and patients due to their easy availability, low cost, and affordability. Research is being performed on the active constituents, protein availability, healing benefits of the medicinal herbs, and their implication in the treatment of multidrug resistant pathogens. Thus, the cataloging of medicinal herbs’ information along with their DNA/protein sequences has become a fundamental step in the development of new medicinal drugs against diseases. However, assembly of this information requires proper storage, management and analyzing tools. This database provides comprehensive information on medicinal properties of herbs with a stylish web interface. MedHerb database provides quick information access to medicinal herbs, genes, proteins, plant species, statistical vision, and published literature with detailed information on each aspect at only one click. Although several medicinal plants have been tested for different diseases but detailed information on above aspects on only 9 medicinal plant species is available. Studies are being done to explore other plant species and assemble their all characteristics and will be added in this database when available. Availability of primers is another unique feature of the database assisting in Polymerase Chain Reaction (PCR) application. This database aims to expand the information by adding new features like addition of more plant species, expressed sequence tags, information on active constituents and new tools to facilitate the researchers from different backgrounds. The database is available at http://medicinalherbs.comule.com/.
  
  Add to my favourites
  
  Email this

- Predicting Recombination Hotspots in Yeast Based on DNA Sequence and Chromatin Structure
  
  Authors: Bingjie Zhang and Guoqing Liu
  
  https://doi.org/10.2174/1574893608999140109121444
  More Less
  
  Meiotic recombination does not occur evenly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Recombination depends not only on sequence features, but also on chromatin structure. Identification and characterization of hotspots and coldspots are considerably significant as the information about hot/cold spots would shed light on the mechanism of recombination and genome evolution. In this study, we analyzed the correlation between recombination and nucleosome occupancy, and presented a model for predicting recombination hotspots based on the sequence and nucleosome occupancy in yeast. Our results show that the regions with high nucleosome occupancy have high recombination rate in the yeast genome and an improved prediction accuracy of 81.6% is achieved when nucleosome occupancy is used as an additional feature.
  
  Add to my favourites
  
  Email this

- SMOTER, A Structured Motif Finder Based on an Exhaustive Tree-Based Algorithm
  
  Authors: Siavash Sheikhizadeh and Samin Hosseini
  
  https://doi.org/10.2174/1574893608999140109122231
  More Less
  
  In this paper, an exhaustive algorithm for extracting structured motifs has been presented. Structured motif is defined as an ordered set of highly-conserved over-presented patterns which occur near each other in a set of DNA sequences .The presented algorithm is based on an innovative data structure called l-mer trie. As opposed to other existing motif finders, this algorithm offers more flexibility in terms of the possibility of determining a range for length of the single patterns, their substitution rates and the spacing between them. The possibility of defining a minimum bound for substitution rates saves considerable time and space in the case of searching for weak motifs occurring with many mutations. Efficiency of the algorithm has been verified on some artificial sequences as well as real DNA sequences of some plant viruses. The results have been compared with those achieved by RISO, another tree-based algorithm, which is claimed to have notable time and space gains over the best known exact algorithms.
  
  Add to my favourites
  
  Email this

- Reference Alignment Based Methods for Quality Evaluation of Multiple Sequence Alignment - A Survey
  
  Authors: Pawel Wojciechowski, Piotr Formanowicz and Jacek Blazewicz
  
  https://doi.org/10.2174/15748936113080990005
  More Less
  
  Fast development of new heuristic methods solving a multiple sequence alignment~(MSA) problem enforces a development of benchmarks for evaluation of the quality of these methods. One of the most reliable benchmarks relies on a comparison of an alignment created by a tested method and the reference alignment (which is usually created on the basis of some additional knowledge like its tertiary structure) stored in the database. This paper surveys methods for the quality evaluation of multiple sequence alignment based on databases of reference alignments. Several classes of reference alignments have been defined. Then, applicable quality measures for these classes have been described. We have also presented applications which can be used to perform the evaluation of the quality of a tested alignment. Among others, this survey contains our remarks and observations concerning reference alignments, the quality measures and the applications for the quality evaluation. Based on our experience, we have proposed a new nomenclature for quality measures. To clarify described problems, two example calculations of various measures for several classes of the reference alignments are presented in the Appendix.
  
  Add to my favourites
  
  Email this

- Improved Prediction of Protein Crystallization, Purification and Production Propensity Using Hybrid Sequence Representation
  
  Authors: Jianzhao Gao, Gang Hu, Zhonghua Wu, Jishou Ruan, Shiyi Shen, Michelle Hanlon and Kui Wang
  
  https://doi.org/10.2174/15748936113080990006
  More Less
  
  Production of high-quality crystals is one of the main bottlenecks in X-ray crystallography-based protein structure determination. In this paper we introduce PPCinter, a novel method to predict the propensity for production of diffraction-quality crystals, production of crystals, purification and production of protein material. PPCinter utilizes not only intra-molecular factors, but considers inter-molecular factors as well. Our method outperforms several current crystallization predictors, obtaining an overall accuracy of 57.5% and an average MCC of 0.39. Our method also reveals several factors that influence the success of the crystallization process, including the unfold-based index, energy-based, solvent accessibility, and hydrophobicity-based indices, amino acid composition, the isoelectric point and disorder-based features. The proposed method, PPCinter, could provide useful input for the target selection procedures utilized by structural genomics centers.
  
  Add to my favourites
  
  Email this

- Parameter Estimation by Using an Improved Bee Memory Differential Evolution Algorithm (IBMDE) to Simulate Biochemical Pathways
  
  Authors: Chuii Khim Chong, Mohd Saberi Mohamad, Safaai Deris, Mohd Shahir Shamsir, Lian En Chai and Yee Wen Choon
  
  https://doi.org/10.2174/15748936113080990007
  More Less
  
  Assessing and estimating essential parameters for a metabolic pathway by using a mathematical model is a significant step in Systems Biology. However, estimating process often faces numerous obstacles, for example when the number of unknown parameters escalates or data has noise, gets trapped in local minima and or having repeated exploration of poor solution during search process. Thus, this study proposes an improved Bee Memory Differential Evolution algorithm (IBMDE), which is a combination of the Differential Evolution algorithm (DE), the Kalman Filter (KF), the Artificial Bee Colony algorithm (ABC), and a memory feature to solve the aforementioned problems. The implemented metabolic pathways for this improved estimation algorithm were glycerol and pyruvate synthesis pathways. IBMDE was successful in generating the estimated optimal kinetic parameter values with noticeable reduction in errors (81.36% and 99.46% respectively) and faster convergence times (6.19% and 15.72% respectively) compared to DE, the Genetic Algorithm (GA), the Nelder Mead (NM), and the Simulated Annealing (SA). The results indicated that, most importantly, the kinetic parameters produced by IBMDE had enhanced the production of desired metabolites than the other estimation algorithms. Besides that, the results also demonstrated the reliability of IBMDE as an estimation algorithm in terms of lower error.
  
  Add to my favourites
  
  Email this

- Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification
  
  Authors: Srabanti Maji and Deepak Garg
  
  https://doi.org/10.2174/1574893608999140109121721
  More Less
  
  Prediction of coding region from genomic DNA sequence is the foremost step in the quest of gene identification. In the eukaryotic organism, the gene structure consists of promoter, intron, start codon, exon and stop codon, etc. In the prediction of splice site, which is the separation between exons and introns, the accuracy is lower than 90% even when the sequences adjacent to the splice sites have a high conservation. Therefore, the algorithms used in the splice sites identification must be improved in order to recover the prediction accuracy. Hence, an efficient method, MM2F-SVM is proposed through this article, which consists of three stages – initial stage, in which a second order Markov Model (MM2) is used, i.e. feature extraction; intermediate, or the second stage in which principal feature analysis (PFA) is done, i.e. feature selection; and the final or the third stage, in which a support vector machine (SVM) with Gaussian kernel is used for final classification. While comparing this proposed MM2F-SVM model with the other existing splice site prediction programs, superior performance for the former has been noticed.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 9, Issue 1, 2014

Volume 9, Issue 1, 2014

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed