Volume 11, Issue 4

Current Bioinformatics - Volume 11, Issue 4, 2016

Volume 11, Issue 4, 2016

- Meet Our Editorial Board Member:
  
  By Walter Filgueira de Azevedo Jr.
  
  https://doi.org/10.2174/157489361104160817221304
  More Less
  
  Add to my favourites
  
  Email this

- Editorial (Thematic Issue: Recent Advances in Bioinformatics and Biomedical Engineering (Selected Articles from IWBBIO 2014)
  
  Authors: Ignacio Rojas and Francisco Ortuño
  
  https://doi.org/10.2174/157489361104160817221822
  More Less
  
  Add to my favourites
  
  Email this

- GPU Acceleration of an Entropy-Based Model to Quantify Epistatic Interactions Between SNPs
  
  Authors: Carlos Riveros, Manuel Ujaldon and Pablo Moscato
  
  https://doi.org/10.2174/1574893611666160212234618
  More Less
  
  The process of characterizing naturally occurring variations in the human genome has captivated the high performance computation community over the past few years. Changes known as biallelic Single-Nucleotide Polymorphisms (SNPs) have become essential biomarkers both in evolutionary relationships and propensity to degenerative diseases. It is being increasingly accepted that traditional statistical SNP analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of interactions among SNPs has been suggested as a plausible approach to identify further SNPs that contribute to disease but either do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software [1], our method computes all Boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform [2]. Our work also contributes with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets. The role of our implementation as accelerator is discussed on a wide variety of GPUs from Nvidia, including the three more popular profiles in graphics computing: High-end cards targeted to High Performance Computing (Tesla), top sellers for video gamers (GeForce) and emerging low power devices for mobile computing (Tegra). We analyze pros and cons of each approach and perform a study of their evolution from Fermi (2012) to Kepler (2014) generations, showing what they can contribute to speed up computationally demanding biomedical codes like ours.
  
  Add to my favourites
  
  Email this

- Enhancing Scoring Performance of Docking-Based Virtual Screening Through Machine Learning
  
  Authors: Cândida G. Silva, Carlos J.V. Simoes, Pedro Carreiras and Rui M.M. Brito
  
  https://doi.org/10.2174/1574893611666160212234816
  More Less
  
  Molecular docking may be reasonably successful at reproducing X-ray poses of a ligand in the binding site of a protein, but scoring functions are typically unsuccessful at correctly ranking ligands according to their binding affinity. Using a set of challenging target enzymes, we show how the use of support vector machines (SVMs), trained with the individual energy terms retrieved from docking-based virtual screening (VS) experiments, can improve the discrimination between active and decoy compounds. Actives and decoys were obtained from the Directory of Useful Decoys (DUD) and docked into target binding sites with AutoDock Vina. The energy parameters of Vina's scoring function were used to train classification models with SVM-light. The results show that although Vina offers acceptable pose prediction accuracy for most targets, its scoring function performs poorly compared to our SVM classification models. The superior overall VS performance of the trained classification models confirms the potential of the use of machine learning methods to eschew the limitations of scoring functions at capturing the non-additive relationship between individual energy terms involved in ligand binding. Altogether, the results illustrate the potential of SVM-based protocols at enabling efficient, fast and economic virtual high-throughput screening campaigns with a freely-available docking software.
  
  Add to my favourites
  
  Email this

- Chaperone Therapy: New Molecular Therapy for Protein Misfolding Diseases with Brain Dysfunction
  
  Authors: Yoshiyuki Suzuki, Kousaku Ohno and Aya Narita
  
  https://doi.org/10.2174/1574893611666160212234924
  More Less
  
  Chaperone therapy was proposed as a new molecular therapeutic approach almost simultaneously to lysosomal diseases and cystic fibrosis, caused by gene mutations resulting in misfolding of expressed proteins. In our original papers, we reported that unstable mutant lysosomal enzymes causing lysosomal diseases resulted in rapid intracellular degradation and loss of catalytic function. However, in the presence of some low molecular competitive inhibitors (chemical chaperones), after binding to enzyme active sites, paradoxically stabilized and enhanced catalytic activities in somatic cells (proteostasis) by correcting the enzyme protein folding. After oral administration, they were transferred to the bloodstream, reached the brain tissue through the blood-brain barrier, and normalized pathophysiology of the disease. Our reports of these inhibitory chaperones were followed by reports of non-competitive (or allosteric) chaperones without inhibitory bioactivity. Furthermore heat shock proteins and other endogenous proteins were recognized as candidates for the third-type chaperone therapy. Theoretically they could be utilized for handling abnormally accumulated intracellular mutant proteins, if they are overexpressed by small molecules particularly in neurodegenerative diseases. These three types of chaperone therapies are expected as promising approaches to a variety of diseases, genetic or nongenetic, and neurological or non-neurological, in addition to lysosomal diseases. Finally, in this article, possible chaperones for Gaucher disease are discussed, and preliminary clinical results of ambroxol therapy are summarized.
  
  Add to my favourites
  
  Email this

- GPU-Based Acceleration of ECG Characterization Using High-Order Hermite Polynomials
  
  Authors: Alberto Gil, David G. Márquez, Gabriel Caffarena, Ana Iriarte and Abraham Otero
  
  https://doi.org/10.2174/1574893611666160212235711
  More Less
  
  In this paper we address the acceleration of the Hermite function characterization of the heartbeat by means of massively parallel Graphics Processing Units. This characterization can be used to develop tools to help the cardiologist to study and diagnose heart disease. However, obtaining this characterization, especially when a large number of functions is used to achieve a high accuracy in heartbeat representation, is very resource intensive. This paper addresses off-line and on-line heartbeat characterization, assessing the acceleration capabilities of Graphics Processing Units for these tasks. Polynomials up to the 30th order are used in the study. The results yield that the off-line processing of long electrocardiogram recordings with a GPU can be computed up to 186x faster than a standard CPU, while real-time processing can be up to 110x faster.
  
  Add to my favourites
  
  Email this

- AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an Optimised de novo Transcriptome for a Non-Model Species, such as Faba Bean (Vicia faba)
  
  Authors: Pedro Seoane, Sara Ocaña, Rosario Carmona, Rocío Bautista, Eva Madrid, Ana M. Torres and M. Gonzalo Claros
  
  https://doi.org/10.2174/1574893611666160212235117
  More Less
  
  The use of workflows to automate routine tasks is an absolute requirement in many bioinformatics fields. Current workflow manager systems usually compromise between providing a user-friendly interface and constructing complex, scalable pipelines. We present AutoFlow, a Ruby-based workflow engine devoid of graphic interface and tool repository, that is useful in most computer systems and most workflow requirements in any scientific field. It accepts any local or remote command-line software and converts one workflow into a series of independent tasks. It has been supplied with control patterns that allow for iterative task capability, supporting static and dynamic variables for decision-making or chaining workflows, as well as debugging utilities that include graphs, file searching, functional consistency and timing. Two proof-of-concept cases are presented to illustrate AutoFlow capabilities, and a case-of-use illustrates the automated construction of the best transcriptome for a non-model species (Vicia faba) after analysis of several combinations of Illumina reads and Sanger sequences with different assemblers and different parameters in a complex and repetitive workflow where branching and convergent tasks were used and internal, automated decisions were taken. The workflow finally produced an optimal transcriptome of 118,188 transcripts, of which 38,004 were annotated, 10,516 coded for a complete protein, 3,314 were putatively new faba-specific transcripts, and 23,727 were considered the representative transcriptome of V. faba.
  
  Add to my favourites
  
  Email this

- Protein Fold Recognition Using Self-Organizing Map Neural Network
  
  Authors: Ozlem Polat and Zümray Dokur
  
  https://doi.org/10.2174/1574893611666160617091142
  More Less
  
  In this work, we propose a solution for the recognition of protein folds using Self-Organizing Map (SOM) neural network and present a comparison between few approaches. We use SOM, Fisher’s Linear Discriminant Analysis (FLD), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) methods for the recognition of three SCOP folds with six attributes (amino acid composition, predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity and polarizability). Then we classify the most common 27 SCOP folds using 125 dimensional data formed by the six attributes. This paper has a novelty in the way of applying SOM to these six attributes, and also portrays the capabilities of SOM among the other methods in protein fold classification. Firstly for the threeclass problem, the methods are tested on 120 proteins by applying 10-fold cross-validation technique and 93.33% classification performance is obtained with SOM. Secondly for the 27-class problem SOM is tested on 694 proteins by applying one-versus-others technique and 73.37% classification performance is obtained.
  
  Add to my favourites
  
  Email this

- Pair-End Inexact Mapping on Hybrid GPU Environments and Out-Of-Core Indexes
  
  Authors: José Salavert, Andrés Tomás, Ignacio Medina, Kunihiko Sadakane and Ignacio Blanquer
  
  https://doi.org/10.2174/1574893611666160212235359
  More Less
  
  Due to the NGS data deluge, sequence mapping has become an intensive task that, depending on the experiment, may demand high amounts of computing power or memory capacity. On the one hand, GPGPU architectures have become a cost-effective solution that outperforms common processors in specific tasks. On the other hand, out-of-core implementations allow to directly access data from secondary memory, which may be useful when mapping against big indexes in systems with low memory configurations. In this paper we discuss the implementation of backward search methods for inexact mapping in these two different study cases. A hybrid CPU-GPU implementation of a backward search algorithm capable of obtaining the pair-ends and the one error mappings of a read has been developed. This implementation can be used to increase the sensitivity and reduce the number of reads to be analysed with a dynamic programming approach. Also, a CPU out-of-core index using MMAP has been studied (provided by csalib). Such index can be used in memory limited scenarios, in which the time of loading many different big genomes into memory is greater than the time needed to map the reads.
  
  Add to my favourites
  
  Email this

- State Grammar and Deep Pushdown Automata for Biological Sequences of Nucleic Acids
  
  Authors: Nidhi Kalra and Ajay Kumar
  
  https://doi.org/10.2174/1574893611666151231185112
  More Less
  
  In this paper, we represent deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) biological sequences using state grammar and deep pushdown automata. The major benefit of this approach is that the DNA and RNA sequences can be parsed in linear time O(n) , where n is the length of the string, which is a significant improvement over the existing approaches. In the various existing approaches in the literature, these sequences are represented using context-sensitive grammar or mildly context-sensitive with higher time complexities. To the best of the author's knowledge, this is the first attempt to represent these sequences using state grammar and deep pushdown automata.
  
  Add to my favourites
  
  Email this

- Network Analysis of Protein Structures: The Comparison of Three Topologies
  
  Authors: Wenying Yan, Guang Hu and Bairong Shen
  
  https://doi.org/10.2174/1574893611666160602124707
  More Less
  
  Topology plays a central role in the structure of a protein. Network theoretical methods are being increasingly applied to investigate protein topology. In this paper, amino acid contact energy networks (AACENs) are constructed for globular, transmembrane and toroidal proteins. The effects of topology on proteins are investigated by the differences of various network parameters among three kinds of protein topologies. Globular proteins are found to have the highest network density, average closeness and system vulnerability, while toroidal proteins have the lowest values of these parameters. Transmembrane proteins are found to have significantly higher assortativity values than globular and toroidal proteins. AACENs are constructed and compared for proteins with different secondary structure compositions, whose influences on biological functions are discussed in terms of topological descriptors. By extracting sub-networks only including interfacial residues between different chains, it may provide a simple but straightforward method to identify hot spots of toroidal proteins. This network study would offer new insight into overall topology and structural organization of different types of proteins.
  
  Add to my favourites
  
  Email this

- A Hybrid Binary Cuckoo Search and Genetic Algorithm for Feature Selection in Type-2 Diabetes
  
  By Ramasamy R. Rajalaxmi
  
  https://doi.org/10.2174/1574893611666151228190309
  More Less
  
  Data mining techniques are applied in bioinformatics to analyze biomedical data. When the number of features related to the data is irrelevant, the classifiers will produce unsatisfactory results. This paper addresses the need to analyze the data for extracting relevant features. A number of feature selection algorithms are developed in the field of medical data to address feature selection. In this paper, an intelligent hybrid optimal feature selection algorithm is proposed for Type-2 diabetes with improved classification accuracy. This work proposes an intelligent Hybrid Binary Cuckoo Search (CS) and Genetic Algorithm (GA) for selecting the important features of Type-2 diabetes. In HBCS-GA, exploration and exploitation of CS is improved using genetic operators to select relevant features with better accuracy. To validate the model, a 10-fold cross-validation strategy is used. The proposed algorithm produces 99.31% accuracy to diagnose the disease. The performance of HBCS-GAis also compared with other approaches. Also, the model validation with reduced features is performed with Decision Tree(DT), Bayesian Network(BN), Support Vector Machine (SVM) and k-Nearest Neighbor (k-NN) classifiers. The accuracies obtained are 94.46%, 96.07%, 98.84%, 96.79% respectively. The results also showed that HBCS-GA achieved high classification accuracy than the other approaches.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 11, Issue 4, 2016

Volume 11, Issue 4, 2016

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed