Current Bioinformatics - Volume 11, Issue 4, 2016
Volume 11, Issue 4, 2016
-
-
GPU Acceleration of an Entropy-Based Model to Quantify Epistatic Interactions Between SNPs
Authors: Carlos Riveros, Manuel Ujaldon and Pablo MoscatoThe process of characterizing naturally occurring variations in the human genome has captivated the high performance computation community over the past few years. Changes known as biallelic Single-Nucleotide Polymorphisms (SNPs) have become essential biomarkers both in evolutionary relationships and propensity to degenerative diseases. It is being increasingly accepted that traditional statistical SNP analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of interactions among SNPs has been suggested as a plausible approach to identify further SNPs that contribute to disease but either do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software [1], our method computes all Boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform [2]. Our work also contributes with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets. The role of our implementation as accelerator is discussed on a wide variety of GPUs from Nvidia, including the three more popular profiles in graphics computing: High-end cards targeted to High Performance Computing (Tesla), top sellers for video gamers (GeForce) and emerging low power devices for mobile computing (Tegra). We analyze pros and cons of each approach and perform a study of their evolution from Fermi (2012) to Kepler (2014) generations, showing what they can contribute to speed up computationally demanding biomedical codes like ours.
-
-
-
Enhancing Scoring Performance of Docking-Based Virtual Screening Through Machine Learning
Authors: Cândida G. Silva, Carlos J.V. Simoes, Pedro Carreiras and Rui M.M. BritoMolecular docking may be reasonably successful at reproducing X-ray poses of a ligand in the binding site of a protein, but scoring functions are typically unsuccessful at correctly ranking ligands according to their binding affinity. Using a set of challenging target enzymes, we show how the use of support vector machines (SVMs), trained with the individual energy terms retrieved from docking-based virtual screening (VS) experiments, can improve the discrimination between active and decoy compounds. Actives and decoys were obtained from the Directory of Useful Decoys (DUD) and docked into target binding sites with AutoDock Vina. The energy parameters of Vina's scoring function were used to train classification models with SVM-light. The results show that although Vina offers acceptable pose prediction accuracy for most targets, its scoring function performs poorly compared to our SVM classification models. The superior overall VS performance of the trained classification models confirms the potential of the use of machine learning methods to eschew the limitations of scoring functions at capturing the non-additive relationship between individual energy terms involved in ligand binding. Altogether, the results illustrate the potential of SVM-based protocols at enabling efficient, fast and economic virtual high-throughput screening campaigns with a freely-available docking software.
-
-
-
Chaperone Therapy: New Molecular Therapy for Protein Misfolding Diseases with Brain Dysfunction
Authors: Yoshiyuki Suzuki, Kousaku Ohno and Aya NaritaChaperone therapy was proposed as a new molecular therapeutic approach almost simultaneously to lysosomal diseases and cystic fibrosis, caused by gene mutations resulting in misfolding of expressed proteins. In our original papers, we reported that unstable mutant lysosomal enzymes causing lysosomal diseases resulted in rapid intracellular degradation and loss of catalytic function. However, in the presence of some low molecular competitive inhibitors (chemical chaperones), after binding to enzyme active sites, paradoxically stabilized and enhanced catalytic activities in somatic cells (proteostasis) by correcting the enzyme protein folding. After oral administration, they were transferred to the bloodstream, reached the brain tissue through the blood-brain barrier, and normalized pathophysiology of the disease. Our reports of these inhibitory chaperones were followed by reports of non-competitive (or allosteric) chaperones without inhibitory bioactivity. Furthermore heat shock proteins and other endogenous proteins were recognized as candidates for the third-type chaperone therapy. Theoretically they could be utilized for handling abnormally accumulated intracellular mutant proteins, if they are overexpressed by small molecules particularly in neurodegenerative diseases. These three types of chaperone therapies are expected as promising approaches to a variety of diseases, genetic or nongenetic, and neurological or non-neurological, in addition to lysosomal diseases. Finally, in this article, possible chaperones for Gaucher disease are discussed, and preliminary clinical results of ambroxol therapy are summarized.
-
-
-
GPU-Based Acceleration of ECG Characterization Using High-Order Hermite Polynomials
Authors: Alberto Gil, David G. Márquez, Gabriel Caffarena, Ana Iriarte and Abraham OteroIn this paper we address the acceleration of the Hermite function characterization of the heartbeat by means of massively parallel Graphics Processing Units. This characterization can be used to develop tools to help the cardiologist to study and diagnose heart disease. However, obtaining this characterization, especially when a large number of functions is used to achieve a high accuracy in heartbeat representation, is very resource intensive. This paper addresses off-line and on-line heartbeat characterization, assessing the acceleration capabilities of Graphics Processing Units for these tasks. Polynomials up to the 30th order are used in the study. The results yield that the off-line processing of long electrocardiogram recordings with a GPU can be computed up to 186x faster than a standard CPU, while real-time processing can be up to 110x faster.
-
-
-
AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an Optimised de novo Transcriptome for a Non-Model Species, such as Faba Bean (Vicia faba)
Authors: Pedro Seoane, Sara Ocaña, Rosario Carmona, Rocío Bautista, Eva Madrid, Ana M. Torres and M. Gonzalo ClarosThe use of workflows to automate routine tasks is an absolute requirement in many bioinformatics fields. Current workflow manager systems usually compromise between providing a user-friendly interface and constructing complex, scalable pipelines. We present AutoFlow, a Ruby-based workflow engine devoid of graphic interface and tool repository, that is useful in most computer systems and most workflow requirements in any scientific field. It accepts any local or remote command-line software and converts one workflow into a series of independent tasks. It has been supplied with control patterns that allow for iterative task capability, supporting static and dynamic variables for decision-making or chaining workflows, as well as debugging utilities that include graphs, file searching, functional consistency and timing. Two proof-of-concept cases are presented to illustrate AutoFlow capabilities, and a case-of-use illustrates the automated construction of the best transcriptome for a non-model species (Vicia faba) after analysis of several combinations of Illumina reads and Sanger sequences with different assemblers and different parameters in a complex and repetitive workflow where branching and convergent tasks were used and internal, automated decisions were taken. The workflow finally produced an optimal transcriptome of 118,188 transcripts, of which 38,004 were annotated, 10,516 coded for a complete protein, 3,314 were putatively new faba-specific transcripts, and 23,727 were considered the representative transcriptome of V. faba.
-
-
-
Protein Fold Recognition Using Self-Organizing Map Neural Network
Authors: Ozlem Polat and Zümray DokurIn this work, we propose a solution for the recognition of protein folds using Self-Organizing Map (SOM) neural network and present a comparison between few approaches. We use SOM, Fisher’s Linear Discriminant Analysis (FLD), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) methods for the recognition of three SCOP folds with six attributes (amino acid composition, predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity and polarizability). Then we classify the most common 27 SCOP folds using 125 dimensional data formed by the six attributes. This paper has a novelty in the way of applying SOM to these six attributes, and also portrays the capabilities of SOM among the other methods in protein fold classification. Firstly for the threeclass problem, the methods are tested on 120 proteins by applying 10-fold cross-validation technique and 93.33% classification performance is obtained with SOM. Secondly for the 27-class problem SOM is tested on 694 proteins by applying one-versus-others technique and 73.37% classification performance is obtained.
-
-
-
Pair-End Inexact Mapping on Hybrid GPU Environments and Out-Of-Core Indexes
Authors: José Salavert, Andrés Tomás, Ignacio Medina, Kunihiko Sadakane and Ignacio BlanquerDue to the NGS data deluge, sequence mapping has become an intensive task that, depending on the experiment, may demand high amounts of computing power or memory capacity. On the one hand, GPGPU architectures have become a cost-effective solution that outperforms common processors in specific tasks. On the other hand, out-of-core implementations allow to directly access data from secondary memory, which may be useful when mapping against big indexes in systems with low memory configurations. In this paper we discuss the implementation of backward search methods for inexact mapping in these two different study cases. A hybrid CPU-GPU implementation of a backward search algorithm capable of obtaining the pair-ends and the one error mappings of a read has been developed. This implementation can be used to increase the sensitivity and reduce the number of reads to be analysed with a dynamic programming approach. Also, a CPU out-of-core index using MMAP has been studied (provided by csalib). Such index can be used in memory limited scenarios, in which the time of loading many different big genomes into memory is greater than the time needed to map the reads.
-
-
-
State Grammar and Deep Pushdown Automata for Biological Sequences of Nucleic Acids
Authors: Nidhi Kalra and Ajay KumarIn this paper, we represent deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) biological sequences using state grammar and deep pushdown automata. The major benefit of this approach is that the DNA and RNA sequences can be parsed in linear time O(n) , where n is the length of the string, which is a significant improvement over the existing approaches. In the various existing approaches in the literature, these sequences are represented using context-sensitive grammar or mildly context-sensitive with higher time complexities. To the best of the author's knowledge, this is the first attempt to represent these sequences using state grammar and deep pushdown automata.
-
-
-
Network Analysis of Protein Structures: The Comparison of Three Topologies
Authors: Wenying Yan, Guang Hu and Bairong ShenTopology plays a central role in the structure of a protein. Network theoretical methods are being increasingly applied to investigate protein topology. In this paper, amino acid contact energy networks (AACENs) are constructed for globular, transmembrane and toroidal proteins. The effects of topology on proteins are investigated by the differences of various network parameters among three kinds of protein topologies. Globular proteins are found to have the highest network density, average closeness and system vulnerability, while toroidal proteins have the lowest values of these parameters. Transmembrane proteins are found to have significantly higher assortativity values than globular and toroidal proteins. AACENs are constructed and compared for proteins with different secondary structure compositions, whose influences on biological functions are discussed in terms of topological descriptors. By extracting sub-networks only including interfacial residues between different chains, it may provide a simple but straightforward method to identify hot spots of toroidal proteins. This network study would offer new insight into overall topology and structural organization of different types of proteins.
-
-
-
A Hybrid Binary Cuckoo Search and Genetic Algorithm for Feature Selection in Type-2 Diabetes
More LessData mining techniques are applied in bioinformatics to analyze biomedical data. When the number of features related to the data is irrelevant, the classifiers will produce unsatisfactory results. This paper addresses the need to analyze the data for extracting relevant features. A number of feature selection algorithms are developed in the field of medical data to address feature selection. In this paper, an intelligent hybrid optimal feature selection algorithm is proposed for Type-2 diabetes with improved classification accuracy. This work proposes an intelligent Hybrid Binary Cuckoo Search (CS) and Genetic Algorithm (GA) for selecting the important features of Type-2 diabetes. In HBCS-GA, exploration and exploitation of CS is improved using genetic operators to select relevant features with better accuracy. To validate the model, a 10-fold cross-validation strategy is used. The proposed algorithm produces 99.31% accuracy to diagnose the disease. The performance of HBCS-GAis also compared with other approaches. Also, the model validation with reduced features is performed with Decision Tree(DT), Bayesian Network(BN), Support Vector Machine (SVM) and k-Nearest Neighbor (k-NN) classifiers. The accuracies obtained are 94.46%, 96.07%, 98.84%, 96.79% respectively. The results also showed that HBCS-GA achieved high classification accuracy than the other approaches.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
