Current Bioinformatics - Volume 10, Issue 2, 2015
Volume 10, Issue 2, 2015
-
-
Investigating Power and Limitations of Ensemble Motif Finders Using Metapredictor CE3
Authors: Mauro Leoncini, Manuela Montagnero and Karina Panucia TillánEnsemble methods represent a relatively new approach to motif discovery that combines the results returned by "third-party" finders with the aim of achieving a better accuracy than that obtained by the single tools. Besides the choice of the external finders, another crucial element for the success of an ensemble method is the particular strategy adopted to combine the finders' results, a.k.a. learning function. Results appeared in the literature seem to suggest that ensemble methods can provide noticeable improvements over the quality of the most popular tools available for motif discovery. With the goal of better understanding potentials and limitations of ensemble methods, we developed a general software architecture whose major feature is the flexibility with respect to the crucial aspects of ensemble methods mentioned above. The architecture provides facilities for the easy addition of virtually any third-party tool for motif discovery whose code is publicly available, and for the definition of new learning functions. We present a prototype implementation of our architecture, called CE3 (Customizable and Easily Extensible Ensemble). Using CE3, and available ensemble methods, we performed experiments with three well-known datasets. The results presented here are varied. On the one hand, they confirm that ensemble methods cannot be just considered as the universal remedy for "in-silico" motif discovery. On the other hand, we found some encouraging regularities that may help to find a general set up for CE3 (and other ensemble methods as well) able to guarantee substantial improvements over single finders in a systematic way.
-
-
-
Protein Sequence Annotation by Means of Community Detection
Authors: Giuseppe Profiti, Damiano Piovesan, Pier Luigi Martelli, Piero Fariselli and Rita CasadioIn the postgenomic era different electronic procedures are available for protein sequence annotation, the process of enriching, with structural and functional features, any protein after electronic translation from its correspondent gene or mRNA. The demand of reliable annotation systems is particularly urgent given the volume of genomic data that are daily produced by next generation sequencing machines. In this paper we present a procedure that enhances the annotation performance of the previously described Bologna Annotation Resource (BAR+). BAR is based on clustering of the graphs representing the similarity between a large number of protein sequences and here we apply community detection algorithms to detect subclusters within any graph. When the cluster is endowed with specific Gene Ontology terms associated both to Biological Process and Molecular Function, the application of our procedure allows a fine tuning of the annotation process and generates subclusters where proteins sharing strictly related GO terms are grouped.
-
-
-
Discretization of Expression Quantitative Trait Loci in Association Analysis Between Genotypes and Expression Data§
Expression quantitative trait loci are used as a tool to identify genetic causes of natural variation in gene expression. Only in a few cases the expression of a gene is controlled by a variant on a single genetic marker. There is a plethora of different complexity levels of interaction effects within markers, within genes and between marker and genes. This complexity challenges biostatisticians and bioinformatitians every day and makes findings difficult to appear. As a way to simplify analysis and better control confounders, we tried a new approach for association analysis between genotypes and expression data. We pursued to understand whether discretization of expression data can be useful in genome-transcriptome association analyses. By discretizing the dependent variable, algorithms for learning classifiers from data as well as performing block selection were used to help understanding the relationship between the expression of a gene and genetic markers. We present the results of using this approach to detect new possible causes of expression variation of DRB5, a gene playing an important role within the immune system. Together with expression of gene DRB5 obtained from the classical microarray technology, we have also measured DRB5 expression by using the more recent next-generation sequencing technology. A supplementary website including a link to the software with the method implemented can be found at http: //bios.ugr.es/DRB5.
-
-
-
Lexical Characterisation of Bio-Ontologies by the Inspection of Regularities in Labels
More LessHundreds of biomedical ontologies have been produced, with many of the significant, widely used ones being developed in collaborative efforts and following a set of construction principles, which include using a systematic naming convention for their labels. Despite their success, many of these ontologies have lacked a foundation of axioms that would expose the wealth of knowledge in the ontologies to computational reasoning. Our previous results suggest that exploiting the structure on the labels may contribute to an axiomatic enrichment. Hence, in this work we perform a study of the structure of the labels of the ontologies available in BioPortal to classify them in terms of potential interest for axiomatic enrichment.
-
-
-
Incremental Construction of Biological Networks by Relation Extraction from Literature
Authors: Dragana Miljkovic, Vid Podpečan, Tjasa Stare, Igor Mozetic, Kristina Gruden and Nada LavračThis work focuses on automated incremental development of biological networks. The Bio3graph approach to information extraction from biological literature is extended with new features which allow for periodical updates of network structures using newly published scientific literature. The incremental approach is demonstrated on two use cases. First, a simple plant defence network with 37 components and 49 relations created manually by merging three existing structural models is extended in two incremental steps, yielding the final model with 183 relations. Second, a complex published network of defence response in Arabidopsis thaliana, containing 175 nodes and 524 relations, is incrementally updated with information extracted from recently published articles resulting in an enhanced network with 628 links. The results show that using the demonstrated incremental approach it is possible to automatically recognise new knowledge about the selected biological relations published in recent literature. The newly implemented Bio3graph extension offers an effective way of merging and visually representing the initial networks and the networks generated from texts thus enabling fast discovery of relations which can potentially enhance the existing models.
-
-
-
Efficient and Error-Tolerant Sequencing Read Mapping
Authors: Piotr Jaroszyński and Norbert DojerMost efficient read mappers build a Ferragina-Manzini index of a genome sequence and then process reads against it. In order to handle differences between reads and corresponding genome fragments, approximate read occurrences are searched in the index. This technique is particularly efficient for mapping reads of length ∼30bp with up to 2-3 errors, as first massive sequencers required. However, within the last few years, in most popular sequencing technologies read length increased to 75 − 200bp. Since the number of required index queries is exponential with respect to the number of errors, it is hard to maintain the allowed error rate within this method. We propose a new approach that overcomes this problem. The main idea is to use the Ferragina-Manzini index to filter potential approximate read occurrences. Filtering is based on the intermediate partitioning concept, i.e. reads are split into parts, which are searched in index with reduced number of errors. We implemented this method in Bmap program. Our experiments show that Bmap outperforms current methods in efficiency without sacrificing mapping accuracy.
-
-
-
A Hierarchical Classification for the Selection of the Most Suitable Multiple Sequence Alignment Methodology
Authors: Francisco M. Ortuño, Hector Pomares, Olga Valenzuela, Carolina Torres and Ignacio RojasMultiple sequence alignments (MSAs) are currently one of the most powerful procedure in bioinformatics in order to provide additional information useful to other understanding techniques such as biological function analyses, structure predictions or next-generation sequencing. Nevertheless, current MSA methodologies are providing quite different alignments for the same set of sequences depending on some particular biological features of these sequences. For this reason, the selection of a suitable tool for aligning a specific set of sequences is an important task which has not been totally solved yet. In this work, we propose a hierarchical algorithm of several binary classifiers based on support vector machines (SVMs) to predict "a priori" the MSA tool which will provide the most accurate alignment. Firstly, a set of heterogeneous biological features related to each set of sequences are retrieved from well-known databases. Subsequently, those most significant features according to each specific aligner are included in this particular classifier. Finally, the SVM classifiers are joined to decide the most suitable method according to the quality of each classification. This procedure was assessed by the benchmark BAliBASE v3.0 and compared against other similar tools, namely AlexSys and PAcAlCI.
-
-
-
Regulation of Meiosis Initiation before the Commitment Point in Budding Yeast: A Review of Biology, Molecular Mechanisms and Related Mathematical Models
Authors: Clampi T. Wannige, Don Kulasiri and Sandhya SamarasingheWe present a review of the initiation of meiosis in budding yeast, Saccharomyces cerevisiae, specially focusing on the initiation stage before the commitment point. We discuss the molecular mechanisms involved in tight regulation of the initiation process using experimental facts and present a comprehensive discussion of the advantages and limitations of the available mathematical models on the meiosis initiation of budding yeast. We also succinctly review the biology of general meiosis initiation and the morphology of the modelling organism Saccharomyces cerevisiae which are helpful to understand the molecular mechanisms involved in context. While explaining the key molecular mechanisms, the available experimental literature and mathematical models open up emerging questions for future investigations. Although the complete molecular network and mechanisms of the meiosis initiation in budding yeast, which can be considered as the best understood meiosis modelling organism, are still not known, the current explanations can be beneficial for understanding key issues of the meiosis in multicellular organisms.
-
-
-
Effect of Hubs in Amino Acid Network on Iron Superoxide Dismutase Stability
Authors: Yanrui Ding, Xueqin Wang and Zhaolin MouBased on structural information of iron superoxide dismutase (Fe-SOD), we constructed various types of Fe-SOD amino acid networks (Fe-SOD AANs). Analyses of the degree of distribution and "rich clubs" of these Fe-SOD AANs indicated that Fe-SOD AANs have highly interacting nodes, namely, the hubs. Interestingly, most hubs are hydrophobic amino acids including Ile, Leu, Phe, Ala and Val. These residues form a strong hydrophobic core that improves Fe-SOD structural stability. Most hubs are uniformly distributed in evolutionally conserved regular secondary structures to maintain Fe-SOD biological functionality. Moreover, a comparison of hubs in several Fe-SOD ANNs with different thermostability revealed that hydrophobic amino acids, such as Gly, Leu, and Phe, but not Gln and Thr, have more interactions and form hubs and is therefore conducive to a more highly hydrophobic core and denser packing of thermostable Fe-SOD. Total numbers of hubs, numbers of hubs in 3/10 helices, turn structures and especially in alpha helices, and the ratio of inner hubs in secondary structure are all important factors for Fe-SOD thermostability. Mutating hub residues on the Fe-SOD surface can improve Fe-SOD thermostability because of increasing hydrogen bond interactions. The results also show hubs in AAN can be used to study the relationship between protein structural characteristic and stability.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
