Current Bioinformatics - Volume 5, Issue 1, 2010
Volume 5, Issue 1, 2010
-
-
Finding Recurrent Copy Number Alteration Regions: A Review of Methods
Authors: Oscar M. Rueda and Ramon Diaz-UriarteCopy number alterations (CNA) in genomic DNA are linked to a variety of human diseases. Although many methods have been developed to analyze data from a single subject, disease-critical genes are more likely to be found in regions that are common or recurrent among diseased subjects. Unfortunately, finding recurrent CNA regions remains a challenge. We review existing methods for the identification of recurrent CNA regions. Methods differ in their working definition of “recurrent region”, the type of input data, the statistical and computational methods used to identify recurrence, and the biological considerations they incorporate (which play a role in the identification of “interesting” regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNAs that affect only a subset of subjects. We suggest that finding recurrent CNAs would benefit from clearly specifying the types of pattern to be detected and the intended usage of the regions found (CNA association with disease, CNA effects on gene expression, clustering of subjects). We finish with suggestions for further methodological research.
-
-
-
Computational Models and Algorithms for the Single Individual Haplotyping Problem
Authors: Minzhu Xie, Jianxin Wang, Jianer Chen, Jingli Wu and Xucong LiuSingle nucleotide polymorphism (SNP) is the predominant form of human genetic variation, and is widely used in disease association studies. Haplotype, i.e. a sequence of SNPs on a chromosome, can provide more information than single SNPs. Haplotype-based analysis is more powerful in complex disease association studies than SNP-based methods. However, it is much difficult to determine haplotypes using only biological experiments. Single individual haplotyping uses computational techniques to infer the haplotypes of an individual from his or her DNA sequence fragments. As more and more individual genomes have been sequenced, the single individual haplotyping problem has been a hotspot of bioinformatics. This paper reviews the computational models and algorithms for the problem, and discusses directions for future research.
-
-
-
On the Comparison of Classifiers for Microarray Data
Authors: Blaise Hanczar and Edward R. DoughertyThe aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. A large number of supervised methods have been proposed in literature for microarray-based classification. Model comparison, which is based on the classification error estimation, is a critical issue. Previous studies have shown that error estimation is unreliable in high-dimensional small-sample settings. This leads naturally to questioning the validity of classificationrule comparison approaches being used in the literature. In this paper we present a brief review of the different comparison methods used in bioinformatics. Then, we test these methods on a set of simulations based on both synthetic and real data. These simulations include different feature-label distributions, classification rules, error estimators and variance estimators. The results show that none of these methods can provide reliable comparison across a wide spectrum of feature-label distributions and classification rules.
-
-
-
Phylogenetic Trees in Bioinformatics
By Tom BurrGenetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including the types of data used to represent each OTU; the use of probabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that finding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.
-
-
-
Performance of Error Estimators for Classification
Authors: Edward R. Dougherty, Chao Sima, Hua, Blaise Hanczar and Ulisses M. Braga-NetoClassification in bioinformatics often suffers from small samples in conjunction with large numbers of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias, or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied, and the sample size. This paper reviews the performance of training-sample error estimators with respect to several criteria: estimation accuracy, variance, bias, correlation with the true error, regression on the true error, and accuracy in ranking feature sets. A number of error estimators are considered: resubstitution, leave-one-out cross-validation, 10-fold cross-validation, bolstered resubstitution, semi-bolstered resubstitution, .632 bootstrap, .632+ bootstrap, and optimal bootstrap. It illustrates these performance criteria for certain models and for two real data sets, referring to the literature for more extensive applications of these criteria. The results given in the present paper are consistent with those in the literature and lead to two conclusions: (1) much greater effort needs to be focused on error estimation, and (2) owing to the generally poor performance of error estimators on small samples, for a conclusion based on a small-sample error estimator to be considered valid, it should be supported by evidence that the estimator in question can be expected to perform sufficiently well under the circumstances to justify the conclusion.
-
-
-
Integration of Diverse Research Methods to Analyze and Engineer Ca2+- Binding Proteins: From Prediction to Production
Authors: Michael Kirberger, Xue Wang, Kun Zhao, Shen Tang, Guantao Chen and Jenny J. YangIn recent years, increasingly sophisticated computational and bioinformatics tools have evolved for the analyses of protein structure, function, ligand interactions, modeling and energetics. This includes the development of algorithms to recursively evaluate side-chain rotamer permutations, identify regions in a 3D structure that meet some set of search parameters, calculate and minimize energy values, and provide high-resolution visual tools for theoretical modeling. Here we discuss the interdependency between different areas of bioinformatics, the evolution of different algorithm design approaches, and finally the transition from theoretical models to real-world design and application as they relate to Ca2+- binding proteins. Within this context, it has become evident that significant pre-experimental design and calculations can be modeled through computational methods, thus eliminating potentially unproductive research and increasing our confidence in the correlation between real and theoretical models. Moving from prediction to production, it is anticipated that bioinformatics tools will play an increasingly significant role in research and development, improving our ability to both understand the physiological roles of Ca2+ and other metals and to extend that knowledge to the design of functionspecific synthetic proteins capable of fulfilling different roles in medical diagnostics and therapeutics.
-
-
-
MicroRNA Target Prediction: Problems and Possible Solutions
Authors: Peter M. Szabo, Zsofia Tombol, Viktor Molnar, Andras Falus, Karoly Racz and Peter IgazMicroRNAs (miRNA) are small non-coding RNA molecules involved in the posttranscriptional regulation of gene expression. miRNAs bind specifically to the 3' untranslated region of messenger RNA (mRNA) molecules and induce translational repression or mRNA degradation. Potential miRNA targets can be predicted by various computational algorithms that take several parameters into consideration and calculate probability scores for each miRNA-mRNA interaction. In this review, three of the most frequently used algorithms (TargetScan, PicTar and miRBase) are compared, and their strengths and weaknesses are highlighted. These algorithms use different input databases and mathematical models that may lead to discrepancies of the outputs. As currently there is no unambiguous evidence for the preference of any of these algorithms, simultaneous analysis by all can be an effective approach. For this purpose a novel software was developed which is capable of collecting all available data about each miRNA-mRNA interaction retrieved by these databases and identifying common putative targets. Another major problem is related to difficulties of experimental validation, therefore only a minority of in silico predicted targets could be validated to date. Following a brief description of the available experimental target validation methods, the authors attempt to provide suggestions for designing in vitro validation approaches.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
