Volume 5, Issue 1

Current Bioinformatics - Volume 5, Issue 1, 2010

Volume 5, Issue 1, 2010

- Finding Recurrent Copy Number Alteration Regions: A Review of Methods
  
  Authors: Oscar M. Rueda and Ramon Diaz-Uriarte
  
  https://doi.org/10.2174/157489310790596402
  More Less
  
  Copy number alterations (CNA) in genomic DNA are linked to a variety of human diseases. Although many methods have been developed to analyze data from a single subject, disease-critical genes are more likely to be found in regions that are common or recurrent among diseased subjects. Unfortunately, finding recurrent CNA regions remains a challenge. We review existing methods for the identification of recurrent CNA regions. Methods differ in their working definition of “recurrent region”, the type of input data, the statistical and computational methods used to identify recurrence, and the biological considerations they incorporate (which play a role in the identification of “interesting” regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNAs that affect only a subset of subjects. We suggest that finding recurrent CNAs would benefit from clearly specifying the types of pattern to be detected and the intended usage of the regions found (CNA association with disease, CNA effects on gene expression, clustering of subjects). We finish with suggestions for further methodological research.
  
  Add to my favourites
  
  Email this

- Computational Models and Algorithms for the Single Individual Haplotyping Problem
  
  Authors: Minzhu Xie, Jianxin Wang, Jianer Chen, Jingli Wu and Xucong Liu
  
  https://doi.org/10.2174/157489310790596411
  More Less
  
  Single nucleotide polymorphism (SNP) is the predominant form of human genetic variation, and is widely used in disease association studies. Haplotype, i.e. a sequence of SNPs on a chromosome, can provide more information than single SNPs. Haplotype-based analysis is more powerful in complex disease association studies than SNP-based methods. However, it is much difficult to determine haplotypes using only biological experiments. Single individual haplotyping uses computational techniques to infer the haplotypes of an individual from his or her DNA sequence fragments. As more and more individual genomes have been sequenced, the single individual haplotyping problem has been a hotspot of bioinformatics. This paper reviews the computational models and algorithms for the problem, and discusses directions for future research.
  
  Add to my favourites
  
  Email this

- On the Comparison of Classifiers for Microarray Data
  
  Authors: Blaise Hanczar and Edward R. Dougherty
  
  https://doi.org/10.2174/157489310790596376
  More Less
  
  The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. A large number of supervised methods have been proposed in literature for microarray-based classification. Model comparison, which is based on the classification error estimation, is a critical issue. Previous studies have shown that error estimation is unreliable in high-dimensional small-sample settings. This leads naturally to questioning the validity of classificationrule comparison approaches being used in the literature. In this paper we present a brief review of the different comparison methods used in bioinformatics. Then, we test these methods on a set of simulations based on both synthetic and real data. These simulations include different feature-label distributions, classification rules, error estimators and variance estimators. The results show that none of these methods can provide reliable comparison across a wide spectrum of feature-label distributions and classification rules.
  
  Add to my favourites
  
  Email this

- Phylogenetic Trees in Bioinformatics
  
  By Tom Burr
  
  https://doi.org/10.2174/157489310790596367
  More Less
  
  Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including the types of data used to represent each OTU; the use of probabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that finding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.
  
  Add to my favourites
  
  Email this

- Performance of Error Estimators for Classification
  
  Authors: Edward R. Dougherty, Chao Sima, Hua, Blaise Hanczar and Ulisses M. Braga-Neto
  
  https://doi.org/10.2174/157489310790596385
  More Less
  
  Classification in bioinformatics often suffers from small samples in conjunction with large numbers of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias, or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied, and the sample size. This paper reviews the performance of training-sample error estimators with respect to several criteria: estimation accuracy, variance, bias, correlation with the true error, regression on the true error, and accuracy in ranking feature sets. A number of error estimators are considered: resubstitution, leave-one-out cross-validation, 10-fold cross-validation, bolstered resubstitution, semi-bolstered resubstitution, .632 bootstrap, .632+ bootstrap, and optimal bootstrap. It illustrates these performance criteria for certain models and for two real data sets, referring to the literature for more extensive applications of these criteria. The results given in the present paper are consistent with those in the literature and lead to two conclusions: (1) much greater effort needs to be focused on error estimation, and (2) owing to the generally poor performance of error estimators on small samples, for a conclusion based on a small-sample error estimator to be considered valid, it should be supported by evidence that the estimator in question can be expected to perform sufficiently well under the circumstances to justify the conclusion.
  
  Add to my favourites
  
  Email this

- Integration of Diverse Research Methods to Analyze and Engineer Ca2+- Binding Proteins: From Prediction to Production
  
  Authors: Michael Kirberger, Xue Wang, Kun Zhao, Shen Tang, Guantao Chen and Jenny J. Yang
  
  https://doi.org/10.2174/157489310790596358
  More Less
  
  In recent years, increasingly sophisticated computational and bioinformatics tools have evolved for the analyses of protein structure, function, ligand interactions, modeling and energetics. This includes the development of algorithms to recursively evaluate side-chain rotamer permutations, identify regions in a 3D structure that meet some set of search parameters, calculate and minimize energy values, and provide high-resolution visual tools for theoretical modeling. Here we discuss the interdependency between different areas of bioinformatics, the evolution of different algorithm design approaches, and finally the transition from theoretical models to real-world design and application as they relate to Ca2+- binding proteins. Within this context, it has become evident that significant pre-experimental design and calculations can be modeled through computational methods, thus eliminating potentially unproductive research and increasing our confidence in the correlation between real and theoretical models. Moving from prediction to production, it is anticipated that bioinformatics tools will play an increasingly significant role in research and development, improving our ability to both understand the physiological roles of Ca2+ and other metals and to extend that knowledge to the design of functionspecific synthetic proteins capable of fulfilling different roles in medical diagnostics and therapeutics.
  
  Add to my favourites
  
  Email this

- MicroRNA Target Prediction: Problems and Possible Solutions
  
  Authors: Peter M. Szabo, Zsofia Tombol, Viktor Molnar, Andras Falus, Karoly Racz and Peter Igaz
  
  https://doi.org/10.2174/157489310790596394
  More Less
  
  MicroRNAs (miRNA) are small non-coding RNA molecules involved in the posttranscriptional regulation of gene expression. miRNAs bind specifically to the 3' untranslated region of messenger RNA (mRNA) molecules and induce translational repression or mRNA degradation. Potential miRNA targets can be predicted by various computational algorithms that take several parameters into consideration and calculate probability scores for each miRNA-mRNA interaction. In this review, three of the most frequently used algorithms (TargetScan, PicTar and miRBase) are compared, and their strengths and weaknesses are highlighted. These algorithms use different input databases and mathematical models that may lead to discrepancies of the outputs. As currently there is no unambiguous evidence for the preference of any of these algorithms, simultaneous analysis by all can be an effective approach. For this purpose a novel software was developed which is capable of collecting all available data about each miRNA-mRNA interaction retrieved by these databases and identifying common putative targets. Another major problem is related to difficulties of experimental validation, therefore only a minority of in silico predicted targets could be validated to date. Following a brief description of the available experimental target validation methods, the authors attempt to provide suggestions for designing in vitro validation approaches.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 5, Issue 1, 2010

Volume 5, Issue 1, 2010

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed