Current Bioinformatics - Volume 2, Issue 2, 2007
Volume 2, Issue 2, 2007
-
-
Advances in Exploration of Machine Learning Methods for Predicting Functional Class and Interaction Profiles of Proteins and Peptides Irrespective of Sequence Homology
More LessAuthors: Juan Cui, Lianyi Han, Honghuang Lin, Zhiqun Tang, Zhiliang Ji, Zhiwei Cao, Yixue Li and Yuzong ChenVarious computational methods have been used for predicting protein function from clues contained in protein sequence. A particular challenge is the functional prediction of proteins that show low or no sequence similarity to proteins of known function. Recently, machine learning methods have been explored for predicting functional class of proteins from a variety of sequence-derived structural and physicochemical properties independent of sequence similarity, which showed promising potential for a broad spectrum of proteins including those that show low and no similarity to other proteins. These methods can thus be explored as potential tools to complement similarity-based, clustering-based and structure-based methods for predicting protein function. This article reviews the strategies, algorithms, current progresses, available software and web-servers, and underlying difficulties in using machine learning methods for predicting the functional class of proteins and peptides, and protein-protein interactions. The reported prediction performances in the application of these methods are also presented.
-
-
-
A Decade of Computing to Traverse the Labyrinth of Protein Domains
More LessDetection and characterization of structural domains of proteins is crucial for determination of its tertiary structure, elucidation of its functions and design and production of its biologically active analogs. Identification of domainsegments at the sequence level is also important in deciphering protein structural genomics and in evolutionary studies. The diversity of domain folds and sequences and high structural flexibility of the inter-domain linker regions pose great challenges for determination of multi-domain protein structures even from X-ray crystallographic or NMR spectroscopic data or by homology modeling. The problems get manifold in the absence of any such data or sequence homologies. Interestingly though, identification of protein domains is a unique research problem where ab-intio computational investigations supersede the experimental ones or offer better applications of the latter. Advancement of Bioinformatics and Computational Biology in post-genomic research has led to plethora of approaches, algorithms and web-server developments for prediction of protein domains using - 3D co-ordinates, partial structural information including secondary structure or only the primary sequence. Here we assess the state-of-art developments in the field. Trend-setting as well as widely used computational methods and web-servers/databases are reviewed here with a focus on their applicability, novelty and strength in mining the multiple features of sequence/structure that contribute to formation and distinctions and diversity of protein domains. Future possibilities of a unified system with optimal decision support are highlighted.
-
-
-
Gene Set Enrichment Analysis (GSEA) for Interpreting Gene Expression Profiles
More LessAuthors: Jing Shi and Michael G. WalkerGene set enrichment analysis (GSEA) is a statistical method to determine if predefined sets of genes are differentially expressed in different phenotypes. Predefined gene sets may be genes in a known metabolic pathway, located in the same cytogenetic band, sharing the same Gene Ontology category, or any user-defined set. In microarray experiments where no single gene shows statistically significant differential expression between phenotypes, GSEA has identified significant differentially expressed sets of genes, even where the average difference in expression between two phenotypes is only 20% for genes in the gene set. The gene set identified in the first GSEA analysis (oxidative phosphorylation genes differentially expressed in diabetic versus non-diabetic patients) was subsequently confirmed by independent laboratory studies published in the New England Journal of Medicine. Since the first paper on GSEA was published, many extensions and alternative methods have been described in the literature. In this paper, we describe the original GSEA algorithm, subsequent extensions and alternatives, results of some of the applications, some limitations of the methods and caveats for users, and possible future research directions. GSEA and related methods are complementary to conventional single-gene methods. Single gene methods work best when individual genes have large effects and there is small variance within the phenotype. GSEA is likely to be more powerful than conventional single-gene methods for studying the large number of common diseases in which many genes each make subtle contributions. It is a tool that deserves to be in the toolbox of bioinformatics practitioners.
-
-
-
Inference of Gene Regulatory Networks and its Validation
More LessGenes encode proteins, some of which in turn regulate other genes. Such interactions make up a gene regulatory network. The understanding and unraveling of gene regulatory networks have been proven very useful in disease diagnosis and genomic drug design. Due to the complexity of gene regulatory networks, the completely understanding of their dynamics is difficult to achieve only through biological experiments without any computational aids. As a consequence, computational models for gene regulatory networks are indispensable. Recently a wide variety of different computational models have been proposed for inferring gene regulatory networks. This paper surveys some of computational models for inferring large gene regulatory networks, in particular, Boolean network model, differential/ difference equation models, and state-space models. Some advantages and disadvantages of these models are commented on. Some criteria for validating the inferred gene regulatory networks are also discussed from the bioinformatics perspective. Finally, several directions of the future work for modeling gene regulatory networks are proposed.
-
-
-
Spectral Estimation Techniques for DNA Sequence and Microarray Data Analysis
More LessAuthors: Hong Yan and Tuan D. PhamSpectral estimation techniques are widely used in modern signal processing systems. Recently, they have found important applications to the analysis of DNA data. In this paper, we review parametric and non-parametric spectral estimation methods for DNA sequence and microarray data analysis. The discrete Fourier transform (DFT) is the most commonly used technique for spectral analysis of digital signals. It can reveal the gene locations in a DNA sequence. The DFT can also be used to detect repetitive elements in a DNA sequence. The DFT produces the so-called windowing or data truncation artifacts when it is applied to a short data segment. Parametric spectral estimation methods, such as the autoregressive (AR) model, overcome this problem and can be used to obtain a high-resolution spectrum of the input signal. In this paper, we demonstrate the advantages of the AR model for the identification of protein coding regions and the detection of DNA repeats. We also review DFT and AR models and other spectral estimation techniques for the analysis of microarray time series data.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month