Current Bioinformatics - Volume 8, Issue 5, 2013
Volume 8, Issue 5, 2013
-
-
Using Multitask Learning Methods to Investigate Signal Peptides and Signal Anchors.
Authors: Ning Zhang, Shan Gao, Lei Chen and Jishou RuanThe identification of signal peptides and signal anchors is critical to understand the related biological mechanisms, and to find more effective vehicles for proteins production. Instead of studying signal peptides and signal anchors of single species separately, we developed a multi-tasking learning framework to investigate them across species simultaneously. By a multi-tasking feature selection method, we identified 12 important features out of 560 amino acid indices. The effectiveness and classification abilities of these 12 features were evaluated by another multi-tasking method, based on cross-validation and independent test. Further analysis of selected features brought some new insights into the physiochemical properties of signal peptides and signal anchors.
-
-
-
Similarity/Dissimilarity Analysis of Protein Sequences by a New Graphical Representation.
Authors: Guohua Huang and Jerry HuMainly based on pKa (NH3+) values of amino acid, a novel graphical method without degeneracy for protein sequences has been proposed firstly, which assists in viewing, aligning and comparing multiple sequences visually. Then, a new algorithm to extract a 40-dimensional numerical vector from graphical curves has been presented to characterize protein sequences. The similar relationship among sequences is computed by Euclidean distance on corresponding numerical vectors. Finally, our method is applied for similarity analysis of protein sequences on two data sets. The results are in agreement with the acknowledged view proved by a great deal of evidence from anatomy and hence demonstrate the validity of this approach.
-
-
-
Predicting Biological Functions of Protein Complexes Using Graphic and Functional Features.
Authors: Lei Chen, Bi-Qing Li and Kai-Yan FengProtein complexes involve in most if not all of essential biological processes in a living cell. Many attempts have been devoted to identify protein complexes using computational methods, most of which exploit protein-protein interaction networks to search intensively interacting proteins as a protein complex. Besides identifying protein complexes, knowing their biological functions may help unlock their molecular mechanisms and their roles in related biological processes. Therefore, it is also desirable to computationally predict the functions of protein complexes. However, no literature has been found to address such a problem. This paper attempts to address the problem by choosing yeast as the model organism, where total 50 protein complexes are collected and their functions are validated by solid experiments. Each of the complexes was encoded by a numeric vector based upon their graphic and functional properties. Feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract core features for the prediction. Three different prediction methods, Nearest Neighbor Algorithm, Bayesian network and Sequential Minimal Optimization, were utilized in this study and tested by jackknife crossvalidation test. Consequently, 22 core features coupled with Nearest Neighbor Algorithm gain the highest accuracy. These core features are regarded as the most important features for the determination of the biological functions of protein complexes. 19 out of 22 core features were from functional properties, indicating that the functions of each protein component probably constrain the overall functions of the protein complex.
-
-
-
PredictFold-PSS-3D1D: A Protein Fold Recognition Server for Predicting Folds from the Twilight Zone Sequences.
Authors: Kaliappan Ganesan and Subbiah ParthasarathyThe PredictFold-PSS-3D1D is an online protein fold recognition web server used to predict the possible folds from the twilight zone protein sequences. In this server, an improved 3D1D profile method (Ganesan and Parthasarathy, J. Struct. Funct. Genomics, 12, 181-189, 2011) is employed, wherein, the inclusion of predicted secondary structure information improves fold recognition. The PredictFold-PSS-3D1D server accepts amino acid sequences and their predicted secondary structure data as input and aligns them with the 3D1D profiles of known SCOP folds in a database. The alignments are ranked by the z-values and P-values. The top 5 ranks of the SCOP folds from the database are listed along with a link to ‘View SCOP details’. The folds with z-values ≥3.0 and P-values ≤0.05 are indicated as ‘Predicted Fold’ for the given query twilight zone protein sequence. This server is available in our PredictFold web server at http://bioinfo.bdu.ac.in/pss3d1d/.
-
-
-
A New Approach for Identifying Protein-Coding Regions by Combining Chirp z and Wavelet Transform.
Authors: Suping Deng, Liyun Yuan, Kanyan Feng, Guohui Ding and Yixue LiIdentifying protein coding regions of DNA sequences is an important step in gene annotation. It is well-known that the protein coding regions of most genomic sequences exhibit a period-3 pattern due to non-uniform distribution of codons. In order to identify protein coding regions more efficiently, a new identification approach was proposed by combining chirp z transform and wavelet transform based on period-3 property. The identification method was applied to 17 DNA sequences of different organisms and achieved a high sensitivity (>80%) in all sequences. Demanding no prior training sets, the approach is fast and could potentially be used widely and conveniently.
-
-
-
Small Molecules' Multi-Metabolic Pathways Prediction Using Physico- Chemical Features and Multi-Task Learning Method.
Authors: Bing Niu, Lei Gu, Chunrong Peng, Juan Ding, Xiaochen Yuan and Wencong LuKnowledge of mechanism of small molecules in metabolic pathway is critical to design specific and effective inhibitors for metabolic pathway. As some small molecules are involved in more than one pathway, it is crucial to use an accurate and robust approach to correctly map the small molecule in specific metabolic pathway that it is involved in. In this article, small molecules are studied using the Minimal-Redundancy-Maximal-Relevance-Forward Feature Search (mRMR-FFS) method combined with Multi-task learning method based on K-nearest neighbor (KNN) Algorithms method. Forty-five important chemical features were found based on 10-folds cross validation test from original data set containing 61 features. By applying KNN method with these forty-five selected features, the accuracy rate of prediction model could achieve 68.2% for the 10-folds cross validation test. It is promosing that our two stage scheme can be a useful approach for searching new effective competitive drugs in metabolic pathway.
-
-
-
Development of an Engineering Method to Optimize Polyamine Metabolic Pathways.
Authors: Mouli Das, Subhasis Mukhopadhyay and Rajat K. DeIn this article, we determine an optimal set of enzymes which is needed to be expressed at a specific level, in order to maximize the production of the target metabolite, spermine, from the substrate, ornithine, in the polyamine metabolic pathway. The pathway thus obtained is compared with the optimal pathway obtained using the existing extreme pathway analysis method. The results are appropriately validated through literature. Finally, we discuss potential applications of this approach in the field of metabolic and genetic engineering. It is hoped that the engineering of this pathway can produce secondary metabolites having commercial values, and finally drugs.
-
-
-
Gene Sets of Gene Ontology are More Stable Diagnostic Biomarkers than Genes in Oral Squamous Cell Carcinoma
Authors: Tao Huang, Wei Wu, Honglai Jin and Yu-Dong CaiMany people suffer from oral squamous cell carcinoma (OSCC), a kind of cancer with high severity and prevalence. The usual diagnostic method of oral squamous cell carcinoma is still very primitive by visually examining the mouth. When patients have a definitive diagnosis, they usually have missed the optimal therapeutic period. Therefore, it is essential to diagnose OSCC as early as possible. Microarray technology has been widely used for biomarker identification, but individual gene biomarkers excavated from microarray studies are often limited by poor reproducibility and robustness, since there was little or no overlap between different studies in term of their results. Here, we used both gene based approach and gene set based approach to identify biomarkers of oral squamous cell carcinoma using five independent data sets. Then, we evaluated the reproducibility of differentially expressed genes in five data sets quantified by t-test p values, and the reproducibility of Gene Ontology (GO) gene sets in five data sets, quantified by Matthews’s correlation coefficient (MCC) using leave-one-out cross validation (LOOCV). Very weak correlation was found between the differentially expressed genes in most data set pairs - the average Pearson correlation coefficient of ten data set pairs was merely 0.048. However, the GO gene sets among data set pairs are significantly correlated – Pearson correlation test p value is 0 for all data set pairs and the average Pearson correlation coefficient is 0.510. Our study shows that it is feasible to identify stable and reproducible gene set biomarkers and pave a way for discovering diagnostic biomarkers of oral squamous cell carcinoma using GO gene sets.
-
-
-
Assessing the Gender Differences of Adverse Effects in HIV Infection Treatment Based on FDA AERS Database.
Authors: Juanjuan Xiao, Ying Wang, Zuofeng Li, Xufeng Zhang, Kaiyan Feng and Lei LiuAdverse effects of HIV infection treatment cannot be ignored because the occurrences of undesired adverse effects will result in low drug adherence and influence the outcome of HIV treatment. Adverse effects are significantly associated with gender. However, the role of gender in the occurrences of adverse effects has not been clearly explored. In order to provide additional guidance to prescription, we analyzed the adverse effects in HIV treatment aiming to find some gender-based differences using FDA AERS database. Interestingly, women and men performed differently in almost all the HIV-therapy adverse effects, among which adverse effects associated with skin and subcutaneous tissues occurred more frequently in male among younger adults while more frequently in female among older adults. The different occurrences of skin and subcutaneous adverse effects in men and women indicated that estrogen and androgen might function differently on skin during HIV therapy. Gender differences exist in both inter-groups and intra-groups of HIV medications. The drug NNRTI induced more adverse effects in female, but PI and INI more in male. Two NNRTI medications, namely efavirenz and nevirapine, caused different adverse effects between the two genders. Efavirenz induced more adverse effects in males and mainly affected nervous system whereas nevirapine caused more adverse effects in females and mainly affected skin and subcutaneous tissues. The gender of female is a risk factor for nevirapine, which is closely related to adverse effects associated with skin. Therefore, the recognition of gender differences in adverse effects may be helpful in prescribing medications to HIV-infected patients, e.g. greater caution should be taken when prescribing nevirapine to women.
-
-
-
Recent Advances in Mathematical Modeling and Simulation of DNA Replication Process.
Authors: Guoli Ji, Yong Zeng, Jinting Guan, Qingshun Q. Li, Congting Ye and Yunlong LiuDNA replication is the basis for biological inheritance, involving a series of sophisticated biochemical processes. Over the past decades, numerous in vitro or in vivo experiments have been implemented among a variety of organisms. While these resource-intensive techniques may always be costly, time-consuming or unable to measure the replication process on a global scale. Recently, mathematical modeling and computational simulation of biochemical processes that can be used for rapid testing of biology hypotheses have attracted considerable attention. In this review, we outline some key mathematical and computational works proposed recently for DNA replication process, with emphasis on the modeling and simulation of the replication origin identification and characterization, the replication process initiation and regulation, and the genome-wide profiling of DNA replication. Although many excellent works have been done, for a deeper insight into the DNA replication process, further iteration of mathematical modeling and biochemical experiment are still needed, and the prospective and possible research directions are discussed herein.
-
-
-
A Novel Unified Ab Initio and Template-Based Approach to GPCR Modeling: Case of EDG-LPA Receptors.
Authors: Olaposi I. Omotuyi and Hiroshi UedaG-protein-coupled receptors (GPCRs) mediate diverse biological functions through intracellular signal cascades initiated by intracellular G-protein coupling following extracellular agonist binding. GPCRs are quintessential targets for drug design due to their involvement in pathophysiological conditions. The difficulty associated with GPCR crystallization and lack of accurate computational method for GPCR modeling constitutes the major setbacks for GPCRbased drug development. Here, we reported the combination of previously known ab initio and template-based methods as a novel approach applicable for modeling geometrically optimized full-length GPCR. First, geometry-optimized transmembrane helices (7Tms) of full-length GPCR are modeled using the GPCR server (http://gpcr.usc.es) followed by loop-refinement. A second structure is generated via the Iterative approach as implemented on I-TASSER (http://zhanglab.ccmb.med.umich.edu/I-TASSER/) server. The best Structures are then selected from the servers based on DOPE-score (GPCR server) and C-score (I-TASSER server) and piped into ModRefiner algorithm as initial and reference models respectively. ModRefiner drives the folding of the N- and C-termini regions of the initial model towards the reference model without altering the local geometries of the 7Tms and the loop regions as evaluated by Local-Global Alignment (LGA) algorithm. Finally, atomic clashes in the ModelRefiner output are resolved using Fragment-Guided Molecular Dynamics (FG-MD) simulation. Comparatively, FG-MD output structures of our test proteins (Endothelial Differentiation Gene-class (EDG) Lysophosphatidic acid receptors) have better model qualities than the initial and reference structures as evaluated by the QmeanScore6 algorithm.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
