Protein and Peptide Letters - Volume 20, Issue 3, 2013
Volume 20, Issue 3, 2013
-
-
A Sequence-based Approach for Predicting Protein Disordered Regions
Authors: Tao Huang, Zhi-Song He, Wei-Ren Cui, Yu-Dong Cai, Xiao-He Shi, Le-Le Hu and Kuo-Chen ChouProtein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathew's correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.
-
-
-
A Protein Block Based Fold Recognition Method for the Annotation of Twilight Zone Sequences
Authors: V. Suresh, K. Ganesan and S. ParthasarathyThe description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and Pvalue ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
-
-
-
Prediction of Turnover Number of Cellulose 1,4-Beta-Cellobiosidase
Authors: Shaomin Yan and Guang WuThe turnover number is an important parameter to distinguish whether an enzyme is practically workable. Therefore the prediction of turnover number of enzyme will reduce the workload to conduct time-consuming and costly experiments to determine the turnover number. However, no studies have been so far conducted to predict them with respect to cellulose 1,4-beta-cellobiosidase, which is an enzyme used in industries, especially in bio-fuel industry. It is important to develop methods to predict the turnover numbers of cellulose 1,4-beta-cellobiosidases from both wild-type and mutations. In this study, we used neural network models with different amino acid properties, pH levels, temperatures and substrates as inputs to predict the turnover number. The results show that 11 out of 25 amino acid properties analyzed can work as predictor and the amino acid distribution probability is the best one because it can reach smaller mean squared errors during convergence and higher correlation coefficient in two-layer neural network models. This study demonstrates the probability that the neural network model can approximately predict the turnover number of cellulose 1,4-betacellobiosidase.
-
-
-
Computational Studies on Enzyme-Substrate Complexes of Methanogenesis for Revealing their Substrate Binding Affinities to Direct the Reverse Reactions
More LessIn the present work, a combined approach of molecular modeling and systems biology was used to reveal how structural dynamics of enzymes involving in methanogenesis contributed to do reverse methanogenic reactions in methanotrophic archaea. The binding energies and molecular interaction distances of homology models and crystallographic structures of each enzyme with corresponding substrates were computed and its binding affinity compared with experimental enzyme kinetic data. The binding energies of enzyme model-substrate complexes in each reaction were favored to reverse reactions compared to PDB structure-substrate complexes, supporting the existence of structural motions to direct substrate specificities in reverse order. Based on these, a proposed metabolic pathway for reverse methanogenesis in methanotrophic archaea was constructed, and its metabolic flux balance analyzed with experimental data of each enzyme reaction step. Methyl CoM reductase and methylene tetrahydromethanopterin reductase were assumed to determine the rate of the reverse methanogenesis reactions. Pathway model of this study should be concerned on understanding the cellular behavior of reverse methanogenesis in response to methane consumption from environment. Binding mode analysis of enzymes is thus directly correlated to molecular conservation and functional divergence of reverse methanogenesis, which lends strong support to reveal the molecular evolutionary hypothesis for methanotrophic archaea.
-
-
-
In Silico Prediction of Cytochrome P450-Mediated Site of Metabolism (SOM)
Authors: Xian Liu, Qiancheng Shen, Jing Li, Shanshan Li, Cheng Luo, Weiliang Zhu, Xiaomin Luo, Mingyue Zheng and Hualiang JiangDrug metabolism is a major consideration for modifying drug clearance and also a primary source for drug metabolite- induced toxicity. Cytochromes P450 (CYPs) are the major enzymes involved in drug metabolism and bioactivation, accounting for almost 75% of the total drug metabolism. Predicting the sites of cytochrome P450-mediated metabolism of drug-like molecules using in silico methods would be highly beneficial and time efficient. An ideal system would enable researchers to make a confident elimination decision based purely on the structure of a new compound. In this review, several tools and models for predicting probable site of metabolism (SOM) have been compared and discussed. The methods are generally based on enzyme structure, ligand structure, and combined methods. Although all the methods have certain accuracy and considerable progress has been made, the results of the calculations still need careful inspection.
-
-
-
HIV-1 Protease Cleavage Site Prediction Based on Two-Stage Feature Selection Method
Authors: Bing Niu, Xiao-Cheng Yuan, Preston Roeper, Qiang Su, Chun-Rong Peng, Jing-Yuan Yin, Juan Ding, HaiPeng Li and Wen-Cong LuKnowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
-
-
-
FRKAS: Knowledge Acquisition Using a Fuzzy Rule Base Approach to Insight of DNA-Binding Domains/Proteins
Authors: Hui-Ling Huang, Fang-Lin Chang, Shinn-Jang Ho, Li-Sun Shu, Wen-Lin Huang and Shinn-Ying HoNumerous prediction methods of DNA-binding domains/proteins were proposed by identifying informative features and designing effective classifiers. These researches reveal that the DNA-protein binding mechanism is complicated and existing accurate predictors such as support vector machine (SVM) with position specific scoring matrices (PSSMs) are regarded as black-box methods which are not easily interpretable for biologists. In this study, we propose an ensemble fuzzy rule base classifier consisting of a set of interpretable fuzzy rule classifiers (iFRCs) using informative physicochemical properties as features. In designing iFRCs, feature selection, membership function design, and fuzzy rule base generation are all simultaneously optimized using an intelligent genetic algorithm (IGA). IGA maximizes prediction accuracy, minimizes the number of features selected, and minimizes the number of fuzzy rules to generate an accurate and concise fuzzy rule base. Benchmark datasets of DNA-binding domains are used to evaluate the proposed ensemble classifier of 30 iFRCs. Each iFRC has a mean test accuracy of 77.46%, and the ensemble classifier has a test accuracy of 83.33%, where the method of SVM with PSSMs has the accuracy of 82.81%. The physicochemical properties of the first two ranks according to their contribution are positive charge and Van Der Waals volume. Charge complementarity between protein and DNA is thought to be important in the first step of recognition between protein and DNA. The amino acid residues of binding peptides have larger Van Der Waals volumes and positive charges than those of non-binding ones. The proposed knowledge acquisition method by establishing a fuzzy rule-based classifier can also be applicable to predict and analyze other protein functions from sequences.
-
-
-
Virus-ECC-mPLoc: A Multi-Label Predictor for Predicting the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites Based on a General Form of Chou's Pseudo Amino Acid Composition
Authors: Xiao Wang, Guo-Zheng Li and Wen-Cong LuProtein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of viral proteins in a host cell or virus-infected cell is important because it is closely related to their destructive tendencies and consequences. Prediction of viral protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods specialized for viral proteins are only used to deal with the single-location proteins. To better reflect the characteristics of multiplex proteins, a new predictor, called Virus-ECC-mPLoc, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and by hybridizing the gene ontology information with the dipeptide composition information. It can be utilized to identify viral proteins among the following six locations: (1) viral capsid, (2) host cell membrane, (3) host endoplasmic reticulum, (4) host cytoplasm, (5) host nucleus, and (6) secreted. Experimental results show that the overall success rates thus obtained by Virus-ECC-mPLoc are 86.9% for jackknife test and 87.2% for independent data set test, which are significantly higher than that by any of the existing predictors. As a user-friendly web-server, Virus-ECCmPLoc is freely accessible to the public at the web-site http://levis.tongji.edu.cn:8080/bioinfo/Virus-ECC-mPLoc/.
-
-
-
Using WPNNA Classifier in Ubiquitination Site Prediction Based on Hybrid Features
Authors: Kai-Yan Feng, Tao Huang, Kai-Rui Feng and Xiao-Jun LiuUbiquitination, a reversible protein post-translational modification (PTM), occurs when an amide bond is formed between ubiquitin (a small protein) and the targeted protein. It involves in a wide variety of cellular processes and is associated with various diseases such as Alzheimer's disease. In order to understand ubiquitination at the molecular level, it is important to identify the ubiquitination site by which the ubiquitin binds to. Since experimental methods to determine ubiquitination sites are both expensive and time-consuming, it is necessary to develop in-silico methods to predict ubiquitination sites based on merely the sequential information of the target protein. In this paper, we apply a new classifier called weighted passive nearest neighbor algorithm (WPNNA) to predict the ubiquitination sites. WPNNA was demonstrated to be insensitive to the varied datum densities between different classes. A hybrid of features, including PSSM conservation scores, amino acid factors and disorder scores, are employed to code the protein fragments centered on the possible ubiquitination sites. The Mathew's correlation coefficient (MCC) of our predictor on a training dataset is 0.169 with sensitivity of 31.6% and specificity of 82.9%, and on an independent test dataset is 0.403 with sensitivity of 64.3% and specificity of 75.7%. We compare our predictor with that of a recent published paper which also made predictions on the same datasets. Our predictor achieves much better sensitivities on both datasets than the paper and achieves much better MCC than the paper on the independent test dataset, indicating that the predictor based on WPNNA is as least a good complement to the current state of art in ubiquitination site prediction.
-
-
-
Inter- and Intra-Chain Disulfide Bond Prediction Based on Optimal Feature Selection
Authors: Shen Niu, Tao Huang, Kai-Yan Feng, Zhisong He, Weiren Cui, Lei Gu, Haipeng Li, Yu-Dong Cai and Yixue LiProtein disulfide bond is formed during post-translational modifications, and has been implicated in various physiological and pathological processes. Proper localization of disulfide bonds also facilitates the prediction of protein three-dimensional (3D) structure. However, it is both time-consuming and labor-intensive using conventional experimental approaches to determine disulfide bonds, especially for large-scale data sets. Since there are also some limitations for disulfide bond prediction based on 3D structure features, developing sequence-based, convenient and fast-speed computational methods for both inter- and intra-chain disulfide bond prediction is necessary. In this study, we developed a computational method for both types of disulfide bond prediction based on maximum relevance and minimum redundancy (mRMR) method followed by incremental feature selection (IFS), with nearest neighbor algorithm as its prediction model. Features of sequence conservation, residual disorder, and amino acid factor are used for inter-chain disulfide bond prediction. And in addition to these features, sequential distance between a pair of cysteines is also used for intra-chain disulfide bond prediction. Our approach achieves a prediction accuracy of 0.8702 for inter-chain disulfide bond prediction using 128 features and 0.9219 for intra-chain disulfide bond prediction using 261 features. Analysis of optimal feature set indicated key features and key sites for the disulfide bond formation. Interestingly, comparison of top features between interand intra-chain disulfide bonds revealed the similarities and differences of the mechanisms of forming these two types of disulfide bonds, which might help understand more of the mechanisms and provide clues to further experimental studies in this research field.
-
-
-
Prediction of Protein-protein Interactions Based on Feature Selection and Data Balancing
Authors: Liang Liu, Wen-Cong Lu, Yu-Dong Cai, Kai-Yan Feng, Chunrong Peng and Yubei ZhuComputational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.
-
-
-
Computational Methods for DNA-binding Protein and Binding Residue Prediction
Authors: Yao Lu, Xiang Wang, Xuesong Chen and Guijun ZhaoProtein-DNA interactions are involved in many essential biological processes such as transcription, splicing, replication and DNA repair. It is of great value to identify DNA-binding proteins as well as their binding sites in order to study the mechanisms of these biological processes. A number of experimental methods have been developed for the identification of DNA-binding proteins, such as DNAase foot printing, EMSA, X-ray crystallography, NMR spectroscopy and CHIP-on-Chip. However, with the increasingly greater number of suspected protein-DNA interactions, identification by experimental methods is expensive, labor-intensive and time-consuming. Hence, in the past decades researchers have developed many computational approaches to predict in silico the interactions of proteins and DNA. Machine learning technology has been widely used and become dominant in this field. In this article, we focus on reviewing recent machine learning–based progresses in DNA-binding protein and binding residue prediction methods, the most commonly used features in these predictions, machine learning classifier comparison and selection, evaluation method comparison, and existing problems and future directions for the field.
-
-
-
MicroRNA Mediated Network and DNA Methylation in Colorectal Cancer
Authors: Bi-Qing Li, Hui Yu, Zhen Wang, Guo-Hui Ding and Lei LiuColorectal cancer (CRC) is one of the most malignant cancers. A growing number of studies have shown that both genetic and epigenetic play important roles in the etiology of CRC. Both microRNA (miRNA) and DNA methylation belong to the scope of epigenetic and there are complex regulatory mechanisms within miRNA and DNA methylation. We compiled 71 CRC related genes and 134 CRC related miRNAs. Then we identified 417 feed forward loops (FFLs) and 37 feedback loops (FBLs) among these genes, miRNAs and transcription factors (TFs). We constructed a network of miRNAs and TFs mediation for CRC utilizing these FFLs and FBLs. Statistical tests proved that these FFLs were significantly enriched in the CRC comparing to the esophageal cancer, breast cancer and randomly selected CRCmiRNA-gene pairs. Analysis of the network singled out 3 core genes, 2 core miRNAs and 5 core TFs. The KEGG enrichment and GO enrichment for the 2 core miRNA target genes indicated that they were significantly enriched in CRC related pathways. (Ex. MARK pathway, TGFβ pathway and cell cycle) Through the investigation on methylation, we found that most of the CRC related genes and miRNAs were prone to be regulated by methylation. This study sheds lights on the regulatory mechanisms in CRC and we provide some insights on the epigenetic of CRC.
-
-
-
A Two-step Similarity-based Method for Prediction of Drug's Target Group
Authors: Lei Chen and Wei-Ming ZengDetermination of drug's target protein is very important for studying drug-target interaction network, while drug-target interaction network is a key area in the drug discovery pipeline. Thus correct prediction of drug's target protein is very helpful to promote the development of drug discovery. In this study, we developed a two-step similarity-based method to predict drug's target group. In each step, a similarity score (obtained by graph representation in the first step, and chemical functional group representation in the second step) was employed to make prediction. Since some drugs can target proteins distributing in more than one group of proteins, the method provided a series of candidate target groups for each drug. As a result, the first-order prediction accuracy on training set and test set were 79.01% and 76.43%, respectively, which were much higher than the success rate of a random guess. The results show that using graph representation to encode drug is a good choice in this area. We expect that this contribution will provide some help to understand drugtarget interaction network.
-
-
-
Meta Genome-wide Network from Functional Linkages of Genes in Human Gut Microbial Ecosystems
Authors: Yan ji, Yixiang Shi, Chuan Wang, Jianliang Dai and Yixue LiThe human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.
-
Volumes & issues
-
Volume 32 (2025)
-
Volume 31 (2024)
-
Volume 30 (2023)
-
Volume 29 (2022)
-
Volume 28 (2021)
-
Volume 27 (2020)
-
Volume 26 (2019)
-
Volume 25 (2018)
-
Volume 24 (2017)
-
Volume 23 (2016)
-
Volume 22 (2015)
-
Volume 21 (2014)
-
Volume 20 (2013)
-
Volume 19 (2012)
-
Volume 18 (2011)
-
Volume 17 (2010)
-
Volume 16 (2009)
-
Volume 15 (2008)
-
Volume 14 (2007)
-
Volume 13 (2006)
-
Volume 12 (2005)
-
Volume 11 (2004)
-
Volume 10 (2003)
-
Volume 9 (2002)
-
Volume 8 (2001)
Most Read This Month
