Volume 19, Issue 1

Protein and Peptide Letters - Volume 19, Issue 1, 2012

Volume 19, Issue 1, 2012

- Preface
  
  By Ben M. Dunn
  
  https://doi.org/10.2174/092986612798472901
  More Less
  
  This issue marks the start of Volume 19 of Protein & Peptide Letters. Since the beginning in 1994, the journal has continued to attract new readers and contributors. This has resulted in a steady increase in the Impact Factor of PPL to it's current level of 1.849. We anticipate that this will continue to climb, especially with the large number of special Hot Topics issues we have published in 2011 and the ones planned for 2012. Of equal importance is the higher and higher standards that are being applied by the Regional Editors and the referees. I want to thank each of the Regional Editors for their service to the journal: Prof. Anna M. Papini, Prof. John D. Wade, Dr. Kuo-Chen Chou, Prof. Liang Tong, Prof. Vladimir N. Uversky, Prof. Francisco A.P. Campos, Dr. Zhan-yun Guo. Of great importance, Bentham Science Publishing has begun a new and greatly improved online system for manuscript submission and processing (http://bsp-cms.eurekaselect.com/). As with any new endeavor, there is an initial learning curve, but all Regional Editors are now working with the new system and we anticipate increased speed in the peer review process in the coming year. Next, I want to thank the many scientists around the world who have submitted their best work to PPL. It has been and continues to be a great pleasure for me to interact with the scientists from many countries through the journal. This continues to be a very rewarding aspect of my professional career. With the continued loyalty of PPL's contributors, the journal will continue to grow in stature. Finally, and of great importance, I wish to thank the managers of PPL at Bentham Science; Sarwat Aziz Abbasi, Maleeha Naz and Beenish Anwer. Their tireless service to the journal and their valuable help and advice to me have been an absolutely crucial part of the long term development of PPL. I look forward to another productive year of PPL.
  
  Add to my favourites
  
  Email this

- Editorial [Hot Topic: The Application of Systems Biology and Bioinformatics Methods in Proteomics, Transcriptomics and Metabolomics (Guest Editor: Yu-Dong Cai)]
  
  By Yu-Dong Cai
  
  https://doi.org/10.2174/092986612798472947
  More Less
  
  We are glad to offer this special issue to the readers with fifteen papers focused on the application of systems biology and bioinformatics methods in proteomics, transcriptomics, and metabolomics. Currently, more and more large-scale biology data, such as sequences, gene expression and protein-protein interactions, have been stored in the databases. Therefore, data analysis methods, such as machine learning, graph theory approach, and statistics analysis, are widely used in various areas of systems biology and bioinformatics, in order to predict the functions of proteins, their genes and networks, as well as their interactions with compounds. The articles of this issue can be roughly categorized into the following four groups. 1. Proteomics The first two papers are focused on protein subcellular location prediction. Wu et al. report a novel web-server predictor, called “iLoc-Gpos”, by introducing the multi-layer scale. It can be used to predict the subcellular localization of Gram positive bacterial proteins with both single-location and multiple-location sites. The web-server for the iLoc-Gpos predictor is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gpos. Wang et al. developed a new web-server named “PSCL” for plant protein subcellular localization prediction based on the optimized functional domains. For the dataset constructed by them, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test, a quite encouraging outcome. The next four papers are focused on the prediction of protein structure and properties. Hu et al. report a new approach to classify protein quaternary structure based on the sequence information. The results thus obtained are important for identifying protein function. In the paper by Yan et al., the authors report the application of using neural network to predict the optimal pH and temperature of cellulases. The information thus acquired is particularly useful for finding the optimal working conditions in enzymatic reactions. Mizianty and Kurgan have developed a novel approach for high-throughput sequence-based prediction of the propensity of protein chains for X-ray crystallography-based structure determination. Their method is shown to outperform the current approaches as evidenced by empirical tests on three benchmark datasets. This methodology finds useful applications in support of the target selection procedures that are implemented by structural genomics centers. Pugalenthi et al. have used the Random Forest Method to predict the residue solvent accessibility based on the protein sequence information. Their approach can be used for the prediction of residue solvent accessibility from protein sequence without the need of structural information. The other two papers are written to address the relationship between the amino acid variants and disease. Li et al. report a web-server called SCYPPred that was developed based on the SVM flanking sequence method and that can be used to predict human cytochrome P450 SNPs (Single Nucleotide Polymorphisms). Prostate Cancer is a serious disease. Cai et al. report that UGT2B17 might be one of the risk factors for Prostate Cancer in men. The conclusion was based on a comprehensive metaanalysis on the correlation of prostate cancer with variants in CYP17 and UGT2B17. The last one is about the post-translation modification. He et al. developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction, by applying the machine learning approach and feature selection procedure. It can be used to predict whether a protein contains phosphorylation sites and their exact sites according to the sequence of the protein concerned.....
  
  Add to my favourites
  
  Email this

- iLoc-Gpos: A Multi-Layer Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Gram-Positive Bacterial Proteins
  
  Authors: Zhi-Cheng Wu, Xuan Xiao and Kuo-Chen Chou
  
  https://doi.org/10.2174/092986612798472839
  More Less
  
  By introducing the “multi-layer scale”, as well as hybridizing the information of gene ontology and the sequential evolution information, a novel predictor, called iLoc-Gpos, has been developed for predicting the subcellular localization of Gram positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gpos-mPLoc was adopted to demonstrate the power of iLoc-Gpos. The dataset contains 519 Gram-positive bacterial proteins classified into the following four subcellular locations: (1) cell membrane, (2) cell wall, (3) cytoplasm, and (4) extracell; none of proteins included has ≥25% pairwise sequence identity to any other in a same subset (subcellular location). The overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gpos was over 93%, which is about 11% higher than that by GposmPLoc. As a user-friendly web-server, iLoc-Gpos is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc- Gpos or http://www.jci-bioinfo.cn/iLoc-Gpos. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user’s convenience, the iLoc-Gpos web-server also has the function to accept the batch job submission, which is not available in the existing version of Gpos-mPLoc web-server.
  
  Add to my favourites
  
  Email this

- PSCL: Predicting Protein Subcellular Localization Based on Optimal Functional Domains
  
  Authors: Kai Wang, Le-Le Hu, Xiao-He Shi, Ying-Song Dong, Hai-Peng Li and Tie-Qiao Wen
  
  https://doi.org/10.2174/092986612798472820
  More Less
  
  It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.
  
  Add to my favourites
  
  Email this

- Prediction of Protein Quaternary Structure with Feature Selection and Analysis Based on Protein Biological Features
  
  Authors: Le-Le Hu, Kai-Yan Feng, Lei Gu and Xiao-Jun Liu
  
  https://doi.org/10.2174/092986612798472866
  More Less
  
  Information of protein quaternary structure can help to understand the biological functions of proteins. Because wet-lab experiments are both time-consuming and costly, we adopt a novel computational approach to assign proteins into 10 kinds of quaternary structures. By coding each protein using its biochemical and physicochemical properties, feature selection was carried out using Incremental Feature Selection (IFS) method. The thus obtained optimal feature set consisted of 97 features, with which the prediction model was built. As a result, the overall prediction success rate is 74.90% evaluated by Jackknife test, much higher than the overall correct rate of a random guess 10% (1/10). The further feature analysis indicates that protein secondary structure is the most contributed feature in the prediction of protein quaternary structure.
  
  Add to my favourites
  
  Email this

- Prediction of Optimal pH and Temperature of Cellulases Using Neural Network
  
  Authors: Shao-Min Yan and Guang Wu
  
  https://doi.org/10.2174/092986612798472794
  More Less
  
  Cellulase is an important enzyme widely used in various industries, and now in fermentation of biomass into biofuels. Enzymatic function of cellulase is closely related to pH, temperature, substrate concentration, etc. For newly found cellulase, it would be more cost-effective to predict its optimal pH and temperature before conducting the costly experiments. In this study, we used a 20-2 feedforward backpropagation neural network to build the relationship between information obtained from primary structure of cellulase with optimal pH and temperature to predict the optimal pH and temperature in cellulases. The results show that the amino-acid distribution probability representing the primary structure of cellulase can predict both optimal pH and temperature, whereas various properties of amino acids related to the primary structure cannot do so.
  
  Add to my favourites
  
  Email this

- CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
  
  Authors: MarcinJ. Mizianty snm and Lukasz A. Kurgan
  
  https://doi.org/10.2174/092986612798472910
  More Less
  
  Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies.
  
  Add to my favourites
  
  Email this

- RSARF: Prediction of Residue Solvent Accessibility from Protein Sequence Using Random Forest Method
  
  Authors: Ganesan Pugalenthi, Krishna Kumar Kandaswamy, Kuo-Chen Chou, Saravanan Vivekanandan and Prasanna Kolatkar
  
  https://doi.org/10.2174/092986612798472875
  More Less
  
  Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.
  
  Add to my favourites
  
  Email this

- SCYPPred: A Web-Based Predictor of SNPs for Human Cytochrome P450
  
  Authors: Li Li, Dong-Qing Wei, Jing-Fang Wang and Kuo-Chen Chou
  
  https://doi.org/10.2174/092986612798472785
  More Less
  
  Human cytochrome P450(CYP 450) enzymes mediate over 60% of the phase I-dependent metabolism of clinical drugs. They are also known for the polymorphism functions that have significant impacts on the enzyme activities. In this study, a web-server called SCYPPred was developed for predicting human cytochrome P450 SNPs (Single Nucleotide Polymorphisms) based on the SVM flanking sequence method; SCYPPred can rapidly yield the desired results by using the amino acid sequences information alone. The web-server is accessible to the public at http://snppred.sjtu.edu.cn. Hopefully SCYPPred could be a useful bioinformatics tool for elucidating the mutation probability of a specific CYP450 enzyme.
  
  Add to my favourites
  
  Email this

- Prostate Cancer with Variants in CYP17 and UGT2B17 Genes: A Meta-Analysis
  
  Authors: Lai Cai, Wei Huang and Kuo-Chen Chou
  
  https://doi.org/10.2174/092986612798472848
  More Less
  
  Both CYP17 and UGT2B17 are suggested to be potential risk factors of prostate cancer (PCa). To date, many studies have evaluated the relationship between CYP17 T-34C and UGT2B17 Del polymorphisms and Prostate cancer with conflicting results. Here, we performed comprehensive meta-analyses of over 25 studies, including results from about 17,000 subjects on the association of CYP17 T-34C and UGT2B17 Del polymorphisms with Prostate cancer. Overall, no significant associations between CYP17 T-34C polymorphism and Prostate cancer risk were found for T versus C (P=0.63), TT versus CC (P=0.52), TT+TC versus CC (P=0.40) or TT versus TC+CC (P=0.98), though there was a marginally significant association with the UGT2B17 Del polymorphism under Del/Del versus Ins/Ins +Ins/Del (P=0.05). In an analysis of various subgroups, there were no substantially significant associations with the CYP17 T-34C polymorphism; while there was a significant association for the UGT2B17 Del/Del genotype in a subgroup of men-based controls (P<0.0001). The current meta-analysis results suggest that the CYP17 T-34C polymorphism may not be associated with Prostate cancer, while the UGT2B17 Del polymorphism may significantly contribute to prostate cancer susceptibility in men. These findings also support the idea that CYP17 has no significant effects on androgen levels, while UGT2B17 does.
  
  Add to my favourites
  
  Email this

- A Novel Sequence-Based Method for Phosphorylation Site Prediction with Feature Selection and Analysis
  
  Authors: Zhi-Song He, Xiao-He Shi, Xiang-Ying Kong, Yu-Bei Zhu and Kuo-Chen Chou
  
  https://doi.org/10.2174/092986612798472893
  More Less
  
  Phosphorylation is one of the most important post-translational modifications, and the identification of protein phosphorylation sites is particularly important for studying disease diagnosis. However, experimental detection of phosphorylation sites is labor intensive. It would be beneficial if computational methods are available to provide an extra reference for the phosphorylation sites. Here we developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction. Nearest Neighbor algorithm was employed as the prediction engine. The peptides around the phosphorylation sites with a fixed length of thirteen amino acid residues were extracted via a sliding window along the protein chains concerned. Each of such peptides was coded into a vector with 6,072 features, derived from Amino Acid Index (AAIndex) database, for the classification/detection. Incremental Feature Selection, a feature selection algorithm based on the Maximum Relevancy Minimum Redundancy (mRMR) method was used to select a compact feature set for a further improvement of the classification performance. Three predictors were established for identifying the three types of phosphorylation sites, achieving the overall accuracies of 66.64%, 66.11%% and 66.69%, respectively. These rates were obtained by rigorous jackknife cross-validation tests.
  
  Add to my favourites
  
  Email this

- Nucleosome Positioning Based on the Sequence Word Composition
  
  Authors: Xian-Fu Yi, Zhi-Song He, Kuo-Chen Chou and Xiang-Yin Kong
  
  https://doi.org/10.2174/092986612798472811
  More Less
  
  The DNA of all eukaryotic organisms is packaged into nucleosomes (a basic repeating unit of chromatin). A nucleosome consists of histone octamer wrapped by core DNA and linker histone H1 associated with linker DNA. It has profound effects on all DNA-dependent processes by affecting sequence accessibility. Understanding the factors that influence nucleosome positioning has great help to the study of genomic control mechanism. Among many determinants, the inherent DNA sequence has been suggested to have a dominant role in nucleosome positioning in vivo. Here, we used the method of minimum redundancy maximum relevance (mRMR) feature selection and the nearest neighbor algorithm (NNA) combined with the incremental feature selection (IFS) method to identify the most important sequence features that either favor or inhibit nucleosome positioning. We analyzed the words of 53,021 nucleosome DNA sequences and 50,299 linker DNA sequences of Saccharomyces cerevisiae. 32 important features were abstracted from 5,460 features, and the overall prediction accuracy through jackknife cross-validation test was 76.5%. Our results support that sequencedependent DNA flexibility plays an important role in positioning nucleosome core particles and that genome sequence facilitates the rapid nucleosome reassembly instead of nucleosome depletion. Besides, our results suggest that there exist some additional features playing a considerable role in discriminating nucleosome forming and inhibiting sequences. These results confirmed that the underlying DNA sequence plays a major role in nucleosome positioning.
  
  Add to my favourites
  
  Email this

- A Nearest Neighbor Algorithm Based Predictor for the Prediction of Enzyme - Small Molecule Interaction
  
  Authors: Le-Le Hu, Zhi-Song He, Xiao-He Shi, Xiang-Ying Kong, Hai-Peng Li and Wen-Cong Lu
  
  https://doi.org/10.2174/092986612798472938
  More Less
  
  It is of great use to find out and clear up the interactions between enzymes and small molecules, for understanding the molecular and cellular functions of organisms. In this study, we developed a novel method for the prediction of enzyme-small molecules interactions based on machine learning approach. The biochemical and physicochemical description of proteins and the functional group composition of small molecules are used for representing enzyme-small molecules pairs. Tested by jackknife cross-validation, our predictor achieved an overall accuracy of 87.47%, showing an acceptable efficiency. The 39 features selected by feature selection were analyzed for further understanding of enzyme-small molecule interactions.
  
  Add to my favourites
  
  Email this

- Analysis of Metabolic Pathway Using Hybrid Properties
  
  Authors: Lei Chen, Yu-Dong Cai, Xiao-He Shi and Tao Huang
  
  https://doi.org/10.2174/092986612798472857
  More Less
  
  Given a compounds-forming system, i.e., a system consisting of some compounds and their relationship, can it form a biologically meaningful pathway? It is a fundamental problem in systems biology. Nowadays, a lot of information on different organisms, at both genetic and metabolic levels, has been collected and stored in some specific databases. Based on these data, it is feasible to address such an essential problem. Metabolic pathway is one kind of compoundsforming systems and we analyzed them in yeast by extracting different (biological and graphic) features from each of the 13,736 compounds-forming systems, of which 136 are positive pathways, i.e., known metabolic pathway from KEGG; while 13,600 were negative. Each of these compounds-forming systems was represented by 144 features, of which 88 are graph features and 56 biological features. “Minimum Redundancy Maximum Relevance” and “Incremental Feature Selection” were utilized to analyze these features and 16 optimal features were selected as being able to predict a query compounds- forming system most successfully. It was found through Jackknife cross-validation that the overall success rate of identifying the positive pathways was 74.26%. It is anticipated that this novel approach and encouraging result may give meaningful illumination to investigate this important topic.
  
  Add to my favourites
  
  Email this

- Prediction of the Functional Roles of Small Molecules in Lipid Metabolism Based on Ensemble Learning
  
  Authors: Chun-Rong Peng, Wen-Cong Lu, Bing Niu, Ya-Jun Li and Le-Le Hu
  
  https://doi.org/10.2174/092986612798472802
  More Less
  
  As many diseases like high cholesterol are referred to lipid metabolism, studying the lipid metabolic pathway has a positive effect on finding the knowledge about interactions between different elements within high complex living systems. Here, we employed a typical ensemble learning method, Bagging learner, to study and predict the possible sub lipid metabolic pathway of small molecules based on physical and chemical features of the compounds. As a result, jackknife cross validation test and independent set test on the model reached 89.85% and 91.46%, respectively. Therefore, our predictor may be used for finding the new compounds which participate in lipid metabolic procedures.
  
  Add to my favourites
  
  Email this

- Selection of Reprogramming Factors of Induced Pluripotent Stem Cells Based on the Protein Interaction Network and Functional Profiles
  
  Authors: Tao Huang, Yu-Dong Cai, Lei Chen, Le-Le Hu, Xiang-Yin Kong, Yi-Xue Li and Kuo-Chen Chou
  
  https://doi.org/10.2174/092986612798472884
  More Less
  
  Induced pluripotent stem cells have displayed great potential in disease investigation and drug development applications. However, selection of reprogramming factors in each cell type or disease state is both expensive and time consuming. To deal with this kind of situation, a fast computational framework was developed by optimize the reprogramming factors via the protein interaction network and gene functional profiles. It can be used to select reprogramming factors from millions of possibilities. It is anticipated that the novel approach will become a very useful tool for both basic research and drug development.
  
  Add to my favourites
  
  Email this

- Improved Candidate Biomarker Detection Based on Mass Spectrometry Data Using the Hilbert-Huang Transform
  
  Authors: Li-Ching Wu, Ping-Heng Hsieh, Jorng-Tzong Horng, Yu-Jen Jou, Chia-Der Lin, Kuang-Fu Cheng, Cheng-Wen Lin and Shih-Yin Chen
  
  https://doi.org/10.2174/092986612798472929
  More Less
  
  Mass spectrometry biomarker discovery may assist patient's diagnosis in time and realize the characteristics of new diseases. Our previous work built a preprocess method called HHTmass which is capable of removing noise, but HHTmass only a proof of principle to be peak detectable and did not tested for peak reappearance rate and used on medical data. We developed a modified version of biomarker discovery method called Enhance HHTMass (E-HHTMass) for MALDI-TOF and SELDI-TOF mass spectrometry data which improved old HHTMass method by removing the interpolation and the biomarker discovery process. E-HHTMass integrates the preprocessing and classification functions to identify significant peaks. The results show that most known biomarker can be found and high peak appearance rate achieved comparing to MSCAP and old HHTMass2. E-HHTMass is able to adapt to spectra with a small increasing interval. In addition, new peaks are detected which can be potential biomarker after further validation.
  
  Add to my favourites
  
  Email this

Protein and Peptide Letters - Volume 19, Issue 1, 2012

Volume 19, Issue 1, 2012

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed