Protein and Peptide Letters - Volume 27, Issue 4, 2020
Volume 27, Issue 4, 2020
-
-
Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins
More LessPhage Virion Proteins (PVP) are essential materials of bacteriophage, which participate in a series of biological processes. Accurate identification of phage virion proteins is helpful to understand the mechanism of interaction between the phage and its host bacteria. Since experimental method is labor intensive and time-consuming, in the past few years, many computational approaches have been proposed to identify phage virion proteins. In order to facilitate researchers to select appropriate methods, it is necessary to give a comprehensive review and comparison on existing computational methods on identifying phage virion proteins. In this review, we summarized the existing computational methods for identifying phage virion proteins and also assessed their performances on an independent dataset. Finally, challenges and future perspectives for identifying phage virion proteins were presented. Taken together, we hope that this review could provide clues to researches on the study of phage virion proteins.
-
-
-
Analysis of Protein-Protein Interaction Networks through Computational Approaches
Authors: Ying Han, Liang Cheng and Weiju SunThe interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein−protein interaction prediction.
-
-
-
Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions
Authors: WeiXia Xie and Yong E. FengBackground: Intrinsically disordered proteins lack a well-defined three dimensional structure under physiological conditions while possessing the essential biological functions. They take part in various physiological processes such as signal transduction, transcription and posttranslational modifications and etc. The disordered regions are the main functional sites for intrinsically disordered proteins. Therefore, the research of the disordered regions has become a hot issue. Objective: In this paper, our motivation is to analysis of the features of disordered regions with different molecular functions and predict of different disordered regions using valid features. Methods: In this article, according to the different molecular function, we firstly divided intrinsically disordered proteins into six classes in DisProt database. Then, we extracted four features using bioinformatics methods, namely, Amino Acid Index (AAIndex), codon frequency (Codon), three kinds of protein secondary structure compositions (3PSS) and Chemical Shifts (CSs), and used these features to predict the disordered regions of the different functions by Support Vector Machine (SVM). Results: The best overall accuracy was 99.29% using the chemical shift (CSs) as feature. In feature fusion, the overall accuracy can reach 88.70% by using CSs+AAIndex as features. The overall accuracy was up to 86.09% by using CSs+AAIndex+Codon+3PSS as features. Conclusion: We predicted and analyzed the disordered regions based on the molecular functions. The results showed that the prediction performance can be improved by adding chemical shifts and AAIndex as features, especially chemical shifts. Moreover, the chemical shift was the most effective feature in the prediction. We hoped that our results will be constructive for the study of intrinsically disordered proteins.
-
-
-
A Novel Amino Acid Properties Selection Method for Protein Fold Classification
Authors: Lichao Zhang and Liang KongBackground: Amino acid physicochemical properties encoded in protein primary structure play a crucial role in protein folding. However, it is not yet clear which of the properties are the most suitable for protein fold classification. Objective: To avoid exhaustively searching the total properties space, an amino acid properties selection method was proposed in this study to rapidly obtain a suitable properties combination for protein fold classification. Methods: The proposed amino acid properties selection method was based on sequential floating forward selection strategy. Beginning with an empty set, variable number of features were added iteratively until achieving the iteration termination condition. Results: The experimental results indicate that the proposed method improved prediction accuracies by 0.26-5% on a widely used benchmark dataset with appropriately selected amino acid properties. Conclusion: The proposed properties selection method can be extended to other biomolecule property related classification problems in bioinformatics.
-
-
-
SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically
Authors: Qing Zhan, Yilei Fu, Qinghua Jiang, Bo Liu, Jiajie Peng and Yadong WangBackground: Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. Objective: In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. Methods: Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. Results: We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. Conclusion: The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
-
-
-
The Influences of Palindromes in mRNA on Protein Folding Rates
Authors: Ruifang Li, Hong Li, Sarula Yang and Xue FengBackground: It is currently believed that protein folding rates are influenced by protein structure, environment and temperature, amino acid sequence and so on. We have been working for long to determine whether and in what ways mRNA affects the protein folding rate. A large number of palindromes aroused our attention in our previous research. Whether these palindromes do have important influences on protein folding rates and what’s the mechanism? Very few related studies are focused on these problems. Objective: In this article, our motivation is to find out if palindromes have important influences on protein folding rates and what’s the mechanism. Methods: In this article, the parameters of the palindromes were defined and calculated, the linear regression analysis between the values of each parameter and the experimental protein folding rates were done. Furthermore, to compare the results of different kinds of proteins, proteins were classified into the two-state proteins and the multi-state proteins. For the two kinds of proteins, the above linear regression analysis were performed respectively. Results: Protein folding rates were negatively correlated to the palindrome frequencies for all proteins. An extremely significant negative linear correlation appeared in the relationship between palindrome densities and protein folding rates. And the repeatedly used bases by different palindromes simultaneously have an important effect on the relationship between palindrome density and protein folding rate. Conclusion: The palindromes have important influences on protein folding rates, and the repeatedly used bases in different palindromes simultaneously play a key role in influencing the protein folding rates.
-
-
-
A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology
Authors: Xuan Xiao, Wei-Jie Chen and Wang-Ren QiuBackground: The information of quaternary structure attributes of proteins is very important because it is closely related to the biological functions of proteins. With the rapid development of new generation sequencing technology, we are facing a challenge: how to automatically identify the four-level attributes of new polypeptide chains according to their sequence information (i.e., whether they are formed as just as a monomer, or as a hetero-oligomer, or a homo-oligomer). Objective: In this article, our goal is to find a new way to represent protein sequences, thereby improving the prediction rate of protein quaternary structure. Methods: In this article, we developed a prediction system for protein quaternary structural type in which a protein sequence was expressed by combining the Pfam functional-domain and gene ontology. turn protein features into digital sequences, and complete the prediction of quaternary structure through specific machine learning algorithms and verification algorithm. Results: Our data set contains 5495 protein samples. Through the method provided in this paper, we classify proteins into monomer, or as a hetero-oligomer, or a homo-oligomer, and the prediction rate is 74.38%, which is 3.24% higher than that of previous studies. Through this new feature extraction method, we can further classify the four-level structure of proteins, and the results are also correspondingly improved. Conclusion: After the applying the new prediction system, compared with the previous results, we have successfully improved the prediction rate. We have reason to believe that the feature extraction method in this paper has better practicability and can be used as a reference for other protein classification problems.
-
-
-
An Effective Cumulative Torsion Angles Model for Prediction of Protein Folding Rates
Authors: Yanru Li, Ying Zhang and Jun LvBackground: Protein folding rate is mainly determined by the size of the conformational space to search, which in turn is dictated by factors such as size, structure and amino-acid sequence in a protein. It is important to integrate these factors effectively to form a more precisely description of conformation space. But there is no general paradigm to answer this question except some intuitions and empirical rules. Therefore, at the present stage, predictions of the folding rate can be improved through finding new factors, and some insights are given to the above question. Objective: Its purpose is to propose a new parameter that can describe the size of the conformational space to improve the prediction accuracy of protein folding rate. Methods: Based on the optimal set of amino acids in a protein, an effective cumulative backbone torsion angles (CBTAeff) was proposed to describe the size of the conformational space. Linear regression model was used to predict protein folding rate with CBTAeff as a parameter. The degree of correlation was described by the coefficient of determination and the mean absolute error MAE between the predicted folding rates and experimental observations. Results: It achieved a high correlation (with the coefficient of determination of 0.70 and MAE of 1.88) between the logarithm of folding rates and the (CBTAeff)0.5 with experimental over 112 twoand multi-state folding proteins. Conclusion: The remarkable performance of our simplistic model demonstrates that CBTA based on optimal set was the major determinants of the conformation space of natural proteins.
-
-
-
A Computational Method for the Identification of Endolysins and Autolysins
Authors: Lei Xu, Guangmin Liang, Baowen Chen, Xu Tan, Huaikun Xiang and Changrui LiaoBackground: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Methods: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.
-
-
-
NeuroCS: A Tool to Predict Cleavage Sites of Neuropeptide Precursors
Authors: Ying Wang, Juanjuan Kang, Ning Li, Yuwei Zhou, Zhongjie Tang, Bifang He and Jian HuangBackground: Neuropeptides are a class of bioactive peptides produced from neuropeptide precursors through a series of extremely complex processes, mediating neuronal regulations in many aspects. Accurate identification of cleavage sites of neuropeptide precursors is of great significance for the development of neuroscience and brain science. Objective: With the explosive growth of neuropeptide precursor data, it is pretty much needed to develop bioinformatics methods for predicting neuropeptide precursors’ cleavage sites quickly and efficiently. Methods: We started with processing the neuropeptide precursor data from SwissProt and NueoPedia into two sets of data, training dataset and testing dataset. Subsequently, six feature extraction schemes were applied to generate different feature sets and then feature selection methods were used to find the optimal feature subset of each. Thereafter the support vector machine was utilized to build models for different feature types. Finally, the performance of models were evaluated with the independent testing dataset. Results: Six models are built through support vector machine. Among them the enhanced amino acid composition-based model reaches the highest accuracy of 91.60% in the 5-fold cross validation. When evaluated with independent testing dataset, it also showed an excellent performance with a high accuracy of 90.37% and Area under Receiver Operating Characteristic curve up to 0.9576. Conclusion: The performance of the developed model was decent. Moreover, for users’ convenience, an online web server called NeuroCS is built, which is freely available at http://i.uestc.edu.cn/NeuroCS/dist/index.html#/. NeuroCS can be used to predict neuropeptide precursors’ cleavage sites effectively.
-
Volumes & issues
-
Volume 32 (2025)
-
Volume 31 (2024)
-
Volume 30 (2023)
-
Volume 29 (2022)
-
Volume 28 (2021)
-
Volume 27 (2020)
-
Volume 26 (2019)
-
Volume 25 (2018)
-
Volume 24 (2017)
-
Volume 23 (2016)
-
Volume 22 (2015)
-
Volume 21 (2014)
-
Volume 20 (2013)
-
Volume 19 (2012)
-
Volume 18 (2011)
-
Volume 17 (2010)
-
Volume 16 (2009)
-
Volume 15 (2008)
-
Volume 14 (2007)
-
Volume 13 (2006)
-
Volume 12 (2005)
-
Volume 11 (2004)
-
Volume 10 (2003)
-
Volume 9 (2002)
-
Volume 8 (2001)
Most Read This Month
