Combinatorial Chemistry & High Throughput Screening - Volume 20, Issue 7, 2017
Volume 20, Issue 7, 2017
- 
- 
Recognizing and Predicting Thioether Bridges Formed by Lanthionine and β-Methyllanthionine in Lantibiotics Using a Random Forest Approach with Feature SelectionMore LessAuthors: ShaoPeng Wang, Yu-Hang Zhang, Ning Zhang, Lei Chen, Tao Huang and Yu-Dong CaiBackground: Lantibiotics, which are usually produced from Gram-positive bacteria, are regarded as one type of special bacteriocins. Lantibiotics have unsaturated amino acid residues formed by lanthionine (Lan) and β-methyllanthionine (MeLan) residues as a ring structure in the peptide. They are derived from the serine and threonine residues and are essential to preventing the growth of other similar strains. Method: In this pioneering work, we firstly proposed a machine learning method to recognize and predict the Lan and MeLan residues in the protein sequences of lantibiotics. We adopted maximal relevance minimal redundancy (mRMR) and incremental feature selection (IFS) to select optimal features and random forest (RF) to build classifiers determining the Lan and MeLan residues. A 10- fold cross-validation test was performed on the classifiers to evaluate their predicted performances. Results: The Matthew's correlation coefficient (MCC) values for predicting the Lan and MeLan residues were 0.813 and 0.769, respectively. Our constructed RF classifiers were shown to have a reliable ability to recognize Lan and MeLan residues from lantibiotic sequences. Furthermore, three other methods, Dagging, the nearest neighbor algorithm (NNA) and sequential minimal optimization (SMO) were also utilized to build classifiers to predict Lan and MeLan residues for comparison. Analysis was also performed on the optimal features, and the relationships between the optimal features and their biological importance were provided. Conclusion: The selected optimal features and analysis in this work will contribute to a better understanding of the sequence and structural features around the Lan and MeLan residues. It could provide useful information and practical suggestions for experimental and computational methods toward exploring the biological features of such special residues in lantibiotics. 
 
- 
- 
- 
Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning MethodMore LessAuthors: Zhijun Liao, Xinrui Wang, Xingyong Chen and Quan ZouAim and Objective: The Krüppel-like factors (KLFs) are a family of containing Zn finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is predicted by bioinformatics. KLFs possess various physiological function involving in a number of cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by machine learning methods. Material and Method: The protein sequences of KLFs and non-KLFs were searched from UniProt and randomly separate them into training dataset(containing positive and negative sequences) and test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residue of three zinc fingers. Results: The classifier model from training dataset were well constructed, and the highest specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus, and many conserved amino acid residues including Cysteine and Histidine, and the span and interval between them were consistent in the three ZF domains. Conclusion: Two classification models for KLFs prediction have been built by novel machine learning methods. 
 
- 
- 
- 
Identification of Cell Cycle-Regulated Genes by Convolutional Neural NetworkMore LessAuthors: Chenglin Liu, Peng Cui and Tao HuangBackground: The cell cycle-regulated genes express periodically with the cell cycle stages, and the identification and study of these genes can provide a deep understanding of the cell cycle process. Large false positives and low overlaps are big problems in cell cycle-regulated gene detection. Methods: Here, a computational framework called DLGene was proposed for cell cycle-regulated gene detection. It is based on the convolutional neural network, a deep learning algorithm representing raw form of data pattern without assumption of their distribution. First, the expression data was transformed to categorical state data to denote the changing state of gene expression, and four different expression patterns were revealed for the reported cell cycle-regulated genes. Then, DLGene was applied to discriminate the non-cell cycle gene and the four subtypes of cell cycle genes. Its performances were compared with six traditional machine learning methods. At last, the biological functions of representative cell cycle genes for each subtype are analyzed. Results: Our method showed better and more balanced performance of sensitivity and specificity comparing to other machine learning algorithms. The cell cycle genes had very different expression pattern with non-cell cycle genes and among the cell-cycle genes, there were four subtypes. Our method not only detects the cell cycle genes, but also describes its expression pattern, such as when its highest expression level is reached and how it changes with time. For each type, we analyzed the biological functions of the representative genes and such results provided novel insight to the cell cycle mechanisms. 
 
- 
- 
- 
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural ClassMore LessAim and Objective: Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. Material and Methods: In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Results: Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. Conclusion: The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. 
 
- 
- 
- 
Prediction of Lysine Malonylation Sites Based on Pseudo Amino AcidMore LessAuthors: Qilin Xiang, Kaiyan Feng, Bo Liao, Yuewu Liu and Guohua HuangAim and Objective: Protein malonylation is a newly discovered post-translational modification. Malonylation is known to closely be associated with type 2 diabetes and to play its regulatory role in fatty acid oxidation and the associated genetic disease. Identifying protein malonylations might lay a solid foundation to explore malonylation function. Due to the limitations of experimental techniques, it is a great challenge to fast and accurately identify malonylation sites. Methods: We proposed a computational method to predict malonylation sites and to analyze malonylation pattern. We firstly extracted protein segments so that the lysine is at the center of each segment. Then, each segment was encoded by the pseudo amino acid compositions. The support vector machine classifier trained by a training dataset was built to distinguish malonylation sites from non-malonylation ones. Results: The leave-one-out test on the training dataset reached the accuracy of 0.7733, and the independent test on the testing dataset got 0.8889. Furthermore, the classifier also successfully identified 144 of 160 putative malonylation sites. Analyses on the differences between malonylation and non-malonylation segments implicated that lysine malonylation should follow a specific pattern, e.g. lysine with its neighbors being Glycine and Alanine might be more likely to be malonylated. Therefore, the proposed method is expected to be a promising tool to identify malonylation sites. 
 
- 
- 
- 
Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection MethodMore LessAuthors: JianZhao Gao, Xue-Wen Tao, Jia Zhao, Yuan-Ming Feng, Yu-Dong Cai and Ning ZhangAim and Objective: Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Material and Methods: Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Results: Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. Conclusion: We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. 
 
- 
- 
- 
Prediction of the Ebola Virus Infection Related Human Genes Using Protein-Protein Interaction NetworkMore LessAuthors: HuanHuan Cao, YuHang Zhang, Jia Zhao, Liucun Zhu, Yi Wang, JiaRui Li, Yuan-Ming Feng and Ning ZhangBackground: Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. Objective: According to the “guilt by association” (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. Conclusion: We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. 
 
- 
- 
- 
The Safety of Ovarian Preservation in Stage I Endometrial Endometrioid Adenocarcinoma Based on Propensity Score MatchingMore LessBackground: Most patients with early stage endometrial endometrioid adenocarcinoma (EEAC) are treated with hysterectomy and bilateral oophorectomy. But this surgical menopause leads to long-term sequelae for premenopausal women, especially for young women of childbearing age. This population-based study was to evaluate the safety of ovarian preservation in young women with stage I EEAC. Methods: Patients of age 50 or younger with stage I EEAC were explored from the Surveillance, Epidemiology and End Results program database during 2004 to 2013. Propensity score matching was used to randomize the data set and reduce the selection biases of doctors. Univariate analysis and multivariate cox proportional hazards model were utilized to estimate the safety of ovarian preservation. Results: A total of 7183 patients were identified, and ovarian preservation was performed in 863 (12 %) patients. Compared with women treated with oophorectomy, patients with ovarian preservation significantly tend to be younger at diagnosis (P-value < 0.001) and more likely diagnosed as stage IA EEAC, to have better differentiated tumor tissues and smaller tumors, as well as less likely to undergo radiation and lymphadenectomy. 863 patients treated with oophorectomy were selected by propensity score matching. After propensity score matching, the differences of all characteristics between ovarian preservation and oophorectomy were not significant and potential confounders in the two groups decreased. In univariate analysis of matched population, ovarian preservation had no effect on overall (P-value=0.928) and cancer-specific (P-value=0.390) mortality. In propensityadjusted multivariate analysis, ovarian preservation was not significantly associated with overall (HR=0.69, 95%CI=0.41-1.68, P-value=0.611) and cancer-specific (HR=1.65, 95%CI=0.54-5.06, Pvalue= 0.379) survival. Conclusion: Ovarian preservation is safe for young women with stage I EEAC, which is not significantly associated with overall and cancer-specific mortality. 
 
- 
Volumes & issues
- 
Volume 28 (2025)
- 
Volume 27 (2024)
- 
Volume 26 (2023)
- 
Volume 25 (2022)
- 
Volume 24 (2021)
- 
Volume 23 (2020)
- 
Volume 22 (2019)
- 
Volume 21 (2018)
- 
Volume 20 (2017)
- 
Volume 19 (2016)
- 
Volume 18 (2015)
- 
Volume 17 (2014)
- 
Volume 16 (2013)
- 
Volume 15 (2012)
- 
Volume 14 (2011)
- 
Volume 13 (2010)
- 
Volume 12 (2009)
- 
Volume 11 (2008)
- 
Volume 10 (2007)
- 
Volume 9 (2006)
- 
Volume 8 (2005)
- 
Volume 7 (2004)
- 
Volume 6 (2003)
- 
Volume 5 (2002)
- 
Volume 4 (2001)
- 
Volume 3 (2000)
Most Read This Month
 
Most Cited Most Cited RSS feed
- 
- 
Label-Free Detection of Biomolecular Interactions Using BioLayer Interferometry for Kinetic CharacterizationAuthors: Joy Concepcion, Krista Witte, Charles Wartchow, Sae Choo, Danfeng Yao, Henrik Persson, Jing Wei, Pu Li, Bettina Heidecker, Weilei Ma, Ram Varma, Lian-She Zhao, Donald Perillat, Greg Carricato, Michael Recknor, Kevin Du, Huddee Ho, Tim Ellis, Juan Gamez, Michael Howes, Janette Phi-Wilson, Scott Lockard, Robert Zuk and Hong Tan
 
- 
- 
- More Less
