Current Proteomics - Volume 16, Issue 5, 2019
Volume 16, Issue 5, 2019
-
-
The Applications of Clustering Methods in Predicting Protein Functions
Authors: Weiyang Chen, Weiwei Li, Guohua Huang and Matthew FlavelBackground: The understanding of protein function is essential to the study of biological processes. However, the prediction of protein function has been a difficult task for bioinformatics to overcome. This has resulted in many scholars focusing on the development of computational methods to address this problem. Objective: In this review, we introduce the recently developed computational methods of protein function prediction and assess the validity of these methods. We then introduce the applications of clustering methods in predicting protein functions.
-
-
-
ML-rRBF-ECOC: A Multi-Label Learning Classifier for Predicting Protein Subcellular Localization with Both Single and Multiple Sites
Authors: Guo-Sheng Han and Zu-Guo YuBackground: The subcellular localization of a protein is closely related with its functions and interactions. More and more evidences show that proteins may simultaneously exist at, or move between, two or more different subcellular localizations. Therefore, predicting protein subcellular localization is an important but challenging problem. Observation: Most of the existing methods for predicting protein subcellular localization assume that a protein locates at a single site. Although a few methods have been proposed to deal with proteins with multiple sites, correlations between subcellular localization are not efficiently taken into account. In this paper, we propose an integrated method for predicting protein subcellular localizations with both single site and multiple sites. Methods: Firstly, we extend the Multi-Label Radial Basis Function (ML-RBF) method to the regularized version, and augment the first layer of ML-RBF to take local correlations between subcellular localization into account. Secondly, we embed the modified ML-RBF into a multi-label Error-Correcting Output Codes (ECOC) method in order to further consider the subcellular localization dependency. We name our method ML-rRBF-ECOC. Finally, the performance of ML-rRBF-ECOC is evaluated on three benchmark datasets. Results: The results demonstrate that ML-rRBF-ECOC has highly competitive performance to the related multi-label learning method and some state-of-the-art methods for predicting protein subcellular localizations with multiple sites. Considering dependency between subcellular localizations can contribute to the improvement of prediction performance. Conclusion: This also indicates that correlations between different subcellular localizations really exist. Our method at least plays a complementary role to existing methods for predicting protein subcellular localizations with multiple sites.
-
-
-
A Method for Analyzing Two-locus Epistasis of Complex Diseases based on Decision Tree and Mutual Entropy
Authors: Xiong Li, Hui Yang, Kaifu Wen, Xiaoming Zhong, Xuewen Xia, Liyue Liu and Dehao QinBackground: Epistasis makes complex diseases difficult to understand, especially when heterogeneity also exists. Heterogeneity of complex diseases makes the distribution of case population more confused. However, the traditional methods proposed to detect epistasis often ignore heterogeneity, resulting in low power of association studies. Methods: In this study, we firstly use rank information in the Classification Decision Tree and Mutual Entropy (CTME) to construct two different evaluation scores, namely multiple objectives. In addition, we improve the calculation of joint entropy between SNPs and disease label, which elevates the efficiency of CTME. Then, the ant colony algorithm is applied to search two-locus epistatic combination space. To handle the potential heterogeneity, all candidate two-locus SNPs are merged to recognize multiple different epistatic combinations. Finally, all these solutions are tested by χ2 test. Results and Conclusion: Experiments show that our method CTME improves the power of association study. More importantly, CTME also detects multiple epistatic SNPs contributing to heterogeneity. The experimental results show that CTME has advantages on power and efficiency.
-
-
-
A Novel Gene Selection Algorithm based on Sparse Representation and Minimum-redundancy Maximum-relevancy of Maximum Compatibility Center
Authors: Min Chen, Yi Zhang, Zejun Li, Ang Li, Wenhua Liu, Liubin Liu and Zheng ChenBackground: Tumor classification is important for accurate diagnosis and personalized treatment and has recently received great attention. Analysis of gene expression profile has shown relevant biological significance and thus has become a research hotspot and a new challenge for bio-data mining. In the research methods, some algorithms can identify few genes but with great time complexity, some algorithms can get small time complex methods but with unsatisfactory classification accuracy, this article proposed a new extraction method for gene expression profile. Methods: In this paper, we propose a classification method for tumor subtypes based on the Minimum- Redundancy Maximum-Relevancy (MRMR) of maximum compatibility center. First, we performed a fuzzy clustering of gene expression profiles based on the compatibility relation. Next, we used the sparse representation coefficient to assess the importance of the gene for the category, extracted the top-ranked genes, and removed the uncorrelated genes. Finally, the MRMR search strategy was used to select the characteristic gene, reject the redundant gene, and obtain the final subset of characteristic genes. Results: Our method and four others were tested on four different datasets to verify its effectiveness. Results show that the classification accuracy and standard deviation of our method are better than those of other methods. Conclusion: Our proposed method is robust, adaptable, and superior in classification. This method can help us discover the susceptibility genes associated with complex diseases and understand the interaction between these genes. Our technique provides a new way of thinking and is important to understand the pathogenesis of complex diseases and prevent diseases, diagnosis and treatment.
-
-
-
A Binary Classifier for the Prediction of EC Numbers of Enzymes
More LessBackground: Identification of Enzyme Commission (EC) number of enzymes is quite important for understanding the metabolic processes that produce enough energy to sustain life. Previous studies mainly focused on predicting six main functional classes or sub-functional classes, i.e., the first two digits of the EC number. Objective: In this study, a binary classifier was proposed to identify the full EC number (four digits) of enzymes. Methods: Enzymes and their known EC numbers were paired as positive samples and negative samples were randomly produced that were as many as positive samples. The associations between any two samples were evaluated by integrating the linkages between enzymes and EC numbers. The classic machining learning algorithm, Support Vector Machine (SVM), was adopted as the prediction engine. Results: The five-fold cross-validation test on five datasets indicated that the overall accuracy, Matthews correlation coefficient and F1-measure were about 0.786, 0.576 and 0.771, respectively, suggesting the utility of the proposed classifier. In addition, the effectiveness of the classifier was elaborated by comparing it with other classifiers that were based on other classic machine learning algorithms. Conclusion: The proposed classifier was quite effective for prediction of EC number of enzymes and was specially designed for dealing with the problem addressed in this study by testing it on five datasets containing randomly produced samples.
-
-
-
Pathogenic Genes Selection Model of Genetic Disease based on Network Motifs Slicing Feedback
Authors: Shengli Zhang, Zekun Tong, Haoyu Yin and Yifan FengBackground: Finding the pathogenic gene is very important for understanding the pathogenesis of the disease, locating effective drug targets and improving the clinical level of medical treatment. However, the existing methods for finding the pathogenic genes still have limitations, for instance the computational complexity is high, and the combination of multiple genes and pathways has not been considered to search for highly related pathogenic genes and so on. Methods: We propose a pathogenic genes selection model of genetic disease based on Network Motifs Slicing Feedback (NMSF). We find a point set which makes the conductivity of the motif minimum then use it to substitute for the original gene pathway network. Based on the NMSF, we propose a new pathogenic genes selection model to expand pathogenic gene set. Results: According to the gene set we have obtained, selection of key genes will be more accurate and convincing. Finally, we use our model to screen the pathogenic genes and key pathways of liver cancer and lung cancer, and compare the results with the existing methods. Conclusion: The main contribution is to provide a method called NMSF which simplifies the gene pathway network to make the selection of pathogenic gene simple and feasible. The fact shows our result has a wide coverage and high accuracy and our model has good expeditiousness and robustness.
-
-
-
Protein Subcellular Localization Prediction based on PSI-BLAST Profile and Principal Component Analysis
Authors: Yuhua Yao, Manzhi Li, Huimin Xu, Shoujiang Yan, Pingan He, Qi Dai, Zhaohui Qi and Bo LiaoBackground: Prediction of protein subcellular location is a meaningful task which attracts much attention in recent years. Particularly, the number of new protein sequences yielded by the highthroughput sequencing technology in the post genomic era has increased explosively. Objective: Protein subcellular localization prediction based solely on sequence data remains to be a challenging problem of computational biology. Methods: In this paper, three sets of evolutionary features are derived from the position-specific scoring matrix, which has shown great potential in other bioinformatics problems. A fusion model is built up by the optimal parameters combination. Finally, principal component analysis and support vector machine classifier is applied to predict protein subcellular localization on NNPSL dataset and Cell- PLoc 2.0 dataset. Results: Our experimental results show that the proposed method remarkably improved the prediction accuracy, and the features derived from PSI-BLAST profile only are appropriate for protein subcellular localization prediction.
-
-
-
Identification of Novel Breast Cancer Genes based on Gene Expression Profiles and PPI Data
Authors: Cheng-Wen Yang, Huan-Huan Cao, Yu Guo, Yuan-Ming Feng and Ning ZhangBackground: Breast cancer is one of the most common malignancies, and a threat to female health all over the world. However, the molecular mechanism of breast cancer has not been fully discovered yet. Objective: It is crucial to identify breast cancer-related genes, which could provide new biomarker for breast cancer diagnosis as well as potential treatment targets. Methods: Here we used the minimum redundancy-maximum relevance (mRMR) method to select significant genes, then mapped the transcripts of the genes on the Protein-Protein Interaction (PPI) network and traced the shortest path between each pair of two proteins. Results: As a result, we identified 24 breast cancer-related genes whose betweenness were over 700. The GO enrichment analysis indicated that the transcription and oxygen level are very important in breast cancer. And the pathway analysis indicated that most of these 24 genes are enriched in prostate cancer, endocrine resistance, and pathways in cancer. Conclusion: We hope these 24 genes might be useful for diagnosis, prognosis and treatment for breast cancer.
-
-
-
An Improved Scatter Search Algorithm for Parameter Estimation in Large-Scale Kinetic Models of Biochemical Systems
Authors: Muhammad A. Remli, Mohd Saberi Mohamad, Safaai Deris, Richard Sinnott and Suhaimi NapisBackground: Mathematical models play a central role in facilitating researchers to better understand and comprehensively analyze various processes in biochemical systems. Their usage is beneficial in metabolic engineering as they help predict and improve desired products. However, one of the primary challenges in model building is parameter estimation. It is the process to find nearoptimal values of kinetic parameters which may culminate in the best fit of model prediction to experimental data. Methods: This paper proposes an improved scatter search algorithm to address the challenging parameter estimation problem. The improved algorithm is based on hybridization of quasi opposition-based learning in enhanced scatter search (QOBLESS) method. The algorithm is tested using a large-scale metabolic model of Chinese Hamster Ovary (CHO) cells. Results: The experimental result shows that the proposed algorithm performs better than other algorithms in terms of convergence speed and the minimum value of the objective function (loglikelihood). The estimated parameters from the experiment produce a better model by means of obtaining a reasonable good fit of model prediction to the experimental data. Conclusion: The kinetic parameters’ value obtained from our work was able to result in a reasonable best fit of model prediction to the experimental data, which contributes to a better understanding and produced more accurate model. Based on the results, the QOBLESS method can be used as an efficient parameter estimation method in large-scale kinetic model building.
-
Volumes & issues
-
Volume 21 (2024)
-
Volume 20 (2023)
-
Volume 19 (2022)
-
Volume 18 (2021)
-
Volume 17 (2020)
-
Volume 16 (2019)
-
Volume 15 (2018)
-
Volume 14 (2017)
-
Volume 13 (2016)
-
Volume 12 (2015)
-
Volume 11 (2014)
-
Volume 10 (2013)
-
Volume 9 (2012)
-
Volume 8 (2011)
-
Volume 7 (2010)
-
Volume 6 (2009)
-
Volume 5 (2008)
-
Volume 4 (2007)
-
Volume 3 (2006)
-
Volume 2 (2005)
-
Volume 1 (2004)
Most Read This Month
