Current Bioinformatics - Volume 15, Issue 6, 2020
Volume 15, Issue 6, 2020
-
-
Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization
Authors: Yunyun Liang and Shengli ZhangBackground: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.
-
-
-
Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition
Authors: Liangwei Yang, Hui Gao, Keyu Wu, Haotian Zhang, Changyu Li and Lixia TangBackground: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy. Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool. Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins. Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain. Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.
-
-
-
Improving Multi-type Gram-negative Bacterial Secreted Protein Prediction via Protein Evolutionary Information and Feature Ranking
Authors: Liang Kong, Lichao Zhang and Shiqian HeBackground: Gram-negative bacteria interact with their environment by secreting a wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm to the extracellular space. Determining the types of secreted proteins is beneficial for further research on secreted proteins and secretion systems. Objective: As an essential alternative for experimental methods, an accurate machine learningbased multi-type Gram-negative bacterial secreted protein prediction method was proposed in this study. Methods: The main contribution is combining auto-cross-correlation analysis and feature ranking technology to build an effective support vector machine-based multi-type Gram-negative bacterial secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture evolutionary correlation information between amino acid pairs along protein sequence from position specific scoring matrices. Feature ranking technique was used to analyze and select the most informative features for building prediction model. Results: Several kinds of prediction accuracies obtained by independent dataset test are reported on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed method improves overall accuracies by 2.91% and 2.25%, respectively. Conclusion: Our study will provide an important guide to utilize protein evolutionary information for further research on bacterial secreted proteins.
-
-
-
An In Silico Immunogenicity Analysis for PbHRH: An Antiangiogenic Peptibody by Fusing HRH Peptide and Human IgG1 Fc Fragment
Authors: Lin Ning, Jiang Huang, Bifang He and Juanjuan KangBackground: Peptibodies, the hybrid of peptides and antibodies, represent a novel strategy in therapeutic use. Previously, we computationally designed an antiangiogenic peptibody PbHRH, which fused the HRH peptide with angiogenesis-suppressing effect and human IgG1 Fc fragment using Romiplostim as template. Molecular modeling and simulation results indicated that it would be a potential drug for the treatment of those angiogenesis related pathological disorders. However, its immunogenicity is not known. Methods: Several bioinformatics tools are used to predict the potential epitopes for the evaluation of the immunogenicity of PbHRH. Romiplostim is set as the control. IEDB-recommended method is used in MHC-I and MHC-II binding prediction, and the IEDB web server (http://tools.iedb.org/immunogenicity/) is used to determine the MHC-I immunogenicity of each peptide. Results: In this work, some peptides are predicted to have the potential ability to bind to MHC-I and MHC-II molecules both in PbHRH and Romiplostim as the potential epitopes. Most of these selected peptides are exactly the same. Allele frequency analysis shows a low population distribution. Combined with the analysis of MHC-I immunogenicity prediction, both HRH and PbHRH show low immunogenicity. Conclusions: Some potential epitopes which could bind to both MHC-I and MHC-II molecules are predicted using bioinformatics tools. The comparative analysis with Romiplostim and the results of MHC-I immunogenicity prediction indicate the low immunogenicity of both HRH and PbHRH. Thus, we form a strategy to evaluate the immunogenicity of peptibodies for the future improvement.
-
-
-
Predicting LncRNA Subcellular Localization Using Unbalanced Pseudo-k Nucleotide Compositions
Authors: Xiao-Fei Yang, Yuan-Ke Zhou, Lin Zhang, Yang Gao and Pu-Feng DuBackground: Long non-coding RNAs (lncRNAs) are transcripts with a length more than 200 nucleotides, functioning in the regulation of gene expression. More evidence has shown that the biological functions of lncRNAs are intimately related to their subcellular localizations. Therefore, it is very important to confirm the lncRNA subcellular localization. Methods: In this paper, we proposed a novel method to predict the subcellular localization of lncRNAs. To more comprehensively utilize lncRNA sequence information, we exploited both kmer nucleotide composition and sequence order correlated factors of lncRNA to formulate lncRNA sequences. Meanwhile, a feature selection technique which was based on the Analysis Of Variance (ANOVA) was applied to obtain the optimal feature subset. Finally, we used the support vector machine (SVM) to perform the prediction. Results: The AUC value of the proposed method can reach 0.9695, which indicated the proposed predictor is an efficient and reliable tool for determining lncRNA subcellular localization. Furthermore, the predictor can reach the maximum overall accuracy of 90.37% in leave-one-out cross validation, which clearly outperforms the existing state-of- the-art method. Conclusion: It is demonstrated that the proposed predictor is feasible and powerful for the prediction of lncRNA subcellular. To facilitate subsequent genetic sequence research, we shared the source code at https://github.com/NicoleYXF/lncRNA.
-
-
-
Using the Chou’s Pseudo Component to Predict the ncRNA Locations Based on the Improved K-Nearest Neighbor (iKNN) Classifier
Authors: Chengyan Wu, Qianzhong Li, Ru Xing and Guo-Liang FanBackground: The non-coding RNA identification at the organelle genome level is a challenging task. In our previous work, an ncRNA dataset with less than 80% sequence identity was built, and a method incorporating an increment of diversity combining with support vector machine method was proposed. Objective: Based on the ncRNA_361 dataset, a novel decision-making method-an improved KNN (iKNN) classifier was proposed. Methods: In this paper, based on the iKNN algorithm, the physicochemical features of nucleotides, the degeneracy of genetic codons, and topological secondary structure were selected to represent the effective ncRNA characters. Then, the incremental feature selection method was utilized to optimize the feature set. Results: The results of iKNN indicated that the decision-making method of mean value is distinctly superior to the traditional decision-making method of majority vote the Increment of Diversity Combining Support Vector Machine (ID-SVM). The iKNN algorithm achieved an overall accuracy of 97.368% in the jackknife test, when k=3. Conclusion: It should be noted that the triplets of the structure-sequence mode under reading frames not only contains the entire sequence information but also reflects whether the base was paired or not, and the secondary structural topological parameters further describe the ncRNA secondary structure on the spatial level. The ncRNA dataset and the iKNN classifier are freely available at http://202.207.14.87:8032/fuwu/iKNN/index.asp.
-
-
-
An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers
Authors: Tianjiao Zhang, Rongjie Wang, Qinghua Jiang and Yadong WangBackground: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition. Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers. Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature. Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results. Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.
-
-
-
The Regulation of Target Genes by Co-occupancy of Transcription Factors, c-Myc and Mxi1 with Max in the Mouse Cell Line
Authors: Hui Wang, Yuan Liu, Hua Guan and Guo-Liang FanBackground: The regulatory function of transcription factors on genes is not only related to the location of binding genes and its related functions, but is also related to the methods of binding. Objective: It is necessary to study the regulation effects in different binding methods on target genes. Methods: In this study, we provided a reliable theoretical basis for studying gene expression regulation of co-binding transcription factors and further revealed the specific regulation of transcription factor co-binding in cancer cells. Results: Transcription factors tend to combine with other transcription factors in the regulatory region to form a competitive or synergistic relationship to regulate target genes accurately. Conclusion: We found that up-regulated genes in cancer cells were involved in the regulation of their own immune system related to the normal cells.
-
-
-
Analysis of the Epigenetic Signature of Cell Reprogramming by Computational DNA Methylation Profiles
Authors: Yongchun Zuo, Mingmin Song, Hanshuang Li, Xing Chen, Pengbo Cao, Lei Zheng and Guifang CaoBackground: DNA methylation plays an important role in the reprogramming process. Understanding the underlying molecular mechanism of reprogramming is crucial for answering fundamental questions regarding the transition of cell identity. Methods: In this study, based on the genome-wide DNA methylation data from different cell lines, comparative methylation profiles were proposed to identify the epigenetic signature of cell reprogramming. Results: The density profile of CpG methylation showed that pluripotent cells are more polarized than Human Dermal Fibroblasts (HDF) cells. The heterogeneity of iPS has a greater deviation in the DNA hypermethylation pattern. The result of regional distribution showed that the differential CpG sites between pluripotent cells and HDFs tend to accumulate in the gene body and CpG shelf regions, whereas the internal differential methylation CpG sites (DMCs) of three types of pluripotent cells tend to accumulate in the TSS1500 region. Furthermore, a series of endogenous markers of cell reprogramming were identified based on the integrative analysis, including focal adhesion, pluripotency maintenance and transcription regulation. The calcium signaling pathway was detected as one of the signatures between NT cells and iPS cells. Finally, the regional bias of DNA methylation for key pluripotency factors was discussed. Our studies provide new insight into the barrier identification of cell reprogramming. Conclusion: Our studies analyzed some epigenetic markers and barriers of nuclear reprogramming, hoping to provide new insight into understanding the underlying molecular mechanism of reprogramming.
-
-
-
Docking Techniques in Toxicology: An Overview
Authors: Meenakshi Gupta, Ruchika Sharma and Anoop KumarA variety of environmental toxicants such as heavy metals, pesticides, organic chemicals, etc produce harmful effects in our living systems. In the literature, various reports have indicated the detrimental effects of toxicants such as immunotoxicity, cardiotoxicity, nephrotoxicity, etc. Experimental animals are generally used to investigate the safety profile of environmental chemicals, but research on animals has some limitations. Thus, there is a need for alternative approaches. Docking study is one of the alternate techniques which predict the binding affinity of molecules in the active site of a particular receptor without using animals. These techniques can also be used to check the interactions of environmental toxicants towards biological targets. Varieties of user-friendly software are available in the market for molecular docking, but very few toxicologists use these techniques in the field of toxicology. To increase the use of these techniques in the field of toxicology, understanding of basic concepts of these techniques is required among toxicological scientists. This article has summarized the fundamental concepts of docking in the context of its role in toxicology. Furthermore, these promising techniques are also discussed in this study.
-
-
-
Rosetta and the Journey to Predict Proteins’ Structures, 20 Years on
Authors: Jad Abbass and Jean-Christophe NebelFor two decades, Rosetta has consistently been at the forefront of protein structure prediction. While it has become a very large package comprising programs, scripts, and tools, for different types of macromolecular modelling such as ligand docking, protein-protein docking, protein design, and loop modelling, it started as the implementation of an algorithm for ab initio protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the literature to describe that algorithm and its contribution to the third edition of the community wide Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers have been contributing to deciphering ’the second half of the genetic code’. Although the focus of Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is associated with its fragment-assembly protein structure prediction approach. Following a presentation of the main concepts underpinning its foundation, especially sequence-structure correlation and usage of fragments, we review the main stages of its developments and highlight the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
-
-
-
Complex Networks, Gene Expression and Cancer Complexity: A Brief Review of Methodology and Applications
Authors: A.C. Iliopoulos, G. Beis, P. Apostolou and I. PapasotiriouIn this brief survey, various aspects of cancer complexity and how this complexity can be confronted using modern complex networks’ theory and gene expression datasets, are described. In particular, the causes and the basic features of cancer complexity, as well as the challenges it brought are underlined, while the importance of gene expression data in cancer research and in reverse engineering of gene co-expression networks is highlighted. In addition, an introduction to the corresponding theoretical and mathematical framework of graph theory and complex networks is provided. The basics of network reconstruction along with the limitations of gene network inference, the enrichment and survival analysis, evolution, robustness-resilience and cascades in complex networks, are described. Finally, an indicative and suggestive example of a cancer gene co-expression network inference and analysis is given.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
