Current Bioinformatics - Volume 11, Issue 1, 2016
Volume 11, Issue 1, 2016
-
-
An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction
Authors: Jiancang Zeng, Dapeng Li, Yunfeng Wu, Quan Zou and Xiangrong LiuWith recent development of bioinformatics, the importance of understanding protein function has been widely acknowledged. Most proteins perform their functions by interacting with other proteins. Hence, it is urgent to explore the protein-protein interaction (PPI). At present, the prediction of PPIs is still a tough problem. Despite the fact that a variety of computational methods have been proposed to identify PPIs; unfortunately, most of them are complex and with low accuracy. Traditional methods extract features following two steps: firstly, they extract features from two proteins of a PPI; secondly, they regard two features as strings, and do concatenation operator. Concatenation is an outcome of an addition operation on strings. The concatenation operator increases redundancy features with the result being associated with the order of concatenation. Based on this, in this paper, we study the features fusion and features selection. The presented framework consists of three stages: in the first stage, we get the negative data set from off-the-shelf database. The reliability of negative data set of previous studies has not been of concern to us. While in the second stage, the n-gram frequency method was used to preprocess the PPIs sequences. The third one was applied to splice the final feature, and then the features were selected to find the optimal feature. Finally, an effective parameter for the Random Forest Classifier was selected. Experiments carried out on real data set showed that our features fusion method outperformed traditional methods in terms of protein-protein interaction prediction. The encouraging results can be helpful for future research in protein function. The web server of protein-protein interaction prediction is accessible at http://datamining.xmu.edu.cn/~zjcdm/Home.html.
-
-
-
A Novel Boolean Network for Analyzing the p53 Gene Regulatory Network
Authors: Qinbin He and Zengrong LiuBoolean network is a powerful tool for the study of gene regulatory networks, and dynamics of a Boolean network is mainly determined by its attractors. In this study, a new approach to construct Boolean network is proposed based on biochemical reaction differential equations. We attempt to investigate gene regulatory networks by means of comparing the experimental results from relevant literature with the attractors obtained by the Boolean network. The Boolean regulatory network proposed is simple and robust to alteration of the parameters of the network. The model is applied to investigate p53 gene regulatory network by analyzing the interplay of the key ingredients including p53, Mdm2 (murine double minute 2) and the external signal of DNA damage. The attractors obtained by p53 Boolean network are consistent with the experimental results of relevant literature. Furthermore, we speculate that there is an unknown protein XXXp in the p53 gene regulatory network, where p53p promotes the DNA of XXXp to form the mRNA of mXXXp. mXXXp generates protein XXXp to promote p53 phosphorylation to form p53p, creating a positive feedback loop of XXXp-p53p.
-
-
-
Prediction of Linear B-Cell Epitopes with mRMR Feature Selection and Analysis
Authors: Bi-Qing Li, Lu-Lu Zheng, Kai-Yan Feng, Le-Le Hu, Guo-Hua Huang and Lei ChenB-cell epitope, also known as antigenic determinant, is part of an antigen recognized by the B-cells. The capability of an antibody to recognize epitopes is widely utilized in numerous biomedical applications including immunodetection and immunotherapeutics. Identification of immunogenic regions helps to understand the mechanisms of the immune system and guide the related applications. In contrast with laborious and time consuming experimental approaches, predicting B-cell epitopes by computational methods is more convenient and efficient. In this study, a novel predictor with feature selection was developed by combining maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). The predictor was then trained and tested by three B-cell epitope datasets. 8 types of features, including physicochemical and biochemical properties, residual disorder, sequence conservation, solvent accessibility, secondary structures, propensity of amino acid to be conserved at protein-protein interface and protein surface, deviation of side chain carbon atom number, gain/loss of amino acid during evolution were used to code the peptides. It was shown that sequence conservation, physicochemical and biochemical properties of amino acids, solvent accessibility and secondary structure contributed most to the identification of epitope sites. And the features from the sites surrounding the central residue are critical for the prediction. The finding of this study may shed lights on the prediction of epitopes and the mechanisms of antigen-antibody interactions.
-
-
-
RNA Sequencing and Transcriptome Analyses for Cercis Gigantean
Authors: Liucun Zhu, Min Jiang, Ying Zhang, Qi Yu, Fanyuan Zhu and Qiang WangCercis gigantea is one of the most beautiful garden trees. It is part of the Cercis genus in the subfamily Caesalpinioideae of Leguminosae. However, little genetic information of C. gigantea is available. In the present study, the C. gigantea transcriptome was subjected to RNA sequencing. This generated large expression datasets suitable for functional genomic analysis. Some 55.5 million high-quality clean reads were collected. These reads were then assembled into 44,660 unigenes and 77,024 unique transcripts. The unigenes, with an average of 998 bps in length, were annotated by comparing with all known proteins in four public databases, Kyoto Encyclopedia of Genes and Genomes (KEGG), the National Center for Biotechnology Information (NCBI) non-redundant protein database (NR), the Cluster of Orthologous Groups (COG), and Swiss-Prot using the NCBI blast procedure. Out of the 44,660 unigenes, 28,884 (64.7%) were annotated. In addition, an interaction network of unigenes in C. gigantea was also constructed. The current study provides the first screen of a transcriptome not only for C. gigantea but for any Caesalpinioideae plant as an important platform for researches of functional genomics, gene expression, and protein-protein interaction.
-
-
-
Feature Classification and Analysis of Lung Cancer Related Genes through Gene Ontology and KEGG Pathways
Authors: You Zhou, Biqing Li, Yuchao Zhang, Lei Chen and Xiangyin KongCharacterization of cancer related genes is important and challenging in both biomedicine and computational biology. As one of the leading causes of cancer mortality worldwide, lung cancer accounts for over one million deaths each year. Generally, lung cancer can be assigned to small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC). Although great advances have been made in lung cancer detection and treatment, 5-year survival rate of patients is still less than 15%. Hence, it is very important to identify all the potential lung cancer related genes as well as their interaction networks. In this research, we presented a novel computational framework to predict lung cancer related genes based on support vector machine (SVM). 59 NSCLC related genes and 89 SCLC related genes were retrieved from KEGG pathways, while 2950 non-NSCLC and 4450 non- SCLC genes were randomly selected from Ensembl database. 10 datasets were constructed by dividing the genes into 10 groups. Each gene was encoded by a 13,126-dimensional vector comprised of 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. A feature extraction strategy was applied to obtain an optimal feature set including 400 GO terms and 47 KEGG pathways for NSCLC, 458 GO terms and 27 KEGG pathways for SCLC, respectively. Further feature analysis showed that these optimal features were actively involved in lung tumorigenesis. It also confirms that our method is an effective tool for predicting cancer related genes and has the potential to be applied extensively to the prediction of other types of cancer genes.
-
-
-
Application of the Shortest Path Algorithm for the Discovery of Breast Cancer-Related Genes
Authors: Lei Chen, Zhi Hao Xing, Tao Huang, Yang Shu, GuoHua Huang and Hai-Peng LiBreast cancer, the most prevalent cancer in women, develops from breast tissue. Its incidence has increased in recent years due to environmental risk factors. Thus, it is urgent to uncover the mechanism underlying breast cancer to design effective treatments. Identification of all breast cancer-related genes is one way to help elucidate the underlying breast cancer mechanism. In this study, a computational method was built and applied to discover new candidate breast cancer-related genes. Based on the known breast cancer-related genes retrieved from public databases, the shortest path algorithm was applied to discover new candidate genes in the protein-protein interaction network. The analysis results of the selected genes suggest that some of them are deemed breast cancer-related genes according to the most recent published literature, while others have direct or indirect associations with the initiation and development of breast cancer.
-
-
-
The Integrative Network of Gene Expression, MicroRNA, Methylation and Copy Number Variation in Colon and Rectal Cancer
Authors: Tao Huang, Bi-Qing Li and Yu-Dong CaiGene expression level changes in cancer patients have been studied for a long time and have proven to be useful in classifying patients, predicting drug response, etc. But factors that control the gene expression in pathological conditions are still unclear. Identifying the putative causal factors could greatly help in understanding the mechanisms of cancer development and progression. It is believed that the microRNA, methylation and copy number variation (CNV) are possible regulators in gene expression. We analyzed the profiles of gene expression, microRNA, methylation and CNV in colon and rectal cancer patients. A multiple regression based method was developed to construct an integrative network including different levels of components. The regulatory effects of microRNA, methylation and copy number variation on gene expression were evaluated and compared. The cis-regulation effects from methylation to gene expression are very strong. 43.2% of cis methylation-expression appears as the most important methylation-expression regulator. 80.0% of cis methylation-expression is within the top five most important methylation-expression regulators. The functions of microRNA dominated genes, methylation dominated genes and CNV dominated genes were analyzed. The CNV dominated genes were involved in gene expression, protein modification and protein kinase activity. The methylation dominated genes showed notable immune response and calcium ion binding. The microRNA dominated genes were notable with regard to biological regulation and signal transduction. By decomposing the networks of gene, microRNA, methylation and CNV, the network modules were identified. Some modules provided useful clues about the mechanisms of gene expression regulation. Our methods provide a general framework for studying the integrative network derived from multiple large-scale biological data. The R script for our integrative network construction method is available from supporting information.
-
-
-
Structure Based Virtual Screening for the Identification of Potential Inhibitors for Penicillin Binding Protein 2B of the Resistant 5204 Strain of Streptococcus pneumoniae
Authors: S. Suvaithenamudhan and S. ParthasarathyIn this paper, we have performed virtual screening of compounds to identify potential inhibitors against the Penicillin Binding Protein 2B (PBP2B) of the resistant 5204 strain of Streptococcus pneumoniae. We have considered 1,677,620 compounds from ZINC database for virtual screening workflow of Schrödinger suite software to identify potential inhibitors that are capable of binding to mutated resistant 5204-PBP2B. Initially, we have obtained 1,247 hits and were prioritized based on protein-ligand contacts which resulted in 99 compounds. These 99 compounds were further clustered to obtain 25 structurally diverse compounds of which the top scoring compound 5-[(6- hydroxy-1,2,3,4-tetrahydroisoquinolin-1-yl)methyl]benzene-1,2,3-triol) with ID: ZINC59376795 may be identified as the potential inhibitor. Molecular dynamics simulations were performed for the wild-sensitive R6-PBP2B and mutatedresistant 5204-PBP2B complexes with this top scoring compound ZINC59376795 and the binding patterns, RMSD calculations, protein-ligand contacts analysis provides deeper insights into the interaction patterns of this novel inhibitor against the sensitive-R6-PBP2B and resistant 5204-PBP2B of S. pneumoniae.
-
-
-
Prediction of an Interaction between Bakuchiol and Acetylcholinesterase using Adaboost
Authors: Can Zhang, Xueyuan Wang, Lei Gu, Li Jiang, Qiang Su, Manman Zhao, Linfeng Zheng, Ling Tang, Fuxue Chen and Bing NiuA structure-activity relationship (SAR) dataset was generated for a set of acetylcholinesterase inhibitors using Adaboost and physicochemical parameters. After calculation, it is found that the ACC of SAR model is 99.51 by using 10-fold cross-validation test, while 99.35% for independent test set. Based on the SAR prediction model, bakuchiol is predicted to be an acetylcholinesterase inhibitor. Fluorescence was used to investigate the binding between bakuchiol and acetylcholinesterase, which can provide valuable qualitative and quantitative information about the interaction between acetylcholinesterase and bakuchiol.
-
-
-
Predicating Candidate Cancer-Associated Genes in the Human Signaling Network Using Centrality
Authors: Xueming Liu and Linqiang PanThe development of cancer evolves gene mutations according to the somatic mutation theory. The identification and prediction of the cancer-associated genes is one of the most important aims in cancer research. We apply four centrality metrics (degree, betweenness, closeness and PageRank) to prioritize and predict the candidate cancer-associated genes in the human signaling network. We find that the genes with higher centrality scores are more likely to be cancer-associated. Taking the top 47 genes for each centrality measure, we get 89 central genes. Among these 89 central genes, 58 genes are known to be cancer-associated, 4 genes encode non-protein and 27 genes are inferred genes. For the 27 inferred genes, by literature mining we find that 21 genes have been confirmed to be cancerassociated and the other 6 genes (CAMP, GSK3A, MTG1, GNGT1, ISGF3G and DYT10) are strong candidates for cancer research. These results show that the four centrality metrics are effective in predicting candidate cancer-associated genes for further experimental analysis.
-
-
-
Iterative Multi Level Calibration of Metabolic Networks
Authors: Max Conway, Claudio Angione and Pietro LiòFrameworks for metabolic engineering have been successfully applied in combination with pre- and post-processing algorithms on genome-wide metabolic models. However, genetic engineering methods with a particular focus on understanding results from multiple perspectives and combining automated and human design are still lacking. To this end, we adopt a multi-objective genetic design technique to find the optimal gene expression levels in genome-scale metabolic reconstructions. Then, we analyse the optimized network by introducing a new multi-omic, multi-level post-processing and visualization procedure, Metabex, which uses Cytoscape for network visualization. These two components are connected together to form a feedback loop that establishes a continual process of machine optimization and human analysis and guidance. To benchmark our framework, we optimize two species of Geobacter for electricity production and biomass synthesis; we achieve increases in electricity production for only a slight decrease in biomass. Many regulatory strategies contributed to this value, locally and globally. For instance, a direct, local strategy was a down-regulation of Cytochrome C Oxidase, while there was simultaneously a global reduction in cofactor and prosthetic group biosynthesis. Finally, we discuss multiple applications of our tool, including model exploration, model engineering, comparative modelling, meta-analysis and model refinement.
-
-
-
Characterization of Graphs for Protein Structure Modeling and Recognition of Solubility
Authors: Lorenzo Livi, Alessandro Giuliani and Alireza SadeghianThis paper deals with the relations among structural, topological, and chemical properties of the E. coli proteome from the vantage point of the solubility/aggregation propensity of proteins. Each E. coli protein is initially represented according to its known folded 3D shape. This step involves representing the available E. coli proteins in terms of graphs. We first analyze those graphs by considering pure topological characterizations, i.e., by analyzing the mass fractal dimension and the distribution underlying both shortest paths and vertex degrees. Results confirm the general architectural principles of proteins. Successively, we focus on the statistical properties of a representation of such graphs in terms of vectors composed of several numerical features, which we extracted from their structural representation. We found that protein size is the main discriminator for the solubility, while however there are other factors that help explaining the solubility degree. We finally analyze such data through a novel one-class classifier, with the aim of discriminating among very and poorly soluble proteins. Results are encouraging and consolidate the potential of pattern recognition techniques when employed to describe complex biological systems.
-
-
-
VFP: A Visual Algorithm for Predicting Gene Fusion in RNA-Seq Data
More LessGene fusion is a key factor in sarcomas, lymphomas, leukemias and so on. In order to help biologist to discover the target of the treatment, we developed VFP to predict gene-fusion from single-end RNA-sequencing reads. VFP employs seed index strategy and octal encoding operations for sequence alignments. By using several rules to score and filter the potential fusion genes, VFP could detect known and novel fusions through a series of tests on lymphoma and melanoma RNA-sequencing datasets.
-
-
-
In Sight to the Identification and Analysis of Simple Sequence Repeats (SSRs) in Monoterpene Biosynthesizing Plant Species
Authors: Anand Mishra, Sanchita, Sunita Singh Dhawan and Ashok SharmaTerpenes are the major component of essential oils biosynthesized in various plant species. The essential oils are widely used as natural flavouring agents, food additives, perfumery and pharmaceuticals. The terpenes are of many types based on presence of numbers of isoprene units i.e. mono (C10), sesqui (C15), di (C20), tri (C30) and tetra (C40). The monoterpenes are diverse in nature and useful for plants and human. Monoterpenes constitute the major portion of the essential oils in the flowers and leaves of plants as natural products. Plant uses these compounds for defence and messaging. A total of 665 sequences have been retrieved related to monoterpenes biosynthesis from public domain. The sequences were assembled together resulting into 159 contigs and singletons. The contigs and singletons were used for in silico identification of microsatellites. The SSR repeats were identified as short sequences of di, tri, tetra, penta and hexanucleotides having maximum percentage of hexanucleotides repeats. The analysis revealed a total of 215 SSRs and their frequency of distribution among monoterpenes of related sequences and functional domain markers (FDM) were also analysed. Homologs sequences not having any terepne synthase group were retrieved as an out group for validation of study.
-
-
-
Prediction and Analysis of the Protein-Protein Interaction Networks for Chickens, Cattle, Dogs, Horses and Rabbits
Authors: Fen Wang, Baoxing Song, Xing Zhao, Yaotian Miao, Dengyun Li, Na Zhou, Pengfei Jiang, Qing Sang, Jingfei Huang and Deli ZhangThe development of high-throughput screening for protein-protein interaction (PPI) is currently utilized for detailed experimentation and the formulation of biological hypotheses. Comprehensive and concrete PPI networks of domestic animals are urgently needed because of their considerable economic value. We constructed the PPI networks of chickens, cattle, dogs, horses, and rabbits using the InParanoid method, an interolog method that depends on protein ortholog algorithms, and the domain-motif interactions from structural topology (D-MIST) method which is defined as if a motif in protein A matches a domain in protein B in the position-specific scoring matrices, protein A likely interacts with protein B. Up to 328,590 PPIs were found in chickens, 447,014 in cattle, 129,386 in dogs, 93,414 in horses and 115,296 in rabbits. Furthermore, a large number of novel PPIs with diverse biological roles were discovered. Gene ontology annotation and subcellular localization were applied to verify the results. In GO annotations, 30.28% to 50.08% and 35.32% to 39.97% of the predicted PPIs were found sharing GO terms using Inparanoid and D-MIST. In subcellular localization, 55.04% to 59.97% and 35.43% to 40.26% of the predicted PPIs had the same subcellular localization using Inparanoid and D-MIST. Compared with the randomized networks, the results revealed the predicted networks were considerably reliable. This work, to some extent establishes the biological significance of these PPI networks and provides clues to further decipher the molecular mechanism of circadian rhythmicity. The PPI networks of the five species are freely available at http: //dppin.songbx.me.
-
-
-
Comparison of Kernel and Decision Tree-based Algorithms for the Prediction of microRNAs Associated with Cancer
Authors: Ram Kothandan and Sumit BiswasThe discovery of microRNAs (miRs) in the 1990's spawned a genre of research which has thrown light on the involvement of these small non-coding RNAs in several developmental pathways and diseases, one of which happens to be cancer. While algorithms which predict the binding of miRNAs to their targets are abundant, the same is not true for the association of miRNAs to targets which can be implicated in cancer. Machine learning approaches, which have been implemented in target prediction need to be extrapolated with proper feature selection to reach an acceptable level of accuracy in the prediction of associations of miRNAs to cancer. In this study we present a comparison of three different learning algorithms viz., the kernel-based Support Vector Machines (SVM), Decision Tree-based Random Forest (RF) and C4.5 to predict miRNAs associated with cancer. 60 informative features were extracted from a dataset of experimentally validated miRNA based on sequence, thermodynamics of miRNA-mRNA binding and their hybridization. Initially, features were ranked based on F-score and a two-stage Recursive Feature Elimination (RFE) process was employed to select the optimal subset of features for individual classifier. Class imbalance in the training set was overcome by employing cost-sensitive approach. The performance of each individual learning algorithm was evaluated in terms of precision, recall, F-measure and AUC. Subsequently, the learning algorithm with better performance measure would be utilized for constructing a two-step binary classifier viz., miRSEQ and miRINT, which will identify a miRNA to be associated with the cancer pathway. Based on our comparative analysis, it was evident that the decision tree based RF model performed well in terms of better precision and AUC (for miRSEQ), but was moderate (for miRINT).
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
