Volume 11, Issue 1

Current Bioinformatics - Volume 11, Issue 1, 2016

Volume 11, Issue 1, 2016

- Meet Our Regional Editor:
  
  By Youlian Pan
  
  https://doi.org/10.2174/157489361101160308182919
  More Less
  
  Add to my favourites
  
  Email this

- Editorial (Thematic Issue: Application of Novel Computational Methods in Molecular Biology, Biomedicine and Biopharmacy)
  
  By Yudong Cai
  
  https://doi.org/10.2174/157489361101160308183223
  More Less
  
  Add to my favourites
  
  Email this

- An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction
  
  Authors: Jiancang Zeng, Dapeng Li, Yunfeng Wu, Quan Zou and Xiangrong Liu
  
  https://doi.org/10.2174/1574893611666151119221435
  More Less
  
  With recent development of bioinformatics, the importance of understanding protein function has been widely acknowledged. Most proteins perform their functions by interacting with other proteins. Hence, it is urgent to explore the protein-protein interaction (PPI). At present, the prediction of PPIs is still a tough problem. Despite the fact that a variety of computational methods have been proposed to identify PPIs; unfortunately, most of them are complex and with low accuracy. Traditional methods extract features following two steps: firstly, they extract features from two proteins of a PPI; secondly, they regard two features as strings, and do concatenation operator. Concatenation is an outcome of an addition operation on strings. The concatenation operator increases redundancy features with the result being associated with the order of concatenation. Based on this, in this paper, we study the features fusion and features selection. The presented framework consists of three stages: in the first stage, we get the negative data set from off-the-shelf database. The reliability of negative data set of previous studies has not been of concern to us. While in the second stage, the n-gram frequency method was used to preprocess the PPIs sequences. The third one was applied to splice the final feature, and then the features were selected to find the optimal feature. Finally, an effective parameter for the Random Forest Classifier was selected. Experiments carried out on real data set showed that our features fusion method outperformed traditional methods in terms of protein-protein interaction prediction. The encouraging results can be helpful for future research in protein function. The web server of protein-protein interaction prediction is accessible at http://datamining.xmu.edu.cn/~zjcdm/Home.html.
  
  Add to my favourites
  
  Email this

- A Novel Boolean Network for Analyzing the p53 Gene Regulatory Network
  
  Authors: Qinbin He and Zengrong Liu
  
  https://doi.org/10.2174/1574893611666151119215249
  More Less
  
  Boolean network is a powerful tool for the study of gene regulatory networks, and dynamics of a Boolean network is mainly determined by its attractors. In this study, a new approach to construct Boolean network is proposed based on biochemical reaction differential equations. We attempt to investigate gene regulatory networks by means of comparing the experimental results from relevant literature with the attractors obtained by the Boolean network. The Boolean regulatory network proposed is simple and robust to alteration of the parameters of the network. The model is applied to investigate p53 gene regulatory network by analyzing the interplay of the key ingredients including p53, Mdm2 (murine double minute 2) and the external signal of DNA damage. The attractors obtained by p53 Boolean network are consistent with the experimental results of relevant literature. Furthermore, we speculate that there is an unknown protein XXXp in the p53 gene regulatory network, where p53p promotes the DNA of XXXp to form the mRNA of mXXXp. mXXXp generates protein XXXp to promote p53 phosphorylation to form p53p, creating a positive feedback loop of XXXp-p53p.
  
  Add to my favourites
  
  Email this

- Prediction of Linear B-Cell Epitopes with mRMR Feature Selection and Analysis
  
  Authors: Bi-Qing Li, Lu-Lu Zheng, Kai-Yan Feng, Le-Le Hu, Guo-Hua Huang and Lei Chen
  
  https://doi.org/10.2174/1574893611666151119215131
  More Less
  
  B-cell epitope, also known as antigenic determinant, is part of an antigen recognized by the B-cells. The capability of an antibody to recognize epitopes is widely utilized in numerous biomedical applications including immunodetection and immunotherapeutics. Identification of immunogenic regions helps to understand the mechanisms of the immune system and guide the related applications. In contrast with laborious and time consuming experimental approaches, predicting B-cell epitopes by computational methods is more convenient and efficient. In this study, a novel predictor with feature selection was developed by combining maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). The predictor was then trained and tested by three B-cell epitope datasets. 8 types of features, including physicochemical and biochemical properties, residual disorder, sequence conservation, solvent accessibility, secondary structures, propensity of amino acid to be conserved at protein-protein interface and protein surface, deviation of side chain carbon atom number, gain/loss of amino acid during evolution were used to code the peptides. It was shown that sequence conservation, physicochemical and biochemical properties of amino acids, solvent accessibility and secondary structure contributed most to the identification of epitope sites. And the features from the sites surrounding the central residue are critical for the prediction. The finding of this study may shed lights on the prediction of epitopes and the mechanisms of antigen-antibody interactions.
  
  Add to my favourites
  
  Email this

- RNA Sequencing and Transcriptome Analyses for Cercis Gigantean
  
  Authors: Liucun Zhu, Min Jiang, Ying Zhang, Qi Yu, Fanyuan Zhu and Qiang Wang
  
  https://doi.org/10.2174/1574893611666151119221213
  More Less
  
  Cercis gigantea is one of the most beautiful garden trees. It is part of the Cercis genus in the subfamily Caesalpinioideae of Leguminosae. However, little genetic information of C. gigantea is available. In the present study, the C. gigantea transcriptome was subjected to RNA sequencing. This generated large expression datasets suitable for functional genomic analysis. Some 55.5 million high-quality clean reads were collected. These reads were then assembled into 44,660 unigenes and 77,024 unique transcripts. The unigenes, with an average of 998 bps in length, were annotated by comparing with all known proteins in four public databases, Kyoto Encyclopedia of Genes and Genomes (KEGG), the National Center for Biotechnology Information (NCBI) non-redundant protein database (NR), the Cluster of Orthologous Groups (COG), and Swiss-Prot using the NCBI blast procedure. Out of the 44,660 unigenes, 28,884 (64.7%) were annotated. In addition, an interaction network of unigenes in C. gigantea was also constructed. The current study provides the first screen of a transcriptome not only for C. gigantea but for any Caesalpinioideae plant as an important platform for researches of functional genomics, gene expression, and protein-protein interaction.
  
  Add to my favourites
  
  Email this

- Feature Classification and Analysis of Lung Cancer Related Genes through Gene Ontology and KEGG Pathways
  
  Authors: You Zhou, Biqing Li, Yuchao Zhang, Lei Chen and Xiangyin Kong
  
  https://doi.org/10.2174/1574893611666151119220803
  More Less
  
  Characterization of cancer related genes is important and challenging in both biomedicine and computational biology. As one of the leading causes of cancer mortality worldwide, lung cancer accounts for over one million deaths each year. Generally, lung cancer can be assigned to small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC). Although great advances have been made in lung cancer detection and treatment, 5-year survival rate of patients is still less than 15%. Hence, it is very important to identify all the potential lung cancer related genes as well as their interaction networks. In this research, we presented a novel computational framework to predict lung cancer related genes based on support vector machine (SVM). 59 NSCLC related genes and 89 SCLC related genes were retrieved from KEGG pathways, while 2950 non-NSCLC and 4450 non- SCLC genes were randomly selected from Ensembl database. 10 datasets were constructed by dividing the genes into 10 groups. Each gene was encoded by a 13,126-dimensional vector comprised of 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. A feature extraction strategy was applied to obtain an optimal feature set including 400 GO terms and 47 KEGG pathways for NSCLC, 458 GO terms and 27 KEGG pathways for SCLC, respectively. Further feature analysis showed that these optimal features were actively involved in lung tumorigenesis. It also confirms that our method is an effective tool for predicting cancer related genes and has the potential to be applied extensively to the prediction of other types of cancer genes.
  
  Add to my favourites
  
  Email this

- Application of the Shortest Path Algorithm for the Discovery of Breast Cancer-Related Genes
  
  Authors: Lei Chen, Zhi Hao Xing, Tao Huang, Yang Shu, GuoHua Huang and Hai-Peng Li
  
  https://doi.org/10.2174/1574893611666151119220024
  More Less
  
  Breast cancer, the most prevalent cancer in women, develops from breast tissue. Its incidence has increased in recent years due to environmental risk factors. Thus, it is urgent to uncover the mechanism underlying breast cancer to design effective treatments. Identification of all breast cancer-related genes is one way to help elucidate the underlying breast cancer mechanism. In this study, a computational method was built and applied to discover new candidate breast cancer-related genes. Based on the known breast cancer-related genes retrieved from public databases, the shortest path algorithm was applied to discover new candidate genes in the protein-protein interaction network. The analysis results of the selected genes suggest that some of them are deemed breast cancer-related genes according to the most recent published literature, while others have direct or indirect associations with the initiation and development of breast cancer.
  
  Add to my favourites
  
  Email this

- The Integrative Network of Gene Expression, MicroRNA, Methylation and Copy Number Variation in Colon and Rectal Cancer
  
  Authors: Tao Huang, Bi-Qing Li and Yu-Dong Cai
  
  https://doi.org/10.2174/1574893611666151119215823
  More Less
  
  Gene expression level changes in cancer patients have been studied for a long time and have proven to be useful in classifying patients, predicting drug response, etc. But factors that control the gene expression in pathological conditions are still unclear. Identifying the putative causal factors could greatly help in understanding the mechanisms of cancer development and progression. It is believed that the microRNA, methylation and copy number variation (CNV) are possible regulators in gene expression. We analyzed the profiles of gene expression, microRNA, methylation and CNV in colon and rectal cancer patients. A multiple regression based method was developed to construct an integrative network including different levels of components. The regulatory effects of microRNA, methylation and copy number variation on gene expression were evaluated and compared. The cis-regulation effects from methylation to gene expression are very strong. 43.2% of cis methylation-expression appears as the most important methylation-expression regulator. 80.0% of cis methylation-expression is within the top five most important methylation-expression regulators. The functions of microRNA dominated genes, methylation dominated genes and CNV dominated genes were analyzed. The CNV dominated genes were involved in gene expression, protein modification and protein kinase activity. The methylation dominated genes showed notable immune response and calcium ion binding. The microRNA dominated genes were notable with regard to biological regulation and signal transduction. By decomposing the networks of gene, microRNA, methylation and CNV, the network modules were identified. Some modules provided useful clues about the mechanisms of gene expression regulation. Our methods provide a general framework for studying the integrative network derived from multiple large-scale biological data. The R script for our integrative network construction method is available from supporting information.
  
  Add to my favourites
  
  Email this

- Structure Based Virtual Screening for the Identification of Potential Inhibitors for Penicillin Binding Protein 2B of the Resistant 5204 Strain of Streptococcus pneumoniae
  
  Authors: S. Suvaithenamudhan and S. Parthasarathy
  
  https://doi.org/10.2174/1574893611666151119220500
  More Less
  
  In this paper, we have performed virtual screening of compounds to identify potential inhibitors against the Penicillin Binding Protein 2B (PBP2B) of the resistant 5204 strain of Streptococcus pneumoniae. We have considered 1,677,620 compounds from ZINC database for virtual screening workflow of Schrödinger suite software to identify potential inhibitors that are capable of binding to mutated resistant 5204-PBP2B. Initially, we have obtained 1,247 hits and were prioritized based on protein-ligand contacts which resulted in 99 compounds. These 99 compounds were further clustered to obtain 25 structurally diverse compounds of which the top scoring compound 5-[(6- hydroxy-1,2,3,4-tetrahydroisoquinolin-1-yl)methyl]benzene-1,2,3-triol) with ID: ZINC59376795 may be identified as the potential inhibitor. Molecular dynamics simulations were performed for the wild-sensitive R6-PBP2B and mutatedresistant 5204-PBP2B complexes with this top scoring compound ZINC59376795 and the binding patterns, RMSD calculations, protein-ligand contacts analysis provides deeper insights into the interaction patterns of this novel inhibitor against the sensitive-R6-PBP2B and resistant 5204-PBP2B of S. pneumoniae.
  
  Add to my favourites
  
  Email this

- Prediction of an Interaction between Bakuchiol and Acetylcholinesterase using Adaboost
  
  Authors: Can Zhang, Xueyuan Wang, Lei Gu, Li Jiang, Qiang Su, Manman Zhao, Linfeng Zheng, Ling Tang, Fuxue Chen and Bing Niu
  
  https://doi.org/10.2174/1574893611666151119220248
  More Less
  
  A structure-activity relationship (SAR) dataset was generated for a set of acetylcholinesterase inhibitors using Adaboost and physicochemical parameters. After calculation, it is found that the ACC of SAR model is 99.51 by using 10-fold cross-validation test, while 99.35% for independent test set. Based on the SAR prediction model, bakuchiol is predicted to be an acetylcholinesterase inhibitor. Fluorescence was used to investigate the binding between bakuchiol and acetylcholinesterase, which can provide valuable qualitative and quantitative information about the interaction between acetylcholinesterase and bakuchiol.
  
  Add to my favourites
  
  Email this

- Predicating Candidate Cancer-Associated Genes in the Human Signaling Network Using Centrality
  
  Authors: Xueming Liu and Linqiang Pan
  
  https://doi.org/10.2174/1574893611888160106154456
  More Less
  
  The development of cancer evolves gene mutations according to the somatic mutation theory. The identification and prediction of the cancer-associated genes is one of the most important aims in cancer research. We apply four centrality metrics (degree, betweenness, closeness and PageRank) to prioritize and predict the candidate cancer-associated genes in the human signaling network. We find that the genes with higher centrality scores are more likely to be cancer-associated. Taking the top 47 genes for each centrality measure, we get 89 central genes. Among these 89 central genes, 58 genes are known to be cancer-associated, 4 genes encode non-protein and 27 genes are inferred genes. For the 27 inferred genes, by literature mining we find that 21 genes have been confirmed to be cancerassociated and the other 6 genes (CAMP, GSK3A, MTG1, GNGT1, ISGF3G and DYT10) are strong candidates for cancer research. These results show that the four centrality metrics are effective in predicting candidate cancer-associated genes for further experimental analysis.
  
  Add to my favourites
  
  Email this

- Iterative Multi Level Calibration of Metabolic Networks
  
  Authors: Max Conway, Claudio Angione and Pietro Liò
  
  https://doi.org/10.2174/1574893611666151203222505
  More Less
  
  Frameworks for metabolic engineering have been successfully applied in combination with pre- and post-processing algorithms on genome-wide metabolic models. However, genetic engineering methods with a particular focus on understanding results from multiple perspectives and combining automated and human design are still lacking. To this end, we adopt a multi-objective genetic design technique to find the optimal gene expression levels in genome-scale metabolic reconstructions. Then, we analyse the optimized network by introducing a new multi-omic, multi-level post-processing and visualization procedure, Metabex, which uses Cytoscape for network visualization. These two components are connected together to form a feedback loop that establishes a continual process of machine optimization and human analysis and guidance. To benchmark our framework, we optimize two species of Geobacter for electricity production and biomass synthesis; we achieve increases in electricity production for only a slight decrease in biomass. Many regulatory strategies contributed to this value, locally and globally. For instance, a direct, local strategy was a down-regulation of Cytochrome C Oxidase, while there was simultaneously a global reduction in cofactor and prosthetic group biosynthesis. Finally, we discuss multiple applications of our tool, including model exploration, model engineering, comparative modelling, meta-analysis and model refinement.
  
  Add to my favourites
  
  Email this

- Characterization of Graphs for Protein Structure Modeling and Recognition of Solubility
  
  Authors: Lorenzo Livi, Alessandro Giuliani and Alireza Sadeghian
  
  https://doi.org/10.2174/1574893611666151109175216
  More Less
  
  This paper deals with the relations among structural, topological, and chemical properties of the E. coli proteome from the vantage point of the solubility/aggregation propensity of proteins. Each E. coli protein is initially represented according to its known folded 3D shape. This step involves representing the available E. coli proteins in terms of graphs. We first analyze those graphs by considering pure topological characterizations, i.e., by analyzing the mass fractal dimension and the distribution underlying both shortest paths and vertex degrees. Results confirm the general architectural principles of proteins. Successively, we focus on the statistical properties of a representation of such graphs in terms of vectors composed of several numerical features, which we extracted from their structural representation. We found that protein size is the main discriminator for the solubility, while however there are other factors that help explaining the solubility degree. We finally analyze such data through a novel one-class classifier, with the aim of discriminating among very and poorly soluble proteins. Results are encouraging and consolidate the potential of pattern recognition techniques when employed to describe complex biological systems.
  
  Add to my favourites
  
  Email this

- VFP: A Visual Algorithm for Predicting Gene Fusion in RNA-Seq Data
  
  Authors: Ye Yang and Juan Liu
  
  https://doi.org/10.2174/1574893611888151222162228
  More Less
  
  Gene fusion is a key factor in sarcomas, lymphomas, leukemias and so on. In order to help biologist to discover the target of the treatment, we developed VFP to predict gene-fusion from single-end RNA-sequencing reads. VFP employs seed index strategy and octal encoding operations for sequence alignments. By using several rules to score and filter the potential fusion genes, VFP could detect known and novel fusions through a series of tests on lymphoma and melanoma RNA-sequencing datasets.
  
  Add to my favourites
  
  Email this

- In Sight to the Identification and Analysis of Simple Sequence Repeats (SSRs) in Monoterpene Biosynthesizing Plant Species
  
  Authors: Anand Mishra, Sanchita, Sunita Singh Dhawan and Ashok Sharma
  
  https://doi.org/10.2174/1574893609666140911010738
  More Less
  
  Terpenes are the major component of essential oils biosynthesized in various plant species. The essential oils are widely used as natural flavouring agents, food additives, perfumery and pharmaceuticals. The terpenes are of many types based on presence of numbers of isoprene units i.e. mono (C10), sesqui (C15), di (C20), tri (C30) and tetra (C40). The monoterpenes are diverse in nature and useful for plants and human. Monoterpenes constitute the major portion of the essential oils in the flowers and leaves of plants as natural products. Plant uses these compounds for defence and messaging. A total of 665 sequences have been retrieved related to monoterpenes biosynthesis from public domain. The sequences were assembled together resulting into 159 contigs and singletons. The contigs and singletons were used for in silico identification of microsatellites. The SSR repeats were identified as short sequences of di, tri, tetra, penta and hexanucleotides having maximum percentage of hexanucleotides repeats. The analysis revealed a total of 215 SSRs and their frequency of distribution among monoterpenes of related sequences and functional domain markers (FDM) were also analysed. Homologs sequences not having any terepne synthase group were retrieved as an out group for validation of study.
  
  Add to my favourites
  
  Email this

- Prediction and Analysis of the Protein-Protein Interaction Networks for Chickens, Cattle, Dogs, Horses and Rabbits
  
  Authors: Fen Wang, Baoxing Song, Xing Zhao, Yaotian Miao, Dengyun Li, Na Zhou, Pengfei Jiang, Qing Sang, Jingfei Huang and Deli Zhang
  
  https://doi.org/10.2174/1574893611666151203221255
  More Less
  
  The development of high-throughput screening for protein-protein interaction (PPI) is currently utilized for detailed experimentation and the formulation of biological hypotheses. Comprehensive and concrete PPI networks of domestic animals are urgently needed because of their considerable economic value. We constructed the PPI networks of chickens, cattle, dogs, horses, and rabbits using the InParanoid method, an interolog method that depends on protein ortholog algorithms, and the domain-motif interactions from structural topology (D-MIST) method which is defined as if a motif in protein A matches a domain in protein B in the position-specific scoring matrices, protein A likely interacts with protein B. Up to 328,590 PPIs were found in chickens, 447,014 in cattle, 129,386 in dogs, 93,414 in horses and 115,296 in rabbits. Furthermore, a large number of novel PPIs with diverse biological roles were discovered. Gene ontology annotation and subcellular localization were applied to verify the results. In GO annotations, 30.28% to 50.08% and 35.32% to 39.97% of the predicted PPIs were found sharing GO terms using Inparanoid and D-MIST. In subcellular localization, 55.04% to 59.97% and 35.43% to 40.26% of the predicted PPIs had the same subcellular localization using Inparanoid and D-MIST. Compared with the randomized networks, the results revealed the predicted networks were considerably reliable. This work, to some extent establishes the biological significance of these PPI networks and provides clues to further decipher the molecular mechanism of circadian rhythmicity. The PPI networks of the five species are freely available at http: //dppin.songbx.me.
  
  Add to my favourites
  
  Email this

- Comparison of Kernel and Decision Tree-based Algorithms for the Prediction of microRNAs Associated with Cancer
  
  Authors: Ram Kothandan and Sumit Biswas
  
  https://doi.org/10.2174/1574893611666151120102307
  More Less
  
  The discovery of microRNAs (miRs) in the 1990's spawned a genre of research which has thrown light on the involvement of these small non-coding RNAs in several developmental pathways and diseases, one of which happens to be cancer. While algorithms which predict the binding of miRNAs to their targets are abundant, the same is not true for the association of miRNAs to targets which can be implicated in cancer. Machine learning approaches, which have been implemented in target prediction need to be extrapolated with proper feature selection to reach an acceptable level of accuracy in the prediction of associations of miRNAs to cancer. In this study we present a comparison of three different learning algorithms viz., the kernel-based Support Vector Machines (SVM), Decision Tree-based Random Forest (RF) and C4.5 to predict miRNAs associated with cancer. 60 informative features were extracted from a dataset of experimentally validated miRNA based on sequence, thermodynamics of miRNA-mRNA binding and their hybridization. Initially, features were ranked based on F-score and a two-stage Recursive Feature Elimination (RFE) process was employed to select the optimal subset of features for individual classifier. Class imbalance in the training set was overcome by employing cost-sensitive approach. The performance of each individual learning algorithm was evaluated in terms of precision, recall, F-measure and AUC. Subsequently, the learning algorithm with better performance measure would be utilized for constructing a two-step binary classifier viz., miRSEQ and miRINT, which will identify a miRNA to be associated with the cancer pathway. Based on our comparative analysis, it was evident that the decision tree based RF model performed well in terms of better precision and AUC (for miRSEQ), but was moderate (for miRINT).
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 11, Issue 1, 2016

Volume 11, Issue 1, 2016

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed