Volume 12, Issue 6

Current Bioinformatics - Volume 12, Issue 6, 2017

Volume 12, Issue 6, 2017

- Meet Our Editorial Board Member
  
  By Stefano Toppo
  
  https://doi.org/10.2174/157489361206171226121057
  More Less
  
  Add to my favourites
  
  Email this

- MetalExplorer, a Bioinformatics Tool for the Improved Prediction of Eight Types of Metal-Binding Sites Using a Random Forest Algorithm with Two- Step Feature Selection
  
  Authors: Jiangning Song, Chen Li, Cheng Zheng, Jerico Revote, Ziding Zhang and Geoffrey I. Webb
  
  https://doi.org/10.2174/2468422806666160618091522
  More Less
  
  Background: Metalloproteins are highly involved in many biological processes, including catalysis, recognition, transport, transcription, and signal transduction. The metal ions they bind usually play enzymatic or structural roles in mediating these diverse functional roles. Thus, the systematic analysis and prediction of metal-binding sites using sequence and/or structural information are crucial for understanding their sequence-structure-function relationships. Objective: The objective of this work is to develop a new computational algorithm for improved prediction of major types of metal-binding sites. Method: We propose MetalExplorer (http://metalexplorer.erc.monash.edu.au/), a new machine learning-based method for predicting eight different types of metal-binding sites (Ca, Co, Cu, Fe, Ni, Mg, Mn, and Zn) in proteins. Our approach combines heterogeneous sequence-, structure-, and residue contact network-based features in a random forest machine-learning framework. Results: The predictive performance of MetalExplorer was tested by cross-validation and independent tests using non-redundant datasets of known structures. This method applies a two-step feature selection approach based on the maximum relevance minimum redundancy and forward feature selection to identify the most informative features that contribute to the prediction performance. With a precision of 60%, MetalExplorer achieved high recall values, which ranged from 59% to 88% for the eight metal ion types in fivefold cross-validation tests. Moreover, the common and type-specific features in the optimal subsets of all metal ions were characterized in terms of their contributions to the overall performance. Conclusion: In terms of both benchmark and independent datasets at the 60% precision control level, MetalExplorer compared favorably with an existing metalloprotein prediction tool, SitePredict. MetalExplorer is expected to be a powerful tool for the accurate prediction of potential metal-binding sites and it should facilitate the functional analysis and rational design of novel metalloproteins.
  
  Add to my favourites
  
  Email this

- Information Content Estimate of Model Proteomes: A Primary Structure Perspective
  
  By Sertac Eroglu
  
  https://doi.org/10.2174/1574893612666161215165052
  More Less
  
  Background: The mathematical foundation for the information theory in communication engineering was developed by Claude Shannon in 1948. Since then the information theory has been utilized to investigate various information carrying systems including biomolecules such as DNA and proteins. Objective: In this study, a measure for the structural information content estimate of proteomes is proposed. The considered primary structure feature for the information content investigation is the sequence length organization of proteomic proteins, as opposed to the amino acid order in individual protein sequences. Method: We analyzed and compared the information content estimates of a representative proteome set of ten proteomes for measured, model-predicted (linguistic distribution model) and simulated (random sequence length) cases. Results: Excellent agreement was observed in the measured and model-predicted information contents of the proteomes. The overall average information per proteomic protein was obtained as 8 and 7 bits for the measured/model-predicted and the simulated proteomic collection data, respectively. Conclusion: The study reveals that the biological interaction mechanisms may primarily rely on the number of amino acids than the amino acid order of an interaction-initiating protein sequence. The approach presented here may serve as a practical tool for studying and comparing biological processes taking place in an organism or in a collection of organisms, and is anticipated to offer numerous promises for the exploration of proteomic information characteristics present in different structural hierarchies such as the secondary and tertiary structures.
  
  Add to my favourites
  
  Email this

- A Novel Method for Better Bacterial Genome Assembly from Illumina Data
  
  Authors: Peixiang Ni, Wenkui Dai, Yongfeng Liu, Zhenyu Yang, Tao Zhou, Shuqing Liang, Tong Wang, Jing Xu and Yun Zhao
  
  https://doi.org/10.2174/1574893610666150624171516
  More Less
  
  Background: With the rapid development of next generation sequencing technology, a great many individual genomes have been generated. Genome sequence of bacterium, as the foundation of microbiology research, is of great value. Due to the diversity and complexity of bacterium, assembling genome short reads is still challenging. Objective: A new solution has been developed based on SOAPdenovo assembler to increase the fineness and accuracy of bacterial genome sequence. Method: The method mainly contains four steps: preliminary genome assembly via SOAPdenovo, super scaffold construction, gap closure and final sequence revision. Results: Seventeen fine genomes have been generated through this solution. Meanwhile, 23 sequenced strains are chosen to evaluate the advantage of this method, and the assembly result shows that 16 of them are better than the original ones in contiguity and accuracy. Conclusion: With more and more individual bacterial genomes generated by this method, we can infer that this work provides a cost-effective and time-saving method for the acquisition of bacterial genomes.
  
  Add to my favourites
  
  Email this

- Computational Modeling of Small Molecule Inhibitors of Mitochondrial Fusion
  
  Authors: Sonam Arora, Salma Jamal, Sonam Gaba, Yasha Hasija and Vinod Scaria
  
  https://doi.org/10.2174/1574893611666161103152934
  More Less
  
  Background: Mitochondria are membrane bound structures found in most eukaryotic cells.The most prominent function of this essential organelle is the generation of ATP and the regulation of cell metabolism. However, being a vital part of the cell, mitochondrial dysfunction has been associated to many diseases due to its influence on cellular metabolism. A range of disorders and diseases have been reported as a result of damage and dysfunction in mitochondria which include cancer, diabetes mellitus and neurodegenerative diseases that affect millions of people worldwide. This has made mitochondrial processes an attractive and novel target for potential therapeutic intervention. The application of cheminformatics tools has made possible prioritization and in-depth understanding of small molecules with mitochondrial phenotypes at a much faster rate and reduced cost compared to traditional high-throughput screening. Methods: We have used a publicly available dataset of inhibitors of mitochondrial fusion to build accurate predictive cheminformatics models. We have used the machine learning based classification algorithms and further enhanced this approach using a maximum common substructure (MCS) approach. Three classification algorithms, namely Naive Bayes, Random forest and J48 were used in the present study. Results: Random forest based model was found to be the most accurate, with an accuracy of about 80%. As a proof of application, themodel was further used to prioritize a subset of drug like molecules from a large chemical library, ZINC as well as used to annotate potential new mechanisms of action of molecules with anti-cancer activities. Conclusions: We show that machine learning approaches could be effectively used to build highly accurate classification models for high-throughput screen datasets. We show as proof of concept that such models could be used to screen and prioritize large datasets in silico, for further experimental validation and also assign potential mechanism of action for molecules.
  
  Add to my favourites
  
  Email this

- Molecular Beacon Based Biosensing for Detection of Pathogenic Water Borne Multiple Fungal Strains: An In-Silico Approach
  
  Authors: Sonali Mishra and Krishna Misra
  
  https://doi.org/10.2174/1574893611666160922153324
  More Less
  
  Background: The water borne pathogenic fungi have recently become a big threat and lead cause of many hazardous infectious diseases in immuno challenged people. Objective: A universal standardized method has to be developed for instant, specific and easier detection and diagnosis of water borne pathogenic fungi. Method: None of the methods so far known for the detection of pathogenic microorganisms is as handy, economic, specific and sensitive as molecular beacons. Computational approach has been employed in the present work to detect the conserved and oft repeat sequence pattern in the r-RNA sequences of twenty three water borne pathogenic fungal species. These species were classified in three groups and models of probes containing secondary stem loop structures were designed for each group. Finally a common model capable of specifically detecting all species in one hit has been designed. Molecular beacons have been proposed by attaching donor and acceptor dyes to the designed probes Results and Conclusion: In the present work a molecular beacon based probe has been modelled which is capable of detecting 23 pathogenic and water borne fungal strains. This approach of designing molecular beacons with specific sequences of rRNA can prove to be a sensitive diagnostic technique for detecting water borne pathogenic, fungal strains in miniscule amounts.
  
  Add to my favourites
  
  Email this

- Identification of Drug-Drug Interactions Using Chemical Interactions
  
  Authors: Lei Chen, Chen Chu, Yu-Hang Zhang, Mingyue Zheng, LiuCun Zhu, XiangYin Kong and Tao Huang
  
  https://doi.org/10.2174/1574893611666160618094219
  More Less
  
  Background: One drug can affect the activity of another when they are administered together, which can cause adverse drug reactions or sometimes improve therapeutic effects. Therefore, correct identification of drug-drug interactions (DDIs) can help medical workers use various drugs effectively, avoiding adverse effects and improving therapeutic effects. Methods: This study proposed a novel prediction model to identify DDIs. A new metric was constructed to evaluate the similarity of two pairs of drugs using chemical interaction information retrieved from STITCH. Validated DDIs retrieved from DrugBank were employed, from which we constructed all possible pairs of drugs that were deemed as negative samples. The whole dataset was divided into one training dataset and one test dataset. To address the imbalanced data, a complicated dataset compilation strategy was adopted to construct nine training datasets from the original training dataset, reducing the ratio of positive samples and negative samples. Nine predictors based on the nearest neighbor algorithm were built based on these training datasets. The proposed model integrated the above nine predictors by majority voting and its performance was evaluated on the test dataset. Results: The predicted results indicate that the method is quite effective for identification of DDIs. Finally, we also discussed the ability of the method for identifying novel DDIs by investigating the likelihood of some negative samples in the test dataset that were predicted as DDIs being novel DDIs. Conclusion: The proposed method has a good ability for identification of potential DDIs.
  
  Add to my favourites
  
  Email this

- Implication on the Function of Novel Xn-relE Toxin Structure of Xenorhabdus nematophila Using Homology Modeling
  
  Authors: Lalit K. Gautam, Ragothaman M. Yennamalli and Jitendra S. Rathore
  
  https://doi.org/10.2174/1574893611666160620093520
  More Less
  
  Background: Bacterial chromosomal toxin-antitoxin systems are involved in various cell functions such as stress response, promoting cell cycle arrest and bringing about the onset of programmed cell death. Unlike RelBE TA module of Escherichia coli, genome of Xenorhabdus nematophila has two separate TA modules for RelB and RelE. Here RelE being the toxin bears its own antitoxin and RelB antitoxin bears its separate toxin counterpart. More interestingly these modules are located distantly in genome. Objective: In this study, Xn-relE toxin model structure from X. nematophila is explored for the first time. Toxic effect of Xn-relE has already been shown by endogenous killing in our earlier report. Methods: Since no crystallographic structure for Xn-relE toxin is available till date. The models of X. nematophila Xn-relE toxin and its antitoxin Xn-relEAT were developed using the I-TASSER server and analyzed to define gene ontology. The models were validated by using VERIFY-3D. Results: Homology models for X. nematophila Xn-relE toxin and its antitoxin Xn-relEAT was obtained and interactions were established. The structural and functional annotation of this TA system designate it as Type II TA module. Conclusion: The present study sheds light on the structure and function of Xn-relE toxin of Xn RelE TA module, whose applicability in the area of agricultural sciences is pronounced.
  
  Add to my favourites
  
  Email this

- Improved Algorithm for the Detection of Cancerous Cells Using Discrete Wavelet Transformation of Genomic Sequences
  
  Authors: Inbamalar T. Mariapushpam and Sivakumar Rajagopal
  
  https://doi.org/10.2174/1574893611666160712222525
  More Less
  
  Background: Cancer is the leading cause of mortality in worldwide. Cancer occurs due to anomalous mutations in a cell. Precise cancer diagnosis and specific course of treatment is essential for saving human lives. Objective: The main aim is to use digital signal processing techniques for the detection of cancer cells. Method: A method to classify the normal and the cancerous cells using discrete wavelet transformation has been developed. Here, the Deoxyribo nucleic acid sequences have been converted into numeric sequences using electron ion interaction potential values. Then wavelet transform is obtained. The cross correlation values of the wavelet coefficients of normal and cancerous cells have been calculated. The maximum cross correlation amplitude in transformed domain is calculated in order to detect the abnormality present in the nucleotides of the cells. Results: The test has been conducted on 82 cancerous Deoxyribo nucleic acid sequences and 82 normal Deoxyribo nucleic acid sequences. Standard performance metrics have been evaluated and the values obtained are sensitivity - 98.78%, specificity - 100%, accuracy - 99.39%, Positive precision - 98.78% and negative precision - 100%. Conclusion: Comparing the performance metrics obtained with the methods in literature, it is found that the wavelet transformation method is better. Hence, this approach can be considered as an efficient solution for cancer detection. This method aids in early cancer detection and cancer therapeutics.
  
  Add to my favourites
  
  Email this

- DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool
  
  Authors: Sunil Patel, Rashmi Tripathi, Vandana Kumari and Pritish Varadwaj
  
  https://doi.org/10.2174/1574893611666160815150746
  More Less
  
  Background: Proteins form specific molecular complexes and the specificity of its interaction is highly essential for discovering and analyzing cellular mechanisms. Aim: The development of large-scale high-throughput experiments using in silico approach has resulted in the production of accurate data which has accelerated the uncovering of novel proteinprotein interactions (PPIs). Method: In this work we present an integrative domain-based method, ‘DeepInteract’ for predicting PPIs using Deep Neural Network (DNN). The interacting set of PPIs was extracted from the Database of Interacting Proteins (DIP) and Kansas University Proteomics Service (KUPS). Results: When validating the performance on an independent dataset of 34100 PPIs of Saccharomyces cerevisiae the proposed classifier achieved promising prediction result with accuracy, precision, sensitivity and specificity of 92.67%, 98.31%, 86.85% and 98.51%, respectively. Similar classifiers were implemented on protein complexes for Escherichia coli, Drosophila melanogaster, Homo sapiens and Caenorhabditis elegans, with prediction accuracy achieved of 97.01%, 90.85%, 94.47% and 88.91% respectively. Conclusion: The performance of this proposed method is found to be better than the existing domain-based machine learning PPI prediction approaches. Recommendation: The DeepInteract server interface along with the train/test datasets, source codes and supplementary files are freely available on: http://bioserver.iiita.ac.in/deepinteract.
  
  Add to my favourites
  
  Email this

- Identification of Robust Clustering Methods in Gene Expression Data Analysis
  
  Authors: Md. B. Hossen and Md. Siraj-Ud-Doulah
  
  https://doi.org/10.2174/1574893611666160610103926
  More Less
  
  Background: Cluster analysis techniques of gene expression microarray data is of increasing interest in the field of current bioinformatics. One of the reasons for this is the need for molecular-based refinement of broadly defined biological classes, with implications in cancer diagnosis, prognosis and treatment. And many algorithms have been developed for this problem. Objective: However microarray data frequently include outliers, and how to treat these outlier's effects in the subsequent analysis-clustering. Method: In this paper, we present the large-scale analysis of seven different agglomerative hierarchical clustering methods and five proximity measures for the analysis of 33 cancer gene expression datasets. As a case study, we used two experimental datasets: Affymetrix and cDNA, and different percent outliers were artificially added to these datasets. Results: We found that ward method gives the highest corrected Rand index value with respect to the spearman proximity measures when datasets contain with and without outliers. Conclusion: This study proves that ward method is more robust clustering methods in gene expression data analysis among other methods.
  
  Add to my favourites
  
  Email this

- Acknowledgements to Reviewers
  
  https://doi.org/10.2174/157489361206171226142836
  More Less
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 12, Issue 6, 2017

Volume 12, Issue 6, 2017

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed