Volume 12, Issue 5

Current Bioinformatics - Volume 12, Issue 5, 2017

Volume 12, Issue 5, 2017

- Meet Our Editorial Board Member
  
  By Walter Filgueira de Azevedo
  
  https://doi.org/10.2174/157489361205170926154721
  More Less
  
  Add to my favourites
  
  Email this

- A Heterogeneous Networks Fusion Algorithm Based on Local Topological Information for Neurodegenerative Disease
  
  Authors: Xue Jiang, Han Zhang, Xiongwen Quan and Yanbin Yin
  
  https://doi.org/10.2174/1574893612666170613105120
  More Less
  
  Background: Predicting disease-related genes based on gene network, is helpful for revealing the interactions between genes under complex disease phenotypes. There usually exist numerous noisy connections in gene co-expression network, making the simulation results greatly depart from the real situation. Most research focus on developing better similarity measures between genes to construct more accurate gene co-expression network. However, with the emergence of various types of biological networks and the urgent needs of precision medicine, the single source gene co-expression network is no longer able to meet the accuracy requirement for disease-related gene prediction. Objective: We have proposed a heterogeneous networks fusion algorithm based on local topological information (HNFLTI) to reconstruct a disease-specific gene network. We have also designed a novel framework based on the HNFLTI to identify the disease-related genes. Method: Firstly, HNFLTI modifies the weight of each edge that connects any two nodes in the gene coexpression network according to the topological structure similarity between the local sub-networks in different source networks. Secondly, HNFLTI filters out redundancy connections by a filtration step, obtaining the disease-specific gene network. Finally, we conduct label progradation on the diseasespecific gene network to predict the disease-related genes. Results: Experimental results demonstrate that the prediction accuracy of disease-related genes is significantly improved using the disease-specific gene network compared with that of the gene coexpression network. Conclusion: Since the molecular mechanisms of neurodegenerative disease are very complex, it is difficult to identify the disease-related genes using traditional computational methods. We reconstruct a disease-specific gene network using the HNFLTI to improve the prediction accuracy of disease-related genes and to conduct exploratory analysis of the molecular mechanism of the disease. The method might be one of the best choices when user wants to obtain reliable interactions between genes under complex disease phenotype.
  
  Add to my favourites
  
  Email this

- Content-Based Search on Time-Series Microarray Databases Using Cluster-Based Fingerprints
  
  Authors: Esma Erguner Ozkoc and Hasan Ogul
  
  https://doi.org/10.2174/1574893611666160209222658
  More Less
  
  Background: The rapid growth of gene expression databases has created a need for contentbased searches as an alternative to unstructured database queries using keyword- or metadata-based searches. Content-based searching is the ability to retrieve all experiments with similar gene expression patterns in a database regardless of the biological annotations provided for these experiments. Objective: While this concept is still in its infancy in a general context, in this study we focus on applying it to a specific subset of gene expression datasets, by only querying experiments involving time-series expression profiles. Method: To this end, we propose a novel experiment fingerprinting scheme obtained by clustering expression profiles, for content-based searching of time-series microarray experiments. To determine the retrieval ability of the proposed scheme, we performed a simulated information retrieval task on a large set of microarray experiments gathered from a public repository. The relevance between any two experiments was then defined using their commonalities based on annotated disease associations. Results and Conclusion: The results showed that relevant experiments can be more successfully retrieved using this new method compared with traditional differential expression-based methods.
  
  Add to my favourites
  
  Email this

- Using a Machine-Learning Approach to Predict Discontinuous Antibody-Specific B-Cell Epitopes
  
  Authors: Yiqi Lin, Xiaoping Min, Liangliang Li, Hai Yu, Shengxiang Ge, Jun Zhang and Ningshao Xia
  
  https://doi.org/10.2174/1574893611666160815102521
  More Less
  
  Background: Predicting B-cell epitopes is important for understanding disease pathogenesis, identifying potential autoantigens, and designing vaccines and immune-based cancer therapies. The experimental approaches used for detecting B-cell epitopes are often laborious and resource-intensive. Thus, several computational methods have been developed for predicting the epitopes of a given antigen. However, most of these methods are coarse binary classifications of antigen regions within epitopes or non-epitopes and do not specify antibodies. Therefore, we aim to solve this antibodyspecified epitope prediction problem using a developed structure-based computational machine-learning method, Epitopia, to reflect this biological reality accurately. Result: We selected 60 non-redundant antibody-antigen protein complexes to train and applied the leave-one-out cross-validation method to test the accuracy of our proposed methods; we compared the results with the Epitopia, Discotope 2.0,PEASE and the state-of-the-art tool SEPPA 2.0. We considered the role of both complementarity determining region residues and antigen surface residues in antigenantibody interactions and assigned a score for each antigen surface residue. If we considered a prediction to be successful if the average score of “epitope residues” exceeded the average score over all surface residues. Then the success rates of our proposed methods were all higher than Epitopia and the best one was 83.3%, which is 7% higher than the rate obtained with Epitopia (76%). The results show that antibody-specific methods are competitive with, and sometimes even better than, Epitopia, which only considers the antigen residues. Conclusion: Our antibody-specific methods provide sufficient accuracy in locating epitopes because whether an antigen surface residue is an epitope depends on whether the antibody recognizes it. This approach is a new method forfic B cell epitopes is more meaningful and efficient than those which are only considering antigen res identifying B-cell epitopes. Based on our results, we believe that the prediction of antibody-speciidues.
  
  Add to my favourites
  
  Email this

- Investigating Key Genes in Type 2 Diabetes Mellitus via Combining mAP-KL and Mutual Information Network
  
  Authors: Guiyan Chen, Weihai Qiu, Shuze Xia and Lijuan Wang
  
  https://doi.org/10.2174/1574893611666160916171028
  More Less
  
  Background: The molecular mechanism of the type 2 diabetes mellitus (T2DM) remains unclear. Objective: This research aimed to investigate key genes in T2DM via combining mAP-KL and mutual information network (MIN) and give great insights to reveal pathological mechanism underlying this disease. Methods: First of all, the data of gene expression profile of T2DM were recruited and preprocessed; then mAP-KL was implemented to investigate clusters and exemplars in T2DM; in the following, support vector machines (SVM) model was selected to evaluate the classification performance of mAPKL; finally, MIN construction and topological analysis were performed to investigate key genes. Results: A total of 20,541 gene symbols were obtained from expression profile of T2DM. By applying mAP-KL, 12 clusters were identified. From Cluster 1 to Cluster 12, their exemplars were OGT, TTC22, LIMCH1, NENF, ROMO1, RGL2, TCF7L1, KRTAP4-4, POLR2F, KIF22, NDUFB11, and AGL, respectively. The results of evaluation by SVM model indicated that the mAP-KL methodology was feasible and suitable for identifying exemplars of T2DM. Finally, MIN construction and topological analysis indicated that there were four hub genes (degree centrality ≥ 100): TCF7L1 (degree = 104), LIMCH1 (degree = 102), NENF (degree = 101), TTC22 (degree = 101), which might be potentially novel predictive and prognostic markers for T2DM. Conclusion: We predict these hub genes (such as TCF7L1 and LIMCH1) might play key roles during the occurrence and development of T2DM and are potentially novel predictive and prognostic markers for T2DM.
  
  Add to my favourites
  
  Email this

- Integration of DNA Methylation Data and Gene Expression Data for Prostate Adenocarcinoma: A Proof of Concept
  
  Authors: Arpit Singh, Razia Rahman and Yasha Hasija
  
  https://doi.org/10.2174/1574893612666170328171106
  More Less
  
  Background: Epigenetics is gaining rapid recognition as it accounts for heritable changes that do not involve changes in the coding sequence, but influences change in gene expressions. DNA methylation is the most extensively studied epigenetic mechanism and has been observed to play a significant role in gene regulation and silencing process. Objective: In our present work, we focused on understanding the relationship between DNA methylation and gene expression. As a proof of concept, Prostate Adenocarcinoma (PRAD), the second leading cause of death in men, was extensively studied to unravel the epigenetic abnormalities associated with disease pathogenesis which may contribute to better diagnosis and prevention of prostate cancer. Method: DNA methylation data (level 1) and Gene expression data (level 3) was taken from The Cancer Genome Atlas (TCGA). A total of 36 samples comprising of 18 normal samples and 18 tumor samples were collected from a batch of 184 and matched with tumor samples and normal samples, respectively. The differentially methylated regions were identified and statistical analysis was carried out for the gene expression data amongst the normal and tumor samples. Further, functional enrichment analysis and pathway analysis were carried out for the filtered genes. Results: Our analysis indicated 453 differentially methylated regions with p-value 0.05, FDR (false discovery rate) value 0.05 and beta value (methylation) > 0.2. The integration of gene expression data with methylation data resulted in 180 significant correlations from which 112 genes were filtered under stringent conditions. Out of these 112 genes, 74 genes were filtered through visual inspection of results and their functional enrichment analysis resulted in total 27 clusters with a maximum enrichment score of ~1.86. Conclusion: The genes "GSTP1" and "FGFR2" were present in our prioritized filtered significant correlations, and it was discovered that these genes were known to play a primary role in prostate cancer pathway and progression. Therefore, this approach may help to prioritize other novel genes and suggest their involvement in the prostate cancer pathway
  
  Add to my favourites
  
  Email this

- An Efficient Prediction of HPV Genotypes from Partial Coding Sequences by Chaos Game Representation and Fuzzy k-Nearest Neighbor Technique
  
  Authors: Watcharaporn Tanchotsrinon, Chidchanok Lursinsap and Yong Poovorawan
  
  https://doi.org/10.2174/1574893611666161110112006
  More Less
  
  Background: Human Papillomavirus is considered as a necessary cause of cervical cancer, which is the second most common cancer in women around the world. At present, an individual genotyping of Human Papillomavirus can provide essential information for an improvement of diagnosis and medical treatment to infected patients. Objective: For this purpose, our paper focuses on predicting the significant Human Papillomavirus genotypes mainly associated with cervical cancers. Method: In this experiment, partial coding sequences of genotypes were transformed into coordinates in chaos game representations, and they were subsequently partitioned into 8×8 equal sub-regions. Probabilities of distribution in sub-regions were extracted in forms of tri-nucleotide frequencies. Then, two-fold cross validation technique was employed for separating training and testing sets. For each fold, a feature selection by RReliefF algorithm was conducted for selecting significant features, followed by predicting the corresponding genotypes by fuzzy k-nearest neighbor technique. Results: The experimental results showed that our proposed method can achieve higher performance than two related methods, while RReliefF algorithm can successfully reduce all of 64 extracted features into 29 significant features. Additionally, it also found that our experimental results are significantly different from those of the method of Nair et al., in almost all genotypes. Conclusion: Therefore, the algorithm based on chaos game representation and fuzzy k-nearest neighbor technique can efficiently predict Human Papillomavirus genotypes.
  
  Add to my favourites
  
  Email this

- Adaptive Genetic Algorithm with Exploration-Exploitation Tradeoff for Preprocessing Microarray Datasets
  
  Authors: Sivaraj Rajappan and DeviPriya Rangasamy
  
  https://doi.org/10.2174/1574893611666161118142801
  More Less
  
  Background: Microarray gene expression datasets contain huge volume of gene data to be used for cancer analysis but often suffer from “curse of dimensionality” and “missing values”. They prevent analysts from extracting right knowledge and often results in instable results. Objective: To address both these issues, the paper proposes a novel algorithm based on Genetic Algorithm (GA). Method: GA is commonly used for feature selection and treating missing values in microarray datasets. But, it often results in premature convergence due to insufficient exploration and exploitation. In the proposed Adaptive Genetic Algorithm (AGA), genetic parameters are dynamically determined based on the values in current generation in order to improve optimality of the solution. The population is divided into two sub-populations and crossover and mutation are performed in parallel on these sub-populations in order to speed up the execution and also to have modularity in the population for performing these operations. In this paper, the missing values are first imputed using AGA and again AGA is used to select significant features. Results: The proposed methodology is implemented in different real microarray datasets to impute values at different missing proportions and to select prominent features. It is found that the datasets processed with AGA provides better results than the standard methods. Conclusion: AGA can be implemented successfully in all datasets where the number of features is large and missing values are present. AGA preprocesses the datasets and prepares them for better classification.
  
  Add to my favourites
  
  Email this

- Codon Usage of Expansin Genes in Populus trichocarpa
  
  Authors: Jian Li, Haoyang Li, Junkai Zhi, Chuzhao Shen, Xuesong Yang and Jichen Xu
  
  https://doi.org/10.2174/1574893611666161008195145
  More Less
  
  Background: Expansin has wall-loosening function that is present in all plant species. Genome sequencing showed that the poplar (Populus trichocarpa) genome contains 36 expansin genes belonging to four subfamilies. Objective: The present report here supposes to dissect the codon usage pattern of poplar expansin genes so as to help us understanding their expression model in poplar and heterologous plants, and gave the suggestion of their appropriate use in future molecular breeding. Method: CodonW was used for poplar expansin gene analysis of base and amino acid composition, the second and third base arrangement of codons, and codon usage frequency of each poplar expansin gene. Further statistical analysis on optimal codons and high-frequency codons were conducted for each subfamily and each domain of expansin. Results: The poplar expansin genes have a low GC and GC3 content, high effective number of codons value (> 50), and few high-frequency codons (6 only). Statistical analysis revealed that each expansin subfamily and each domain has own characteristics in terms of amino acid composition, high-frequency codon distribution, and codon base combination. Especially, the subfamily A, B, LA, and LB contained 6, 14, 13, and 11 high-frequency codons, while the signal peptide, catalytic domain, and binding domain contained 16, 8, and 10 high-frequency codons respectively. Conclusion: The poplar expansin genes have low codon usage bias. Each subfamily and each domain of poplar expansin also displayed the codon feature characteristically.
  
  Add to my favourites
  
  Email this

- Alterations in Structural and Biological Activities of Merozoite Surface Protein 2 Due to O-GlcNAc Modification: In Silico Approach
  
  Authors: Jawaria Munir, Zeeshan Iqbal, Wajahat M. Qazi, Daniel C. Hoessli, Zahid Mahmood and Nasir Uddin
  
  https://doi.org/10.2174/1574893612666170206112054
  More Less
  
  Background: The complex life cycle of malarial parasite limits the efficacy of vaccine based on a single protein. The optimal strategy to design a viable vaccine should consider a selection of antigens from different stages of the parasite’s life cycle. In addition to multi-stage complexity, the synthetic or recombinant vaccine could not elicit a suitable immune response because it lacks important elements of the native architecture such as post-translational modifications. Objective: The Plasmodium falciparum Merozoite Surface Protein 2 (PfMSP2) is one of the suitable vaccine candidates, due to its presence on merozoite surface after the invasion of erythrocytes. The humoral response against O-linked glycosylated PfMSP2 is expected to interfere with erythrocyte invasion and propagation of merozoites. Prevention of parasite invasion to human erythrocytes can efficiently reduce the spread and development of this infectious agent. Method: The PfMSP2 bears potential glycosylation sites and the human erythrocytic O-linked Nacetylglucosamine transferase (OGT) could glycosylate PfMSP2 through combinatorial metabolism. This hypothesis was tested by generating binding models of PfMSP2 and human OGT complexed with UDP-GlcNAc via protein-protein docking. The docking experiment was followed by the binding of OGlcNAc at the hydroxyl group of potential modification sites in PfMSP2. Results: The binding complex of PfMSP2 with human OGT shows the interaction between key residues and affirms the transfer of O-GlcNAc to the N-terminal domain of PfMSP2. The potential glycosylation site in PfMSP2 is Ser19 that is present in conserved N-terminal domain. The glycosylated Ser19 exhibits different orientation and induced structural alteration in the immunogenic region of PfMSP2. Conclusion: Our findings suggest that PfMSP2 shows potential for glycosylation at Ser19 and this modified PfMSP2 may constitute more appropriate antigen for the developing a protective immune response against malaria.
  
  Add to my favourites
  
  Email this

- The Complexity of Promoter Regions Based on a Vector Topological Entropy
  
  Authors: Shuilin Jin, Zhuo Wang, Junyu Lin, Jia Wang, Xiurui Zhang, Renjie Tan, Chuanbin Zhang, Zhe Wang, Wanqian Guo, Yang Hu, Li Xu, Lejun Zhang, Guiyou Liu and Qinghua Jiang
  
  https://doi.org/10.2174/1574893611666160527101340
  More Less
  
  Background: Entropy can be used to detect the complexity of Sequences. various concepts of entropy appeared, such as metric entropy, Kolmogorov-Sinai entropy, Renyi entropy and topological entropy. Topological entropy is a difficult definition used to decipher the structure of DNA sequences, due to finite dimensional problems. Method: Different from the generalized topological entropy, a vector topological entropy is presented, which is based on the idea of multi-scale analysis of DNA sequences. Subsequently the complexity of promoter regions between Chromosome X and Y is detected by the use of a quantity topological entropy. Results: It is shown that the quantity topological entropy of promoters is less than the coding regions in all the Chromosomes. The mean of topological entropy of promoters is 3 standard deviations higher than the mean of coding regions in Chromosomes. The results show that the quantity topological entropy of coding regions is significantly higher than that of promoters. Conclusion: The topological entropy is a useful tool for detecting the structure of DNA sequences, and the result of the comparisons shows the promoter regions as being more regular, which implies that the promoters are more functionally important.
  
  Add to my favourites
  
  Email this

- Finger Base - An Algorithm to Predict the Incidence of Zinc Finger Motif from Uncharacterized Proteins
  
  Authors: Mohan Ajitha and Subramanian Arumugam
  
  https://doi.org/10.2174/1574893611666160728114256
  More Less
  
  Background: One of the basic problems in the insilico approach is to identify zinc finger motif from uncharacterized proteins. Existing algorithms such as Zif Base, Zifibi and ZiFiT can identify the presence of zinc finger motifs only in characterized proteins. Objective: This paper focuses on developing a solution to overcome the existing limitation and to identify zinc finger motif from uncharacterized proteins. Method: This tool consists of two algorithms PATTERN and FINGER. The PATTERN algorithm generates templates for all the characterized proteins that are available in various databases. Then the FINGER algorithm compares the query sequence of an uncharacterized protein with that of templates identified through PATTERN algorithm. Results: If there is a presence of template in that query sequence, the tool infers that the query sequence has a transcriptional role. Moreover, the veracity of the algorithm is validated by comparing the result with the result of characterized data derived from the experimental methods. Conclusion: The precision and recall of the algorithm were predicted as 86% and 89% respectively. Furthermore, this algorithm determines with higher accuracy compared to any other prevailing computational approaches.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 12, Issue 5, 2017

Volume 12, Issue 5, 2017

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed