Current Bioinformatics - Volume 12, Issue 5, 2017
Volume 12, Issue 5, 2017
-
-
A Heterogeneous Networks Fusion Algorithm Based on Local Topological Information for Neurodegenerative Disease
Authors: Xue Jiang, Han Zhang, Xiongwen Quan and Yanbin YinBackground: Predicting disease-related genes based on gene network, is helpful for revealing the interactions between genes under complex disease phenotypes. There usually exist numerous noisy connections in gene co-expression network, making the simulation results greatly depart from the real situation. Most research focus on developing better similarity measures between genes to construct more accurate gene co-expression network. However, with the emergence of various types of biological networks and the urgent needs of precision medicine, the single source gene co-expression network is no longer able to meet the accuracy requirement for disease-related gene prediction. Objective: We have proposed a heterogeneous networks fusion algorithm based on local topological information (HNFLTI) to reconstruct a disease-specific gene network. We have also designed a novel framework based on the HNFLTI to identify the disease-related genes. Method: Firstly, HNFLTI modifies the weight of each edge that connects any two nodes in the gene coexpression network according to the topological structure similarity between the local sub-networks in different source networks. Secondly, HNFLTI filters out redundancy connections by a filtration step, obtaining the disease-specific gene network. Finally, we conduct label progradation on the diseasespecific gene network to predict the disease-related genes. Results: Experimental results demonstrate that the prediction accuracy of disease-related genes is significantly improved using the disease-specific gene network compared with that of the gene coexpression network. Conclusion: Since the molecular mechanisms of neurodegenerative disease are very complex, it is difficult to identify the disease-related genes using traditional computational methods. We reconstruct a disease-specific gene network using the HNFLTI to improve the prediction accuracy of disease-related genes and to conduct exploratory analysis of the molecular mechanism of the disease. The method might be one of the best choices when user wants to obtain reliable interactions between genes under complex disease phenotype.
-
-
-
Content-Based Search on Time-Series Microarray Databases Using Cluster-Based Fingerprints
Authors: Esma Erguner Ozkoc and Hasan OgulBackground: The rapid growth of gene expression databases has created a need for contentbased searches as an alternative to unstructured database queries using keyword- or metadata-based searches. Content-based searching is the ability to retrieve all experiments with similar gene expression patterns in a database regardless of the biological annotations provided for these experiments. Objective: While this concept is still in its infancy in a general context, in this study we focus on applying it to a specific subset of gene expression datasets, by only querying experiments involving time-series expression profiles. Method: To this end, we propose a novel experiment fingerprinting scheme obtained by clustering expression profiles, for content-based searching of time-series microarray experiments. To determine the retrieval ability of the proposed scheme, we performed a simulated information retrieval task on a large set of microarray experiments gathered from a public repository. The relevance between any two experiments was then defined using their commonalities based on annotated disease associations. Results and Conclusion: The results showed that relevant experiments can be more successfully retrieved using this new method compared with traditional differential expression-based methods.
-
-
-
Using a Machine-Learning Approach to Predict Discontinuous Antibody-Specific B-Cell Epitopes
Authors: Yiqi Lin, Xiaoping Min, Liangliang Li, Hai Yu, Shengxiang Ge, Jun Zhang and Ningshao XiaBackground: Predicting B-cell epitopes is important for understanding disease pathogenesis, identifying potential autoantigens, and designing vaccines and immune-based cancer therapies. The experimental approaches used for detecting B-cell epitopes are often laborious and resource-intensive. Thus, several computational methods have been developed for predicting the epitopes of a given antigen. However, most of these methods are coarse binary classifications of antigen regions within epitopes or non-epitopes and do not specify antibodies. Therefore, we aim to solve this antibodyspecified epitope prediction problem using a developed structure-based computational machine-learning method, Epitopia, to reflect this biological reality accurately. Result: We selected 60 non-redundant antibody-antigen protein complexes to train and applied the leave-one-out cross-validation method to test the accuracy of our proposed methods; we compared the results with the Epitopia, Discotope 2.0,PEASE and the state-of-the-art tool SEPPA 2.0. We considered the role of both complementarity determining region residues and antigen surface residues in antigenantibody interactions and assigned a score for each antigen surface residue. If we considered a prediction to be successful if the average score of “epitope residues” exceeded the average score over all surface residues. Then the success rates of our proposed methods were all higher than Epitopia and the best one was 83.3%, which is 7% higher than the rate obtained with Epitopia (76%). The results show that antibody-specific methods are competitive with, and sometimes even better than, Epitopia, which only considers the antigen residues. Conclusion: Our antibody-specific methods provide sufficient accuracy in locating epitopes because whether an antigen surface residue is an epitope depends on whether the antibody recognizes it. This approach is a new method forfic B cell epitopes is more meaningful and efficient than those which are only considering antigen res identifying B-cell epitopes. Based on our results, we believe that the prediction of antibody-speciidues.
-
-
-
Investigating Key Genes in Type 2 Diabetes Mellitus via Combining mAP-KL and Mutual Information Network
Authors: Guiyan Chen, Weihai Qiu, Shuze Xia and Lijuan WangBackground: The molecular mechanism of the type 2 diabetes mellitus (T2DM) remains unclear. Objective: This research aimed to investigate key genes in T2DM via combining mAP-KL and mutual information network (MIN) and give great insights to reveal pathological mechanism underlying this disease. Methods: First of all, the data of gene expression profile of T2DM were recruited and preprocessed; then mAP-KL was implemented to investigate clusters and exemplars in T2DM; in the following, support vector machines (SVM) model was selected to evaluate the classification performance of mAPKL; finally, MIN construction and topological analysis were performed to investigate key genes. Results: A total of 20,541 gene symbols were obtained from expression profile of T2DM. By applying mAP-KL, 12 clusters were identified. From Cluster 1 to Cluster 12, their exemplars were OGT, TTC22, LIMCH1, NENF, ROMO1, RGL2, TCF7L1, KRTAP4-4, POLR2F, KIF22, NDUFB11, and AGL, respectively. The results of evaluation by SVM model indicated that the mAP-KL methodology was feasible and suitable for identifying exemplars of T2DM. Finally, MIN construction and topological analysis indicated that there were four hub genes (degree centrality ≥ 100): TCF7L1 (degree = 104), LIMCH1 (degree = 102), NENF (degree = 101), TTC22 (degree = 101), which might be potentially novel predictive and prognostic markers for T2DM. Conclusion: We predict these hub genes (such as TCF7L1 and LIMCH1) might play key roles during the occurrence and development of T2DM and are potentially novel predictive and prognostic markers for T2DM.
-
-
-
Integration of DNA Methylation Data and Gene Expression Data for Prostate Adenocarcinoma: A Proof of Concept
Authors: Arpit Singh, Razia Rahman and Yasha HasijaBackground: Epigenetics is gaining rapid recognition as it accounts for heritable changes that do not involve changes in the coding sequence, but influences change in gene expressions. DNA methylation is the most extensively studied epigenetic mechanism and has been observed to play a significant role in gene regulation and silencing process. Objective: In our present work, we focused on understanding the relationship between DNA methylation and gene expression. As a proof of concept, Prostate Adenocarcinoma (PRAD), the second leading cause of death in men, was extensively studied to unravel the epigenetic abnormalities associated with disease pathogenesis which may contribute to better diagnosis and prevention of prostate cancer. Method: DNA methylation data (level 1) and Gene expression data (level 3) was taken from The Cancer Genome Atlas (TCGA). A total of 36 samples comprising of 18 normal samples and 18 tumor samples were collected from a batch of 184 and matched with tumor samples and normal samples, respectively. The differentially methylated regions were identified and statistical analysis was carried out for the gene expression data amongst the normal and tumor samples. Further, functional enrichment analysis and pathway analysis were carried out for the filtered genes. Results: Our analysis indicated 453 differentially methylated regions with p-value 0.05, FDR (false discovery rate) value 0.05 and beta value (methylation) > 0.2. The integration of gene expression data with methylation data resulted in 180 significant correlations from which 112 genes were filtered under stringent conditions. Out of these 112 genes, 74 genes were filtered through visual inspection of results and their functional enrichment analysis resulted in total 27 clusters with a maximum enrichment score of ~1.86. Conclusion: The genes "GSTP1" and "FGFR2" were present in our prioritized filtered significant correlations, and it was discovered that these genes were known to play a primary role in prostate cancer pathway and progression. Therefore, this approach may help to prioritize other novel genes and suggest their involvement in the prostate cancer pathway
-
-
-
An Efficient Prediction of HPV Genotypes from Partial Coding Sequences by Chaos Game Representation and Fuzzy k-Nearest Neighbor Technique
Authors: Watcharaporn Tanchotsrinon, Chidchanok Lursinsap and Yong PoovorawanBackground: Human Papillomavirus is considered as a necessary cause of cervical cancer, which is the second most common cancer in women around the world. At present, an individual genotyping of Human Papillomavirus can provide essential information for an improvement of diagnosis and medical treatment to infected patients. Objective: For this purpose, our paper focuses on predicting the significant Human Papillomavirus genotypes mainly associated with cervical cancers. Method: In this experiment, partial coding sequences of genotypes were transformed into coordinates in chaos game representations, and they were subsequently partitioned into 8×8 equal sub-regions. Probabilities of distribution in sub-regions were extracted in forms of tri-nucleotide frequencies. Then, two-fold cross validation technique was employed for separating training and testing sets. For each fold, a feature selection by RReliefF algorithm was conducted for selecting significant features, followed by predicting the corresponding genotypes by fuzzy k-nearest neighbor technique. Results: The experimental results showed that our proposed method can achieve higher performance than two related methods, while RReliefF algorithm can successfully reduce all of 64 extracted features into 29 significant features. Additionally, it also found that our experimental results are significantly different from those of the method of Nair et al., in almost all genotypes. Conclusion: Therefore, the algorithm based on chaos game representation and fuzzy k-nearest neighbor technique can efficiently predict Human Papillomavirus genotypes.
-
-
-
Adaptive Genetic Algorithm with Exploration-Exploitation Tradeoff for Preprocessing Microarray Datasets
Authors: Sivaraj Rajappan and DeviPriya RangasamyBackground: Microarray gene expression datasets contain huge volume of gene data to be used for cancer analysis but often suffer from “curse of dimensionality” and “missing values”. They prevent analysts from extracting right knowledge and often results in instable results. Objective: To address both these issues, the paper proposes a novel algorithm based on Genetic Algorithm (GA). Method: GA is commonly used for feature selection and treating missing values in microarray datasets. But, it often results in premature convergence due to insufficient exploration and exploitation. In the proposed Adaptive Genetic Algorithm (AGA), genetic parameters are dynamically determined based on the values in current generation in order to improve optimality of the solution. The population is divided into two sub-populations and crossover and mutation are performed in parallel on these sub-populations in order to speed up the execution and also to have modularity in the population for performing these operations. In this paper, the missing values are first imputed using AGA and again AGA is used to select significant features. Results: The proposed methodology is implemented in different real microarray datasets to impute values at different missing proportions and to select prominent features. It is found that the datasets processed with AGA provides better results than the standard methods. Conclusion: AGA can be implemented successfully in all datasets where the number of features is large and missing values are present. AGA preprocesses the datasets and prepares them for better classification.
-
-
-
Codon Usage of Expansin Genes in Populus trichocarpa
Authors: Jian Li, Haoyang Li, Junkai Zhi, Chuzhao Shen, Xuesong Yang and Jichen XuBackground: Expansin has wall-loosening function that is present in all plant species. Genome sequencing showed that the poplar (Populus trichocarpa) genome contains 36 expansin genes belonging to four subfamilies. Objective: The present report here supposes to dissect the codon usage pattern of poplar expansin genes so as to help us understanding their expression model in poplar and heterologous plants, and gave the suggestion of their appropriate use in future molecular breeding. Method: CodonW was used for poplar expansin gene analysis of base and amino acid composition, the second and third base arrangement of codons, and codon usage frequency of each poplar expansin gene. Further statistical analysis on optimal codons and high-frequency codons were conducted for each subfamily and each domain of expansin. Results: The poplar expansin genes have a low GC and GC3 content, high effective number of codons value (> 50), and few high-frequency codons (6 only). Statistical analysis revealed that each expansin subfamily and each domain has own characteristics in terms of amino acid composition, high-frequency codon distribution, and codon base combination. Especially, the subfamily A, B, LA, and LB contained 6, 14, 13, and 11 high-frequency codons, while the signal peptide, catalytic domain, and binding domain contained 16, 8, and 10 high-frequency codons respectively. Conclusion: The poplar expansin genes have low codon usage bias. Each subfamily and each domain of poplar expansin also displayed the codon feature characteristically.
-
-
-
Alterations in Structural and Biological Activities of Merozoite Surface Protein 2 Due to O-GlcNAc Modification: In Silico Approach
Authors: Jawaria Munir, Zeeshan Iqbal, Wajahat M. Qazi, Daniel C. Hoessli, Zahid Mahmood and Nasir UddinBackground: The complex life cycle of malarial parasite limits the efficacy of vaccine based on a single protein. The optimal strategy to design a viable vaccine should consider a selection of antigens from different stages of the parasite’s life cycle. In addition to multi-stage complexity, the synthetic or recombinant vaccine could not elicit a suitable immune response because it lacks important elements of the native architecture such as post-translational modifications. Objective: The Plasmodium falciparum Merozoite Surface Protein 2 (PfMSP2) is one of the suitable vaccine candidates, due to its presence on merozoite surface after the invasion of erythrocytes. The humoral response against O-linked glycosylated PfMSP2 is expected to interfere with erythrocyte invasion and propagation of merozoites. Prevention of parasite invasion to human erythrocytes can efficiently reduce the spread and development of this infectious agent. Method: The PfMSP2 bears potential glycosylation sites and the human erythrocytic O-linked Nacetylglucosamine transferase (OGT) could glycosylate PfMSP2 through combinatorial metabolism. This hypothesis was tested by generating binding models of PfMSP2 and human OGT complexed with UDP-GlcNAc via protein-protein docking. The docking experiment was followed by the binding of OGlcNAc at the hydroxyl group of potential modification sites in PfMSP2. Results: The binding complex of PfMSP2 with human OGT shows the interaction between key residues and affirms the transfer of O-GlcNAc to the N-terminal domain of PfMSP2. The potential glycosylation site in PfMSP2 is Ser19 that is present in conserved N-terminal domain. The glycosylated Ser19 exhibits different orientation and induced structural alteration in the immunogenic region of PfMSP2. Conclusion: Our findings suggest that PfMSP2 shows potential for glycosylation at Ser19 and this modified PfMSP2 may constitute more appropriate antigen for the developing a protective immune response against malaria.
-
-
-
The Complexity of Promoter Regions Based on a Vector Topological Entropy
Authors: Shuilin Jin, Zhuo Wang, Junyu Lin, Jia Wang, Xiurui Zhang, Renjie Tan, Chuanbin Zhang, Zhe Wang, Wanqian Guo, Yang Hu, Li Xu, Lejun Zhang, Guiyou Liu and Qinghua JiangBackground: Entropy can be used to detect the complexity of Sequences. various concepts of entropy appeared, such as metric entropy, Kolmogorov-Sinai entropy, Renyi entropy and topological entropy. Topological entropy is a difficult definition used to decipher the structure of DNA sequences, due to finite dimensional problems. Method: Different from the generalized topological entropy, a vector topological entropy is presented, which is based on the idea of multi-scale analysis of DNA sequences. Subsequently the complexity of promoter regions between Chromosome X and Y is detected by the use of a quantity topological entropy. Results: It is shown that the quantity topological entropy of promoters is less than the coding regions in all the Chromosomes. The mean of topological entropy of promoters is 3 standard deviations higher than the mean of coding regions in Chromosomes. The results show that the quantity topological entropy of coding regions is significantly higher than that of promoters. Conclusion: The topological entropy is a useful tool for detecting the structure of DNA sequences, and the result of the comparisons shows the promoter regions as being more regular, which implies that the promoters are more functionally important.
-
-
-
Finger Base - An Algorithm to Predict the Incidence of Zinc Finger Motif from Uncharacterized Proteins
Authors: Mohan Ajitha and Subramanian ArumugamBackground: One of the basic problems in the insilico approach is to identify zinc finger motif from uncharacterized proteins. Existing algorithms such as Zif Base, Zifibi and ZiFiT can identify the presence of zinc finger motifs only in characterized proteins. Objective: This paper focuses on developing a solution to overcome the existing limitation and to identify zinc finger motif from uncharacterized proteins. Method: This tool consists of two algorithms PATTERN and FINGER. The PATTERN algorithm generates templates for all the characterized proteins that are available in various databases. Then the FINGER algorithm compares the query sequence of an uncharacterized protein with that of templates identified through PATTERN algorithm. Results: If there is a presence of template in that query sequence, the tool infers that the query sequence has a transcriptional role. Moreover, the veracity of the algorithm is validated by comparing the result with the result of characterized data derived from the experimental methods. Conclusion: The precision and recall of the algorithm were predicted as 86% and 89% respectively. Furthermore, this algorithm determines with higher accuracy compared to any other prevailing computational approaches.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
