Current Bioinformatics - Volume 14, Issue 6, 2019
Volume 14, Issue 6, 2019
-
-
Computational Approaches to Predict the Non-canonical DNAs
Authors: Nazia Parveen, Amen Shamim, Seunghee Cho and Kyeong K. KimBackground: Although most nucleotides in the genome form canonical double-stranded B-DNA, many repeated sequences transiently present as non-canonical conformations (non-B DNA) such as triplexes, quadruplexes, Z-DNA, cruciforms, and slipped/hairpins. Those noncanonical DNAs (ncDNAs) are not only associated with many genetic events such as replication, transcription, and recombination, but are also related to the genetic instability that results in the predisposition to disease. Due to the crucial roles of ncDNAs in cellular and genetic functions, various computational methods have been implemented to predict sequence motifs that generate ncDNA. Objective: Here, we review strategies for the identification of ncDNA motifs across the whole genome, which is necessary for further understanding and investigation of the structure and function of ncDNAs. Conclusion: There is a great demand for computational prediction of non-canonical DNAs that play key functional roles in gene expression and genome biology. In this study, we review the currently available computational methods for predicting the non-canonical DNAs in the genome. Current studies not only provide an insight into the computational methods for predicting the secondary structures of DNA but also increase our understanding of the roles of non-canonical DNA in the genome.
-
-
-
A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering
Authors: Tuncay Bayrak and Hasan OğulBackground: Predicting the value of gene expression in a given condition is a challenging topic in computational systems biology. Only a limited number of studies in this area have provided solutions to predict the expression in a particular pattern, whether or not it can be done effectively. However, the value of expression for the measurement is usually needed for further meta-data analysis. Methods: Because the problem is considered as a regression task where a feature representation of the gene under consideration is fed into a trained model to predict a continuous variable that refers to its exact expression level, we introduced a novel feature representation scheme to support work on such a task based on two-way collaborative filtering. At this point, our main argument is that the expressions of other genes in the current condition are as important as the expression of the current gene in other conditions. For regression analysis, linear regression and a recently popularized method, called Relevance Vector Machine (RVM), are used. Pearson and Spearman correlation coefficients and Root Mean Squared Error are used for evaluation. The effects of regression model type, RVM kernel functions, and parameters have been analysed in our study in a gene expression profiling data comprising a set of prostate cancer samples. Results: According to the findings of this study, in addition to promising results from the experimental studies, integrating data from another disease type, such as colon cancer in our case, can significantly improve the prediction performance of the regression model. Conclusion: The results also showed that the performed new feature representation approach and RVM regression model are promising for many machine learning problems in microarray and high throughput sequencing analysis.
-
-
-
Performance Improvement of Gene Selection Methods using Outlier Modification Rule
Authors: Md. Shahjaman, Nishith Kumar and Md. N. H. MollahBackground: DNA microarray technology allows researchers to measure the expression levels of thousands of genes simultaneously. The main objective of microarray gene expression (GE) data analysis is to detect biomarker genes that are Differentially Expressed (DE) between two or more experimental groups/conditions. Objective: There are some popular statistical methods in the literature for the selection of biomarker genes. However, most of them often produce misleading results in presence of outliers. Therefore, in this study, we introduce a robust approach to overcome the problems of classical methods. Methods: We use median and median absolute deviation (MAD) for our robust procedure. In this procedure, a gene was considered as outlying gene if at least one of the expressions of this gene does not belong to a certain interval of the proposed outlier detection rule. Otherwise, this gene was considered as a non-outlying gene. Results: We investigate the performance of the proposed method in a comparison of the traditional method using both simulated and real gene expression data analysis. From a real colon cancer gene expression data analysis, the proposed method detected an additional fourteen (14) DE genes that were not detected by the traditional methods. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we observed that these additional 14 DE genes are involved in three important metabolic pathways of cancer disease. The proposed method also detected nine (9) additional DE genes from another head-and-neck cancer gene expression data analysis; those involved in top ten metabolic pathways obtain from the KEGG pathway database. Conclusion: The simulation as well as real cancer gene expression datasets results show better performance with our proposed procedure. Therefore, the additional genes detected by the proposed procedure require further wet lab validation.
-
-
-
GMSA: A Data Sharing System for Multiple Sequence Alignment Across Multiple Users
More LessBackground: In recent years, the rapid growth of biological datasets in Bioinformatics has made the computation of Multiple Sequence Alignment (MSA) become extremely slow. Using the GPU to accelerate MSA has shown to be an effective approach. Moreover, there is a trend that many bioinformatic researchers or institutes setup a shared server for remote users to submit MSA jobs via provided web-pages or tools. Objective: Given the fact that different MSA jobs submitted by users often process similar datasets, there can be an opportunity for users to share their computation results between each other, which can avoid the redundant computation and thereby reduce the overall computing time. Furthermore, in the heterogeneous CPU/GPU platform, many existing applications assign their computation on GPU devices only, which leads to a waste of the CPU resources. Co-run computation can increase the utilization of computing resources on both CPUs and GPUs by dispatching workloads onto them simultaneously. Methods: In this paper, we propose an efficient MSA system called GMSA for multi-users on shared heterogeneous CPU/GPU platforms. To accelerate the computation of jobs from multiple users, data sharing is considered in GMSA due to the fact that different MSA jobs often have a percentage of the same data and tasks. Additionally, we also propose a scheduling strategy based on the similarity in datasets or tasks between MSA jobs. Furthermore, co-run computation model is adopted to take full use of both CPUs and GPUs. Results: We use four protein datasets which were redesigned according to different similarity. We compare GMSA with ClustalW and CUDA-ClustalW in multiple users scenarios. Experiments results showed that GMSA can achieve a speedup of up to 32X. Conclusion: GMSA is a system designed for accelerating the computation of MSA jobs with shared input datasets on heterogeneous CPU/GPU platforms. In this system, a strategy was proposed and implemented to find the common datasets among jobs submitted by multiple users, and a scheduling algorithm is presented based on it. To utilize the overall resource of both CPU and GPU, GMSA employs the co-run computation model. Results showed that it can speed up the total computation of jobs efficiently.
-
-
-
Combining Sequence Entropy and Subgraph Topology for Complex Prediction in Protein Protein Interaction (PPI) Network
Authors: Aisha Sikandar, Waqas Anwar and Misba SikandarBackground: Complex prediction from interaction network of proteins has become a challenging task. Most of the computational approaches focus on topological structures of protein complexes and fewer of them consider important biological information contained within amino acid sequences. Objective: To capture the essence of information contained within protein sequences we have computed sequence entropy and length. Proteins interact with each other and form different sub graph topologies. Methods: We integrate biological features with sub graph topological features and model complexes by using a Logistic Model Tree. Results: The experimental results demonstrated that our method out performs other four state-ofart computational methods in terms of the number of detecting known protein complexes correctly. Conclusion: In addition, our framework provides insights into future biological study and might be helpful in predicting other types of sub graph topologies.
-
-
-
RMDB: An Integrated Database of Single-cytosine-resolution DNA Methylation in Oryza Sativa
Authors: Tiansheng Zhu, Jihong Guan, Hui Liu and Shuigeng ZhouBackground: Previous studies have revealed that DNA methylation plays a crucial role in eukaryotic growth and development via involvement in the regulation of gene expression and chromosomal instability. With the advancement of biotechnology, next-generation sequencing (NGS) is emerging as a popular method to explore the functions of DNA methylation, and an increasing number of genome-scale DNA methylation datasets have been published. Several DNA methylation databases, including MethDB, NGSmethDB and MENT have been developed for storing and analyzing the DNA methylation data. However, no public resource dedicated to DNA methylation of Oryza sativa is available to date. Methods & Results: We built a comprehensive database (RMDB) for integration and analysis of DNA methylation data of Oryza sativa. A couple of functional modules were developed to identify the connections between DNA methylation and phenotypes. Moreover, rich graphical visualization tools were employed to facilitate data presentation and interpretation. Conclusion: RMDB is an integrated database dedicated to rice DNA methylation. To the best of our knowledge, this is the first integrated rice DNA methylation database. We believe that RMDB will be helpful to understand the epigenetic mechanisms of Oryza sativa. RMDB is freely available at http://admis.fudan.edu.cn/rmdb.
-
-
-
Proteome Mining for the Identification of Putative Drug Targets For Human Pathogen Clostridium Tetani
Authors: Anum Munir, Shaukat I. Malik and Khalid Akhtar MalikBackground: Clostridium tetani are rod-like, anaerobic types of pathogenic bacteria of the genus Clostridium. It is Gram-positive in nature and appears as a tennis racket or drumsticks on staining with the dye. Tetanus is a neuromuscular disease wherein the Clostridium tetani exotoxin produces muscle fits in the host. Tetanus is the second leading cause of worldwide deaths occurring from the family of immunization-preventable diseases. Methods: In this research, subtractive proteome analysis of C. tetani was performed to identify putative drug targets. The proteins were subjected to blast analysis against Homo sapiens to exclude homologous proteins. The database of Essential Genes was used to determine the essential proteins of the pathogen. These basic proteins were additionally analyzed to anticipate the corresponding metabolic pathways. Results: Cellular localization analysis was carried out to determine the possibility of the protein presence in the outer membrane. The study has recognized 29 essential genes and 20 unique pathways of 2314 proteins as potential drug targets. There are 29 essential proteins, out of which, 3 membrane proteins were also identified as putative drug targets. Conclusion: Virtual screening in contrast to these proteins can be valuable in the identification of novel clinical compounds for the C. tetani infections in Homo sapiens.
-
-
-
A Therapeutic Approach Against Leishmania donovani by Predicting RNAi Molecules Against the Surface Protein, gp63
Authors: Farhana T. Chowdhury, Mohammad U.S. Shohan, Tasmia Islam, Taisha T. Mimu and Parag PalitBackground: Leishmaniasis is a disease caused by the Leishmania sp. and can be classified into two major types: cutaneous and visceral leismaniasis. Visceral leishmaniasis is the deadlier type and is mediated by Leishmania donovani and involves the establishment of persistent infection and causes damage to the liver, spleen and bone marrow. With no vaccine yet available against leishmaniasis and the current therapeutic drugs of leishmaniasis being toxic and expensive; an alternative treatment is necessary. Objective: Surface glycocalyx protein gp63, plays a major role in the virulence and resulting pathogenicity associated with the disease. Henceforth, silencing the gp63 mRNA through the RNA interference system was the aim of this study. Methods: In this study two competent siRNAs and three miRNAs have been designed against gp63 for five different strains of L. donovani by using various computational methods. Target specific siRNAs were designed using siDirect 2.0 and to design possible miRNA, another tool named IDT (IntegratedDNA Technology). Screening for off-target similarity was done by BLAST and the GC contents and the secondary structures of the designed RNAs were determined. RNA-RNA interaction was calculated by RNAcofold and IntraRNA, followed by the determination of heat capacity and the concentration of duplex by DNAmelt web server. Results: The selected RNAi molecules; two siRNA and three miRNA had no off-target in human genome and the ones with lower GC content were selected for efficient RNAi function. The selected ones showed proper thermodynamic characteristics to suppress the expression of the pathogenic gene of gp63.
-
-
-
Adaptive Elman Model of Gene Regulation Network Based on Time Series Data
Authors: Shengxian Cao, Yu Wang and Zhenhao TangBackground: Time series expression data of genes contain relations among different genes, which are difficult to model precisely. Slime-forming bacteria is one of the three major harmful bacteria types in industrial circulating cooling water systems. Objective: This study aimed at constructing gene regulation network(GRN) for slime-forming bacteria to understand the microbial fouling mechanism. Methods: For this purpose, an Adaptive Elman Neural Network (AENN) to reveal the relationships among genes using gene expression time series is proposed. The parameters of Elman neural network were optimized adaptively by a Genetic Algorithm (GA). And a Pearson correlation analysis is applied to discover the relationships among genes. In addition, the gene expression data of slime-forming bacteria by transcriptome gene sequencing was presented. Results: To evaluate our proposed method, we compared several alternative data-driven approaches, including a Neural Fuzzy Recurrent Network (NFRN), a basic Elman Neural Network (ENN), and an ensemble network. The experimental results of simulated and real datasets demonstrate that the proposed approach has a promising performance for modeling Gene Regulation Networks (GRNs). We also applied the proposed method for the GRN construction of slime-forming bacteria and at last a GRN for 6 genes was constructed. Conclusion: The proposed GRN construction method can effectively extract the regulations among genes. This is also the first report to construct the GRN for slime-forming bacteria.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
