Current Bioinformatics - Volume 12, Issue 4, 2017
Volume 12, Issue 4, 2017
-
-
Metaheuristic Optimization for Parameter Estimation in Kinetic Models of Biological Systems - Recent Development and Future Direction
Background: Kinetic models with predictive ability are important to be used in industrial biotechnology. However, the most challenging task in kinetic modeling is parameter estimation, which can be addressed using metaheuristic optimization methods. The methods are utilized to minimize scalar distance between model output and experimental data. Due to highly nonlinear nature of biological systems and large number of kinetic parameters, parameter estimation becomes difficult and time consuming. Methods: This paper provides a review on recent development of parameter estimation methods, which has received increasing attention in the field of systems biology. The development of metaheuristic optimization methods is mostly focused in this review along with the development of large-scale kinetic models. Results: Although a plethora of methods have been applied to the problem of parameter estimation, recent results show that most of the successful approaches are those based on hybrid methods and parallel strategies. In addition, the current software used for parameter estimation and the sources of biological data for kinetic modeling are also described in this review. This review also presents future direction in parameter estimation to meet current industrial demands, especially in systems biology applications. Conclusion: The development of numerous optimization methods for parameter estimation in kinetic models has brought much advancement in the application of systems biology. Currently, it seems that there are highly demanded for further development of efficient optimization methods to address the expansion of systems biology applications.
-
-
-
Correlations Between Experimentally-Determined Melting Temperatures and GC-Content for Short DNA Strands
Authors: Dan Tulpan, Roberto Montemanni and Derek H. SmithBackground: The hybridization stability of single and double stranded DNA sequences has been studied extensively and its impact on bio-computing, bio-sensing and bio-quantification technologies such as microarrays, Real-time PCR and DNA sequencing is significant. In many bioinformatics applications DNA duplex hybridization is traditionally estimated using GC-content and melting temperature calculations based on the sequence base composition. Objective: In this study we explore the equivalence of the two approaches when estimating DNA sequence hybridization and we show that GC-content is a far from perfect predictor of DNA strand hybridization strength compared to experimentally-determined melting temperatures. Method: To test the assumption that DNA GC-content is a good indicator of its melting temperature, we formulate a research hypothesis and we apply the Pearson product-moment correlation statistical model to measure the strength of a linear association between the GC-content and melting temperatures. Results: We built a manually curated set of 373 experimental data points collected from 21 publications, each point representing a DNA strand with length between 4 and 35 nucleotides and its corresponding experimentally determined melting temperature measured under specific sequence and salt concentrations. For each data point we calculated the corresponding GC-content and we separated the set into 12 subsets to minimize the variability of experimental conditions. Conclusion: Based on calculated Pearson product-moment correlation coefficients we conclude that GC-content only seldom correlates well with experimentally determined melting temperatures and thus it is not a strictly necessary constraint when used to control the uniformity of DNA strands.
-
-
-
HIV-1 Nucleotide Sequence Comprehensive Analysis: A Computational Approach
Background: Acquired Immunodeficiency Syndrome (AIDS) is a large-scale pandemic caused by the infection of Human Immunodeficiency Virus (HIV). This virus infects over 40 million people worldwide. In the search for pandemic control, many drug resistance tests have been performed, resulting in the generation of large genomic data amount. These data are stored in biological databases, increasing on a daily basis. However, the majority of genomic data lacks important information, regarding virus subtype distribution, in the primary databases, e.g. GenBank. Objective: A novel software tool to obtain, index and analyze highly mutational virus data, such as all HIV-1 sequence data from GenBank. Method: The software aligns all sequences containing a complete genome (HXB2) for mapping purposes. In addition, all sequences with subtype references are locally aligned to classify all data into genotypic niches. Results: Our results detail the prevalence of every subtype from a global HIV-1 sequence perspective, highlighting increases in the number of sequences related to recombinant subtypes. We were also able to identify country-based distribution of sequences according to geographical data distribution. All data were analyzed on a reasonable timescale, particularly in comparison to classic methods. Conclusion: Our software represents an important contribution to HIV molecular epidemiology and offers a technique to rapidly classify new sequences, in addition to providing insight about sequence coverage density, subtype and country distribution. This data, together with cross-referencing, will aid in the generation of a novel, comprehensive and updated HIV-1 database.
-
-
-
Circadian Clock Gene of Grass Carp (Ctenopharyngodon idellus): Genomic Structure and Tissue Expression Pattern of Period1 Gene
Authors: Yuhui He, Xu-fang Liang, Shan He, Xiaochen Yuan, Qingchao Wang, Wenjing Cai and Longfang SunBackground: Complex organisms require a sophisticated communication network to maintain circadian rhythmicity. Period (per) gene is an important circadian clock gene in vertebrates, playing important roles in several physiological processes, including locomotor activity, cell growth, reproduction, feeding behavior and hormonal secretion. However, little is known about the genomic structure and function of per gene in fish. Objective: The present study characterized the genomic structure and tissue expression of per1 gene in grass carp (Ctenopharyngodon idellus) for the first time. Method: Genomic structure of per1 was determined according to NCBI and ensemble database. Expression level of per1 mRNA was evaluated by real time PCR in different tissues of grass carp. Results and Conclusion: The obtained cDNA of per1a was 5003 bp consisted of 20 exons and 19 introns, and per1b was 5083 bp consisted of 19 exons and 18 introns in grass carp. Multiple sequence alignment and phylogenetic analyses indicated the orthology of mammalian, amphibian, reptile and fish per1. However, different gene arrangements of up- and down-stream of per genes were found among grass carp, zebrafish, medaka, stickleback, mouse and human. Tissue expression detection revealed wide distribution of per1a and per1b in grass carp. The conserved protein structure with mammals and highest expression in eye of grass carp per1b (p < 0.05) suggested its functions involved in rhythmic transcriptional regulatory and tropism movement to light in fish. These results could shed new light on the gene function and evolution of per gene in teleost.
-
-
-
Active Subnetwork GA: A Two Stage Genetic Algorithm Approach to Active Subnetwork Search
Authors: Ozan Ozisik, Burcu Bakir-Gungor, Banu Diri and Osman Ugur SezermanBackground: A group of interconnected genes in a protein-protein interaction network that contains most of the disease associated genes is called an active subnetwork. Active subnetwork search is an NP-hard problem. In the last decade, simulated annealing, greedy search, color coding, genetic algorithm, and mathematical programming based methods are proposed for this problem. Method: In this study, we employed a novel genetic algorithm method for active subnetwork search problem. We used active node list chromosome representation, branch swapping crossover operator, multicombination of branches in crossover, mutation on duplicate individuals, pruning, and two stage genetic algorithm approach. The proposed method is tested on simulated datasets and Wellcome Trust Case Control Consortium rheumatoid arthritis genome-wide association study dataset. Our results are compared with the results of a simple genetic algorithm implementation and the results of the simulated annealing method that is proposed by Ideker et al. in their seminal paper. Results and Conclusion: The comparative study demonstrates that our genetic algorithm approach outperforms the simple genetic algorithm implementation in all datasets and simulated annealing in all but one datasets in terms of obtained scores, although our method is slower. Functional enrichment results show that the presented approach can successfully extract high scoring subnetworks in simulated datasets and identify significant rheumatoid arthritis associated subnetworks in the real dataset. This method can be easily used on the datasets of other complex diseases to detect disease-related active subnetworks. Our implementation is freely available at https://www.ce.yildiz.edu.tr/personal/ozanoz/file/6611/ActSubGA.
-
-
-
A Fast Algorithm for Reconstructing Multiple Sequence Alignment and Phylogeny Simultaneously
Authors: Chi-Tim Ng, Chun Li and Xiaodan FanBackground: There is an increasing need to routinely and quickly compare multiple sequences of, for example, bird flu virus genomes to infer their evolutionary relationship. This entails a fast simultaneous inference of both sequence alignment and phylogeny. Current methods cannot meet the speed requirement though a high phylogeny accuracy is maintained in such scenarios. Objective: We propose a Fast Algorithm for constructing Multiple sequence Alignment and Phylogeny (FAMAP) from closely related DNA sequences. Method: FAMAP is essentially a sequentially-inputting algorithm and can be implemented in a progressive fashion, i.e., adding a new sequence into an existing tree or multiple sequence alignment. Its time complexity is O[NP(L)] + O(NG) and its space complexity is O(N) + O(G) + O[Q(L)] , where N is the number of sequences, N is the number of mutations on the phylogeny, L is the maximum length of the sequences, and P(L) and Q(L) are the time and space complexity of aligning a pair of sequences of length L, depending on the pairwise alignment algorithm employed. Results: Intensive simulation studies shows that our method is superior in terms of speed over other popular methods and has comparable accuracy of both multiple sequence alignment and the phylogeny. Conclusion: Our new algorithm might be one of the best choices when the user wants to quickly obtain a reliable phylogeny estimation from dozens of closely related long sequences
-
-
-
Reverse Vaccinology to Computationally Screen Antigenic Epitopes as Potential Vaccine Candidates from Clostridium botulinum Strain Hall A
Authors: Mehak Dangi, Bharat Singh and Anil Kumar ChhillarBackground: Clostridium botulinum having capacity to produce botulinum toxin (a potent bioterrorism agent) causes a fatal food borne disease "botulism" which is a major threat to public health globally. Most of the treatments and vaccines available for Clostridium botulinum are not effective enough to cure these infections. Therefore, there is an urgent need to look for more potential vaccine candidates by employing quick and cost effective methods. Even in the past decades enormous efforts have been put for the development of computational tools including Reverse vaccinology which were successfully used as landmark for discoveries in relevant scientific fields. Objective: The present article dealt with the employment of in silico based approaches of Reverse vaccinology for the identification of antigenic epitopes/peptides of Clostridium botulinum Hall A strain from its sequenced genome to consider them as potential vaccine candidates. Method and Results: After screening whole proteome on the merit, epitopes belonging to the top five proteins having accession numbers YP_001387668.1 (NlpC/P60 family protein), YP_001386461.1 (NlpC/P60 family protein), YP_001388839.1 (N-acetylmuramoyl-Lalanine amidase), YP_001387894.1 (hypothetical protein CLC_2047) and YP_001386160.1 (sporecortex-lytic enzyme) were selected for further computational characterization. These proteins were predicted to be surface exposed, nonallergic, essential, virulent and antigenic in nature. The binding efficiencies of these epitopes with HLAA* 0201 have also been visualized by docking tools. Conclusion: The predicted epitopes can be the good candidates for synthesizing a peptide vaccine that can elicit an efficient immune response in the host against the pathogen. Detailed characterization of such pathogen specific molecules as vaccine candidates (immuno-protective agents) would be beneficial for designing better treatment strategies against botulism.
-
-
-
ECMSRC: A Sparse Learning Approach for the Prediction of Extracellular Matrix Proteins
Authors: Imran Naseem, Shujaat Khan, Roberto Togneri and Mohammed BennamounBackground: The extracellular matrix (ECM) is a dynamic, physiologically active component of all living tissues. It plays a vital role in the functionality of living tissues. The mutation in ECM genes has shown to cause several diseases including cancer. A reliable prediction of the ECM is therefore of prognostic significance. Objective: Since the ECM proteins are closely related to secretory proteins, a number of researchers have investigated the secretory proteins to explore the extensive properties of the ECM but only few of them focus on the classification of ECM and non-ECM proteins. In this research we propose a novel approach for the prediction of the ECM proteins from the protein sequences. Method: Essentially the most discriminant features are selected by maximizing the class relevance and minimizing the redundancy (mRMR) in an information theoretic sense. The sparsity of these discriminant features is harnessed to employ the sparse representation classification (SRC) for prediction of the ECM proteins. Results: The proposed algorithm achieves a test-accuracy of 81.06% on a standard dataset which is superior compared to the EcmPred approach. For the case of prediction of the experimentally verified ECM proteins from humans, we report a verification accuracy of 80% which outperforms the EcmPred approach by a margin of 5%. Conclusion: The ECMSRC outperforms the EcmPred method in test accuracy and Youden's index. Noteworthy is the fact that the it utilizes fewer features compared to EcmPred (40 features) method to achieve this superior performance. The MATLAB implementation of the ECMSRC is available at http://sp.gsse.pafkiet.edu.pk/downloads.
-
-
-
Genomic Islands of Mannheimia haemolytica – In Silico Analysis
Authors: Relangi Tulasi Rao and Kannan JayakumarBackground: Genomic Islands (GIs) are commonly believed to be relics of horizontal transfer and are associated specific metabolic capacities, including virulence of the strain. Mannheimia haemolytica is a commensal, Gram-negative bacteria found in the upper respiratory tract of livestock. Amid stress or immune compromised conditions of the host, the bacteria known to induce malady. Aim: This study aimed to predict Genomic Islands in M. haemolytica genomes and virulence factors associated with these Islands. Further, we aimed to trace the donors of these horizontally transferred Islands. Methodology: Genomic Islands predicted with Ensemble algorithm for Genomic Island Detection (EGID) tool, data analysis and visualization done with self written shell scripts. And the putative donor predictions of horizontally transferred Islands with the aid of a frame work developed in our laboratory. Results and Conclusion: The study recognized distinct regions of the GIs of M. haemolytica M42458 which imparted with the other strains. A cluster consisting of six ORFs of M. haemolytica M42548 was common to all strains. About 22% ORFs of predicted GIs of M. haemolytica M42458 unique to itself. Data mining for association of virulence factors with GIs suggested that horizontal transfer played important role in hustling the virulence factors like adhesions, Outer Membrane proteins (ompA) and Type IV Secretion systems. And, also acquired antibacterial resistance genes found in GIs of M. haemolytica M42548 strain. The insights of GIs are evidence for the recent evolution of the most virulent strain under the study was M. haemolytica M42548. Our framework for donor prediction results revealed Haemophilus sp. is the major donor of GI regions. Further, GI Knockout studies needed for the evaluation of role of GIs in transforming commensal to virulent strain.
-
-
-
On the Temporal Effects of Features on the Prediction of Breast Cancer Survivability
Authors: Doaa M. Shawky and Ahmed F. SeddikObjectives: Breast cancer is the uncontrolled growth of breast cells. It is the second leading cause of cancer deaths in women worldwide. Thus, predicting the survivability of breast cancer is of great importance. The goal of this paper is to study the temporal effects of some features that describe breast cancer on the prediction of survivability. Methods: In the present study, several artificial intelligence (AI)-based approaches are implemented for predicting the survivability of breast cancer based on the rate of change of variables in the patient’s record. These values are used as features that characterize each patient and are employed in the learning process instead of the original variables. Four prediction models are built and compared using a proposed set of features. The models include artificial neural networks (ANN), K-nearest neighbors (KNN), support vector machines (SVM), and logistic regression (LR). The approach is applied to a large publicly available dataset of breast cancers. For each model, performance measures when a model is built using the proposed features are compared against those obtained when the original values are used instead. Results: For all four models, improved prediction accuracy was obtained when the rate of change of variables rather than the raw values were used. ANN yielded the best prediction accuracy, sensitivity, and specificity. The LR model yielded the worst performance measures. Conclusion: The results of this study provide a foundation for future research on medical decision making, where the temporal effect of patients’ data should be taken into consideration when an AIbased system is used.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
