Current Bioinformatics - Volume 14, Issue 8, 2019
Volume 14, Issue 8, 2019
-
-
Relevance of Machine Learning Techniques and Various Protein Features in Protein Fold Classification: A Review
More LessAuthors: Komal Patil and Usha ChouhanBackground: Protein fold prediction is a fundamental step in Structural Bioinformatics. The tertiary structure of a protein determines its function and to predict its tertiary structure, fold prediction serves an important role. Protein fold is simply the arrangement of the secondary structure elements relative to each other in space. A number of studies have been carried out till date by different research groups working worldwide in this field by using the combination of different benchmark datasets, different types of descriptors, features and classification techniques. Objective: In this study, we have tried to put all these contributions together, analyze their study and to compare different techniques used by them. Methods: Different features are derived from protein sequence, its secondary structure, different physicochemical properties of amino acids, domain composition, Position Specific Scoring Matrix, profile and threading techniques. Conclusion: Combination of these different features can improve classification accuracy to a large extent. With the help of this survey, one can know the most suitable feature/attribute set and classification technique for this multi-class protein fold classification problem.
-
-
-
A New Comprehensive Index for Evaluating the Quality of Infant Formula under the Framework of Chinese Food Standards
More LessAuthors: Ming Zhang, Li Zhang and Hongsheng LiuObjective: We proposed a new comprehensive index, the food quality index (FQI), to effectively evaluate food quality. Methods: The food quality index is based on chemical and biological indicators of the Chinese food standards framework. We evaluated the limit value regulations for infant formula standards and then established a comprehensive index and grading standard. Finally, we applied the index to evaluate data obtained from the Shenyang Product Quality Supervision and Inspection Institute. Results: The results showed that the quality of older infant and young children formula (OF) was good, and the infant formula (IF) was acceptable. Conclusion: The quality of OF was observably higher than that of IF, and they were significantly different with a p-value < 0.01. The most influencing factor of the IF and OF was minerals, followed by major components, optional ingredients, vitamins, and pathogenic bacteria had no effect. In IF, The number of the main influencing single indicators were 36 and 20 in IF and OF, respectively. Statistical analysis showed that index values of vitamins in the two kinds of milk powders were significantly different, with a p-value < 0.01. Optional ingredients were significantly correlated, with a p-value < 0.05.
-
-
-
Predicting Drug Side Effects with Compact Integration of Heterogeneous Networks
More LessAuthors: Xian Zhao, Lei Chen, Zi-Han Guo and Tao LiuBackground: The side effects of drugs are not only harmful to humans but also the major reasons for withdrawing approved drugs, bringing greater risks for pharmaceutical companies. However, detecting the side effects for a given drug via traditional experiments is time- consuming and expensive. In recent years, several computational methods have been proposed to predict the side effects of drugs. However, most of the methods cannot effectively integrate the heterogeneous properties of drugs. Methods: In this study, we adopted a network embedding method, Mashup, to extract essential and informative drug features from several drug heterogeneous networks, representing different properties of drugs. For side effects, a network was also built, from where side effect features were extracted. These features can capture essential information about drugs and side effects in a network level. Drug and side effect features were combined together to represent each pair of drug and side effect, which was deemed as a sample in this study. Furthermore, they were fed into a random forest (RF) algorithm to construct the prediction model, called the RF network model. Results: The RF network model was evaluated by several tests. The average of Matthews correlation coefficients on the balanced and unbalanced datasets was 0.640 and 0.641, respectively. Conclusion: The RF network model was superior to the models incorporating other machine learning algorithms and one previous model. Finally, we also investigated the influence of two feature dimension parameters on the RF network model and found that our model was not very sensitive to these parameters.
-
-
-
The “Gene Cube”: A Novel Approach to Three-dimensional Clustering of Gene Expression Data
More LessAuthors: George I. Lambrou, Maria Sdraka and Dimitrios KoutsourisBackground: A very popular technique for isolating significant genes from cancerous tissues is the application of various clustering algorithms on data obtained by DNA microarray experiments. Aim: The objective of the present work is to take into consideration the chromosomal identity of every gene before the clustering, by creating a three-dimensional structure of the form Chromosomes×Genes×Samples. Further on, the k-Means algorithm and a triclustering technique called δ- TRIMAX, are applied independently on the structure. Materials and Methods: The present algorithm was developed using the Python programming language (v. 3.5.1). For this work, we used two distinct public datasets containing healthy control samples and tissue samples from bladder cancer patients. Background correction was performed by subtracting the median global background from the median local Background from the signal intensity. The quantile normalization method has been applied for sample normalization. Three known algorithms have been applied for testing the “gene cube”, a classical k-means, a transformed 3D k-means and the δ-TRIMAX. Results: Our proposed data structure consists of a 3D matrix of the form Chromosomes×Genes×Samples. Clustering analysis of that structure manifested very good results as we were able to identify gene expression patterns among samples, genes and chromosomes. Discussion: to the best of our knowledge, this is the first time that such a structure is reported and it consists of a useful tool towards gene classification from high-throughput gene expression experiments. Conclusions: Such approaches could prove useful towards the understanding of disease mechanics and tumors in particular.
-
-
-
Estimating Bifurcating Consensus Phylogenetic Trees Using Evolutionary Imperialist Competitive Algorithm
More LessAuthors: Vageehe Nikkhah, Seyed M. Babamir and Seyed S. ArabBackground: One of the important goals of phylogenetic studies is the estimation of species-level phylogeny. A phylogenetic tree is an evolutionary classification of different species of creatures. There are several methods to generate such trees, where each method may produce a number of different trees for the species. By choosing the same proteins of all species, it is possible that the topology and arrangement of trees would be different. Objective: There are methods by which biologists summarize different phylogenetic trees to a tree, called consensus tree. A consensus method deals with the combination of gene trees to estimate a species tree. As the phylogenetic trees grow and their number is increased, estimating a consensus tree based on the species-level phylogenetic trees becomes a challenge. Methods: The current study aims at using the Imperialist Competitive Algorithm (ICA) to estimate bifurcating consensus trees. Evolutionary algorithms like ICA are suitable to resolve problems with the large space of candidate solutions. Results: The obtained consensus tree has more similarity to the native phylogenetic tree than related studies. Conclusion: The proposed method enjoys mechanisms and policies that enable us more than other evolutionary algorithms in tuning the proposed algorithm. Thanks to these policies and the mechanisms, the algorithm enjoyed efficiently in obtaining the optimum consensus tree. The algorithm increased the possibility of selecting an optimum solution by imposing some changes in its parameters.
-
-
-
VirDB: Crowdsourced Database for Evaluation of Dynamical Viral Infection Models
More LessBackground: Open science is an emerging movement underlining the importance of transparent, high quality research where results can be verified and reused by others. However, one of the biggest problems in replicating experiments is the lack of access to the data used by the authors. This problem also occurs during mathematical modeling of a viral infections. It is a process that can provide valuable insights into viral activity or into a drug’s mechanism of action when conducted correctly. Objective: We present the VirDB database (virdb.cs.put.poznan.pl), which has two primary objectives. First, it is a tool that enables collecting data on viral infections that could be used to develop new dynamic models of infections using the FAIR data sharing principles. Second, it allows storing references to descriptions of viral infection models, together with their evaluation results. Methods: To facilitate the fast population of database and the ease of exchange of scientific data, we decided to use crowdsourcing for collecting data. Such approach has already been proved to be very successful in projects such as Wikipedia. Conclusion: VirDB builds on the concepts and recommendations of Open Science and shares data using the FAIR principles. Thanks to this storing data required for designing and evaluating models of viral infections which can be freely available on the Internet.
-
-
-
HS-MMGKG: A Fast Multi-objective Harmony Search Algorithm for Two-locus Model Detection in GWAS
More LessAuthors: Liyan Sun, Guixia Liu, Lingtao Su and Rongquan WangBackground: Genome-Wide Association Study (GWAS) plays a very important role in identifying the causes of a disease. Because most of the existing methods for genetic-interaction detection in GWAS are designed for a single-correlation model, their performances vary considerably for different disease models. These methods usually have high computation cost and low accuracy. Methods: We present a new multi-objective heuristic optimization methodology named HSMMGKG for detecting genetic interactions. In HS-MMGKG, we use harmony search with five objective functions to improve the efficiency and accuracy. A new strategy based on p-value and MDR is adopted to generate more reasonable results. The Boolean representation in BOOST is modified to calculate the five functions rapidly. These strategies take less time complexity and have higher accuracy while detecting the potential models. Results: We compared HS-MMGKG with CSE, MACOED and FHSA-SED using 26 simulated datasets. The experimental results demonstrate that our method outperforms others in accuracy and computation time. Our method has identified many two-locus SNP combinations that are associated with seven diseases in WTCCC dataset. Some of the SNPs have direct evidence in CTD database. The results may be helpful to further explain the pathogenesis. Conclusion: It is anticipated that our proposed algorithm could be used in GWAS which is helpful in understanding disease mechanism, diagnosis and prognosis.
-
-
-
A New Model of Identifying Differentially Expressed Genes via Weighted Network Analysis Based on Dimensionality Reduction Method
More LessAuthors: Mi-Xiao Hou, Jin-Xing Liu, Ying-Lian Gao, Junliang Shang, Sha-Sha Wu and Sha-Sha YuanBackground: As a method to identify Differentially Expressed Genes (DEGs), Non- Negative Matrix Factorization (NMF) has been widely praised in bioinformatics. Although NMF can make DEGs to be easily identified, it cannot provide more associated information for these DEGs. Objective: The methods of network analysis can be used to analyze the correlation of genes, but they caused more data redundancy and great complexity in gene association analysis of high dimensions. Dimensionality reduction is worth considering in this condition. Methods: In this paper, we provide a new framework by combining the merits of two: NMF is applied to select DEGs for dimensionality reduction, and then Weighted Gene Co-Expression Network Analysis (WGCNA) is introduced to cluster on DEGs into similar function modules. The combination of NMF and WGCNA as a novel model accomplishes the analysis of DEGs for cholangiocarcinoma (CHOL). Results: Some hub genes from DEGs are highlighted in the co-expression network. Candidate pathways and genes are also discovered in the most relevant module of CHOL. Conclusion: The experiments indicate that our framework is effective and the works also provide some useful clues to the reaches of CHOL.
-
-
-
Transcriptional Regulation Analysis of Alzheimer's Disease Based on FastNCA Algorithm
More LessAuthors: Qianni Sun, Wei Kong, Xiaoyang Mou and Shuaiqun WangBackground: Understanding the relationship between genetic variation and gene expression is a central issue in genetics. Although many studies have identified genetic variations associated with gene expression, it is unclear how they perturb the underlying regulatory network of gene expression. Objective: To explore how genetic variations perturb potential transcriptional regulation networks of Alzheimer’s disease (AD) to paint a more complete picture of the complex landscape of transcription regulation. Methods: Fast network component analysis (FastNCA), which can capture the genetic variations in the form of single nucleotide polymorphisms (SNPs), is applied to analyse the expression activities of TFs and their regulatory strengths on TGs using microarray and RNA-seq data of AD. Then, multi-data fusion analysis was used to analyze the different TGs regulated by the same TFs in the different data by constructing the transcriptional regulatory networks of differentially expressed genes. Results: the common TF regulating TGs are not necessarily identical in different data, they may be involved in the same pathways that are closely related to the pathogenesis of AD, such as immune response, signal transduction and cytokine-cytokine receptor interaction pathways. Even if they are involved in different pathways, these pathways are also confirmed to have a potential link with AD. Conclusion: The study shows that the pathways of different TGs regulated by the same TFs in different data are all closely related to AD. Multi-data fusion analysis can form a certain complement to some extent and get more comprehensive results in the process of exploring the pathogenesis of AD.
-
-
-
Genome-wide Differential-based Analysis of the Relationship between DNA Methylation and Gene Expression in Cancer
More LessAuthors: Yuanyuan Zhang, Chuanhua Kou, Shudong Wang and Yulin ZhangBackground: DNA methylation is an epigenetic modification that plays an important role in regulating gene expression. There is evidence that the hypermethylation of promoter regions always causes gene silencing. However, how the methylation patterns of other regions in the genome, such as gene body and 3’UTR, affect gene expression is unknown. Objective: The study aimed to fully explore the relationship between DNA methylation and expression throughout the genome-wide analysis which is important in understanding the function of DNA methylation essentially. Methods: In this paper, we develop a heuristic framework to analyze the relationship between the methylated change in different regions and that of the corresponding gene expression based on differential analysis. Results: To understande the methylated function of different genomic regions, a gene is divided into seven functional regions. By applying the method in five cancer datasets from the Synapse database, it was found that methylated regions with a significant difference between cases and controls were almost uniformly distributed in the seven regions of the genome. Also, the effect of DNA methylation in different regions on gene expression was different. For example, there was a higher percentage of positive relationships in 1stExon, gene body and 3’UTR than in TSS1500 and TSS200. The functional analysis of genes with a significant positive and negative correlation between DNA methylation and gene expression demonstrated the epigenetic mechanism of cancerassociated genes. Conclusion: Differential based analysis helps us to recognize the change in DNA methylation and how this change affects the change in gene expression. It provides a basis for further integrating gene expression and DNA methylation data to identify disease-associated biomarkers.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month