Current Bioinformatics - Volume 10, Issue 4, 2015
Volume 10, Issue 4, 2015
-
-
Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction: A Survey
Authors: Shantipriya Parida, Satchidananda Dehuri and Sung-Bae ChoThe application of machine learning approaches to decode cognitive states through functional Magnetic Resonance Imaging (fMRI) is one of the emerging fields of research over the past decade. Multivoxel Pattern Analysis (MVPA) treats the activation of multiple voxels from the fMRI data as a pattern to decode the brain states using machine learning based classifiers. The potential in designing a classifier to accurately classify the discriminating cognitive states has attracted great attention from machine learning researchers. Interest has been evinced in particular to the application of such classifiers to study brain functions, diagnose mental diseases, detect deception and develop a brain-computer-interface. This paper surveys the recent development of machine learning approaches in cognitive state classification and brain activity prediction. Comparative studies of various techniques have been investigated to appreciate their merits and demerits. Furthermore, feature selection is discussed in this survey as an important preprocessing step in MVPA because it incorporates those features that will be integrated in the classification task of fMRI data, thereby improving the performance of the classifier. Features can be selected by restricting the analysis to specific anatomical regions or by computing univariate (voxel-wise) or multivariate statistics. Besides a summary and the future perspective of this field, an extensive list of bibliography is included for the community of interest.
-
-
-
A Review of Class Imbalance Learning Methods in Bioinformatics
Authors: Hualong Yu, Changyin Sun, Wankou Yang, Sen Xu and Yuanyuan DanIn recent years, research on bioinformatics has increasingly focused on the problem of class imbalance. A classification task is called class imbalance when the number of instances belonging to one class or several classes exceeds that of the other classes. Class imbalance often underestimates the performance of minority classes. This article provides a review of the most widely used class imbalance learning methods and their applications in various bioinformatic problems, including disease diagnosis based on gene expression data and protein mass spectrometry data, translation initiation site recognition based on DNA sequences, protein function classification using amino acid sequences, activities prediction of drug molecules, recognition of precursor microRNA (pre-miRNAs), etc. This article also summarizes the current challenges and future possible trends of class imbalance learning methods in Bioinformatics.
-
-
-
High-Throughput Techniques for Identifying microRNA Target Genes
More LessMicroRNAs (miRNAs) are important regulators of cell biological processes with approximately 21–25 nucleotides in length. They are thought to suppress protein coding gene expression through translation inhibition or (and) mRNA decay through leading the RNAi-induced silencing complex (RISC) to complementary sites in the 3' untranslated region of the target mRNA. Studies have shown widespread regulation of protein levels by miRNAs in cellular and animal models. Some computeraided putative target genes have been validated using a single target gene identification method, such as luciferase reporter assays or EGFP/RFP reporter assay. However, many of other target genes are unknown and need to be experimentally validated. Recently, high-throughput multiple target screening assays that enable a large-scale, efficient and multi-level study are expected to greatly improve the target identification methods. In this review, current high-throughput technologies for identifying miRNAs target genes are introduced, together with perspectives on further developments in this field.
-
-
-
A Comparative Study Among Various Statistical Tests Using Microarray Gene Expression Data
Authors: Monalisa Mandal and Anirban MukhopadhyayThis article reviews and compares different existing statistical tests utilized in significance analysis of gene expression from DNA microarray data. Microarray gene expression study facilitates the investigation of genes that are differentially expressed with respect to different classes of samples. Many gene expression studies also aim to find groups of genes that do something together, or to find out molecular similarities among a group of samples. In all respects, it is very important to know whether there is a difference between two sets of results, and whether that difference is likely to occur due to random variations of the datasets. Hence instead of simple data analysis, statistical tests are the proper way to check statistical relevance. Basically, the statistical significance test is compulsory for yielding an ordered list of relevant genes in terms of differential expression to the investigator. This article surveys a few important statistical tests which include both parametric and nonparametric tests and compares those statistical tests with respect to different performance metrics. The statistical tests have been applied on several artificial and real-life microarray gene expression datasets. Most of the time, real-life datasets contain noise and noisy data can have undesirable effect on the results of any data mining analysis. Therefore, performance analysis on noisy datasets for different statistical tests has also been performed to facilitate the comparative study. Furthermore, correlation analysis of the resultant genes from different statistical methods has been accomplished.
-
-
-
Hybrid Particle Swarm Optimization with Iterative Local Search for DNA Sequence Assembly
Authors: Indumathy Rajagopal and Uma Maheswari SankareswaranThis paper proposes a novel hybrid approach to solve the DNA sequence assembly problem by combining particle swarm optimization and iterative local search algorithms. One of the vital challenges in DNA sequence assembly is to arrange a long genome sequence that consists of millions of fragments in accurate order. This is an NP- hard combinatorial optimization problem. The prominence of this paper is to demonstrate how this hybrid algorithms scheme can improve the performance of fragment assembly process. Incorporating iterative local search heuristics in particle swarm optimization algorithm efficiently assembles the fragments by maximizing the overlap score. The performances of the proposed hybrid algorithm were compared with the variants of Particle Swarm Optimization algorithms and other known methodologies. The experimental results show that the proposed hybrid approach produces better results than the other techniques when tested with different sized well-known benchmark instances.
-
-
-
A Review on Application of Particle Swarm Optimization in Bioinformatics
Authors: Shikha Agrawal and Sanjay SilakariBioinformatics is an emerging interdisciplinary research area which holds great promise in the advancement of research and development in complex areas, such as medicine, biology, agriculture, environment, public health, drug design and so on. It is a blend of computer science and molecular biology. Most of the problems in bioinformatics are NP hard in nature so researchers have used soft computing and artificial intelligence techniques to solve these problems. Recently, the use of Swarm Intelligence techniques for solving bioinformatics problems has been gaining the attention of researchers because of their ability to generate low cost, approximate, good solutions. Among various algorithms of Swarm Intelligence, Particle Swarm Optimization is used in many applications and has proved to be very effective. This paper reviews and discusses some representative methods to provide inspiring examples to illustrate basic concept of PSO and how PSO had been applied to solve bioinformatics problems. These representative examples include RNA Secondary Structure Prediction, Gene Clustering, Phylogenetic Tree Construction, Energy Minimization and Protein Modeling. The aim of this paper is to provide an overall understanding of PSO and its place in bioinformatics so as to motivate researchers to develop new applications and concepts.
-
-
-
Prediction of Gene Co-Expression by Quantifying Heterogeneous Features
Authors: Abbasali Emamjomeh, Bahram Goliaei, Javad Zahiri and Reza EbrahimpourPrediction of gene co-expression has a great importance because of its role in explaining the molecular and functional mechanisms of the cells. For this reason, high performance methods should be developed to reduce errors. We have developed a novel method using heterogeneous features including gene expression values and various sequencebased features (SBF) via the random forest (RF) classifier to predict co-expressed genes. The proposed method, SeqNet, outperforms current state-of-the-art methods. Furthermore, the results indicated that the SBF are effective in the detection of co-expressed genes. However, the highest performance in predicting co-expressed genes was achieved by sequencebased features, along with gene expression data. This may be due to the ability of heterogeneous features prompt functional relationships between genes. Finally, we have concluded that SBF improve the performance of co-expressed genes prediction methods. The SeqNet can predict gene co-expression relationships when there is not enough gene expression data.
-
-
-
Multifractal Analysis of Muscular Tissue Cryofixed in a Cryostat Chamber
Authors: Giorgio Bianciardi, Francesca Pontenani and Sergio TripodiCryofixation of tissues in a cryostat chamber is a routine technique to investigate rapidly about the presence of tumours during a surgical procedure in patients (intraoperative consultation). The tissue is placed without cryoprotectant in contact with a cooled metal block inside the cryostat used for cutting and preparing the specimen, without or with a heavy weight. Until now, quantitative studies of the damages produced by freezing in intraoperative consultation are lacking. To obtain quantitative indexes we have performed fractal analysis (local fractal dimension, D0 and entropy, D1) of the cryofixed muscular tissues in comparison to formalin-fixed samples. Seventy-two microscopic fields or 700 muscle fibres were automatically examined. After freezing at t = 20 °C using an heavy weight, large voids inside the cells (ice-tissue interfaces) were present, while without the use of the weight the fibres collapse (shrinkage). Fractal analysis revealed the presence of a multifractal structure. In the formalin-fixed samples, at large scale the muscle tissue D0 and D1 reached the values of the Diffusion-Limited Aggregation process. At large scale, after cryofixation using the weight, D0 and D1 statistical increased (p<0.01; p<0.01), respect to the formalin-fixed samples, while, without the weight, the values were close to the ones of formalin-fixed samples. At low scale, without the weight, D0 and D1decreased statistically (p<0.01) compared to the formalin-fixed samples, while, with the weight, the values were close to the ones of formalin-fixed samples. Large and low scales accurately quantified the amount of ice-tissue interfaces and cell shrinkage, respectively.
-
-
-
A Novel Information Theoretic Approach to Gene Selection for Cancer Classification Using Microarray Data
Authors: Imran Naseem, Roberto Togneri and Mohammed BennamounIn this research an efficient gene selection method called Discriminant Mutual Information (DMI) algorithm is proposed. The DMI algorithm sequentially induces discrimination and relevance to identify the most significant genes for tumor classification. In particular, in the first step the entire gene population is decorrelated by the formation of gene-sets such that the genes with similar characteristics form a single gene-set. The mutual information criterion is further employed to identify the most representative gene of each gene-set. Extensive experiments have been conducted on six publicly available databases where the proposed DMI algorithm has shown good results compared to a number of state-of-the-art approaches. Extensive computational analysis clearly reflects the computational efficiency of the proposed approach, typically it requires only a few seconds for experimentation on standard microarray datasets.
-
-
-
HIV1-Human Protein-Protein Interaction Prediction (HHPPIP) Methodology: An FP-Growth Based Association Rule Mining Approach
Authors: R. Geetha Ramani and Shomona Gracia JacobInteractions between human and viral proteins have proved to be the major cause of several critical ailments and research directions in the recent past have focused on predicting potential viralhost protein relations through computational techniques. This research aimed at detecting probable interactions between HIV-1 and human proteins by generating all possible high -confident (>80%) associations between the host and viral protein and utilizing the association rules to predict new interactions. The FP-Growth algorithm was analyzed and found to evolve the exhaustive set of high confidence association rules that were mined further to isolate probable and significant HIV1-Human predictions with improved accuracy, sensitivity and specificity compared to previous work. The identified HIV1-Human protein interactions were further investigated using Gene Ontology based and DAVID functional annotation tool to establish their biological and therapeutic merits. The superiority of the proposed approach to previously applied computational techniques has been discussed. AIDS is one of the most dreaded diseases and we believe the proposed approach and the predicted interactions would be instrumental in expediting biological and molecular researchers towards formulating drugs for AIDS therapy and the biological functionality of the predicted interactions would enable timely diagnosis of the presence of the infectious viral protein and its replication in the host.
-
-
-
Insights into sRNA Genes Regulated by Two-Component Systems in the Bacillus cereus Group
Authors: Han Mei, Qing Tang, Xinfeng Li, Yaxi Wang, Jieping Wang and Jin HeTwo-component systems (TCSs) and small regulatory RNAs (sRNAs) form dense regulatory networks in bacteria. To expand the known repertoire of TCS regulons in the Bacillus cereus group, we employed an in silico strategy to identify sRNA genes that might be regulated by response regulators (RRs) of TCS. Using the whole genomes of 21 fully sequenced strains of the B. cereus group, we identified 12 different types of novel sRNA genes. Using transcriptome data from B. thuringiensis CT- 43, we confirmed the independent transcription of both the sRNA_bc4 gene and the sRNA_bc6 gene. Furthermore, the sRNA_bc6 and sRNA_bc12 genes were demonstrated to exist exclusively in the B. cereus group and thus have the potential to act as molecular markers. Finally, we modified the recognition motifs of PhoP and YclJ in the B. cereus group. These results significantly contribute to our understanding of TCS regulons in bacteria.
-
-
-
HaShRECA: Hadoop Based Short Read Error Correction Algorithm for Genome Assembly
Authors: Muhammad Tahir, Muhammad Sardaraz, Ataul Aziz Ikram and Hassan BajwaNext-generation high-throughput sequencing technologies have opened up new and challenging research opportunities. In particular, Next-generation sequencers produce a massive amount of short-reads data in a single run. However, the large amount of short-reads data produced is highly susceptible to errors, as compared to shotgun sequencing. Therefore, there is a peremptory demand to design fast and more accurate statistical and computational tools to analyze this data. We present HaShRECA, a new short-reads error correction algorithm based on probabilistic analysis of potential read errors that utilizes the Hadoop MapReduce framework. Experimental results show that HaShRECA is more accurate, as well as time and space efficient as compared to previous algorithms.
-
-
-
A Computer-Aided System for Automatic Mitosis Detection from Breast Cancer Histological Slide Images based on Stiffness Matrix and Feature Fusion
Authors: Ashkan Tashk, Mohammad Sadegh Helfroush, Habibollah Danyali and Mojgan AkbarzadehBackground: Nowadays, pathologists grade breast cancer histopathology slides by microscopes based on Nottingham as an international standard. In this standard, three factors are scored. One of these factors is mitotic counting. This counting is a rigid and time consuming activity which suffers from conflicts in inter- and intra- observations. Objective: To prevent the drawbacks occurred during mitotic counting, the experts fetch up to this fact that it is essential to employ some well-designed and organized computer-aided diagnosis (CAD) systems. For achieving this purpose, an innovative and operational automatic mitosis detection system is proposed in this paper. Methods: The proposed system constitutes of several image processing stages such as preprocessing, segmentation, feature extraction and supervised classification. The main contribution of the proposed system is in features extraction stage. It efficiently discriminates mitosis objects from non-mitosis ones based on significant features of stiffness matrix (SM) fused by other effective statistical texture features. SM includes various types of features useful and conventional in the study of histopathology slide images. Results: The proposed CAD system is fully software implemented. The datasets are given to the support vector machine classifier with non-linear kernels for different ratio of training and testing data. A maximum F-measure of 78.42% for scanner A dataset and 83.33% for scanner H are achieved. Conclusions: The experimental results, demonstrate that the proposed CAD system out performs the other automatic mitosis detection methods proposed in the current literature. It can also be applied to both scanner A and H with no performance and throughput reduction.
-
-
-
GenePlots: Large-Scale Gene Structures Visualization and Comparison Based on GenBank CDS Annotations
Authors: Haijun Liu, Keping Chen, Haifeng Shi, Keming Zhu and Xiaoyong LiuDrawing and comparing structures of homologous genes is an important way to study the gene evolution. In recent years, several programs and web tools have been proposed to draw gene structures according to genome sequences or GFF (General Feature Format) gene data. It is unfitted for them to draw a large number of gene structures due to time-consuming preparation of input data. The visualization tools for comparing the structures of hundreds or even thousands of genes are particularly lacking. We developed GenePlots, a comparative and interactive viewer for large scale gene structures based on CDS annotations in GenBank. With CDS annotations formatted like FASTA, GenePlots can draw all exon-intron structures for numerous genes in a unified scale. The most important assets of GenePlots are: (i) using GenBank CDS annotations as input data; (ii) the capability for analyzing thousands of genes simultaneously; (iii) the exhibition of intron phase; (iv) the flexibility allowing users to change the resulting diagram. GenePlots provides a simple and high efficient way to produce a large number of gene structures for large-scale genes comparison.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
