Current Bioinformatics - Volume 8, Issue 3, 2013
Volume 8, Issue 3, 2013
-
-
Detection of Protein Complexes Using Hierarchical Link Clustering and Core-Attachment Structure§
Authors: Yinhai Liu, Chengjie Sun, Yang Yu, Lei Lin and Xiaolong WangIdentifying protein complexes from protein-protein interact ion (PPI) networks is an important issue in proteomics and bioinformatics. And various computational methods have been developed to solve it. In this paper, an approach called Hierarchical Link Clustering and Core-Attachment (HLC-CA) was proposed to detect protein complexes by integrating an HLC algorithm and the immanent core-attachment structure in protein complexes. Compared with other methods, HLC-CA has a low time complexity and few parameters to tune. HLC-CA includes four steps. Firstly, an HLC algorithm was used to obtain candidate clusters. Secondly, a density threshold was employed to filter the clusters in ord e r to identify complex cores. Thirdly, each core was recruited attachments by introducing the closeness. Finally, the cores chosen in the second step and their corresponding attachments were used to compose protein complexes. Evaluation results show that the proposed HLC-CA method outperforms most of the state-of-the-art methods.
-
-
-
Metabolic Network Analysis: Current Status and Way Forward
Authors: Shweta Kolhi and Aahok S. KolaskarOne of the fundamental aims of life science is to gain insights into the functioning of an organism at systems level. Generation and systematic storage of enormous biological data in the post genomic era has made systems level studies a reality. For studies involving systems level investigation, metabolic pathways data inferred from intricate interactions amongst genes/enzymes/proteins are best suited and are being used extensively as they represent dynamic interactions in an organism. Consequently research in the field of comparative metabolomics as well as metabolic networks analysis is undergoing rapid improvement. Although the efforts to analyze metabolome have increased in recent years, our knowledge pertaining to its design principles is very limited. Various methodologies to compare and align metabolic pathways have been put forth and are discussed in this review. Further, graph theoretic approaches are undertaken with an aim to unveil the universal laws governing the complex metabolic networks. New algorithms that negate the abstraction from earlier studies are the need of the hour. One such approach termed “metabolic categorization” that helps in understanding the functionality of each metabolic pathway at systems level is discussed in this review. Finally, extension of linguistic approaches from genome and proteome to metabolome is suggested in order to simplify the understanding of a living system.
-
-
-
Protein Modules Detection Based on Subcellular Information
Authors: Yang Yu, Lei Lin, Chengjie Sun, Xiaolong Wang and Xuan WangProtein modules detection from protein-protein interaction network is the hot topic in the biological information process. In this paper, we present a rank strategy for deriving protein complex, in which both subcellular information and topological information of the network are combined. First, we locate the clusters based on the competing methods from protein-protein interaction network as candidate clusters and rank these clusters based on link density calculated from the localization matrix. Second, compared with four original methods, the experimental results demonstrate that our rank strategy can improve the performances of the four original methods and is robust to all the similarity scores. Finally, the integration of the protein co-cocalizaiton information can reduce false positive percentage, especially for the extracted protein complexes only from protein-protein interaction network. Furthermore, detailed comparison with functional annotations illustrates and certifies the efficiency of the spatial information and this strategy is indicated to be helpful to find functional modules.
-
-
-
Mining of Network Markers for Brain Tumor from Transcriptome and Interactome Data
More LessGlioblastoma multiforme (GBM: grade IV astrocytoma) is the most common but lethal form of brain cancer. The median survival time of GBM patients is only 15 months. Only a few predictive markers have been reported for prognosis and treatment. This study integrates gene expression and protein-protein interaction data to search for pathways that are differentially regulated between long-term and short-term survivors of GBM patients. A novel objective function for greedy search was introduced in search for 47 significantly and differentially expressed sub-networks (SDES) or pathways in a greedy fashion. The resultant putative pathways (involving 156 genes) were tested for enrichment of known GBM cancer genes as well as GO terms related to “biological process.” Integration of gene expression profiles of GBM patients with a PPI network improves the recall rate of known GBM driver genes and shows the better GO enrichment in comparison to the conventional gene-set approach that is based solely on the expression data.
-
-
-
On the Discovery of Cellular Subsystems in Gene Correlation Networks Using Measures of Centrality
Authors: Kathryn M. Dempsey and Hesham H. AliInnovative models for analyzing high-throughput biological data are becoming of great significance in the post genomic era. Correlation networks are rapidly becoming powerful models for representing various types of biological relationships especially in the case of extracting knowledge from gene expression data. Data analysis using of other popular networks models in biology have revealed that structures within a graph model, such as high degee nodes and cliques, often correspond to cellular functions. Correlation networks, which can be used to measure the relationships between patterns of gene expression, are capable of representing entire-genome expression assays. In this study we build correlation networks from gene expression datasets available in the public domain; once built, we are able to identify graph theoretic structures (critical nodes and dense subgraphs) and use measures of centrality to infer the biological impact of these structures within the network. We go on to validate the link between network components (such as critical nodes and degrees) and biological function of the model by exploring the biological properties of a set of nodes with high centrality measures in the correlation. In addition, we use network integration to identify essential genes in an integrated correlation network obtained by the union of networks of mice with different age groups. By examining clusters connected by highly central nodes in this integrated network, we were able to find a set of essential genes and identify several cellular subsystems that point towards aging related mechanisms. The obtained results provide clear evidence that correlation networks represent a powerful tool for analyzing temporal biological data and consequently make use of the wealth of gene expression assays currently available.
-
-
-
Systematic Analysis of Interactomes in Sequence Properties Space
More LessA number of representations of protein networks have been reported. Further, while the existence of multiple types of interactomes and relationships between proteins has been accepted and discussed extensively, the exploration of these concepts and hypotheses using machine learning frameworks for protein interaction prediction in a multi-class setting has not yet been extensively accomplished. Essentially, this is due to two reasons: the missing values issues in features and the heterogeneity and not always clear annotation of protein interaction data. This has motivated the attempt to build a set of universal features attributable to any set of protein pairs, generating a universal feature space where evolutionary constraints show their effects and play a central role. We have called this space and the features generating it respectively the sequence properties space and the derived features. We have probed an integrated version of sequence properties space in its ability to properly represent the different kind of available interactomes.
-
-
-
Improving Functional Modules Discovery by Enriching Interaction Networks with Gene Profiles
Authors: Saeed Salem, Rami Alroobi, Shadi Banitaan, Loqmane Seridi, Ibrahim Aljarah and James BrewerRecent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.
-
-
-
Predicting False Positives of Protein-Protein Interaction Data by Semantic Similarity Measures§
Authors: George Montanez and Young-Rae ChoRecent technical advances in identifying protein-protein interactions (PPIs) have generated the genomic-wide interaction data, collectively collectively referred to as the interactome. These interaction data give an insight into the underlying mechanisms of biological processes. However, the PPI data determined by experimental and computational methods include an extremely large number of false positives which are not confirmed to occur in vivo. Filtering PPI data is thus a critical preprocessing step to improve analysis accuracy. Integrating Gene Ontology (GO) data is proposed in this article to assess reliability of the PPIs. We evaluate the performance of various semantic similarity measures in terms of functional consistency. Protein pairs with high semantic similarity are considered highly likely to share common functions, and therefore, are more likely to interact. We also propose a combined method of semantic similarity to apply to predicting false positive PPIs. The experimental results show that the combined hybrid method has better performance than the individual semantic similarity classifiers. The proposed classifier predicted that 58.6% of the S. cerevisiae PPIs from the BioGRID database are false positives.
-
-
-
Semantic Similarities as Discriminative Features of Protein Complexes
Authors: Pietro Hiram Guzzi, Marianna Milano, Pierangelo Veltri and Mario CannataroBiological data about genes, proteins and biologically relevant molecules that are stored in databases may be associated to biological information (knowledge) such as experiments, properties and functions, response to drugs etc. Such knowledge is formally structured into ontologies that provide the best formalize to organize and store knowledge. In the biological field, Gene Ontology (GO) provides both a categorization of annotating terms and a source of annotation for genes and proteins. Consequently it is possible to introduce novel methodologies of analysis that are based on the use of ontologies. Recently a growing interest has caputed semantic similarities, i.e. the calculation of the similarity of two or more proteins starting from their annotations. For instance semantic measures have been used for the prediction of protein complexes. Although the importance of these researches, some problems remain still unsolved: the assessment of semantic measures with respect to biological features as well as a deep study on the impact of the chosen measure in the obtained results. This paper focus on the use of semantic similarity measures into the protein complexes prediction pipeline. For these aims we investigated if there exists a bias among different measures as well as a higher value of semantic similarity within proteins that participate in the same complex. Results confirm that protein belonging to the same complex have a bigger average values of semantic similarity with respect to the average values of the proteomes. This confirm a possible use of semantic similarity measures within protein complexes prediction algorithms and a way to choose the best one among them.
-
-
-
Fractal Analysis of Epithelial-Connective Tissue Interface in Basal Cell Carcinoma of the Skin
Authors: Giorgio Bianciardi, Clelia Miracco, Stefano Lazzi and Pietro LuziThis paper investigates the use of computerized fractal analysis for objective characterization of the complexity of the epithelial-connective tissue interface in basal cell carcinoma and the ability of the technique to quantitatively discriminate among different diagnostic categories. Tumor boundaries were extracted by means of image analysis. The fractal dimension was calculated by using the box-counting method. The results showed that the shape of the boundaries between epithelium and stroma is significantly more complex in infiltrative high risk tumors than in circumscribed low risk ones (p<0.001), with 100% correct classifications. This study shows that the computerized fractal analysis of epithelial-connective tissue interface in basal cell carcinomas can provide an accurate, quantitative, inexpensive technique to help in tumor diagnosis.
-
-
-
An In Silico Identification of Human Promoters: A Soft Computing Based Approach
Authors: Sutapa Datta and Subhasis MukhopadhyayPromoter region of a gene sequence of Eukaryotes is very important as it helps us to understand the mechanism of transcription regulation. The identification of this region is a complex problem as the signature for identification turns out to be fuzzy. Several in silico methods are available for identifying the promoter region, but the scope for new methods still exists. Reasonable prediction of promoter sequence (that can be tested by comparing with the wet-lab data) from a mixed database of promoters and nonpromoters is thus a challenge that any new method would have to face. In this communication we propose a composite method that utilizes clustering of known promoter and non-promoter sequences in their respective clusters based on their relative distances, and then classifying the max similarity scores obtained from a group of new sequences and the clusters, to predict the true promoters among the new set of sequences. The in silico experiment is carried out on different databases constructed by us from the available primary sequence databanks to demonstrate the advantage of the proposed approach.
-
-
-
Hidden Markov Model for Splicing Junction Sites Identification in DNA Sequences
Authors: Srabanti Maji and Deepak GargIdentification of coding sequence from genomic DNA sequence is the major step in pursuit of gene identification. In the eukaryotic organism, gene structure consists of promoter, intron, start codon, exons and stop codon, etc. and to identify it, accurate labeling of the mentioned segments is necessary. Splice site is the ‘separation’ between exons and introns, the predicted accuracy of which is lower than 90% (in general) though the sequences adjacent to the splice sites have a high conservation. As the accuracy of splice site recognition has not yet been satisfactory (adequate), therefore, much attention has been paid to improve the prediction accuracy and improvement in the algorithms used is very essential element. In this manuscript, Hidden Markov Model (HMM) based splice sites predictor is developed and trained using Modified Expectation Maximization (MEM) algorithm. A 12 fold cross validation technique is also applied to check the reproducibility of the results obtained and to further increase the prediction accuracy. The proposed system can able to achieve the accuracy of 98% of true donor site and 93% for true acceptor site in the standard DNA (nucleotide) sequence.
-
-
-
A New Integration-Centric Algorithm of Identifying Essential Proteins Based on Topology Structure of Protein-Protein Interaction Network and Complex Information
Authors: Jiawei Luo and Ling MaEssential proteins are necessary for the survival and development of organism. Many computational approaches have been proposed for predicting essential proteins based on protein-protein interaction (PPI) network. In this paper, we propose a new centrality algorithm for identifying essential proteins, named CSC algorithm. CSC algorithm integrates topology character of PPI network and in-degree of proteins in complexes. We use CSC algorithm to identify the essential proteins in PPI network of Saccharomyces cerevisiae. The results show that the ratio of identified essential proteins on CSC algorithm is higher than other ten centrality methods: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC) and PeC. Particularly, the identification accuracy of CSC algorithm is more than 40% over the six classic centrality measures (DC, BC, CC, SC, EC, IC).
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
