Current Bioinformatics - Volume 16, Issue 1, 2021
Volume 16, Issue 1, 2021
-
-
Bayesian Functional Mixed-effects Models with Grouped Smoothness for Analyzing Time-course Gene Expression Data
Authors: Shangyuan Ye, Ye Liang and Bo ZhangObjective: As a result of the development of microarray technologies, gene expression levels of thousands of genes involved in a given biological process can be measured simultaneously, and it is important to study their temporal behavior to understand their mechanisms. Since the dependence between gene expression levels over time for a given gene is often too complicated to model parametrically, sparse functional data analysis has received an increasing amount of attention for analyzing such data. Methods: We propose a new functional mixed-effects model for analyzing time-course gene expression data. Specifically, the model groups individual functions with heterogeneous smoothness. The proposed method utilizes the mixed-effects model representation of penalized splines for both the mean function and the individual functions. Given noninformative or weakly informative priors, Bayesian inference on the proposed models was developed, and Bayesian computation was implemented by using Markov chain Monte Carlo methods. Results: The performance of our new model was studied by two simulation studies and illustrated using a yeast cell cycle gene expression dataset. Simulation results suggest that our proposed methods can outperform the previously used methods in terms of the mean integrated squared error. The yeast gene expression data application suggests that the proposed model with two latent groups should be used on this dataset. Conclusion: The new Bayesian functional mixed-effects model that assumes multiple groups of functions with different smoothing parameters provides an enhanced approach to analyzing timecourse gene expression data.
-
-
-
Analysis of Codon Usage Patterns in the Human Papillomavirus Oncogenes
Authors: Myeongji Cho, Hayeon Kim, Mikyeong Je and Hyeon S. SonBackground: Persistent high-risk genital human papillomavirus (HPV) infection is a major cause of cervical cancer in women. The products of the viral transforming genes E6 and E7 in the high-risk HPVs are known to be similar in their amino acid composition and structure. We performed a comparative analysis of codon usage patterns in the E6 and E7 genes of HPVs. Methods: The E6 and E7 gene sequences of eight HPV subtypes were analyzed to determine their nucleotide composition, relative synonymous codon usage (RSCU), effective number of codons (ENC), neutrality, genetic variability, selection pressure, and codon adaptation index (CAI). Additionally, a correspondence analysis (CoA) was performed. Results: The analysis to determine the effects of differences in composition on the codon usage patterns revealed that there might be usage bias for ‘A’ nucleotides. This was consistent with the results of the RSCU analysis, which demonstrated that the selection of A/T-rich patterns and the preference for A/T-ended codons in HPVs are influenced by compositional constraints. Moreover, the results reveal that selection pressure plays an important role in the CoA results for the RSCU values, Tajima’s D tests, and neutrality tests. Conclusion: The results of this study are consistent with previous findings that most papillomavirus genes are under purifying selection pressure, which limits changes to the encoded proteins. Natural selection and mutation pressures resulting in changes in the nucleotide composition and codon usage bias in the two tumor genes of HPV act differently during the evolution of the HPV subtype; thus, throughout the viral life cycle, HPV can constantly evolve to adapt to a new environment.
-
-
-
Identification of Genomic Islands in Synechococcus sp. WH8102 Using Genomic Barcode and Whole-Genome Microarray Analysis
Authors: Jiahui Pan, Xizi Luo, Jiang Bian, Tong Shao, Chaoying Li, Tingting Zhao, Shiwei Zhang, Fengfeng Zhou and Guoqing WangBackground: Synechococcus sp. WH8102 is one of the most abundant photosynthetic organisms in many ocean regions. Objective: The aim of this study is to identify genomic islands (GIs) in Synechococcus sp. WH8102 with integrated methods. Methods: We have applied a genomic barcode to identify the GIs in Synechococcus sp. WH8102, which could make genomic regions of different origins visually apparent. The gene expression data of the predicted GIs was analyzed through microarray data which was collected for functional analysis of the relevant genes. Results: Seven GIs were identified in Synechococcus sp. WH8102. Most of them are involved in cell surface modification, photosynthesis and drug resistance. In addition, our analysis also revealed the functions of these GIs, which could be used for in-depth study on the evolution of this strain. Conclusion: Genomic barcodes provide us with a comprehensive and intuitive view of the target genome. We can use it to understand the intrinsic characteristics of the whole genome and identify GIs or other similar elements.
-
-
-
Biomarker Identification for Liver Hepatocellular Carcinoma and Cholangiocarcinoma Based on Gene Regulatory Network Analysis
Authors: Qiuyan Huo, Yuying Ma, Yu Yin and Guimin QinBackground: Liver hepatocellular carcinoma (LIHC) and cholangiocarcinoma (CHOL) are two main histological subtypes of primary liver cancer with a unified molecular landscape, and feed-forward loops (FFLs) have been shown to be relevant in these complex diseases. Objective: To date, there has been no comparative analysis of the pathogenesis of LIHC and CHOL based on regulatory relationships. Therefore, we investigated the common and distinct regulatory properties of LIHC and CHOL in terms of gene regulatory networks. Methods: Based on identified FFLs and analysis of pathway enrichment, we constructed pathwayspecific co-expression networks and further predicted biomarkers for these cancers by network clustering. Results: We identified 20 and 36 candidate genes for LIHC and CHOL, respectively. The literature from PubMed supports the reliability of our results. Conclusion: Our results indicated that the hsa01522-Endocrine resistance pathway was associated with both LIHC and CHOL. Additionally, six genes (SPARC, CTHRC1, COL4A1, EDIL3, LAMA4 and OLFML2B) were predicted to be highly associated with both cancers, and COL4A2, CSPG4, GJC1 and ADAMTS7 were predicted to be potential biomarkers of LIHC, and COL6A3, COL1A2, FAP and COL8A1 were predicted to be potential biomarkers of CHOL. In addition, we inferred that the Collagen gene family, which appeared more frequently in our overall prediction results, might be closely related to cancer development.
-
-
-
Distinguishing Enzymes and Non-enzymes Based on Structural Information with an Alignment Free Approach
Authors: Lifeng Yang and Xiong JiaoBackground: Knowledge of protein functions is very crucial for the understanding of biological processes. Experimental methods for protein function prediction are of no use to treat the growing amount of protein sequence and structure data. Objective: To develop some computational techniques for the protein function prediction. Methods: Based on the residue interaction network features and the motion mode information, an SVM model was constructed and used as the predictor. The role of these features was analyzed and some interesting results were obtained. Results: An alignment-free method for the classification of enzyme and non-enzyme is developed in this work. There is no single feature that occupies a dominant position in the prediction process. The topological and the information-theoretic residue interaction network features have a better performance. The combination of the fast mode and the slow mode can get a better explanation for the classification result. Conclusion: The method proposed in this paper can act as a classifier for the enzymes and nonenzymes.
-
-
-
A Novel Method for Microsatellite Instability Detection by Liquid Biopsy Based on Next-generation Sequencing
Authors: Zheng Jiang, Hui Liu, Siwen Zhang, Jia Liu, Weitao Wang, Guoliang Zang, Bo Meng, Huixin Lin, Jichuan Quan, Shuangmei Zou, Dawei Yuan, Xishan Wang, Geng Tian and Jidong LangBackground: Microsatellite instability (MSI) is a prognostic biomarker used to guide medication selection in multiple cancers, such as colorectal cancer. Traditional PCR with capillary electrophoresis and next-generation sequencing using paired tumor tissue and leukocyte samples are the main approaches for MSI detection due to their high sensitivity and specificity. Currently, patient tissue samples are obtained through puncture or surgery, which causes injury and risk of concurrent disease, further illustrating the need for MSI detection by liquid biopsy. Methods: We propose an analytic method using paired plasma/leukocyte samples and MSI detection using next-generation sequencing technology. Based on the theoretical progress of oncogenesis, we hypothesized that the microsatellite site length in plasma equals the combination of the distribution of tumor tissue and leukocytes. Thus, we defined a window-judgement method to identify whether biomarkers were stable. Results: Compared to traditional PCR as the standard, we evaluated three methods in 20 samples (MSI-H:3/MSS:17): peak shifting method using tissue vs. leukocytes, peak shifting method using plasma vs. leukocytes, and our method using plasma vs. leukocytes. Compared to traditional PCR, we observed a sensitivity of 100%, 0%, and 100%, and a specificity of 100.00%, 94.12%, and 88.24%, respectively. Conclusion: Our method has the advantage of possibly detecting MSI in a liquid biopsy and provides a novel direction for future studies to increase the specificity of the method.
-
-
-
A Novel Hybrid Filter/Wrapper Feature Selection Approach Based on Improved Fruit Fly Optimization Algorithm and Chi-square Test for High Dimensional Microarray Data
Authors: Chaokun Yan, Bin Wu, Jingjing Ma, Ge Zhang, Junwei Luo, Jianlin Wang and Huimin LuoBackground: Microarray data is widely utilized for disease analysis and diagnosis. However, it is hard to process them directly and achieve high classification accuracy due to the intrinsic characteristics of high dimensionality and small size samples. As an important data preprocessing technique, feature selection is usually used to reduce the dimensionality of some datasets. Methods: Given the limitations of employing filter or wrapper approaches individually for feature selection, in the study, a novel hybrid filter-wrapper approach, CS_IFOA, is proposed for high dimensional datasets. First, the Chi-square Test is utilized to filter out some irrelevant or redundant features. Next, an improved binary Fruit Fly Optimization algorithm is conducted to further search the optimal feature subset without degrading the classification accuracy. Here, the KNN classifier with the 10-fold-CV is utilized to evaluate the classification accuracy. Results: Extensive experimental results on six benchmark biomedical datasets show that the proposed CS-IFOA can achieve superior performance compared with other state-of-the-art methods. The CS-IFOA can get a smaller number of features while achieving higher classification accuracy. Furthermore, the standard deviation of the experimental results is relatively small, which indicates that the proposed algorithm is relatively robust. Conclusion: The results confirmed the efficiency of our approach in identifying some important genes for high-dimensional biomedical datasets, which can be used as an ideal pre-processing tool to help optimize the feature selection process, and improve the efficiency of disease diagnosis.
-
-
-
Quantum Patterns of Genome Size Variation in Angiosperms
Authors: Liaofu Luo and Lirong ZhangAims: The discontinuous pattern of genome size variation in angiosperms is an unsolved problem related to genome evolution. In this study, we introduced a genome evolution operator and solved the related eigenvalue equation to deduce the discontinuous pattern. Background: Genome is a well-defined system for studying the evolution of species. One of the basic problems is the genome size evolution. The DNA amounts for angiosperm species are highly variable, differing over 1000-fold. One big surprise is the discovery of the discontinuous distribution of nuclear DNA amounts in many angiosperm genera. Objective: The discontinuous distribution of nuclear DNA amounts has certain regularity, much like a group of quantum states in atomic physics. The quantum pattern has not been explained by all the evolutionary theories so far and we shall interpret it through the quantum simulation of genome evolution. Methods: We introduced a genome evolution operator H to deduce the distribution of DNA amount. The nuclear DNA amount in angiosperms is studied from the eigenvalue equation of the genome evolution operator H. The operator H is introduced by physical simulation and it is defined as a function of the genome size N and the derivative with respect to the size. Results: The discontinuity of DNA size distribution and its synergetic occurrence in related angiosperms species are successfully deduced from the solution of the equation. The results agree well with the existing experimental data of Aloe, Clarkia, Nicotiana, Lathyrus, Allium and other genera. Conclusion: The success of our approach may infer the existence of a set of genomic evolutionary equations satisfying classical-quantum duality. The classical phase of evolution means it obeys the classical deterministic law, while the quantum phase means it obeys the quantum stochastic law. The discontinuity of DNA size distribution provides novel evidences on the quantum evolution of angiosperms. It has been realized that the discontinuous pattern is due to the existence of some unknown evolutionary constraints. However, our study indicates that these constraints on the angiosperm genome essentially originate from quantum.
-
-
-
Identification of Critical Functional Modules and Signaling Pathways in Osteoporosis
Authors: Xiaowei Jiang, Pu Ying, Yingchao Shen, Yiming Miu, Wenbin Kong, Tong Lu and Qiang WangBackground: Osteoporosis is the most common bone metabolic disease. Abnormal osteoclast formation and resorption play a fundamental role in osteoporosis pathogenesis. Recent researches have greatly broadened our understanding of molecular mechanisms of osteoporosis. However, the molecular mechanisms leading to osteoporosis are still not entirely clear. Objective: The purpose of this work is to study the critical regulatory genes, functional modules, and signaling pathways. Methods: Differential expression analysis, network topology-based analysis, and overrepresentation enrichment analysis (ORA) were used to identify differentially expressed genes (DEGs), gene subnetworks, and signaling pathways related to osteoporosis, respectively. Results: Differential expression analysis identified DEGs, such as POGLUT1, DAPK3 and NFKBIA, associated with osteoclastogenesis, which highlighted Notch, apoptosis and NF-kB signaling pathways. Network topology-based analysis identified the upregulated subnetwork characterized by EXOSC8 and DIS3L from the RNA exosome complex, and the downregulated subnetwork composed of histone deacetylases and the cofactors, MORF4L1 and JDP2. Furthermore, the overrepresentation enrichment analysis highlighted that corticotrophin-releasing hormone signaling pathway might affect osteoclastogenesis through its component NR4A1, and suppressing osteoclast differentiation and osteoclast bone resorption with urocortin (UCN). Conclusion: Our systematic analysis not only discovered novel molecular mechanisms but also proposed potential drug targets for osteoporosis.
-
-
-
Bioinformatics Analysis Identifies CPZ as a Tumor Immunology Biomarker for Gastric Cancer
Authors: Yuan Gu, Ying Gao, Xiaodan Tang, Huizhong Xia and Kunhe ShiBackground: Gastric cancer (GC) is one of the most common malignancies worldwide. However, the biomarkers for the prognosis and diagnosis of Gastric cancer are still need. Objective: The present study aimed to evaluate whether CPZ could be a potential biomarker for GC. Methods: Kaplan-Meier plotter (http://kmplot.com/analysis/) was used to determine the correlation between CPZ expression and overall survival (OS) and disease-free survival (DFS) time in GC. We analyzed CPZ expression in different types of cancer and the correlation of CPZ expression with the abundance of immune infiltrates, including B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells, via gene modules using TIMER Database. Results: The present study identified that CPZ was overexpressed in multiple types of human cancer, including gastric cancer. We found that overexpression of CPZ correlates to the poor prognosis of patients with STAD. Furthermore, our analyses show that immune infiltration levels and diverse immune marker sets are correlated with levels of CPZ expression in STAD. Bioinformatics analysis revealed that CPZ was involved in regulating multiple pathways, including PI3K-Akt signaling pathway, cGMP-PKG signaling pathway, Rap1 signaling pathway, TGF-beta signaling pathway, regulation of cell adhesion, extracellular matrix organization, collagen fibril organization, and collagen catabolic process. Conclusion: This study, for the first time, provides useful information to understand the potential roles of CPZ in tumor immunology and validate it to be a potential biomarker for GC.
-
-
-
Bioinformatics Analysis Reveals Centromere Protein K Can Serve as Potential Prognostic Biomarker and Therapeutic Target for Non-small Cell Lung Cancer
More LessBackground: Non-small cell lung carcinoma (NSCLC) accounts for 80% of all lung cancer cases, which have been a leading cause of morbidity and mortality worldwide. Previous studies demonstrated that centromere proteins were dysregulated and involved in regulating the tumorigenesis and development of human cancers. However, the roles of centromere protein family members in NSCLC remained to be further elucidated. Objective: The present study aimed to explore the roles of centromere protein family members in NSCLC. Methods: GEPIA (http://gepia.cancer-pku.cn/) was used to analyze the target’s expression between normal and human cancers. We explored the prognostic value of centromere proteins in NSCLC using the Kaplan–Meier plotter (http://kmplot.com). The protein-protein interaction among centromere proteins was determined using GeneMANIA (http://www.genemania.org). TISIDB (http://cis.hku.hk/TISIDB) database was used to detect the relationship between centromere protein expression and clinical stages, lymphocytes, immunomodulators and chemokines in NSCLC. The DAVID database (https://david.ncifcrf.gov) was used to detect potential roles of CENPK using its co-expressing genes. Results: The present study for the first time showed that centromere protein family members including CENPA, CENPF, CENPH, CENPI, CENPK, CENPM, CENPN, CENPO, CENPQ, CENPU were dysregulated and correlated to the poor prognosis of patients with LUAD. CENPK showed the greatest correlation with the prognosis of patients with NSCLC. We found that CENPK was significantly highly expressed in LUAD samples and overexpression of CENPK was remarkably correlated to the shorter OS and DFS in patients with a different stage of NSCLC. Of note, this study for the first time showed that CENPK was significantly correlated to the lymphocytes and immunomodulators using the TISIDB database. Conclusion: In summary, CENPK can serve as a novel biomarker for the diagnosis of patients with NSCLC.
-
-
-
Identification of Glioma Specific Genes as Diagnostic and Prognostic Markers for Glioma
Authors: Ming Tu, Ling Ye, ShaoBo Hu, Wei Wang, Penglei Zhu, XiangHe Lu and WeiMing ZhengBackground: Malignant gliomas are the most prevalent malignancy of the brain. However, there was still lack of sensitive and accurate biomarkers for gliomas. Objective: To explore the mechanisms underlying glioma progression and identify novel diagnostic and prognostic markers for glioma. Methods: By analyzing TCGA dataset, whole-genome genes expression levels were evaluated in 19 different types of human cancers. A protein-protein interacting network was constructed to reveal the potential roles of these glioma special genes. KEGG and GO analysis revealed the potential effect of these genes. Results: We identified 698 gliomas specially expressed genes by analyzing TCGA dataset. A protein-protein interacting network was constructed to reveal the potential roles of these glioma special genes. KEGG and GO analysis showed gliomas specially expressed genes were involved in regulating neuroactive ligand-receptor interaction, retrograde endocannabinoid signaling, Glutamatergic synapse, chemical synaptic transmission, nervous system development, central nervous system development, and learning. Of note, GRIA1, GNAO1, GRIN1, CACNA1A, CAMK2A, and SYP were identified to be down-regulated and associated with poor prognosis in gliomas. Conclusion: GRIA1, GNAO1, GRIN1, CACNA1A, CAMK2A, and SYP were identified to be down-regulated and associated with poor prognosis in gliomas. We thought this study will provide novel biomarkers for gliomas.
-
-
-
Comprehensive Analysis Reveals GPRIN1 is a Potential Biomarker for Non-sm all Cell Lung Cancer
Authors: Jian Li, Zheng Gong, Haicheng Jiang, Jie Gao, Jianwei Liang, Peng Chang and Yulong HouBackground: Non-small cell lung cancer (NSCLC) is one of the most leading cause of tumor related mortality worldwide. However, the prognosis of NSCLC remained to be poor and the mechanisms remained to be further investigated. Objective: This study aimed to evaluate whether GPRIN1 could be a potential biomarker for NSCLC. Methods: The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) and GEO database(http://www.ncbi.nlm.nih.gov/geo) were used to analyze the GPRIN1 expression between normal and human cancers. The protein-protein interaction among centromere proteins was determined using STRING database (http://www.bork.emblheidelberg.de/STRING/). GraphPad Prism 5.0 software was utilized for the independent and paired samples’ t-test or ANOVA to analyze the difference of GPRIN1 expression between two groups. Results: This study showed GPRIN1 was overexpressed and correlated to shorter OS time in human cancers. In NSCLC, we found that GPRIN1 was up-regulated in NSCLC samples compared to normal lung tissues by analyzing TCGA and GEO datasets. Bioinformatics analysis indicated that this gene was involved in regulating cancer proliferation and metabolism. Finally, we identified key targets of GPRIN1 in NSCLC by constructing PPl networks, including MCM3, KIF20A, UHRF1, BRCA1, KIF4A, HMMR, KIF18B, KIFC1, ASPM, and NCAPG2. Conclusion: These analyses showed GPRIN1 could act as a prognosis biomarker in patients with NSCLC.
-
-
-
Comprehensive Analysis of Key Proteins Involved in Radioresistance of Prostate Cancer by Integrating Protein-protein Interaction Networks
Authors: Duocheng Qian, Quan Li, Yansong Zhu and Dujian LiBackground: Radioresistance remains a significant obstacle in the treatment of prostate cancer (PCa). The mechanisms underlying the radioresistance in PCa remained to be further investigated. Methods: GSE53902 dataset was used in this study to identify radioresistance-related mRNAs. Protein-protein interaction (PPI) network was constructed based on STRING analysis. DAVID system was used to predict the potential roles of radioresistance-related mRNAs. Results: We screened and re-annotated the GSE53902 dataset to identify radioresistance-related mRNAs. A total of 445 up-regulated and 1036 down-regulated mRNAs were identified in radioresistance PCa cells. Three key PPI networks consisting of 81 proteins were further constructed in PCa. Bioinformatics analysis revealed that these genes were involved in regulating MAP kinase activity, response to hypoxia, regulation of the apoptotic process, mitotic nuclear division, and regulation of mRNA stability. Moreover, we observed that radioresistance-related mRNAs, such as PRC1, RAD54L, PIK3R3, ASB2, FBXO32, LPAR1, RNF14, and UBA7, were dysregulated and correlated to the survival time in PCa. Conclusion: We thought this study would be useful to understand the mechanisms underlying radioresistance of PCa and identify novel prognostic markers for PCa.
-
-
-
Identification of Key mRNAs, miRNAs, and mRNA-miRNA Network Involved in Papillary Thyroid Carcinoma
Authors: Wei Han, Dongchen Lu, Chonggao Wang, Mengdi Cui and Kai LuBackground: In the past decades, the incidence of thyroid cancer (TC) has been gradually increasing, owing to the widespread use of ultrasound scanning devices. However, the key mRNAs, miRNAs, and mRNA-miRNA network in papillary thyroid carcinoma (PTC) has not been fully understood. Methods: In this study, multiple bioinformatics methods were employed, including differential expression analysis, gene set enrichment analysis, and miRNA-mRNA interaction network construction. Results: Firstly, we investigated the key miRNAs that regulated significantly more differentially expressed genes based on GSEA method. Secondly, we searched for the key miRNAs based on the mRNA-miRNA interaction subnetwork involved in PTC. We identified hsa-mir-1275, hsa-mir-1291, hsa-mir-206 and hsa-mir-375 as the key miRNAs involved in PTC pathogenesis. Conclusion: The integrated analysis of the gene and miRNA expression data not only identified key mRNAs, miRNAs, and mRNA-miRNA network involved in papillary thyroid carcinoma, but also improved our understanding of the pathogenesis of PTC.
-
-
-
Identification of Key mRNAs and lncRNAs Associated with the Effects of Anti-TWEAK on Osteosarcoma
Authors: Mingxuan Yang, Liangtao Zhao, Xuchang Hu, Haijun Feng and Xuewen KangBackground: Osteosarcoma (OS) is one of the most common primary malignant bone tumors in teenagers. Emerging studies demonstrated TWEAK and Fn14 were involved in regulating cancer cell differentiation, proliferation, apoptosis, migration and invasion. Objective: The present study identified differently expressed mRNAs and lncRNAs after anti- TWEAK treatment in OS cells using GSE41828. Methods: We identified 922 up-regulated mRNAs, 863 down-regulated mRNAs, 29 up-regulated lncRNAs, and 58 down-regulated lncRNAs after anti-TWEAK treatment in OS cells. By constructing PPI networks, we identified several key proteins involved in anti-TWEAK treatment in OS cells, including MYC, IL6, CD44, ITGAM, STAT1, CCL5, FN1, PTEN, SPP1, TOP2A, and NCAM1. By constructing lncRNAs co-expression networks, we identified several key lncRNAs, including LINC00623, LINC00944, PSMB8-AS1, LOC101929787. Result: Bioinformatics analysis revealed DEGs after anti-TWEAK treatment in OS were involved in regulating type I interferon signaling pathway, immune response-related pathways, telomere organization, chromatin silencing at rDNA, and DNA replication. Bioinformatics analysis revealed differently expressed lncRNAs after anti-TWEAK treatment in OS were related to telomere organization, protein heterotetramerization, DNA replication, response to hypoxia, TNF signaling pathway, PI3K-Akt signaling pathway, Focal adhesion, Apoptosis, NF-kappa B signaling pathway, MAPK signaling pathway, FoxO signaling pathway. Conclusion: This study provided useful information for understanding the mechanisms of TWEAK underlying OS progression and identifying novel therapeutic markers for OS.
-
-
-
Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches
Authors: Omer Irshad and Muhammad U. Ghani KhanAim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of the human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives, which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of the human cellular system. Inferring new knowledge from known facts always requires a reasonably large amount of data in well-structured, integrated, and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at an astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop an aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Methods: We propose an aspect-oriented formal data integration model that uses web semantics standards to formally specify its every construct. The proposed model supports the aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with each other in a physical cell system. Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
