Current Bioinformatics - Volume 16, Issue 9, 2021
Volume 16, Issue 9, 2021
-
-
An Appraisal of Skill Gaps in Bioinformatics Education
Authors: Smriti Sharma and Vinayak BhatiaThere has been an exponential rise in the field of bioinformatics in the last decade. The specialists of this field need to be well versed in computing, statistics, and mathematics, along with expertise in biological sciences. This review is an attempt to understand the existing skill gaps in the education of bioinformatics globally and to give the policy developers some indicators while designing the curriculum of the bioinformatics course. Authors have found that the pace with which this field is developing is not commensurate with the training and education efforts at the university level globally. But, on a positive note, academia and industry both seem to recognise this lag, therefore, efforts are being made in this direction. This review also summarizes the distinctive features needed to integrate bioinformatics in education for curriculum designing, teaching, and learning methods.
-
-
-
Identification of Disordered Regions of Intrinsically Disordered Proteins by Multi-features Fusion
Authors: Sun Canzhuang and Feng YongeBackground: Intrinsically disordered proteins lack a well-defined three-dimensional structure under physiological conditions. They have performed multiple functions in life activities and are closely related to many human diseases. The identification of the disordered region of intrinsically disordered proteins is important to protein function annotation. Objective: To accurately identify the disordered regions in intrinsically disordered proteins. Methods: In this study, we constructed a multi-feature fusion model based on a support vector machine to predict disordered regions of intrinsically disordered proteins from the DisPort database. We extracted codons usage frequencies, GC content, protein secondary structure components, hydrophilic-hydrophobic amino acid components, and chemical shifts as features to predict the disordered regions of intrinsically disordered proteins. Results: The best accuracy is 82.098% by using codon frequencies in single feature prediction. In order to improve the performance, we fused these features and obtained the best result of 83.173% in combining codons frequencies with chemical shifts as the feature. Conclusion: The results show that our model has achieved a good prediction result in predicting disordered regions of intrinsically disordered proteins-moreover, the performances of our model are better than those of existing methods.
-
-
-
Construction of Anatomical Structure-specific Developmental Dynamic Networks for Human Brain on Multiple Omics Levels
Authors: Yingying Wang, Yu Yang, Jianfeng Liu and Keshen LiBackground: Human brain development is a series of complex processes exhibiting profound changes from gestation to adulthood. Objective: We aimed to construct dynamic developmental networks for each anatomical structure of the human brain based on omics’ levels in order to gain a new systematical brain map on the molecular level. Methods: We performed the brain development analyses by constructing dynamical networks between adjacent time points on different grouping levels of anatomical structures. The gene-time networks were first built to obtain the developing brain dynamical maps on transcriptome level. Then miRNA-mRNA networks and protein-protein networks were constructed by integrating the information from miRNomics and proteomics. The time and structure-specific biomarkers were filtered based on analyses of topological characters. Results: The most dramatic developmental time and structure were fetal-infancy and telencephalon, respectively. Cortex was the key developmental region in ‘late fetal and neonatal’ and ‘early infancy’. The development of the temporal lobe was different from other lobes since the significant changes of molecules were found only in the comparison pair ‘early fetal-early mid-fetal’ and ‘adolescence-young adulthood’. Interestingly, the changes among different brain structures inside adolescence and adulthood were bigger than other time points. hsa-miR-548c-3p and H3C2 may be new brain development indicators considering their key roles in networks. Conclusion: To our knowledge, this study is the first report of dynamical brain development maps for different anatomical structures on multiple omics. The results provide a new sight of brain development in a systematical way which may provide a more accurate understanding of the human brain.
-
-
-
PathDriver: Cancer Driver Genes Identification Based on the Metabolic Pathway
Authors: Xianghua Peng, Fang Liu, Ping Liu, Xing Li and Xinguo LuAim: In exploiting cancer initialization and progression, a great challenge is to identify the driver genes. Background: With advances in Next-Generation Sequencing (NGS) technologies, the identification of specific oncogenic genes has emerged through integrating multi-omics data. Although the existing computational models have identified many common driver genes, they rely on individual regulatory mechanisms or independent copy number variants, ignoring the dynamic function of genes in pathways and networks. Objective: The molecular metabolic pathway is a critical biological process in tumor initiation, progression and maintenance. Establishing the role of genes in pathways and networks helps to describe their functional roles under physiological and pathological conditions at multiple levels. Methods: We present a metabolic pathway based driver genes identification (pathDriver) to distinguish different cancer types/subtypes. In pathDriver, combined with protein-protein interaction network, the metabolic pathway is utilized to construct the pathway network. Then, the Interaction Frequency (IF) and Inverse Pathway Frequency (IPF) are used to evaluate the collaborative impact factor of genes in the pathway network. Finally, the cancer-specific driver genes are identified by calculating the scores of edges connected to genes in the pathway network. Results: We applied it to 16 kinds of TCGA cancers for pan-cancer analysis. Conclusion: The driving pathway identified biologically significant known cancer genes and the potential new candidate genes.
-
-
-
Screening Differential Hub Genes Related with the Hypoglycemic Effect of Quercetin Through Data Mining
Authors: Ji-Ping Wei, Tao Luo, Yuchen Wang and Wenyu LuBackground: The effect of quercetin on blood glucose levels has been widely studied. However, the mechanism of hypoglycemic effect of quercetin remains unclear. Objective: To elucidate hypoglycemic effect of quercetin, microarray data of GSE38067 dataset have been used to screen Differential Hub Genes (DHGs) by differential expression analysis, weighted gene co-expression network analysis and protein-protein interaction analysis. Methods: Through systematic data mining in this study, the hypoglycemic effect of quercetin was exerted via affecting the gene expression of seven candidate DHGs, especially Cdkn1a and Cd36 genes, to relieve insulin resistance, prevent oxidative damage and protect pancreatic β-cells in streptozotocin (STZ) induced diabetic mice. Result and Conclusion: As a result, this work provides a possible way to fight against diabetes by using quercetin as functional food ingredients or medicine.
-
-
-
PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest
Authors: Chuanyan Wu, Bentao Lin, Kai Shi, Qingju Zhang, Rui Gao, Zhiguo Yu, Yang De Marinis, Yusen Zhang and Zhi-Ping LiuBackground: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, information- based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, ReliefF- based feature selection method was adopted to measure the weights of these features. Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.
-
-
-
Prediction of Off-Target Effects in CRISPR/Cas9 System by Ensemble Learning
Authors: Yongxian Fan and Haibo XuBackground: CRISPR/Cas9, a new generation of targeted gene editing technology with low cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible, but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction. Methods: We compared three encoding methods based on One-Hot and combined the gene sequence with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance. Results: The performance is compared with existing tools based on the ROC value and PRC value. The experimental results show that the XGBCRISPR model is superior to the existing tools. Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy of model can be improved further as many off-target scores appear.
-
-
-
Prediction of lncRNA-disease Associations Based on Robust Multi-label Learning
Authors: Jiaxin Zhang, Quanmeng Sun and Cheng LiangBackground: Long non-coding RNAs (lncRNAs) are nonprotein-coding transcripts of more than 200 nucleotides in length. In recent years, studies have shown that long non-coding RNAs (lncRNA) play a vital role in various biological processes, complex disease diagnosis, prognosis, and treatment. Objective: Analysis of known lncRNA-disease associations and prediction of potential lncRNA-disease associations are necessary to provide the most probable candidates for subsequent experimental validation. Methods: In this paper, we present a novel robust computational framework for lncRNA-disease association prediction by combining the 132;“1-norm graph with multi-label learning. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases using known associations. Then, both lncRNA and disease similarity matrices are adaptively re-weighted to enhance the robustness via the 132;“1- norm graph. Lastly, the association matrix is updated with a graph-based multi-label learning framework to uncover the underlying consistency between the lncRNA space and the disease space. Results: We compared the proposed method with the four latest methods on five widely used data sets. The experimental results show that our method can achieve comparable performance in both five-fold cross-validation and leave-one-disease-out cross-validation prediction tasks. The case study of prostate cancer further confirms the practicability of our approach in identifying lncRNAs as potential prognostic biomarkers. Conclusion: Our method can serve as a useful tool for the prediction of novel lncRNA-disease associations.
-
-
-
GEREA: Prediction of Gene Expression Regulators from Transcriptome Profiling Data to Transition Networks
Authors: Min Yao, Caiyun Jiang, Chenglong Li, Yongxia Li, Shan Jiang, Liang He, Hong Xiao, Jima Quan, Xiali Huang and Tinghua HuangBackground: Mammalian genes are regulated at the transcriptional and posttranscriptional levels. These mechanisms may involve the direct promotion or inhibition of transcription via a regulator or post-transcriptional regulation through factors such as micro (mi)RNAs. Objective: Construct gene regulation relationships modulated by causality inference-based miRNA- (transition factor)-(target gene) networks and analysis gene expression data to identify gene expression regulators. Methods: Mouse gene expression regulation relationships were manually curated from literature using a text mining method which were then employed to generate miRNA-(transition factor)- (target gene) networks. An algorithm was then introduced to identify gene expression regulators from transcriptome profiling data by applying enrichment analysis to these networks. Results: A total of 22,271 mouse gene expression regulation relationships were curated for 4,018 genes and 242 miRNAs. GEREA software was developed to perform the integrated analyses. We applied the algorithm to transcriptome data for synthetic miR-155 oligo-treated mouse CD4+ Tcells and confirmed that miR-155 is an important network regulator. The software was also tested on publicly available transcriptional profiling data for Salmonella infection, resulting in the identification of miR-125b as an important regulator. Conclusion: The causality inference-based miRNA-(transition factor)-(target gene) networks serve as a novel resource for gene expression regulation research, and GEREA is an effective and useful adjunct to the currently available methods. The regulatory networks and the algorithm implemented in the GEREA software package are available under a free academic license at http://www.thua45.cn/gerea.
-
-
-
Identification of Potential Immune-related Biomarkers in Gastrointestinal Cancers
Authors: Tianyu Zhu, Qi Dai and Ping-An HeObjectives: Gastrointestinal (GI) cancer is the most common and lethal malignant tumor, while limited research and biomarkers are available to stratify patients who are likely to benefit from immunotherapy in GI cancers. During early diagnosis and prognosis of GI cancers, searching for shared potential biomarkers and differences among stages is an urgent and challenging task. The staging RNA expression data corresponding to immune genes were analyzed to infer the immune system in each stage of GI cancers. Methods: The differential expression gene analysis was performed to analyze the expression of 758 immune genes between normal and each stage samples of GI cancers. Enrichment analysis including GO and KEGG pathway analysis was carried out to investigate the role of these differential genes and underlying mechanisms in GI cancers. Furthermore, PPI network analysis recognized the hub genes among these DEGs. Overall survival analysis was processed to clarify the diagnostic and prognostic role of these potential biomarkers in early and advanced stages. Results: Our present work revealed the immunological commonness and differences across stages of GI cancers, and disclosed several potential immune-related biomarkers, including CCL20, C7, CD36, CXCL11, and CLEC5A. The potential biological function which immune system participates across the GI cancers was highly correlated with virus and membrane. Conclusion: Our result facilitates to understand the involvement of immune system in GI cancers and better design treatment strategies based on current cancer immunotherapy.
-
-
-
A Path-based Method for Identification of Protein Phenotypic Annotations
More LessBackground: Identification of protein phenotypic annotations is an essential and challenging problem in modern genetics. Such problem is related to some serious diseases, including cancers, HIV and so on. The factors of genotype and environment increase the difficulties in determining the phenotype of proteins. The experiment methods to achieve such a goal are always timeconsuming and expensive. Objective: The aim of this study was to design a quick and cheap method for determining the phenotypes of proteins. Methods: In this study, we proposed a network computational method to identify novel phenotypic annotations of proteins. To execute such method, a heterogeneous network was constructed, which contained three sub-networks: protein network, phenotypic type network, and protein-phenotypic type network. The method tried to find out all paths with limited length, which connected one protein and one phenotypic type. A scoring scheme was adopted to count obtained paths and induced a score to indicate the associations between them. Results and Conclusion: The ROC and PR curve analyses were done to evaluate the performance of the method, indicating the utility of the method. Our method was superior to other network methods, which incorporated popular network algorithms.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
