Volume 16, Issue 9

Current Bioinformatics - Volume 16, Issue 9, 2021

Volume 16, Issue 9, 2021

- Meet the Editorial Board Member
  
  By Jeyaraman Jeyakanthan
  
  https://doi.org/10.2174/157489361609211025111245
  More Less
  
  Add to my favourites
  
  Email this

- An Appraisal of Skill Gaps in Bioinformatics Education
  
  Authors: Smriti Sharma and Vinayak Bhatia
  
  https://doi.org/10.2174/1574893616666210609094743
  More Less
  
  There has been an exponential rise in the field of bioinformatics in the last decade. The specialists of this field need to be well versed in computing, statistics, and mathematics, along with expertise in biological sciences. This review is an attempt to understand the existing skill gaps in the education of bioinformatics globally and to give the policy developers some indicators while designing the curriculum of the bioinformatics course. Authors have found that the pace with which this field is developing is not commensurate with the training and education efforts at the university level globally. But, on a positive note, academia and industry both seem to recognise this lag, therefore, efforts are being made in this direction. This review also summarizes the distinctive features needed to integrate bioinformatics in education for curriculum designing, teaching, and learning methods.
  
  Add to my favourites
  
  Email this

- Identification of Disordered Regions of Intrinsically Disordered Proteins by Multi-features Fusion
  
  Authors: Sun Canzhuang and Feng Yonge
  
  https://doi.org/10.2174/1574893616666210308102552
  More Less
  
  Background: Intrinsically disordered proteins lack a well-defined three-dimensional structure under physiological conditions. They have performed multiple functions in life activities and are closely related to many human diseases. The identification of the disordered region of intrinsically disordered proteins is important to protein function annotation. Objective: To accurately identify the disordered regions in intrinsically disordered proteins. Methods: In this study, we constructed a multi-feature fusion model based on a support vector machine to predict disordered regions of intrinsically disordered proteins from the DisPort database. We extracted codons usage frequencies, GC content, protein secondary structure components, hydrophilic-hydrophobic amino acid components, and chemical shifts as features to predict the disordered regions of intrinsically disordered proteins. Results: The best accuracy is 82.098% by using codon frequencies in single feature prediction. In order to improve the performance, we fused these features and obtained the best result of 83.173% in combining codons frequencies with chemical shifts as the feature. Conclusion: The results show that our model has achieved a good prediction result in predicting disordered regions of intrinsically disordered proteins-moreover, the performances of our model are better than those of existing methods.
  
  Add to my favourites
  
  Email this

- Construction of Anatomical Structure-specific Developmental Dynamic Networks for Human Brain on Multiple Omics Levels
  
  Authors: Yingying Wang, Yu Yang, Jianfeng Liu and Keshen Li
  
  https://doi.org/10.2174/1574893616666210331115659
  More Less
  
  Background: Human brain development is a series of complex processes exhibiting profound changes from gestation to adulthood. Objective: We aimed to construct dynamic developmental networks for each anatomical structure of the human brain based on omics’ levels in order to gain a new systematical brain map on the molecular level. Methods: We performed the brain development analyses by constructing dynamical networks between adjacent time points on different grouping levels of anatomical structures. The gene-time networks were first built to obtain the developing brain dynamical maps on transcriptome level. Then miRNA-mRNA networks and protein-protein networks were constructed by integrating the information from miRNomics and proteomics. The time and structure-specific biomarkers were filtered based on analyses of topological characters. Results: The most dramatic developmental time and structure were fetal-infancy and telencephalon, respectively. Cortex was the key developmental region in ‘late fetal and neonatal’ and ‘early infancy’. The development of the temporal lobe was different from other lobes since the significant changes of molecules were found only in the comparison pair ‘early fetal-early mid-fetal’ and ‘adolescence-young adulthood’. Interestingly, the changes among different brain structures inside adolescence and adulthood were bigger than other time points. hsa-miR-548c-3p and H3C2 may be new brain development indicators considering their key roles in networks. Conclusion: To our knowledge, this study is the first report of dynamical brain development maps for different anatomical structures on multiple omics. The results provide a new sight of brain development in a systematical way which may provide a more accurate understanding of the human brain.
  
  Add to my favourites
  
  Email this

- PathDriver: Cancer Driver Genes Identification Based on the Metabolic Pathway
  
  Authors: Xianghua Peng, Fang Liu, Ping Liu, Xing Li and Xinguo Lu
  
  https://doi.org/10.2174/1574893616666210727153526
  More Less
  
  Aim: In exploiting cancer initialization and progression, a great challenge is to identify the driver genes. Background: With advances in Next-Generation Sequencing (NGS) technologies, the identification of specific oncogenic genes has emerged through integrating multi-omics data. Although the existing computational models have identified many common driver genes, they rely on individual regulatory mechanisms or independent copy number variants, ignoring the dynamic function of genes in pathways and networks. Objective: The molecular metabolic pathway is a critical biological process in tumor initiation, progression and maintenance. Establishing the role of genes in pathways and networks helps to describe their functional roles under physiological and pathological conditions at multiple levels. Methods: We present a metabolic pathway based driver genes identification (pathDriver) to distinguish different cancer types/subtypes. In pathDriver, combined with protein-protein interaction network, the metabolic pathway is utilized to construct the pathway network. Then, the Interaction Frequency (IF) and Inverse Pathway Frequency (IPF) are used to evaluate the collaborative impact factor of genes in the pathway network. Finally, the cancer-specific driver genes are identified by calculating the scores of edges connected to genes in the pathway network. Results: We applied it to 16 kinds of TCGA cancers for pan-cancer analysis. Conclusion: The driving pathway identified biologically significant known cancer genes and the potential new candidate genes.
  
  Add to my favourites
  
  Email this

- Screening Differential Hub Genes Related with the Hypoglycemic Effect of Quercetin Through Data Mining
  
  Authors: Ji-Ping Wei, Tao Luo, Yuchen Wang and Wenyu Lu
  
  https://doi.org/10.2174/1574893616666210617110314
  More Less
  
  Background: The effect of quercetin on blood glucose levels has been widely studied. However, the mechanism of hypoglycemic effect of quercetin remains unclear. Objective: To elucidate hypoglycemic effect of quercetin, microarray data of GSE38067 dataset have been used to screen Differential Hub Genes (DHGs) by differential expression analysis, weighted gene co-expression network analysis and protein-protein interaction analysis. Methods: Through systematic data mining in this study, the hypoglycemic effect of quercetin was exerted via affecting the gene expression of seven candidate DHGs, especially Cdkn1a and Cd36 genes, to relieve insulin resistance, prevent oxidative damage and protect pancreatic β-cells in streptozotocin (STZ) induced diabetic mice. Result and Conclusion: As a result, this work provides a possible way to fight against diabetes by using quercetin as functional food ingredients or medicine.
  
  Add to my favourites
  
  Email this

- PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest
  
  Authors: Chuanyan Wu, Bentao Lin, Kai Shi, Qingju Zhang, Rui Gao, Zhiguo Yu, Yang De Marinis, Yusen Zhang and Zhi-Ping Liu
  
  https://doi.org/10.2174/1574893616666210617162258
  More Less
  
  Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, information- based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, ReliefF- based feature selection method was adopted to measure the weights of these features. Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.
  
  Add to my favourites
  
  Email this

- Prediction of Off-Target Effects in CRISPR/Cas9 System by Ensemble Learning
  
  Authors: Yongxian Fan and Haibo Xu
  
  https://doi.org/10.2174/1574893616666210811100938
  More Less
  
  Background: CRISPR/Cas9, a new generation of targeted gene editing technology with low cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible, but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction. Methods: We compared three encoding methods based on One-Hot and combined the gene sequence with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance. Results: The performance is compared with existing tools based on the ROC value and PRC value. The experimental results show that the XGBCRISPR model is superior to the existing tools. Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy of model can be improved further as many off-target scores appear.
  
  Add to my favourites
  
  Email this

- Prediction of lncRNA-disease Associations Based on Robust Multi-label Learning
  
  Authors: Jiaxin Zhang, Quanmeng Sun and Cheng Liang
  
  https://doi.org/10.2174/1574893616666210712091221
  More Less
  
  Background: Long non-coding RNAs (lncRNAs) are nonprotein-coding transcripts of more than 200 nucleotides in length. In recent years, studies have shown that long non-coding RNAs (lncRNA) play a vital role in various biological processes, complex disease diagnosis, prognosis, and treatment. Objective: Analysis of known lncRNA-disease associations and prediction of potential lncRNA-disease associations are necessary to provide the most probable candidates for subsequent experimental validation. Methods: In this paper, we present a novel robust computational framework for lncRNA-disease association prediction by combining the 132;“1-norm graph with multi-label learning. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases using known associations. Then, both lncRNA and disease similarity matrices are adaptively re-weighted to enhance the robustness via the 132;“1- norm graph. Lastly, the association matrix is updated with a graph-based multi-label learning framework to uncover the underlying consistency between the lncRNA space and the disease space. Results: We compared the proposed method with the four latest methods on five widely used data sets. The experimental results show that our method can achieve comparable performance in both five-fold cross-validation and leave-one-disease-out cross-validation prediction tasks. The case study of prostate cancer further confirms the practicability of our approach in identifying lncRNAs as potential prognostic biomarkers. Conclusion: Our method can serve as a useful tool for the prediction of novel lncRNA-disease associations.
  
  Add to my favourites
  
  Email this

- GEREA: Prediction of Gene Expression Regulators from Transcriptome Profiling Data to Transition Networks
  
  Authors: Min Yao, Caiyun Jiang, Chenglong Li, Yongxia Li, Shan Jiang, Liang He, Hong Xiao, Jima Quan, Xiali Huang and Tinghua Huang
  
  https://doi.org/10.2174/1574893616666210621100335
  More Less
  
  Background: Mammalian genes are regulated at the transcriptional and posttranscriptional levels. These mechanisms may involve the direct promotion or inhibition of transcription via a regulator or post-transcriptional regulation through factors such as micro (mi)RNAs. Objective: Construct gene regulation relationships modulated by causality inference-based miRNA- (transition factor)-(target gene) networks and analysis gene expression data to identify gene expression regulators. Methods: Mouse gene expression regulation relationships were manually curated from literature using a text mining method which were then employed to generate miRNA-(transition factor)- (target gene) networks. An algorithm was then introduced to identify gene expression regulators from transcriptome profiling data by applying enrichment analysis to these networks. Results: A total of 22,271 mouse gene expression regulation relationships were curated for 4,018 genes and 242 miRNAs. GEREA software was developed to perform the integrated analyses. We applied the algorithm to transcriptome data for synthetic miR-155 oligo-treated mouse CD4⁺ Tcells and confirmed that miR-155 is an important network regulator. The software was also tested on publicly available transcriptional profiling data for Salmonella infection, resulting in the identification of miR-125b as an important regulator. Conclusion: The causality inference-based miRNA-(transition factor)-(target gene) networks serve as a novel resource for gene expression regulation research, and GEREA is an effective and useful adjunct to the currently available methods. The regulatory networks and the algorithm implemented in the GEREA software package are available under a free academic license at http://www.thua45.cn/gerea.
  
  Add to my favourites
  
  Email this

- Identification of Potential Immune-related Biomarkers in Gastrointestinal Cancers
  
  Authors: Tianyu Zhu, Qi Dai and Ping-An He
  
  https://doi.org/10.2174/1574893615666210106121335
  More Less
  
  Objectives: Gastrointestinal (GI) cancer is the most common and lethal malignant tumor, while limited research and biomarkers are available to stratify patients who are likely to benefit from immunotherapy in GI cancers. During early diagnosis and prognosis of GI cancers, searching for shared potential biomarkers and differences among stages is an urgent and challenging task. The staging RNA expression data corresponding to immune genes were analyzed to infer the immune system in each stage of GI cancers. Methods: The differential expression gene analysis was performed to analyze the expression of 758 immune genes between normal and each stage samples of GI cancers. Enrichment analysis including GO and KEGG pathway analysis was carried out to investigate the role of these differential genes and underlying mechanisms in GI cancers. Furthermore, PPI network analysis recognized the hub genes among these DEGs. Overall survival analysis was processed to clarify the diagnostic and prognostic role of these potential biomarkers in early and advanced stages. Results: Our present work revealed the immunological commonness and differences across stages of GI cancers, and disclosed several potential immune-related biomarkers, including CCL20, C7, CD36, CXCL11, and CLEC5A. The potential biological function which immune system participates across the GI cancers was highly correlated with virus and membrane. Conclusion: Our result facilitates to understand the involvement of immune system in GI cancers and better design treatment strategies based on current cancer immunotherapy.
  
  Add to my favourites
  
  Email this

- A Path-based Method for Identification of Protein Phenotypic Annotations
  
  Authors: Jian Gao, Bin Hu and Lei Chen
  
  https://doi.org/10.2174/1574893616666210531100035
  More Less
  
  Background: Identification of protein phenotypic annotations is an essential and challenging problem in modern genetics. Such problem is related to some serious diseases, including cancers, HIV and so on. The factors of genotype and environment increase the difficulties in determining the phenotype of proteins. The experiment methods to achieve such a goal are always timeconsuming and expensive. Objective: The aim of this study was to design a quick and cheap method for determining the phenotypes of proteins. Methods: In this study, we proposed a network computational method to identify novel phenotypic annotations of proteins. To execute such method, a heterogeneous network was constructed, which contained three sub-networks: protein network, phenotypic type network, and protein-phenotypic type network. The method tried to find out all paths with limited length, which connected one protein and one phenotypic type. A scoring scheme was adopted to count obtained paths and induced a score to indicate the associations between them. Results and Conclusion: The ROC and PR curve analyses were done to evaluate the performance of the method, indicating the utility of the method. Our method was superior to other network methods, which incorporated popular network algorithms.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 16, Issue 9, 2021

Volume 16, Issue 9, 2021

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed