Volume 18, Issue 5

Current Bioinformatics - Volume 18, Issue 5, 2023

Volume 18, Issue 5, 2023

- Genomic Characterization of Emerging SARS-CoV-2: A Systematic Review
  
  Authors: Shikha Sharma, Rinkle Rani and Nidhi Kalra
  
  https://doi.org/10.2174/1574893618666230228115423
  More Less
  
  Introduction: Severe Acute Respiratory Syndrome Coronavirus – 2, SARS-CoV-2, is a wellknown virus for its fatal infectivity and widespread impact on the health of the worldwide population. Genome sequencing is critical in understanding the virus’s behavior, origin, and genetic variants. This article conducts an extensive literature review on the SARS-CoV-2 genome, including its Genome Structure, Genome Analysis, Evolution, Mutation, and, Genome Computation. It highlights the summary of clinical and evolutionary research along with the applicability of computational methods in the areas. It lucidly presents the structural detail and mutation analysis of SARS-CoV-2 without overwhelming the readers with difficult terms. In the pandemic, machine learning and deep learning emerged as a paradigm change, that when combined with genome analysis, enabled more precise identification and prognosis of the virus's impact. Molecular detailing is crucial in extracting features from the SARS-CoV-2 genome before computation models are applied. Methods: Further, in this systematic study we investigate the usage of Machine Learning and Deep Learning models mapped to SARS-CoV-2 genome samples to see their applicability in virus detection and disease severity prediction. We searched research articles from various reputed journals explaining the structure, evolution, mutations, and computational methods published until June 2022. Results: The paper summarizes significant trends in the research of SARS-COV-2 genomes. Furthermore, this research also identifies the limitations and research gaps that yet have to be explored more and indicates future directions. Impact Statement: There are few review articles on the SARS-CoV-2 genome; these reviews target various aspects of the SARS-COV2 genome individually. This article considers all the aspects simultaneously and provides in-depth knowledge about the SARS-CoV-2 genome. Conclusion: This article provides a detailed description about the type of samples, volumes of selection, processes, and tools used by various researchers in their studies. Further, the computational techniques applied to the SARS-COV2 genome are also discussed and analysed thoroughly.
  
  Add to my favourites
  
  Email this

- Quality Control of Gene Expression Data Allows Accurate Quantification of Differentially Expressed Biological Pathways
  
  Authors: Ellen Reed, Enrico Ferrari and Mikhail Soloviev
  
  https://doi.org/10.2174/1574893618666230221141815
  More Less
  
  Background: Gene expression signatures provide a promising diagnostic tool for many diseases, including cancer. However, there remain multiple issues related to the quality of gene expression data, which may impede the analysis and interpretation of differential gene expression in cancer. Objective: We aimed to address existing issues related to the quality of gene expression data and to devise improved quality control (QC) and expression data processing procedures. Methods: Linear regression analysis was applied to gene expression datasets generated from diluted and pre-mixed matched breast cancer and normal breast tissue samples. Datapoint outliers were identified and removed, and accurate expression values corresponding to cancer and normal tissues were recalculated. Results: We achieved a 27% increase in the number of identifiable differentially regulated genes and a similar reduction in the number of false positives identified from microarray DEG data. Our approach reduced technical errors and improved the accuracy and precision of determining the degree of DEG but did not remove biological outliers, such as naturally variably expressed genes. We also determined the linear dynamic range of microarray assay directly from expression data, which allowed accurate quantification of differentially expressed entire pathways. Conclusion: The improved QC allowed accurate discrimination of genes by the degree of their upregulation, which helped to reveal an intricate and highly tuned network of biological pathways and their regulation in cancer. We were able, for the first time, to quantify the degree of transcriptional upregulation of entire individual biological pathways upregulated in breast cancer. It can be concluded that the vast majority of DEG data that are publicly available today may have been generated using sub-optimal experimental design, lacking preparations required for genuinely accurate and quantitative analysis.
  
  Add to my favourites
  
  Email this

- Trimming and Decontamination of Metagenomic Data can Significantly Impact Assembly and Binning Metrics, Phylogenomic and Functional Analysis
  
  Authors: Jason M. Whitham and Amy M. Grunden
  
  https://doi.org/10.2174/1574893618666230227145952
  More Less
  
  Background: Investigators using metagenomic sequencing to study microbiomes often trim and decontaminate reads without knowing their effect on downstream analyses. Objective: This study was designed to evaluate the impacts JGI trimming and decontamination procedures have on assembly and binning metrics, placement of MAGs into species trees, and functional profiles of MAGs extracted from complex rhizosphere metagenomes, as well as how more aggressive trimming impacts these binning metrics. Methods: Twenty-three Miscanthus x giganteus rhizosphere metagenomes were subjected to different combinations and thresholds of force, kmer, and quality trimming and decontamination using BBDuk. Reads were assembled and binned in KBase. Phylogenomic and statistical analyses were applied to evaluate the effects of trimming and decontamination on downstream analyses. Results: We found that JGI trimmed and decontaminated reads had significant impacts on assembly and binning metrics compared to raw reads, including significantly higher total contig counts, more contigs greater than 10k bp in length, and larger total lengths of raw assemblies compared to QC assemblies, and 2.0% lower average contamination of QC MAGs compared to raw MAGs. We also found that differences in the placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. Furthermore, aggressive trimming (Q20) was found to significantly reduce MAG counts. Conclusion: Trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing?” However, mild trimming and decontamination of metagenomic reads with high-quality scores are recommended for removing sample processing and sequencing artifacts.
  
  Add to my favourites
  
  Email this

- Advanced Multivariable Statistical Analysis Interactive Tool for Handling Missing Data and Confounding Covariates for Label-free LC-MS Proteomics Experiments
  
  Authors: Sudhir Srivastava, Michael L. Merchant, Craig J. McClain, Anil Rai, Krishna K. Chaturvedi, Ulavappa B. Angadi, Dwijesh C. Mishra and Shesh N. Rai
  
  https://doi.org/10.2174/1574893618666230223150253
  More Less
  
  Background: Careful consideration is required for detecting significant features (proteins or peptides) in LC-MS proteomics studies using multivariable regression analyses. In proteomics data, missing values can arise due to random errors, bad samples, features below the detection limit in specific samples, etc. Further, expression data are always prone to heterogeneity due to technical/biological reasons. Missing values and heterogeneity in proteomics studies can confound important findings. Moreover, there is additional information in these studies, such as pre-clinical and clinical information (e.g., sex, exposure, etc.), which can be used to supplement the inference. Methods: We introduce a user-friendly web application SATP (Statistical Analysis interactive Tool for label-free LC-MS Proteomics experiments) for differential expression analysis of proteomics data that is scalable to large clinical proteomic studies. Appropriate normalization and imputation methods have been provided. Apart from these, several statistical tests such as t-test, moderated t-test, linear fixed effect model, and linear mixed model with adjustment of effect of extra covariates have also been provided for users' benefit. Results: Our intuitive tool has several advantages over the existing ones, including an extension to multiple factor comparisons after adjusting for covariates. Conclusion: This is a comprehensive tool for analysis of complex experiments with multiple covariates, whereas most of the existing tools were developed for comparing simple experiments mostly with two groups without covariates. Availability: The tool can be accessed freely by the users from https://ulbbf.shinyapps.io/satp/.
  
  Add to my favourites
  
  Email this

- Development and Study of a Knowledge Graph for Retrieving the Relationship Between BVDV and Related Genes
  
  Authors: Jia Lv, Yunli Bai, Lu Chang, Yingfei Li, Rulin Wang and Weiguang Zhou
  
  https://doi.org/10.2174/1574893618666230224142324
  More Less
  
  Background: Bovine viral diarrhea virus (BVDV) can cause diarrhea, abortion, and immunosuppression in cattle, imposing huge economic losses for the global cattle industry. The pathogenic and immune mechanisms of BVDV remain elusive. The development of a BVDV-gene knowledge base can provide clues to reveal the interaction of BVDV with host cells. However, the traditional method of manually establishing a knowledge base is time-consuming and inefficient. The method of developing a knowledge base based on deep learning has noticeably attracted scholars' attention recently. Objective: The study aimed to explore the substitution of deep learning for manual mining of BVDVrelated genes and to develop a knowledge graph of the relationship between BVDV and related genes. Methods: A deep learning-based biomedical knowledge graph development method was proposed, which used deep learning to mine biomedical knowledge, model BVDV and various gene concepts, and store data in a graphical database. First, the PubMed database was used as the data source and crawler technology to obtain abstract data on the relationship between BVDV and various host genes. Pretrained BioBERT model was used for biomedical named entity recognition to obtain all types of gene entities, and the pre-trained BERT model was utilized for relationship extraction to achieve the relationship between BVDV and various gene entities. Then, it was combined with manual proofreading to obtain structured triple data with high accuracy. Finally, the Neo4j graph database was used to store data and to develop the knowledge graph of the relationship between BVDV and related genes. Results: The results showed the obtainment of 71 gene entity types, including PRL4, MMP-7, TGIF1, etc. 9 relation types of BVDV and gene entities were obtained, including "can downregulate expression of", "can upregulate expression of&;quot;, "can suppress expression of", etc. The knowledge graph was developed using deep learning to mine biomedical knowledge combined with manual proofreading, which was faster and more efficient than the traditional method of establishing knowledge base manually, and the retrieval of semantic information by storing data in graph database was also more efficient. Conclusion: A BVDV-gene knowledge graph was preliminarily developed, which provided a basis for studying the interaction between BVDV and host cells.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 18, Issue 5, 2023

Volume 18, Issue 5, 2023

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed