Current Genomics - Volume 22, Issue 5, 2021
Volume 22, Issue 5, 2021
-
-
An Overview of Algorithms and Associated Applications for Single Cell RNA-Seq Data Imputation
Authors: Zarrin Basharat, Sania Majeed, Humaira Saleem, Ishtiaq A. Khan and Azra YasminSingle cell RNA-Seq technology enables the assessment of RNA expression in individual cells. This makes it popular in experimental biology for gleaning specifications of novel cell types as well as inferring heterogeneity. Experimental data conventionally contains zero counts or dropout events for many single cell transcripts. Such missing data hampers the accurate analysis using standard workflows, designed for massive RNA-Seq datasets. Imputation for single cell datasets is done to infer the missing values. This was traditionally done with ad-hoc code but later customized pipelines, workflows and specialized software appeared for this purpose. This made it easy to benchmark and cluster things in an organized manner. In this review, we have assembled a catalog of available RNASeq single cell imputation algorithms/workflows and associated softwares for the scientific community performing single-cell RNA-Seq data analysis. Continued development of imputation methods, especially using deep learning approaches, would be necessary for eradicating associated pitfalls and addressing challenges associated with future large scale and heterogeneous datasets.
-
-
-
Chemogenomic Approaches for Revealing Drug Target Interactions in Drug Discovery
Authors: Harshita Bhargava, Amita Sharma and Prashanth SuravajhalaThe drug discovery process has been a crucial and cost-intensive process. This cost is not only monetary but also involves risks, time, and labour that are incurred while introducing a drug in the market. In order to reduce this cost and the risks associated with the drugs that may result in severe side effects, the in silico methods have gained popularity in recent years. These methods have had a significant impact on not only drug discovery but also the related areas such as drug repositioning, drug-target interaction prediction, drug side effect prediction, personalised medicine, etc. Amongst these research areas predicting interactions between drugs and targets forms the basis for drug discovery. The availability of big data in the form of bioinformatics, genetic databases, along with computational methods, have further supported data-driven decision-making. The results obtained through these methods may be further validated using in vitro or in vivo experiments. This validation step can further justify the predictions resulting from in silico approaches, further increasing the accuracy of the overall result in subsequent stages. A variety of approaches are used in predicting drug-target interactions, including ligand-based, molecular docking based and chemogenomic-based approaches. This paper discusses the chemogenomic methods, considering drug target interaction as a classification problem on whether or not an interaction between a particular drug and target would serve as a basis for understanding drug discovery/drug repositioning. We present the advantages and disadvantages associated with their application.
-
-
-
Accumulating Impact of Smoking and Co-morbidities on Severity and Mortality of COVID-19 Infection: A Systematic Review and Meta-analysis
Background: High prevalence, severity, and formidable morbidity have marked the recent emergence of the novel coronavirus disease (COVID-19) pandemic. The significant association with the pre-existing co-morbid conditions has increased the disease burden of this global health emergency, pushing the patients, healthcare workers and facilities to the verge of complete disruption. Methods: Meta-analysis of pooled data was undertaken to assess the cumulative risk assessment of multiple co-morbid conditions associated with severe COVID-19. PubMed, Scopus, and Google Scholar were searched from January 1st to June 27th 2020 to generate a well-ordered, analytical, and critical review. The exercise began with keying in requisite keywords, followed by inclusion and exclusion criteria, data extraction, and quality evaluation. The final statistical meta-analysis of the risk factors of critical/severe and non-critical COVID-19 infection was carried out on Microsoft Excel (Ver. 2013), MedCalc (Ver.19.3), and RevMan software (Ver.5.3). Results: We investigated 19 eligible studies, comprising 12037 COVID-19 disease patients, representing the People’s Republic of China (PRC), USA, and Europe. 18.2% (n = 2200) of total patients had critical/severe COVID-19 disease. The pooled analysis showed a significant association of COVID-19 disease severity risk with cardiovascular disease (RR: 3.11, p < 0.001), followed by diabetes (RR: 2.06, p < 0.001), hypertension (RR: 1.54, p < 0.001), and smoking (RR: 1.52, p < 006). Conclusion: The review involved a sample size of 12037 COVID-19 patients across a wide geographical distribution. The reviewed reports have focussed on the association of individual risk assessment of co-morbid conditions with the heightened risk of COVID-19 disease. The present meta- analysis of cumulative risk assessment of co-morbidity from cardiovascular disease, diabetes, hypertension, and smoking signals a novel interpretation of inherent risk factors exacerbating COVID-19 disease severity. Consequently, there exists a definite window of opportunity for increasing survival of COVID-19 patients (with high risk and co-morbid conditions) by timely identification and implementation of appropriately suitable treatment modalities.
-
-
-
In Silico Analysis of CCGAC and CATGTG Cis-regulatory Elements Across Genomes Reveals their Roles in Gene Regulation under Stress
Background: Plant yield closely depends on its environment and is negatively affected by abiotic stress conditions like drought, salinity, heat, and cold. Analysis of the stress-inducible genes in Arabidopsis has previously shown that CCGAC and CATGTG play a crucial role in controlling the gene expression through the binding of DREB/CBF and NAC TFs under various stress conditions, mainly drought and salinity. Methods: The pattern of these motifs is conserved, which has been analyzed in this study to find the mechanism of gene expression through spacer specificity, inter motif distance preference, functional analysis, and statistical analysis for four different plants, namely Oryza sativa, Triticum aestivum, Arabidopsis thaliana, and Glycine max. Results: The spacer frequency analysis has shown a preference for particular spacer lengths among four genomes. The spacer specificity at all the spacer lengths which predicts dominance of particular base pairs over others, was analyzed to find the preference of the sequences in the flanking region. Functional analysis on stress-regulated genes for saline, osmotic, and heat stress clearly shows that these motif frequencies with inter motif distance (0-30) in the promoter region of Arabidopsis are highest in genes which are upregulated by saline and osmotic stress and downregulated by heat stress. Conclusion: Microarray data were analyzed to confirm the role of both motifs in stress response pathways. Transcription factors seem to prefer larger motif size with repeated CCGAC and CATGTG elements. The common preference for one spacer was further validated through Box and Whisker’s statistical analysis.
-
-
-
Quantitative Trait Loci Identification by Estimating the Genetic Model based on the Extremal Samples
Authors: Zining Yang, Yaning Yang, Xu S. Xu and Min YuanBackground: In genetic association studies with quantitative trait loci (QTL), the association between a candidate genetic marker and the trait of interest is commonly examined by the omnibus F test or by the t-test corresponding to a given genetic model or mode of inheritance. It is known that the t-test with a correct model specification is more powerful than the F test. However, since the underlying genetic model is rarely known in practice, the use of a model-specific t-test may incur substantial power loss. Robustefficient tests, such as the Maximin Efficiency Robust Test (MERT) and MAX3 have been proposed in the literature. Methods: In this paper, we propose a novel two-step robust-efficient approach, namely, the genetic model selection (GMS) method for quantitative trait analysis. GMS selects a genetic model by testing Hardy-Weinberg disequilibrium (HWD) with extremal samples of the population in the first step and then applies the corresponding genetic modelspecific t-test in the second step. Results: Simulations show that GMS is not only more efficient than MERT and MAX3, but also has comparable power to the optimal t-test when the genetic model is known. Conclusion: Application to the data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort demonstrates that the proposed approach can identify meaningful biological SNPs on chromosome 19.
-
-
-
Improving the Genome Annotation of Rhizoctonia solani Using Proteogenomics
Authors: Jiantao Shu, Mingkun Yang, Cheng Zhang, Pingfang Yang, Feng Ge and Ming LiBackground: Rhizoctonia solani is a pathogenic fungus that causes serious diseases in many crops, including rice, wheat, and soybeans. In crop production, it is very important to understand the pathogenicity of this fungus, which is still elusive. It might be helpful to comprehensively understand its genomic information using different genome annotation strategies. Methods: Aiming toimprove the genome annotation of R. solani, we performed a proteogenomic study based on the existing data. Based on our study, a total of 1060 newly identified genes, 36 revised genes, 139 single amino acid variants (SAAVs), 8 alternative splicing genes, and diverse post-translational modifications (PTMs) events were identified in R. solani AG3. Further functional annotation on these 1060 newly identified genes was performed through homology analysis with its 5 closest relative fungi. Results: Based on this, 2 novel candidate pathogenic genes, which might be associated with pathogen- host interaction, were discovered. In addition, in order to increase the reliability and novelty of the newly identified genes in R. solani AG3, 1060 newly identified genes were compared with the newly published available R. solani genome sequences of AG1, AG2, AG4, AG5, AG6, and AG8. There are 490 homologous sequences. We combined the proteogenomic results with the genome alignment results and finally identified 570 novel genes in R. solani. Conclusion: These findings extended R. solani genome annotation and provided a wealth of resources for research on R. solani.
-
-
-
Splice Junction Identification using Long Short-Term Memory Neural Networks
Authors: Kevin Regan, Abolfazl Saghafi and Zhijun LiBackground: Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi- exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications. Objective: The aim of this paper is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA. Methods: A Long Short-Term Memory (LSTM) Neural Network model is developed that classifies a given sequence as EI (Exon-Intron splice), IE (Intron-Exon splice), or N (No splice). The model is trained with groups of trinucleotides and its performance is tested using validation and test data to prevent bias. Results: Model performance was measured using accuracy and f-score in test data. The finalized model achieved an average accuracy of 91.34% with an average f-score of 91.36% over 50 runs. Conclusion: Comparisons show a highly competitive model to recent Convolutional Neural Network structures. The proposed LSTM model achieves the highest accuracy and f-score among published alternative LSTM structures.
-
Volumes & issues
-
Volume 26 (2025)
-
Volume 25 (2024)
-
Volume 24 (2023)
-
Volume 23 (2022)
-
Volume 22 (2021)
-
Volume 21 (2020)
-
Volume 20 (2019)
-
Volume 19 (2018)
-
Volume 18 (2017)
-
Volume 17 (2016)
-
Volume 16 (2015)
-
Volume 15 (2014)
-
Volume 14 (2013)
-
Volume 13 (2012)
-
Volume 12 (2011)
-
Volume 11 (2010)
-
Volume 10 (2009)
-
Volume 9 (2008)
-
Volume 8 (2007)
-
Volume 7 (2006)
-
Volume 6 (2005)
-
Volume 5 (2004)
-
Volume 4 (2003)
-
Volume 3 (2002)
-
Volume 2 (2001)
-
Volume 1 (2000)
Most Read This Month
