Current Bioinformatics - Volume 15, Issue 8, 2020
Volume 15, Issue 8, 2020
-
-
Comprehensive Analysis of Features and Annotations of Pathway Databases
Authors: Ali Ghulam, Xiujuan Lei, Min Guo and Chen BianThis study focused on describing the necessary information related to pathway mechanisms, characteristics, and databases feature annotations. Various difficulties related to data storage and retrieval in biological pathway databases are discussed. These focus on different techniques for retrieving annotations, features, and methods of digital pathway databases for biological pathway analysis. Furthermore, many pathway databases annotations, features, and search databases were also examined (which are reasonable for the integration into microarray examination). The investigation was performed on the databases, which contain human pathways to understand the hidden components of cells applied in this process. Three different domain-specific pathways were selected for this study and the information of pathway databases was extracted from the existing literature. The research compared different pathways and performed molecular level relations. Moreover, the associations between pathway networks were also evaluated. The study involved datasets for gene pathway matrices and pathway scoring techniques. Additionally, different pathways techniques, such as metabolomics and biochemical pathways, translation, control, and signaling pathways and signal transduction, were also considered. We also analyzed the list of gene sets and constructed a gene pathway network. This article will serve as a useful manual for storing a repository of specific biological data and disease pathways.
-
-
-
A Review of Protein Inter-residue Distance Prediction
Authors: He Huang and Xinqi GongProteins are large molecules consisting of a linear sequence of amino acids. Protein performs biological functions with specific 3D structures. The main factors that drive proteins to form these structures are constraint between residues. These constraints usually lead to important inter-residue relationships, including short-range inter-residue contacts and long-range interresidue distances. Thus, a highly accurate prediction of inter-residue contact and distance information is of great significance for protein tertiary structure computations. Some methods have been proposed for inter-residue contact prediction, most of which focus on contact map prediction and some reviews have summarized the progresses. However, inter-residue distance prediction is found to provide better guidance for protein structure prediction than contact map prediction in recent years. The methods for inter-residue distance prediction can be roughly divided into two types according to the consideration of distance value: one is based on multi-classification with discrete value and the other is based on regression with continuous value. Here, we summarize these algorithms and show that they have obtained good results. Compared to contact map prediction, distance map prediction is in its infancy. There is a lot to do in the future including improving distance map prediction precision and incorporating them into residue-residue distanceguided ab initio protein folding.
-
-
-
Predicting lncRNA-protein Interactions by Machine Learning Methods: A Review
By Zhi-Ping LiuIn this work, a review of predicting lncRNA-protein interactions by bioinformatics methods is provided with a focus on machine learning. Firstly, a computational framework for predicting lncRNA-protein interactions is presented. Then, the currently available data resources for the predictions have been listed. The existing methods will be reviewed by introducing their crucial steps in the prediction framework. The key functions of lncRNA, e.g., mediator on transcriptional regulation, are often involved in interacting with proteins. The interactions with proteins provide a tunnel of leveraging the molecular cooperativity for fulfilling crucial functions. Thus, the important directions in bioinformatics have been highlighted for identifying essential lncRNA-protein interactions and deciphering the dysfunctional importance of lncRNA, especially in carcinogenesis.
-
-
-
The Power of Matrix Factorization: Methods for Deconvoluting Genetic Heterogeneous Data at Expression Level
Authors: Yuan Liu, Zhining Wen and Menglong LiBackground: The utilization of genetic data to investigate biological problems has recently become a vital approach. However, it is undeniable that the heterogeneity of original samples at the biological level is usually ignored when utilizing genetic data. Different cell-constitutions of a sample could differentiate the expression profile, and set considerable biases for downstream research. Matrix factorization (MF) which originated as a set of mathematical methods, has contributed massively to deconvoluting genetic profiles in silico, especially at the expression level. Objective: With the development of artificial intelligence algorithms and machine learning, the number of computational methods for solving heterogeneous problems is also rapidly abundant. However, a structural view from the angle of using MF to deconvolute genetic data is quite limited. This study was conducted to review the usages of MF methods on heterogeneous problems of genetic data on expression level. Methods: MF methods involved in deconvolution were reviewed according to their individual strengths. The demonstration is presented separately into three sections: application scenarios, method categories and summarization for tools. Specifically, application scenarios defined deconvoluting problem with applying scenarios. Method categories summarized MF algorithms contributed to different scenarios. Summarization for tools listed functions and developed web-servers over the latest decade. Additionally, challenges and opportunities of relative fields are discussed. Results and Conclusion: Based on the investigation, this study aims to present a relatively global picture to assist researchers to achieve a quicker access of deconvoluting genetic data in silico, further to help researchers in selecting suitable MF methods based on the different scenarios.
-
-
-
Supervised Learning in Spiking Neural Networks with Synaptic Delay Plasticity: An Overview
More LessThroughout the central nervous system (CNS), the information communicated between neurons is mainly implemented by the action potentials (or spikes). Although the spike-timing based neuronal codes have significant computational advantages over rate encoding scheme, the exact spike timing-based learning mechanism in the brain remains an open question. To close this gap, many weight-based supervised learning algorithms have been proposed for spiking neural networks. However, it is insufficient to consider only synaptic weight plasticity, and biological evidence suggest that the synaptic delay plasticity also plays an important role in the learning progress in biological neural networks. Recently, many learning algorithms have been proposed to consider both the synaptic weight plasticity and synaptic delay plasticity. The goal of this paper is to give an overview of the existing synaptic delay-based learning algorithms in spiking neural networks. We described the typical learning algorithms and reported the experimental results. Finally, we discussed the properties and limitations of each algorithm and made a comparison among them.
-
-
-
An Overview of Abdominal Multi-organ Segmentation
More LessThe segmentation of multiple abdominal organs of the human body from images with different modalities is challenging because of the inter-subject variance among abdomens, as well as the complex intra-subject variance among organs. In this paper, the recent methods proposed for abdominal multi-organ segmentation (AMOS) on medical images in the literature are reviewed. The AMOS methods can be categorized into traditional and deep learning-based methods. First, various approaches, techniques, recent advances, and related problems under both segmentation categories are explained. Second, the advantages and disadvantages of these methods are discussed. A summary of some public datasets for AMOS is provided. Finally, AMOS remains an open issue, and the combination of different methods can achieve improved segmentation performance.
-
-
-
A Review on the Methods of Peptide-MHC Binding Prediction
Authors: Yang Liu, Xia-hui Ouyang, Zhi-Xiong Xiao, Le Zhang and Yang CaoBackground: T lymphocyte achieves an immune response by recognizing antigen peptides (also known as T cell epitopes) through major histocompatibility complex (MHC) molecules. The immunogenicity of T cell epitopes depends on their source and stability in combination with MHC molecules. The binding of the peptide to MHC is the most selective step, so predicting the binding affinity of the peptide to MHC is the principal step in predicting T cell epitopes. The identification of epitopes is of great significance in the research of vaccine design and T cell immune response. Objective: The traditional method for identifying epitopes is to synthesize and test the binding activity of peptide by experimental methods, which is not only time-consuming, but also expensive. In silico methods for predicting peptide-MHC binding emerge to pre-select candidate peptides for experimental testing, which greatly saves time and costs. By summarizing and analyzing these methods, we hope to have a better insight and provide guidance for future directions. Methods: Up to now, a number of methods have been developed to predict the binding ability of peptides to MHC based on various principles. Some of them employ matrix models or machine learning models based on the sequence characteristic embedded in peptides or MHC to predict the binding ability of peptides to MHC. Some others utilize the three-dimensional structural information of peptides or MHC, for example, by extracting three-dimensional structural information to construct a feature matrix or machine learning model, or directly using protein structure prediction, molecular docking to predict the binding mode of peptides and MHC Results: Although the methods in predicting peptide-MHC binding based on the feature matrix or machine learning model can achieve high-throughput prediction, the accuracy of which depends heavily on the sequence characteristic of confirmed binding peptides. In addition, it cannot provide insights into the mechanism of antigen specificity. Therefore, such methods have certain limitations in practical applications. Methods in predicting peptide-MHC binding based on structural prediction or molecular docking are computationally intensive compared to the methods based on feature matrix or machine learning model and the challenge is how to predict a reliable structural model. Conclusion: This paper reviews the principles, advantages and disadvantages of the methods of peptide-MHC binding prediction and discussed the future directions to achieve more accurate predictions.
-
-
-
High-density Genetic Linkage Map Construction in Sunflower (Helianthus annuus L.) Using SNP and SSR Markers
Authors: Pin Lyu, Jianhua Hou, Haifeng Yu and Huimin ShiBackground: Sunflower (Helianthus annuus L.) is an important oil crop only after soybean, canola and peanuts. A high-quality genetic map is the foundation of marker-assisted selection (MAS). However, for this species, the high-density maps have been reported limitedly. Objective: In this study, we proposed the construction of a high-density genetic linkage map by the F7 population of sunflowers using SNP and SSR Markers. Methods: The SLAF-seq strategy was employed to further develop SNP markers with SSR markers to construct the high-density genetic map by the HighMap software. Results: A total of 1,138 million paired-end reads (226Gb) were obtained and 518,900 SLAFs were detected. Of the polymorphic SLAFs, 2,472,245 SNPs were developed and finally, 5,700 SNPs were found to be ideal to construct a genetic map after filtering. The final high-density genetic map included 4,912 SNP and 93 SSR markers distributed in 17 linkage groups (LGs) and covered 2,425.05 cM with an average marker interval of 0.49 cM. Conclusion: The final result demonstrated that the SLAF-seq strategy is suitable for SNP markers detection. The genetic map reported in this study can be considered as one of the most highdensity genetic linkage maps of sunflower and could lay a foundation for quantitative trait loci (QTLs) fine mapping or map-based gene cloning.
-
-
-
Review of the Applications of Deep Learning in Bioinformatics
Authors: Yongqing Zhang, Jianrong Yan, Siyu Chen, Meiqin Gong, Dongrui Gao, Min Zhu and Wei GanRapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.
-
-
-
Feature Selection Algorithm for High-dimensional Biomedical Data Using Information Gain and Improved Chemical Reaction Optimization
Authors: Ge Zhang, Pan Yu, Jianlin Wang and Chaokun YanBackground: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space. Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO. Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search. Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.
-
-
-
Bioinformatics Analysis Reveals Functions of MicroRNAs in Rice Under the Drought Stress
Authors: Yan Peng, Yuewu Liu and Xinbo ChenBackground: Drought is one of the most damaging and widespread abiotic stresses that can severely limit the rice production. MicroRNAs (miRNAs) act as a promising tool for improving the drought tolerance of rice and have become a hot spot in recent years. Objective: In order to further extend the understanding of miRNAs, the functions of miRNAs in rice under drought stress are analyzed by bioinformatics. Methods: In this study, we integrated miRNAs and genes transcriptome data of rice under the drought stress. Some bioinformatics methods were used to reveal the functions of miRNAs in rice under drought stress. These methods included target genes identification, differentially expressed miRNAs screening, enrichment analysis of DEGs, network constructions for miRNA-target and target-target proteins interaction. Results: (1) A total of 229 miRNAs with differential expression in rice under the drought stress, corresponding to 73 rice miRNAs families, were identified. (2) 1035 differentially expressed genes (DEGs) were identified, which included 357 up-regulated genes, 542 down-regulated genes and 136 up/down-regulated genes. (3) The network of regulatory relationships between 73 rice miRNAs families and 1035 DEGs was constructed. (4) 25 UP_KEYWORDS terms of DEGs, 125 GO terms and 7 pathways were obtained. (5) The protein-protein interaction network of 1035 DEGs was constructed. Conclusion: (1) MiRNA-regulated targets in rice might be mainly involved in a series of basic biological processes and pathways under drought conditions. (2) MiRNAs in rice might play critical roles in Lignin degradation and ABA biosynthesis. (3) MiRNAs in rice might play an important role in drought signal perceiving and transduction.
-
-
-
Sequence-based Identification of Arginine Amidation Sites in Proteins Using Deep Representations of Proteins and PseAAC
Authors: Sheraz Naseer, Waqar Hussain, Yaser D. Khan and Nouman RasoolBackground: Among all the major post-translational modifications, amidation seems to be a small change, where a peptide ends with an amide group (-NH 2), not a carboxyl group (-COOH). Thus, to study their physicochemical properties, identification of the amidation mechanism is very important. However, the in vitro, ex vivo and in vivo identification can be laborious, time-taking and costly. There is a dire need for an efficient and accurate computational model to help researchers and biologists identifying these sites, in an easy manner. Objectives: Herein, we propose a novel predictor for the identification of arginine amide (R-Amide) sites in proteins, by integrating the Chou’s Pseudo Amino Acid Composition (PseAAC) with deep features. Methods: We use well-known DNNs for both the tasks of learning a feature representation of peptide sequences and performing classifications. Results: Among different DNNs, CNN showed the highest scores in terms of accuracy, and all other computed measures outperformed all the previously reported predictors. Conclusion: Based on these results, it is concluded that the proposed model can help identify arginine amidation in a very efficient and accurate manner, which can help scientists understand the mechanism of this modification in proteins.
-
-
-
Deep Novo A+: Improving the Deep Learning Model for De Novo Peptide Sequencing with Additional Ion Types and Validation Set
Authors: Lei Di, Yongxing He and Yonggang LuBackground: De novo peptide sequencing is one of the key technologies in proteomics, which can extract peptide sequences directly from tandem mass spectrometry (MS/MS) spectra without any protein databases. Since the accuracy and efficiency of de novo peptide sequencing can be affected by the quality of the MS/MS data, the DeepNovo method using deep learning for de novo peptide sequencing is introduced, which outperforms the other state-of-the-art de novo sequencing methods. Objective: For superior performance and better generalization ability, additional ion types of spectra should be considered and the model of DeepNovo should be adaptive. Methods: Two improvements are introduced in the DeepNovo A+ method: a_ions are added in the spectral analysis, and the validation set is used to automatically determine the number of training epochs. Results: Experiments show that compared to the DeepNovo method, the DeepNovo A+ method can consistently improve the accuracy of de novo sequencing under different conditions. Conclusion: By adding a_ions and using the validation set, the performance of de novo sequencing can be improved effectively.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
