Current Bioinformatics - Volume 15, Issue 7, 2020
Volume 15, Issue 7, 2020
-
-
Recent Advances on the Machine Learning Methods in Identifying Phage Virion Proteins
Authors: Yingjuan Yang, Chunlong Fan and Qi ZhaoIn the field of bioinformatics, the prediction of phage virion proteins helps us understand the interaction between phage and its host cells and promotes the development of new antibacterial drugs. However, traditional experimental methods to identify phage virion proteins are expensive and inefficient, more researchers are working to develop new computational methods. In this review, we summarized the machine learning methods for predicting phage virion proteins during recent years, and briefly described their advantages and limitations. Finally, some research directions related to phage virion proteins are listed.
-
-
-
Natural Scene Nutrition Information Acquisition and Analysis Based on Deep Learning
Authors: Tianyue Zhang, Xu Wei, Zhi Li, Fangzhe Shi, Zhiqiang Xia, Mengru Lian, Ling Chen and Hao ZhangBackground: In the field of personalized health, it is often difficult for individuals to obtain professional knowledge to solve their practical problems timely and accurately. While there are some applications that can get targeted information, they often fail to function properly in nonideal environments, and they cannot achieve precise answers to individual users. Therefore, how to establish an information capture model based on big data and combine it with intelligent search is an important issue in the field of personalized health. Objective: This paper starts with the information acquisition and intelligent recommendation in the field of personalized health, and proposes a natural scene information acquisition and analysis model based on deep learning, focusing on improving the recognition rate of text in natural scenes and achieving targeted smart search to allow users to get more accurate personalized health advice. Methods: In this model, natural scene information is processed from four aspects: targeted big data collection and search, connected text proposal network text detection algorithm and projectionbased text segmentation, capsule network text recognition and result analysis. The model reduces recognition bias due to problems such as special filming conditions and photographic techniques by using deep learning algorithms. At the same time, the data mining has also improved the pertinence of the results analysis. Results: The proposed model is applied to analyze the user's nutrient intake requirements. The results show that the method achieves 83% prediction accuracy on the nutrient composition table dataset, and its performance is better than current convolutional neural network applications. And the model can get accurate personalized data to provide users with dietary advice. Conclusion: This model combines deep learning and data mining methods to obtain intelligent solutions at a professional level by uploading target information images in non-ideal environments, and is suitable for accurate analysis of problems in personalized health area.
-
-
-
High-dimensional Causal Mediation Analysis with a Large Number of Mediators Clumping at Zero to Assess the Contribution of the Microbiome to the Risk of Bacterial Pathogen Colonization in Older Adults
Authors: Wei Liu, John P. Haran, Arlene S. Ash, Jeroan J. Allison, Shangyuan Ye, Jenifer Tjia, Vanni Bucci and Bo ZhangBackground: Causal mediation analysis is conducted in biomedical research with the goal of investigating causal mechanisms that consist of both direct causal pathways between the treatment and outcome variables and intermediate causal pathways through mediators. Recently, this type of analysis has been applied in the context of bioinformatics; however, it encounters the obstacle of high-dimensional and semi-continuous mediators with clumping at zero. Methods: In this article, we develop a methodology to conduct high-dimensional causal mediation analysis with a modeling framework that involves (i) a nonlinear model for the outcome variable, (ii) two-part models for semi-continuous mediators with clumping at zero, and (iii) sophisticated variable-selection techniques using machine learning. We conducted simulations and investigated the performance of the proposed method. It is shown that the proposed method can provide reliable statistical information on the causal effects with high-dimensional mediators. The method is adopted to assess the contribution of the intestinal microbiome to the risk of bacterial pathogen colonization in older adults from US nursing homes. Conclusion: The proposed high-dimensional causal mediation analysis with nonlinear models is an innovative and reliable approach to conduct causal inference with high-dimensional mediators.
-
-
-
A Survey of Metrics Measuring Difference for Rooted Phylogenetic Trees
More LessBackground: The evolutionary history of organisms can be described by phylogenetic trees. We need to compare the topologies of rooted phylogenetic trees when researching the evolution of a given set of species. Objective: Up to now, there are several metrics measuring the dissimilarity between rooted phylogenetic trees, and those metrics are defined by different ways. Methods: This paper analyzes those metrics from their definitions and the distance values computed by those metrics by terms of experiments. Results: The results of experiments show that the distances calculated by the cluster metric, the partition metric, and the equivalent metric have a good Gaussian fitting, and the equivalent metric can describe the difference between trees better than the others. Conclusion: Moreover, it presents a tool called as CDRPT (Computing Distance for Rooted Phylogenetic Trees). CDRPT is a web server to calculate the distance for trees by an on-line way. CDRPT can also be off-line used by means of installing application packages for the Windows system. It greatly facilitates the use of researchers. The home page of CDRPT is http://bioinformatics.imu.edu.cn/tree/.
-
-
-
Lung Cancer Classification and Gene Selection by Combining Affinity Propagation Clustering and Sparse Group Lasso
Authors: Juntao Li, Mingming Chang, Qinghui Gao, Xuekun Song and Zhiyu GaoBackground: Cancer threatens human health seriously. Diagnosing cancer via gene expression analysis is a hot topic in cancer research. Objective: The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic genes. Methods: In this study, Affinity Propagation (AP) clustering with similarity score was employed to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted to construct four binary classifiers and the voting strategy was used to integrate them. Results: This study screened six gene groups that may associate with different lung cancer subtypes among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF and VDR. Furthermore, this study achieved improved classification accuracies at minority classes SQ and COID in comparison with other four methods. Conclusion: We propose the AP clustering based sparse group lasso (AP-SGL), which provides an alternative for simultaneous diagnosis and gene selection for lung cancer.
-
-
-
Gene Regulatory Network Construction Based on a Particle Swarm Optimization of a Long Short-term Memory Network
Authors: Zhenhao Tang, Xiangying Chai, Yu Wang and Shengxian CaoBackground: The Gene Regulatory Network (GRN) is a model for studying the function and behavior of genes by treating the genome as a whole, which can reveal the gene expression mechanism. However, due to the dynamics, nonlinearity, and complexity of gene expression data, it is a challenging task to construct a GRN precisely. And in the circulating cooling water system, the Slime-Forming Bacteria (SFB) is one of the bacteria that helps to form dirt. In order to explore the microbial fouling mechanism of SFB, constructing a GRN for the fouling-forming genes of SFB is significant. Objective: Propose an effective GRN construction method and construct a GRN for the foulingforming genes of SFB. Methods: In this paper, a combination method of Long Short-Term Memory Network (LSTM) and Mean Impact Value (MIV) was applied for GRN reconstruction. Firstly, LSTM was employed to establish a gene expression prediction model. To improve the performance of LSTM, a Particle Swarm Optimization (PSO) was introduced to optimize the weight and learning rate. Then, the MIV was used to infer the regulation among genes. In view of the fouling-forming problem of SFB, we have designed electromagnetic field experiments and transcriptome sequencing experiments to locate the fouling-forming genes and obtain gene expression data. Result: In order to test the proposed approach, the proposed method was applied to three datasets: a simulated dataset and two real biology datasets. By comparing with other methods, the experimental results indicate that the proposed method has higher modeling accuracy and it can be used to effectively construct a GRN. And at last, a GRN for fouling-forming genes of SFB was constructed using the proposed approach. Conclusion: The experiments indicated that the proposed approach can reconstruct a GRN precisely, and compared with other approaches, the proposed approach performs better in extracting the regulations among genes.
-
-
-
Prediction of Neddylation Sites Using the Composition of k-spaced Amino Acid Pairs and Fuzzy SVM
Authors: Zhe Ju and Shi-Yun WangIntroduction: Neddylation is the process of ubiquitin-like protein NEDD8 attaching substrate lysine via isopeptide bonds. As a highly dynamic and reversible post-translational modification, lysine neddylation has been found to be involved in various biological processes and closely associated with many diseases. Objective: The accurate identification of neddylation sites is necessary to elucidate the underlying molecular mechanisms of neddylation. As traditional experimental methods are often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods: In this study, a novel predictor named CKSAAP_NeddSite is developed to detect neddylation sites. An effective feature encoding technology, the composition of k-spaced amino acid pairs, is used to encode neddylation sites. And the F-score feature selection method is adopted to remove the redundant features. Moreover, a fuzzy support vector machine algorithm is employed to overcome the class imbalance and noise problem. Results: As illustrated by 10-fold cross-validation, CKSAAP_NeddSite achieves an AUC of 0.9848. Independent tests also show that CKSAAP_NeddSite significantly outperforms existing neddylation sites predictor. Therefore, CKSAAP_NeddSite can be a useful bioinformatics tool for the prediction of neddylation sites. Feature analysis shows that some residues around neddylation sites may play an important role in the prediction. Conclusion: The results of analysis and prediction could offer useful information for elucidating the molecular mechanisms of neddylation. A user-friendly web-server for CKSAAP_NeddSite is established at 123.206.31.171/CKSAAP_NeddSite.
-
-
-
Analysis of Oncogene Protein Structure Using Small World Network Concept
Authors: Neetu Kumari and Anshul VermaBackground: The basic building block of a body is protein which is a complex system whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore, careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug designing. Protein structures are described at their different levels of complexity: primary (chain), secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of protein is a difficult task but it can be analyzed as a network of interconnection between its component, where amino acids are considered as nodes and interconnection between them are edges. Objective: Many literature works have proven that the small world network concept provides many new opportunities to investigate network of biological systems. The objective of this paper is analyzing the protein structure using small world concept. Methods: Protein is analyzed using small world network concept, specifically where extreme condition is having a degree distribution which follows power law. For the correct verification of the proposed approach, dataset of the Oncogene protein structure is analyzed using Python programming. Results: Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG)) using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are calculated for 1323 nodes and graphs are plotted. Conclusion: Ultimately, it is concluded that there exist hubs with higher centrality degree but less in number, and they are expected to be robust toward harmful effects of mutations with new functions.
-
-
-
Research on Psychological Scales Based on the Multitheory Fusion
Authors: Guangdi Liu, Yu C. Li, Yue Wang, Jing Xiang Liu, Yong Sheng Sang, Wei Zhang and Le ZhangObjective: This study proposed an innovative approach to simplify the multiple psychological scales for children and adolescents by integrating statistical methods and item reflection theory into a structural equation model. Methods: First, a psychological scale for adolescents to replace the existing scales optimized for adults with the Delphi method has been developed. Second, the number of items in the current group of scales has been reduced. Result and Conclusion: A psychological scale for adolescents has been built up that comprehensively reflects their psychological characteristics in terms of mental state, behavioral status, emotion & feeling, relationship, and environmental adaptation. This psychological scale has been simplified and improved its reliability and validity.
-
-
-
A Drug Target Interaction Prediction Based on LINE-RF Learning
Authors: Jihong Wang, Yue Shi, Xiaodan Wang and Huiyou ChangBackground: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.
-
-
-
A 2D Non-degeneracy Graphical Representation of Protein Sequence and Its Applications
Authors: Xiaoli Xie and Yunxiu ZhaoBackground: The comparison of the protein sequences is an important research filed in bioinformatics. Many alignment-free methods have been proposed. Objective: In order to mining the more information of the protein sequence, this study focus on a new alignment-free method based on physiochemical properties of amino acids. Methods: Average physiochemical value (Apv) has been defined. For a given protein sequence, a 2D curve was outlined based on Apv and position of the amino acid, and there is not loop and intersection on the curve. According to the curve, the similarity/dissimilarity of the protein sequences can be analyzed. Results and Conclusion: Two groups of protein sequences are taken as examples to illustrate the new methods, the protein sequences can be classified correctly, and the results are highly correlated with that of ClustalW. The new method is simple and effective.
-
-
-
A Deep Convolutional Neural Network to Improve the Prediction of Protein Secondary Structure
Authors: Lin Guo, Qian Jiang, Xin Jin, Lin Liu, Wei Zhou, Shaowen Yao, Min Wu and Yun WangBackground: Protein secondary structure prediction (PSSP) is a fundamental task in bioinformatics that is helpful for understanding the three-dimensional structure and biological function of proteins. Many neural network-based prediction methods have been developed for protein secondary structures. Deep learning and multiple features are two obvious means to improve prediction accuracy. Objective: To promote the development of PSSP, a deep convolutional neural network-based method is proposed to predict both the eight-state and three-state of protein secondary structure. Methods: In this model, sequence and evolutionary information of proteins are combined as multiple input features after preprocessing. A deep convolutional neural network with no pooling layer and connection layer is then constructed to predict the secondary structure of proteins. L2 regularization, batch normalization, and dropout techniques are employed to avoid over-fitting and obtain better prediction performance, and an improved cross-entropy is used as the loss function. Results: Our proposed model can obtain Q3 prediction results of 86.2%, 84.5%, 87.8%, and 84.7%, respectively, on CullPDB, CB513, CASP10 and CASP11 datasets, with corresponding Q8 prediction results of 74.1%, 70.5%, 74.9%, and 71.3%. Conclusion: We have proposed the DCNN-SS deep convolutional-network-based PSSP method, and experimental results show that DCNN-SS performs competitively with other methods.
-
-
-
Screening of SLE-susceptible SNPs in One Chinese Family with Systemic Lupus Erythematosus
Authors: Juan Luo, Yanming Meng, Jianzhao Zhai, Ying Zhu, Yizhou Li and Yongkang WuBackground: Systemic lupus erythematosus (SLE) is a complex autoimmune disease, which mainly affects childbearing-aged women. Although its pathogenesis is not fully clear yet, studies have shown that genetic factors are vital in exploring SLE pathogenic mechanisms. Objective: The purpose of this study is to predict and screen potential pathogenic single nucleotide polymorphisms (SNPs). By comparing the genomes of members of a family with SLE and performing functional analysis on mutation loci, possible pathogenic polymorphisms are screened. These analyses lay the foundation for further research mechanisms. Method: Genomic alignment, variant calling and functional annotation were performed and then ~92,778 original SNPs were obtained for each specimen. We found that the patient/healthyspecific SNPs show different conservative score distribution. Many patient-specific SNPs were detected in SLE-related pathways. We therefore investigated the patient-specific SNPs from four diverse perspectives, including nonsynonymous variations in exon regions, expression quantitative trait loci (eQTLs), RNA binding sites and RNA-binding protein (RBP) binding sites. Results: 18 potential pathogenic SNPs were identified in SLE risk genes, which were associated with functional loci. Systematic literature study was then performed to verify these potential pathogenic SNPs. Conclusion: This study could help to better explain possible genetic mechanisms of SLE from the perspective of variation. It could provide effective strategy for the accurate diagnosis and personalized treatment of SLE patients.
-
-
-
Densely Dilated Spatial Pooling Convolutional Network Using Benign Loss Functions for Imbalanced Volumetric Prostate Segmentation
Authors: Qiuhua Liu, Min Fu, Hao Jiang and Xinqi GongBackground: The high incidence rate of prostate disease poses a requirement of accurate early detection. Magnetic Resonance Imaging (MRI) is one of the main imaging methods used for prostate cancer detection so far, but it has problems of imbalance and variation in appearance, therefore, automated prostate segmentation is still challenging.
Objective: Aiming to accurately segment the prostate from MRI, the focus was on designing a unique network with benign loss functions.
Methods: A novel Densely Dilated Spatial Pooling Convolutional Network (DDSP ConNet) in an encoderdecoder structure, with a unique DDSP block was proposed. By densely combining dilated convolution and global pooling layers, the DDSP block supplies coarse segmentation results and preserves hierarchical contextual information. Meanwhile, the DSC and Jaccard loss were adopted to train the DDSP ConNet. And it was proved theoretically that they have benign properties, including symmetry, continuity, and differentiability on the parameters of the network.
Results: Extensive experiments have been conducted to corroborate the effectiveness of the DDSP ConNet with DSC and Jaccard loss on the MICCAI PROMISE12 challenge dataset. In the test dataset, the DDSP ConNet achieved a score of 85.78.
Conclusion: In the conducted experiments, DDSP network with DSC and Jaccard loss outperformed most of the other competitors on the PROMISE12 dataset. Therefore, it has a better ability to extract hierarchical features and solve the imbalanced medical image problem.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
