Volume 15, Issue 3

Current Bioinformatics - Volume 15, Issue 3, 2020

Volume 15, Issue 3, 2020

- Meet Our Section Editor
  
  By Pu-Feng Du
  
  https://doi.org/10.2174/157489361503200303121210
  More Less
  
  Add to my favourites
  
  Email this

- Current State of the Art for Survival Prediction in Cancer Using Data Mining Techniques
  
  Authors: M.N. Doja, Ishleen Kaur and Tanvir Ahmad
  
  https://doi.org/10.2174/1574893614666190902152142
  More Less
  
  Background: Cancer treatment is expensive and results in a lot of side effects, and thus survival prediction is necessary for the patients as well as the clinician. Data mining technology has been used in the medical domain to extract interesting information. Cancer prognosis is such an application in medicine. Objective: This study focuses on identifying the technologies used in the recent past for predicting the survival of cancer patients. Supervised, semi-supervised and unsupervised techniques have been used over the years successfully for the survival prediction of different types of cancer. Methods: A systematic literature review process has been followed in this study to discover the future directions of the research. This study focuses on uncovering the gaps in recent studies. Results and Conclusion: It has been found that the present system lacks structured information of the patients. Also, there are a lot of different cancer types that are still unexplored in terms of survival prediction, mainly due to the unavailability of sufficient data. Hence a lot can be improved if researchers may get their hands on required data for the research.
  
  Add to my favourites
  
  Email this

- Leukocyte Image Segmentation Based on Adaptive Histogram Thresholding and Contour Detection
  
  Authors: Xiaogen Zhou, Zuoyong Li, Huosheng Xie, Ting Feng, Yan Lu, Chuansheng Wang and Rongyan Chen
  
  https://doi.org/10.2174/1574893614666190723115832
  More Less
  
  Aims: The proposed method falls into the category of medical image processing. Background: Computer-aided automatic analysis systems for the analysis and cytometry of leukocyte (White Blood Cells, WBCs) in human blood smear images are a powerful diagnostic tool for many types of diseases, such as anemia, malaria, syphilis, heavy metal poisoning, and leukemia. Leukocyte segmentation is a basis of its automatic analysis, and the segmentation accuracy will directly influence the reliability of image-based automatic leukocyte analysis. Objective: This paper aims to present a leukocyte segmentation method, which improves segmentation accuracy under rapid and standard staining conditions. Methods: The proposed method first localizes leukocytes by color component combination and Adaptive Histogram Thresholding (AHT), and crops sub-image corresponding to each leukocyte. Then, the proposed method employs AHT to extract the nucleus of leukocyte and utilizes image color features to remove image backgrounds such as red blood cells and dyeing impurities. Finally, Canny edge detection is performed to extract the entire leukocyte. Accordingly, cytoplasm is obtained by subtracting nucleus with leukocyte. Results: Experimental results on two datasets containing 160 leukocyte images show that the proposed method obtains more accurate segmentation results than their counterparts. Conclusion: The proposed method obtains more accurate segmentation results than their counterparts under rapid and standard staining conditions.
  
  Add to my favourites
  
  Email this

- HSEAT: A Tool for Plant Heat Shock Element Analysis, Motif Identification and Analysis
  
  Authors: Sarah R. Qazi, Noor ul Haq, Shakeel Ahmad and Samina N. Shakeel
  
  https://doi.org/10.2174/1574893614666190102151956
  More Less
  
  Background: Previous methods used to discover cis-regulatory motifs in promoter region of plant genes possess very limited performance, especially for analysis of novel and rare motifs. Different plant genes have differential expression under different environmental or experimental conditions and modular regulation of cis-regulatory sequences in promoter regions of the same or different genes. It has previously been revealed that Heat Shock Proteins (HSPs) creation is correlated with plant tolerance under heat and other stress conditions. Regulation of these HSP genes is controlled by interactions between heat shock factors (HSFs) with cis-acting motifs present in the promoter region of the genes. Differential expression of these HSP genes is because of their unique promoter architecture, cis-acting sequences and their interaction with HSFs. Objective: A versatile promoter analysis tool was proposed for identification and analysis of promoters of HSPs. Methods: Heat Shock Element Analysis Tool (HSEAT) has been implemented in java programming language using pattern recognition approach. This tool has build-in MS access database for storing different motifs. Results: HSEAT has been designed to detect different types of Heat Shock Elements (HSEs) in promoter regions of plant HSPs with integration of complete analysis of plant promoters to the tool. HSEAT is user-friendly, interactive application to discover various types of HSEs e.g. TTC Rich Types, Gap Types and Prefect HSE as well as STRE in HSPs. Here we examined and evaluated some known HSP promoters from different plants using this tool with already available tools. Conclusion: HSEAT has extensive potential to explore conserved or semi-conserved motifs or potential binding sites of different transcription factors for other stress regulating genes. This tool can be found at https://sourceforge.net/projects/heast/.
  
  Add to my favourites
  
  Email this

- MD-LBP: An Efficient Computational Model for Protein Subcellular Localization from HeLa Cell Lines Using SVM
  
  Authors: Muhammad Tahir and Adnan Idris
  
  https://doi.org/10.2174/1574893614666190723120716
  More Less
  
  Background: The knowledge of subcellular location of proteins is essential to the comprehension of numerous protein functions. Objective: Accurate as well as computationally efficient and reliable automated analysis of protein localization imagery greatly depend on the calculation of features from these images. Methods: In the current work, a novel method termed as MD-LBP is proposed for feature extraction from fluorescence microscopy protein images. For a given neighborhood, the value of central pixel is computed as the difference of global and local means of the input image that is further used as threshold to generate a binary pattern for that neighborhood. Results: The performance of our method is assessed for 2D HeLa dataset using 5-fold crossvalidation protocol. The performance of MD-LBP method with RBF-SVM as base classifier, is superior to that of standard LBP algorithm, Threshold Adjacency Statistics, and Haralick texture features. Conclusion: Development of specialized systems for different kinds of medical imagery will certainly pave the path for effective drug discovery in pharmaceutical industry. Furthermore, biological and bioinformatics based procedures can be simplified to facilitate pharmaceutical industry for drug designing.
  
  Add to my favourites
  
  Email this

- Elastic Net Regularized Softmax Regression Methods for Multi-subtype Classification in Cancer
  
  Authors: Lin Zhang, Yanling He, Haiting Song, Xuesong Wang, Nannan Lu, Lei Sun and Hui Liu
  
  https://doi.org/10.2174/1574893613666181112141724
  More Less
  
  Background: Various regularization methods have been proposed to improve the prediction accuracy in cancer diagnosis. Elastic net regularized logistic regression has been widely adopted for cancer classification and gene selection in genetics and molecular biology but is commonly applied to binary classification and regression. However, usually, the cancer subtypes can be more, and most likely cannot be decided precisely. Objective: Besides the multi-class issue, the feature selection problem is also a critical problem for cancer subtype classification. Methods: An Elastic Net Regularized Softmax Regression (ENRSR) for multi-classification is put forward to tackle the multiple classification issue. As an extension of elastic net regularized logistic regression, ENRSR enforces structure sparsity and ‘grouping effect’ for gene selection based on gene expression data, which may exhibit high correlation. The sparsity structure and ‘grouping effect’ help to select more propriate discriminable features for multi-classification. Result: It is demonstrated that ENRSR gains more accurate and robust performance compared to the other 6 competing algorithms (K-means, Hierarchical Clustering, Expectation Maximization, Nonnegative Matrix Factorization, Support Vector Machine and Random Forest) in predicting cancer subtypes both on simulation data and real cancer gene expression data in terms of F measure. Conclusion: Our proposed ENRSR method is a reliable regularized softmax regression for multisubtype classification.
  
  Add to my favourites
  
  Email this

- Research on Gastric Cancer’s Drug-resistant Gene Regulatory Network Model
  
  Authors: Zhi Li, Tianyue Zhang, Haojie Lei, Liyan Wei, Yuanning Liu, Yadi Shi, Shuyi Li, Bowen Shen, Hao Guo, Zhangqian Chen, Xiaorong Yi and Hao Zhang
  
  https://doi.org/10.2174/1574893614666190722102557
  More Less
  
  Objective: Based on bioinformatics, differentially expressed gene data of drug-resistance in gastric cancer were analyzed, screened and mined through modeling and network modeling to find valuable data associated with multi-drug resistance of gastric cancer. Methods: First, data sets were preprocessed from three aspects: data processing, data annotation and classification, and functional clustering. Secondly, based on the preprocessed data, each classified primary gene regulatory network was constructed by mining interactions among the genes. This paper computed the values of each node in each classified primary gene regulatory network and ranked these nodes according to their scores. On the basis of this, the appropriate core node was selected and the corresponding core network was developed. Results and Conclusion: Finally, core network modules were analyzed, which were mined. After the correlation analysis, the result showed that the constructed network module had 20 core genes. This module contained valuable data associated with multi-drug resistance in gastric cancer.
  
  Add to my favourites
  
  Email this

- Citrullination Site Prediction by Incorporating Sequence Coupled Effects into PseAAC and Resolving Data Imbalance Issue
  
  Authors: Md. A. M. Hasan, Md K. Ben Islam, Julia Rahman and Shamim Ahmad
  
  https://doi.org/10.2174/1574893614666191202152328
  More Less
  
  Background: Post-translational modification is one of the bio-molecular mechanisms in living organisms, which incorporate functional diversity in proteins as well as regulate cellular processes. Transformation of arginine residue to citrulline in protein is such a modification. Objective: Our objective is to identify citrullinated arginine residue sites quickly and accurately. Methods: In this study, a novel computational tool, abbreviated as predCitru-Site, has been developed to predict citrullination sites. This technique effectively has incorporated the sequencecoupling effect of surrounding amino acids of arginine residues as well as optimizes skewed training citrullination dataset for prediction quality improvement. The performance of predCitru- Site has been measured from the average of 5 complete runs of the 10-fold cross-validation test to comply with existing tools. Results and Conclusion: predCitru-Site has achieved 97.6% sensitivity, 98.9% specificity, and overall accuracy of 98.5%. With Matthew’s correlation coefficient of 0.967, it has also shown an area under the receiver operator characteristics curve of 0.997. Compared with existing tools, predCitru-Site significantly outperforms on the same benchmark dataset. It also shows significant improvement in the case of independent tests in all performance metrics (around 50% higher in AUC). These results suggest that our method is promising and can be used as a complementary technique for fast exploration of citrullination in arginine residue. A user-friendly web server has also been deployed at http://research.ru.ac.bd/predCitru-Site/ for the convenience of experimental scientists.
  
  Add to my favourites
  
  Email this

- Sequence-based Structural B-cell Epitope Prediction by Using Two Layer SVM Model and Association Rule Features
  
  Authors: Jehn-Hwa Kuo, Chi-Chang Chang, Chi-Wei Chen, Heng-Hao Liang, Chih-Yen Chang and Yen-Wei Chu
  
  https://doi.org/10.2174/1574893614666181123155831
  More Less
  
  Background: Immune reaction is the most important defense mechanism for destroying invading pathogens in our body, and the epitope is the position of the antigen–antibody interaction on pathogenic proteins. Objective: The majority of epitopes are structural; however, the existing sequence-based predicting websites still have several methods to improve the predicting performance. Therefore, in this study, we used SVM as a machine learning tool to predict the epitope-based on protein sequences. Methods: Firstly, we built five SVM models in the first layer according to five features, including binary composition, position-specific scoring matrix, secondary structure, accessible surface area, and association rule, and then chose the patterns that exhibited the best performance in each model. Secondly, using the confidence score of the first-layer models as the input value for the SVM model in the second layer, that SVM model was integrated into the first-layer SVM models for improving the predicting accuracy. Results: The final prediction model was able to achieve up to 63% accuracy in predicting epitope results, and the predicting performance was better than that achieved by the existing predicting websites. Conclusion: Finally, a case study using a two-subunit cytochrome c oxidase of Paracoccus denitrificans was tested, achieving an accuracy of up to 66%.
  
  Add to my favourites
  
  Email this

- Mutation Mechanisms of Breast Cancer among the Female Population in China
  
  Authors: Asmaa Amer, Ahmed Nagah, Tianhai Tian and Xinan Zhang
  
  https://doi.org/10.2174/1574893615666191220141548
  More Less
  
  Background: Cancer is a genetic disease caused by the accumulation of gene mutations. It is important to derive the number of driver mutations that are needed for the development of human breast cancer, which may provide insights into the tumor diagnosis and therapy. Objective: This work is designed to investigate whether there is any difference for the mutation mechanism of breast cancer between the patients in the USA and those in China. We study the mechanisms of breast cancer development in China, and then compare these mechanisms with those in the USA. Methods: This work designed a multistage model including both gene mutation and clonal expansion of intermediate cells to fit the dataset of breast cancer in China from 2004 to 2009. Results: Our simulation results show that the maximum number of driver mutations for breast epithelium stem cells of females in China is 13 which is less than the 14 driver mutations of females in the USA. In addition, the two-hit model is the optimal one for the tumorigenesis of females in China, which is also different from the three-hit model that was predicted as the optimal model for the tumorigenesis of females in the USA. Conclusion: The differences of the mutation mechanisms between China and the USA reflect a variety of lifestyle, genetic influences, environmental exposure, and the availability of mammography screening.
  
  Add to my favourites
  
  Email this

- Stability Analysis at Key Positions of EGFR Related to Non-small Cell Lung Cancer
  
  Authors: Avirup Ghosh and Hong Yan
  
  https://doi.org/10.2174/1574893614666191212112026
  More Less
  
  Background: Mutations in a protein called the Epidermal Growth Factor Receptor (EGFR) can cause Non-Small Cell Lung Cancer (NSCLC), which is the most common form of lung cancer. Many NSCLC cases arise from the L858R mutation, where Leucine (L) is replaced by arginine (R) at the 858th position in the EGFR, and that is also recognized as an exon 21 substitution. Moreover, half of the EKFR-mutated lung cancer patients develop acquired resistance to the first-generation EGFR-TKIs due to another mutation T790M. Objective: In this research work, a novel method is used to investigate the possible reason for the EGFR mutation to takes place in the specific 858th and 790th position, and also, we evaluated the hydrogen bonds to measure the overall stability of different structures. Methods: We performed the molecular dynamics simulation and used Amber tool to achieve our primary objectives and later we use CPPTRAJ to analyze other changes in the hydrogen bonds for different mutational structures of EGFR. Results: First, we investigated the hydrogen bonds in different positions in the EGFR kinase domain and estimated why the first stage mutation (L858R) and resistance mutation (L858R/T790M) take place in the 858th and 790th position respectively. We found the hydrogen bond counts in the 858th and 790th position is lesser than the neighborhood positions and that yields to achieve a least stability in that position. Conclusion: Our method represents an important contribution to molecular dynamics analysis for NSCLC studies. The results obtained from this study provide a useful insight into the NSCLC drug resistance.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 15, Issue 3, 2020

Volume 15, Issue 3, 2020

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed