Current Bioinformatics - Volume 15, Issue 3, 2020
Volume 15, Issue 3, 2020
-
-
Current State of the Art for Survival Prediction in Cancer Using Data Mining Techniques
Authors: M.N. Doja, Ishleen Kaur and Tanvir AhmadBackground: Cancer treatment is expensive and results in a lot of side effects, and thus survival prediction is necessary for the patients as well as the clinician. Data mining technology has been used in the medical domain to extract interesting information. Cancer prognosis is such an application in medicine. Objective: This study focuses on identifying the technologies used in the recent past for predicting the survival of cancer patients. Supervised, semi-supervised and unsupervised techniques have been used over the years successfully for the survival prediction of different types of cancer. Methods: A systematic literature review process has been followed in this study to discover the future directions of the research. This study focuses on uncovering the gaps in recent studies. Results and Conclusion: It has been found that the present system lacks structured information of the patients. Also, there are a lot of different cancer types that are still unexplored in terms of survival prediction, mainly due to the unavailability of sufficient data. Hence a lot can be improved if researchers may get their hands on required data for the research.
-
-
-
Leukocyte Image Segmentation Based on Adaptive Histogram Thresholding and Contour Detection
Authors: Xiaogen Zhou, Zuoyong Li, Huosheng Xie, Ting Feng, Yan Lu, Chuansheng Wang and Rongyan ChenAims: The proposed method falls into the category of medical image processing. Background: Computer-aided automatic analysis systems for the analysis and cytometry of leukocyte (White Blood Cells, WBCs) in human blood smear images are a powerful diagnostic tool for many types of diseases, such as anemia, malaria, syphilis, heavy metal poisoning, and leukemia. Leukocyte segmentation is a basis of its automatic analysis, and the segmentation accuracy will directly influence the reliability of image-based automatic leukocyte analysis. Objective: This paper aims to present a leukocyte segmentation method, which improves segmentation accuracy under rapid and standard staining conditions. Methods: The proposed method first localizes leukocytes by color component combination and Adaptive Histogram Thresholding (AHT), and crops sub-image corresponding to each leukocyte. Then, the proposed method employs AHT to extract the nucleus of leukocyte and utilizes image color features to remove image backgrounds such as red blood cells and dyeing impurities. Finally, Canny edge detection is performed to extract the entire leukocyte. Accordingly, cytoplasm is obtained by subtracting nucleus with leukocyte. Results: Experimental results on two datasets containing 160 leukocyte images show that the proposed method obtains more accurate segmentation results than their counterparts. Conclusion: The proposed method obtains more accurate segmentation results than their counterparts under rapid and standard staining conditions.
-
-
-
HSEAT: A Tool for Plant Heat Shock Element Analysis, Motif Identification and Analysis
Authors: Sarah R. Qazi, Noor ul Haq, Shakeel Ahmad and Samina N. ShakeelBackground: Previous methods used to discover cis-regulatory motifs in promoter region of plant genes possess very limited performance, especially for analysis of novel and rare motifs. Different plant genes have differential expression under different environmental or experimental conditions and modular regulation of cis-regulatory sequences in promoter regions of the same or different genes. It has previously been revealed that Heat Shock Proteins (HSPs) creation is correlated with plant tolerance under heat and other stress conditions. Regulation of these HSP genes is controlled by interactions between heat shock factors (HSFs) with cis-acting motifs present in the promoter region of the genes. Differential expression of these HSP genes is because of their unique promoter architecture, cis-acting sequences and their interaction with HSFs. Objective: A versatile promoter analysis tool was proposed for identification and analysis of promoters of HSPs. Methods: Heat Shock Element Analysis Tool (HSEAT) has been implemented in java programming language using pattern recognition approach. This tool has build-in MS access database for storing different motifs. Results: HSEAT has been designed to detect different types of Heat Shock Elements (HSEs) in promoter regions of plant HSPs with integration of complete analysis of plant promoters to the tool. HSEAT is user-friendly, interactive application to discover various types of HSEs e.g. TTC Rich Types, Gap Types and Prefect HSE as well as STRE in HSPs. Here we examined and evaluated some known HSP promoters from different plants using this tool with already available tools. Conclusion: HSEAT has extensive potential to explore conserved or semi-conserved motifs or potential binding sites of different transcription factors for other stress regulating genes. This tool can be found at https://sourceforge.net/projects/heast/.
-
-
-
MD-LBP: An Efficient Computational Model for Protein Subcellular Localization from HeLa Cell Lines Using SVM
Authors: Muhammad Tahir and Adnan IdrisBackground: The knowledge of subcellular location of proteins is essential to the comprehension of numerous protein functions. Objective: Accurate as well as computationally efficient and reliable automated analysis of protein localization imagery greatly depend on the calculation of features from these images. Methods: In the current work, a novel method termed as MD-LBP is proposed for feature extraction from fluorescence microscopy protein images. For a given neighborhood, the value of central pixel is computed as the difference of global and local means of the input image that is further used as threshold to generate a binary pattern for that neighborhood. Results: The performance of our method is assessed for 2D HeLa dataset using 5-fold crossvalidation protocol. The performance of MD-LBP method with RBF-SVM as base classifier, is superior to that of standard LBP algorithm, Threshold Adjacency Statistics, and Haralick texture features. Conclusion: Development of specialized systems for different kinds of medical imagery will certainly pave the path for effective drug discovery in pharmaceutical industry. Furthermore, biological and bioinformatics based procedures can be simplified to facilitate pharmaceutical industry for drug designing.
-
-
-
Elastic Net Regularized Softmax Regression Methods for Multi-subtype Classification in Cancer
Authors: Lin Zhang, Yanling He, Haiting Song, Xuesong Wang, Nannan Lu, Lei Sun and Hui LiuBackground: Various regularization methods have been proposed to improve the prediction accuracy in cancer diagnosis. Elastic net regularized logistic regression has been widely adopted for cancer classification and gene selection in genetics and molecular biology but is commonly applied to binary classification and regression. However, usually, the cancer subtypes can be more, and most likely cannot be decided precisely. Objective: Besides the multi-class issue, the feature selection problem is also a critical problem for cancer subtype classification. Methods: An Elastic Net Regularized Softmax Regression (ENRSR) for multi-classification is put forward to tackle the multiple classification issue. As an extension of elastic net regularized logistic regression, ENRSR enforces structure sparsity and ‘grouping effect’ for gene selection based on gene expression data, which may exhibit high correlation. The sparsity structure and ‘grouping effect’ help to select more propriate discriminable features for multi-classification. Result: It is demonstrated that ENRSR gains more accurate and robust performance compared to the other 6 competing algorithms (K-means, Hierarchical Clustering, Expectation Maximization, Nonnegative Matrix Factorization, Support Vector Machine and Random Forest) in predicting cancer subtypes both on simulation data and real cancer gene expression data in terms of F measure. Conclusion: Our proposed ENRSR method is a reliable regularized softmax regression for multisubtype classification.
-
-
-
Research on Gastric Cancer’s Drug-resistant Gene Regulatory Network Model
Authors: Zhi Li, Tianyue Zhang, Haojie Lei, Liyan Wei, Yuanning Liu, Yadi Shi, Shuyi Li, Bowen Shen, Hao Guo, Zhangqian Chen, Xiaorong Yi and Hao ZhangObjective: Based on bioinformatics, differentially expressed gene data of drug-resistance in gastric cancer were analyzed, screened and mined through modeling and network modeling to find valuable data associated with multi-drug resistance of gastric cancer. Methods: First, data sets were preprocessed from three aspects: data processing, data annotation and classification, and functional clustering. Secondly, based on the preprocessed data, each classified primary gene regulatory network was constructed by mining interactions among the genes. This paper computed the values of each node in each classified primary gene regulatory network and ranked these nodes according to their scores. On the basis of this, the appropriate core node was selected and the corresponding core network was developed. Results and Conclusion: Finally, core network modules were analyzed, which were mined. After the correlation analysis, the result showed that the constructed network module had 20 core genes. This module contained valuable data associated with multi-drug resistance in gastric cancer.
-
-
-
Citrullination Site Prediction by Incorporating Sequence Coupled Effects into PseAAC and Resolving Data Imbalance Issue
Authors: Md. A. M. Hasan, Md K. Ben Islam, Julia Rahman and Shamim AhmadBackground: Post-translational modification is one of the bio-molecular mechanisms in living organisms, which incorporate functional diversity in proteins as well as regulate cellular processes. Transformation of arginine residue to citrulline in protein is such a modification. Objective: Our objective is to identify citrullinated arginine residue sites quickly and accurately. Methods: In this study, a novel computational tool, abbreviated as predCitru-Site, has been developed to predict citrullination sites. This technique effectively has incorporated the sequencecoupling effect of surrounding amino acids of arginine residues as well as optimizes skewed training citrullination dataset for prediction quality improvement. The performance of predCitru- Site has been measured from the average of 5 complete runs of the 10-fold cross-validation test to comply with existing tools. Results and Conclusion: predCitru-Site has achieved 97.6% sensitivity, 98.9% specificity, and overall accuracy of 98.5%. With Matthew’s correlation coefficient of 0.967, it has also shown an area under the receiver operator characteristics curve of 0.997. Compared with existing tools, predCitru-Site significantly outperforms on the same benchmark dataset. It also shows significant improvement in the case of independent tests in all performance metrics (around 50% higher in AUC). These results suggest that our method is promising and can be used as a complementary technique for fast exploration of citrullination in arginine residue. A user-friendly web server has also been deployed at http://research.ru.ac.bd/predCitru-Site/ for the convenience of experimental scientists.
-
-
-
Sequence-based Structural B-cell Epitope Prediction by Using Two Layer SVM Model and Association Rule Features
Authors: Jehn-Hwa Kuo, Chi-Chang Chang, Chi-Wei Chen, Heng-Hao Liang, Chih-Yen Chang and Yen-Wei ChuBackground: Immune reaction is the most important defense mechanism for destroying invading pathogens in our body, and the epitope is the position of the antigen–antibody interaction on pathogenic proteins. Objective: The majority of epitopes are structural; however, the existing sequence-based predicting websites still have several methods to improve the predicting performance. Therefore, in this study, we used SVM as a machine learning tool to predict the epitope-based on protein sequences. Methods: Firstly, we built five SVM models in the first layer according to five features, including binary composition, position-specific scoring matrix, secondary structure, accessible surface area, and association rule, and then chose the patterns that exhibited the best performance in each model. Secondly, using the confidence score of the first-layer models as the input value for the SVM model in the second layer, that SVM model was integrated into the first-layer SVM models for improving the predicting accuracy. Results: The final prediction model was able to achieve up to 63% accuracy in predicting epitope results, and the predicting performance was better than that achieved by the existing predicting websites. Conclusion: Finally, a case study using a two-subunit cytochrome c oxidase of Paracoccus denitrificans was tested, achieving an accuracy of up to 66%.
-
-
-
Mutation Mechanisms of Breast Cancer among the Female Population in China
Authors: Asmaa Amer, Ahmed Nagah, Tianhai Tian and Xinan ZhangBackground: Cancer is a genetic disease caused by the accumulation of gene mutations. It is important to derive the number of driver mutations that are needed for the development of human breast cancer, which may provide insights into the tumor diagnosis and therapy. Objective: This work is designed to investigate whether there is any difference for the mutation mechanism of breast cancer between the patients in the USA and those in China. We study the mechanisms of breast cancer development in China, and then compare these mechanisms with those in the USA. Methods: This work designed a multistage model including both gene mutation and clonal expansion of intermediate cells to fit the dataset of breast cancer in China from 2004 to 2009. Results: Our simulation results show that the maximum number of driver mutations for breast epithelium stem cells of females in China is 13 which is less than the 14 driver mutations of females in the USA. In addition, the two-hit model is the optimal one for the tumorigenesis of females in China, which is also different from the three-hit model that was predicted as the optimal model for the tumorigenesis of females in the USA. Conclusion: The differences of the mutation mechanisms between China and the USA reflect a variety of lifestyle, genetic influences, environmental exposure, and the availability of mammography screening.
-
-
-
Stability Analysis at Key Positions of EGFR Related to Non-small Cell Lung Cancer
Authors: Avirup Ghosh and Hong YanBackground: Mutations in a protein called the Epidermal Growth Factor Receptor (EGFR) can cause Non-Small Cell Lung Cancer (NSCLC), which is the most common form of lung cancer. Many NSCLC cases arise from the L858R mutation, where Leucine (L) is replaced by arginine (R) at the 858th position in the EGFR, and that is also recognized as an exon 21 substitution. Moreover, half of the EKFR-mutated lung cancer patients develop acquired resistance to the first-generation EGFR-TKIs due to another mutation T790M. Objective: In this research work, a novel method is used to investigate the possible reason for the EGFR mutation to takes place in the specific 858th and 790th position, and also, we evaluated the hydrogen bonds to measure the overall stability of different structures. Methods: We performed the molecular dynamics simulation and used Amber tool to achieve our primary objectives and later we use CPPTRAJ to analyze other changes in the hydrogen bonds for different mutational structures of EGFR. Results: First, we investigated the hydrogen bonds in different positions in the EGFR kinase domain and estimated why the first stage mutation (L858R) and resistance mutation (L858R/T790M) take place in the 858th and 790th position respectively. We found the hydrogen bond counts in the 858th and 790th position is lesser than the neighborhood positions and that yields to achieve a least stability in that position. Conclusion: Our method represents an important contribution to molecular dynamics analysis for NSCLC studies. The results obtained from this study provide a useful insight into the NSCLC drug resistance.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
