Skip to content
2000
Volume 10, Issue 1
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Hepatocellular carcinoma (HCC) is the most common type of liver cancer worldwide and mostly occurs in viral hepatitis endemic areas such as China. Knowledge of HCC-related genes may lead to an early detection of HCC and develop molecularly targeted therapeutics, reducing mortality and improving a patient’s prognosis significantly. Therefore, it is valuable and important for us to identify common characters of HCC related genes. In this study, we proposed a computational method to predict HCC related genes based on Gene Ontology terms and KEGG terms using Random Forest (RF), in which features were optimized by maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). 224 HCC gene candidates were compiled from some databases, while 11,200non-HCC gene candidates were randomly selected from Ensemble database. 10 candidate datasets were constructed by dividing non-HCC gene candidates into 10 groups. Each gene in datasets was encoded by 13,126 features including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 615 GO terms and 11 KEGG pathways was discovered. Through analysis, we found these features were closely related to HCC, which means our method is effective for discovering HCC related genes, and it is hopeful that it can also be used to predict and analyze genes for other types of cancer.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/157489361001150309131453
2015-02-01
2025-09-04
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/157489361001150309131453
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test