Skip to content
2000
Volume 17, Issue 3
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: Dimension disaster is often associated with feature extraction. The extracted features may contain more redundant feature information, which leads to the limitation of computing ability and overfitting problems. Objective: Feature selection is an important strategy to overcome the problems from dimension disaster. In most machine learning tasks, features determine the upper limit of the model performance. Therefore, more and more feature selection methods should be developed to optimize redundant features. Methods: In this paper, we introduce a new technique to optimize sequence features based on the Binomial Distribution (BD). Firstly, the principle of the binomial distribution algorithm is introduced in detail. Then, the proposed algorithm is compared with other commonly used feature selection methods on three different types of datasets by using a Random Forest classifier with the same parameters. Results: The results confirm that BD has a promising improvement in feature selection and classification accuracy. Conclusion: Finally, we provide the source code and executable program package (http: //lingroup. cn/server/BDselect/), by which users can easily perform our algorithm in their researches.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893616666211007102747
2022-03-01
2025-02-06
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893616666211007102747
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test