Skip to content
2000
Volume 21, Issue 8
  • ISSN: 1570-1786
  • E-ISSN: 1875-6255

Abstract

N4-methylcytosine (4mC) is one of the most important epigenetic modifications, which plays a significant role in biological progress and helps explain biological functions. Although biological experiments can identify potential 4mC sites, they are limited due to the experimental environment and labor-intensive process. Therefore, it is crucial to construct a computational model to identify the 4mC sites. Some computational methods have been proposed to identify the 4mC sites, but some problems should not be ignored, such as those presented as follows: (1) a more accurate algorithm is required to improve the prediction, especially for Matthew’s correlation coefficient (MCC); (2) easier method is needed for clinical research to design medicine or treat disease. Considering these aspects, an effective algorithm using comprehensible encoding in multiple species was proposed in this study. Since nucleotide arrangement and its property information could reflect the sequence structure and function, several feature vectors have been developed based on nucleotide energy information, trinucleotide energy information, and nucleotide chemical property information. Besides, feature effect has been analyzed to select the optimal feature vectors for multiple species. Finally, the optimal feature vectors were inputted into the CatBoost algorithm to construct the identification model. The evaluation results showed that our study obtained the highest MCC, , 2.5%~11.1%, 1.4%~17.8%, 1.1%~7.6%, and 2.3%~18.0% higher than previous models for the , and datasets, respectively. These satisfactory results reflect that the proposed method is available to identify 4mC sites in multiple species, especially for MCC. It could provide a reasonable supplement for biological research.

Loading

Article metrics loading...

/content/journals/loc/10.2174/0115701786277281231228093405
2024-08-01
2025-10-29
Loading full text...

Full text loading...

/content/journals/loc/10.2174/0115701786277281231228093405
Loading

  • Article Type:
    Research Article
Keyword(s): 4mC; CatBoost; feature selection; feature vectors; nucleotide energy; site identification
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test