Skip to content
2000
Volume 20, Issue 10
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Introduction/Objective

Hepatocellular Carcinoma (HCC) is a major disease that seriously threatens human health. Early screening can significantly improve the five-year survival rate of HCC patients. Cell-free DNA (cfDNA), as a potential carrier of cancer signals in body fluids, can be used for early cancer detection. However, current early HCC detection methods based on cfDNA sequencing require deep sequencing data, limiting their application and usage in routine disease screening. We proposed a foundational DNA language model, called CLHCC, for analyzing DNA sequences and methylation patterns to detect HCC at low sequencing depths.

Methods

CLHCC randomly selected 1500 DNA fragments from HCC-specific differentially methylated regions identified by cd-score. The model then performed a one-hot encoding strategy on these DNA fragments and input the data into a CNN combined with an LSTM neural network for classification.

Results

We tested CLHCC on 2139 target-BS data samples, achieving an accuracy of 84.59% (precision: 83.44%, recall: 81%) under 10-fold cross-validations. This performance is better than DNA language models built using CNN or LSTM alone.

Conclusion

Our study applies deep learning to analyze DNA sequences in specific methylation regions without the need for complex alignment processes. This provides new theoretical and practical guidance for clinical applications and holds promise for non-invasive early HCC screening cfDNA.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936334029250213041916
2025-02-26
2025-12-24
Loading full text...

Full text loading...

References

  1. SungH. FerlayJ. SiegelR.L. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA Cancer J. Clin.202171320924910.3322/caac.21660 33538338
    [Google Scholar]
  2. ZhouM. WangH. ZengX. Mortality, morbidity, and risk factors in China and its provinces, 1990-2017: A systematic analysis for the global burden of disease study 2017.Lancet2020396102431145115810.1016/S0140‑6736(19)30427‑1 31248666
    [Google Scholar]
  3. PiñeroF. DirchwolfM. PessôaM.G. Biomarkers in hepatocellular carcinoma: Diagnosis, prognosis and treatment response assessment.Cells202096137010.3390/cells9061370 32492896
    [Google Scholar]
  4. GreenbergM.V.C. Bourc’hisD. The diverse roles of DNA methylation in mammalian development and disease.Nat. Rev. Mol. Cell Biol.2019201059060710.1038/s41580‑019‑0159‑6 31399642
    [Google Scholar]
  5. MengH. CaoY. QinJ. DNA methylation, its mediators and genome integrity.Int. J. Biol. Sci.201511560461710.7150/ijbs.11218 25892967
    [Google Scholar]
  6. SandovalJ. HeynH. MoranS. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.Epigenetics20116669270210.4161/epi.6.6.16196 21593595
    [Google Scholar]
  7. NorwitzE.R. LevyB. Noninvasive prenatal testing: The future is now.Rev. Obstet. Gynecol.2013624862 24466384
    [Google Scholar]
  8. YuJ. GuG. JuS. Recent advances in clinical applications of circulating cell-free DNA integrity.Lab. Med.201445161210.1309/LMKKOX6UJZQGW0EA 24719978
    [Google Scholar]
  9. FernandesM.G.O. Cruz-MartinsN. Souto MouraC. Clinical application of next-generation sequencing of plasma cell-free DNA for genotyping untreated advanced non-small cell lung cancer.Cancers20211311270710.3390/cancers13112707 34070940
    [Google Scholar]
  10. PantelK. Alix-PanabièresC. Liquid biopsy and minimal residual disease — Latest advances and implications for cure.Nat. Rev. Clin. Oncol.201916740942410.1038/s41571‑019‑0187‑3 30796368
    [Google Scholar]
  11. LeonS.A. ShapiroB. SklaroffD.M. YarosM.J. Free DNA in the serum of cancer patients and the effect of therapy.Cancer Res.1977373646650 837366
    [Google Scholar]
  12. YanY.Y. Cell-free DNA: Hope and potential application in cancer.Front. Cell Dev. Biol.20219639233
    [Google Scholar]
  13. DunaevaM. Buddingh’B.C. ToesR.E.M. LuimeJ.J. LubbertsE. PruijnG.J.M. Decreased serum cell-free DNA levels in rheumatoid arthritis.Auto Immun. Highlights201561-2233010.1007/s13317‑015‑0066‑6 26113482
    [Google Scholar]
  14. BronkhorstA.J. WentzelJ.F. AucampJ. van DykE. du PlessisL. PretoriusP.J. Characterization of the cell-free DNA released by cultured cancer cells.Biochim. Biophys. Acta Mol. Cell Res.20161863115716510.1016/j.bbamcr.2015.10.022 26529550
    [Google Scholar]
  15. WeiM. WangL. LiY. BioKG-CMI: A multi-source feature fusion model based on biological knowledge graph for predicting circRNA-miRNA interactions.Sci. China Inf. Sci.202467818910410.1007/s11432‑024‑4098‑3
    [Google Scholar]
  16. WangL. WongL. YouZ-H. HuangD-S. AMDECDA: Attention mechanism combined with data ensemble strategy for predicting CircRNA-disease association.IEEE Trans. Big Data202410432032910.1109/TBDATA.2023.3334673
    [Google Scholar]
  17. CaoX. HuangY.A. YouZ.H. scPriorGraph: Constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data.Genome Biol.202425120710.1186/s13059‑024‑03357‑w 39103856
    [Google Scholar]
  18. HeS.H. YunL. YiH.C. Accurate prediction of drug combination risk levels based on relational graph convolutional network and multi-head attention.J. Transl. Med.202422157210.1186/s12967‑024‑05372‑8 38880914
    [Google Scholar]
  19. XuR. WeiW. KrawczykM. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma.Nat. Mater.201716111155116110.1038/nmat4997 29035356
    [Google Scholar]
  20. FrommerM. McDonaldL.E. MillarD.S. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands.Proc. Natl. Acad. Sci. USA19928951827183110.1073/pnas.89.5.1827 1542678
    [Google Scholar]
  21. SachdevaV. KistlerM. SpeightE. TzengT-H.K. Exploring the viability of the cell broadband Engine for bioinformatics applications.Parallel Comput.2008341161662610.1016/j.parco.2008.04.001
    [Google Scholar]
  22. NeedlemanS.B. WunschC.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Mol. Biol.197048344345310.1016/0022‑2836(70)90057‑4 5420325
    [Google Scholar]
  23. LitjensG. KooiT. BejnordiB.E. A survey on deep learning in medical image analysis.Med. Image Anal.201742608810.1016/j.media.2017.07.005 28778026
    [Google Scholar]
  24. LiptonZC BerkowitzJ ElkanC A critical review of recurrent neural networks for sequence learning.arXiv201510.48550/arXiv.1506.00019
    [Google Scholar]
  25. SrivastavaN. HintonG.E. KrizhevskyA. Dropout: A simple way to prevent neural networks from overfitting.J. Mach. Learn. Res.201415119291958
    [Google Scholar]
  26. KingmaD P BaJ. Adam: A method for stochastic optimization.arXiv201410.48550/arXiv.1412.6980
    [Google Scholar]
  27. CaruanaR. LawrenceS. GilesL.C. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping.NIPS’00: Proceedings of the 14th International Conference on Neural Information Processing SystemsDenver, CO2000381387
    [Google Scholar]
  28. MarreroJ.A. KulikL.M. SirlinC.B. Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the American association for the study of liver diseases.Clin. Liver Dis.2019131110.1002/cld.802 31391927
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936334029250213041916
Loading
/content/journals/cbio/10.2174/0115748936334029250213041916
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test