Skip to content
2000
Volume 26, Issue 13
  • ISSN: 1389-2010
  • E-ISSN: 1873-4316

Abstract

Background

Hemophilia ‘A’ (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.

Objectives

This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia ‘A,’ addressing the challenges associated with data imbalance and enhancing prediction accuracy.

Methods

The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine-learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.

Results

The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.

Conclusion

This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia ‘A’ patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.

Loading

Article metrics loading...

/content/journals/cpb/10.2174/0113892010366485250415101928
2025-04-21
2025-12-16
Loading full text...

Full text loading...

References

  1. WitmerC. YoungG. Factor VIII inhibitors in hemophilia A: Rationale and latest evidence.Ther. Adv. Hematol.201341597210.1177/204062071246450923610614
    [Google Scholar]
  2. BerntorpE. FischerK. HartD.P. MancusoM.E. StephensenD. ShapiroA.D. BlanchetteV. Haemophilia.Nat. Rev. Dis. Primers2021714510.1038/s41572‑021‑00278‑x34168126
    [Google Scholar]
  3. SeamanC. D. XavierF. RagniM. V. Hemophilia A (factor VIII deficiency).Hematology/Oncology Clinics202135611171129
    [Google Scholar]
  4. SchmittC. AdamkewiczJ.I. XuJ. PetryC. CatalaniO. YoungG. NegrierC. CallaghanM.U. LevyG.G. Pharmacokinetics and pharmacodynamics of emicizumab in persons with hemophilia A with factor VIII inhibitors: HAVEN 1 study.Thromb. Haemost.2021121335136010.1055/s‑0040‑171711433086400
    [Google Scholar]
  5. Samelson-JonesB.J. DoshiB.S. GeorgeL.A. Coagulation factor VIII: Biological basis of emerging hemophilia A therapies.Blood2024144212185219710.1182/blood.202302327539088776
    [Google Scholar]
  6. MiesbachW. EladlyF. Current and future options of Hemophilia ‘A’ treatments.Expert Opin. Biol. Ther.202121111395140210.1080/14712598.2021.190899333769892
    [Google Scholar]
  7. SinghV.K. MauryaN.S. ManiA. YadavR.S. Machine learning method using position-specific mutation based classification outperforms one hot coding for disease severity prediction in haemophilia ‘A’.Genomics202011265122512810.1016/j.ygeno.2020.09.02032927010
    [Google Scholar]
  8. KnoeblP. ThalerJ. JilmaP. QuehenbergerP. GleixnerK. SperrW.R. Emicizumab for the treatment of acquired hemophilia A.Blood2021137341041910.1182/blood.202000631532766881
    [Google Scholar]
  9. TomeoF. MarizS. BrunettaA.L. Stoyanova-BeninskaV. PenttilaK. MagrelliA. Haemophilia, state of the art and new therapeutic opportunities, a regulatory perspective.Br. J. Clin. Pharmacol.202187114183419610.1111/bcp.1483833772837
    [Google Scholar]
  10. FischerK. LassilaR. PeyvandiF. GattA. GouwS.C. HollingsworthR. LambertT. KaczmarekR. CarboneroD. MakrisM. AyC. MaleC. HermansC. VerhammeP. LissitchkovT. AntoniadesM. PenkaM. BlatnyJ. KomrskaV. PoulsenL.H. KampmannP. LehtinenA-E. SusenS. DargaudY. BironC. D’OironR. HarrocheA. KlamrothR. OldenburgJ. BuehrlenM. MiesbachW. LangerF. SpannagP. OliveriM. PlatokoukiH. NomikouE. KatsarouO. GarypidouV. EconomouM. NemesL. NolanB. O’ConnellN. PaoloR. CastamanG. PeyvandiF. RocinoA. ZanonE. TagliaferriA. AgnelliG. De CrisotofaroR. SchincoP. TosettoA. LejnieceS. GailiuteN. GattA. MäkelburgA. Laros-van GorkomB. BronsP. LeebeekF.W.G. SchutgensR. WindygaJ. CatarinoC. AiresA. FragaC. MoraisS. AraújoF. SerbanM. DavydkinI. BatorovaA. Anzej DomaS. MartinezL.S. Palomo BravoA. OrtegaI.S. BonanadS. BaghaeiF. AstermarkJ. HolmströmM. FontanaP. SchmuggeM. ZulfikarB. KavakliK. KhanM. BensonG. LesterW. AndrewP. CatherineB. PintoF. SaltaS. FarrellyC. MatthiasM. LaffanM. Thynn ThynnY. McdonaldV. AustinS. BellaM. HayC. GraingerJ. TalksK. ShapiroS. MacleanR. PayneJ. European HAemophilia Safety Surveillance (EUHASS) participants Inhibitor development according to concentrate after 50 exposure days in severe hemophilia: Data from the European HAemophilia Safety Surveillance (EUHASS).Res. Pract. Thromb. Haemost.20248410246110.1016/j.rpth.2024.10246139026659
    [Google Scholar]
  11. AnanyevaN.M. Lacroix-DesmazesS. HauserC.A.E. ShimaM. OvanesovM.V. KhrenovA.V. SaenkoE.L. Inhibitors in hemophilia A.Blood Coagul. Fibrinolysis200415210912410.1097/00001721‑200403000‑0000115090997
    [Google Scholar]
  12. Mingot-CastellanoM.E. Rodríguez-MartorellF.J. Nuñez-VázquezR.J. MarcoP. Acquired Hemophilia A: A review of what we know.J. Blood Med.20221369171010.2147/JBM.S34207736447782
    [Google Scholar]
  13. Di MinnoG. CoppolaA. MargaglioneM. RocinoA. MancusoM.E. TagliaferriA. LinariS. ZanonE. SantoroC. BiasoliC. CastamanG. SantagostinoE. MannucciP.M. AICE PROFIT Study Group Predictors of inhibitor eradication by primary immune tolerance induction in severe haemophilia A with high responding inhibitors.Haemophilia2022281556410.1111/hae.1443134727394
    [Google Scholar]
  14. CarcaoM. MancusoM.E. YoungG. Jiménez-YusteV. Key questions in the new hemophilia era: Update on concomitant use of FVIII and emicizumab in hemophilia A patients with inhibitors.Expert Rev. Hematol.202114214314810.1080/17474086.2021.187581733499681
    [Google Scholar]
  15. RocinoA. FranchiniM. CoppolaA. Treatment and prevention of bleeds in Hemophilia patients with inhibitors to factor VIII/IX.J. Clin. Med.2017644610.3390/jcm604004628420167
    [Google Scholar]
  16. NakarC. ShapiroA. Hemophilia A with inhibitor: Immune tolerance induction (ITI) in the mirror of time.Transfus. Apheresis Sci.201958557858910.1016/j.transci.2019.08.00831447396
    [Google Scholar]
  17. KemptonC.L. PayneA.B. FedewaS.A. Race, ethnicity, and immune tolerance induction in hemophilia A in the United States.Res. Pract. Thromb. Haemost.20237810225110.1016/j.rpth.2023.10225138193063
    [Google Scholar]
  18. HartD.P. AlameluJ. BhatnagarN. BissT. CollinsP.W. HallG. HayC. LiesnerR. MakrisM. MathiasM. MotwaniJ. PalmerB. PayneJ. PercyC. RichardsM. RiddellA. TalksK. TunstallO. ChalmersE. Immune tolerance induction in severe haemophilia A: A UKHCDO inhibitor and paediatric working party consensus update.Haemophilia202127693293710.1111/hae.1438134403546
    [Google Scholar]
  19. GouwS.C. van den BergH.M. OldenburgJ. AstermarkJ. de GrootP.G. MargaglioneM. ThompsonA.R. van HeerdeW. BoekhorstJ. MillerC.H. le CessieS. van der BomJ.G. F8 gene mutation type and inhibitor development in patients with severe hemophilia A: systematic review and meta-analysis.Blood2012119122922293410.1182/blood‑2011‑09‑37945322282501
    [Google Scholar]
  20. PshenichnikovaO.S. SurinV.L. Genetic risk factors for inhibitor development in Hemophilia A.Russ. J. Genet.202157886787710.1134/S1022795421080111
    [Google Scholar]
  21. McGillJ.R. SimhadriV.L. SaunaZ.E. HLA variants and inhibitor development in hemophilia A: A retrospective case-controlled study using the ATHNdataset.Front. Med.2021866339610.3389/fmed.2021.66339634026790
    [Google Scholar]
  22. AhmedA.E. PrattK.P. Race, ethnicity, F8 variants, and inhibitor risk: Analysis of the “My Life Our Future” hemophilia A database.J. Thromb. Haemost.202321480081310.1016/j.jtha.2022.12.01736696179
    [Google Scholar]
  23. RawalA. KidchobC. OuJ. YogurtcuO.N. YangH. SaunaZ.E. A machine learning approach for identifying variables associated with risk of developing neutralizing antidrug antibodies to factor VIII.Heliyon202396e1633110.1016/j.heliyon.2023.e1633137251488
    [Google Scholar]
  24. JardimL.L. SchieberT.A. SantanaM.P. CerqueiraM.H. LorenzatoC.S. FrancoV.K.B. ZuccheratoL.W. da Silva SantosB.A. ChavesD.G. RavettiM.G. RezendeS.M. Prediction of inhibitor development in previously untreated and minimally treated children with severe and moderately severe hemophilia A using a machine-learning network.J. Thromb. Haemost.20242292426243710.1016/j.jtha.2024.05.01738810700
    [Google Scholar]
  25. MatinoD. TieuP. ChanA. Molecular mechanisms of inhibitor development in hemophilia.Mediterr. J. Hematol. Infect. Dis.2020121e202000110.4084/mjhid.2020.00131934311
    [Google Scholar]
  26. LillicrapD. FijnvandraatK. YoungG. MancusoM.E. Patients with hemophilia A and inhibitors: Prevention and evolving treatment paradigms.Expert Rev. Hematol.202013431332110.1080/17474086.2020.173951832186928
    [Google Scholar]
  27. da Silva LopesT.J. PinottiM. BernardiF. BalestraD. Prediction of inhibitor risk in Hemophilia ‘A’ using machine learning.HEMOPHILIA202430527878
    [Google Scholar]
  28. HuJ. ChenL. RogersB. ChandlerM. SantosJ. Application of artificial intelligence and machine learning on predicting poor outcomes in hemophilia patients.Preprints202410.21203/rs.3.rs‑3837724/v1
    [Google Scholar]
  29. HuJ. LuC. RogersB. ChandlerM. SantosJ. Application of artificial intelligence and machine learning was not able to reliably predict poor outcomes in people with hemophilia.Cureus2024168e6681010.7759/cureus.6681039280395
    [Google Scholar]
  30. PayneA.B. MillerC.H. KellyF.M. Michael SoucieJ. Craig HooperW. The CDC hemophilia a mutation project (CHAMP) mutation list: A new online resource.Hum. Mutat.2013342E2382E239210.1002/humu.2224723280990
    [Google Scholar]
  31. AlkharusiH. Categorical variables in regression analysis: A comparison of dummy and effect coding.Int. J. Educ.20124220210.5296/ije.v4i2.1962
    [Google Scholar]
  32. DalyA. DekkerT. HessS. Dummy coding vs effects coding for categorical variables: Clarifications and extensions.J. Choice Modelling2016213641
    [Google Scholar]
  33. GaravagliaS. SharmaA. A smart guide to dummy variables: Four applications and a macro.Proceedings of the northeast SAS users group conferenceOctober, 1998.
    [Google Scholar]
  34. SegerC. An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.2018
    [Google Scholar]
  35. BerryK.J. MielkeP.W.Jr IyerH.K. Factorial designs and dummy coding.Percept. Mot. Skills199887391992710.2466/pms.1998.87.3.919
    [Google Scholar]
  36. MohammedR. RawashdehJ. AbdullahM. Machine learning with oversampling and undersampling techniques: Overview study and experimental results.2020 11th International Conference on Information and Communication Systems (ICICS)Irbid, Jordan, 07-09 April 2020, pp. 243-248.10.1109/ICICS49469.2020.239556
    [Google Scholar]
  37. GhazikhaniA. YazdiH.S. MonsefiR. Class imbalance handling using wrapper-based random oversampling.20th Iranian Conference on Electrical Engineering (ICEE2012)Tehran, Iran, 15-17 May 2012, pp. 611-616.10.1109/IranianCEE.2012.6292428
    [Google Scholar]
  38. MoreoA. EsuliA. SebastianiF. Distributional random oversampling for imbalanced text classification.Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalPisa, Italy, 07 July 2016, pp. 805 - 808.10.1145/2911451.2914722
    [Google Scholar]
  39. CaloV.M. EfendievY. GalvisJ. LiG. Randomized oversampling for generalized multiscale finite element methods.Multiscale Model. Simul.201614148250110.1137/140988826
    [Google Scholar]
  40. ZhengZ. CaiY. LiY. Oversampling method for imbalanced classification.Comput. Inf.201534510171037
    [Google Scholar]
  41. ShelkeM.S. DeshmukhP.R. ShandilyaV.K. A review on imbalanced data handling using undersampling and oversampling technique.Int. J. Recent Trends Eng. Res.20173444444910.23883/IJRTER.2017.3168.0UWXM
    [Google Scholar]
  42. BreimanL. Random forests.Mach. Learn.200145153210.1023/A:1010933404324
    [Google Scholar]
  43. BiauG. ScornetE. A random forest guided tour.Test201625219722710.1007/s11749‑016‑0481‑7
    [Google Scholar]
  44. PalimkarP. ShawR.N. GhoshA. Machine learning technique to prognosis diabetes disease: Random forest classifier approach.Advanced computing and intelligent technologies: Proceedings of ICACIT 2021.Springer Singapore202221924410.1007/978‑981‑16‑2164‑2_19
    [Google Scholar]
  45. AdetunjiA.B. AkandeO.N. AjalaF.A. OyewoO. AkandeY.F. OluwadaraG. House price prediction using random forest machine learning technique.Procedia Comput. Sci.202219980681310.1016/j.procs.2022.01.100
    [Google Scholar]
  46. ChenT. GuestrinC. Xgboost: A scalable tree boosting system.Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data miningSan Francisco, California, USA, 13 August 2016, pp. 785 - 794.10.1145/2939672.293978
    [Google Scholar]
  47. LiJ. AnX. LiQ. WangC. YuH. ZhouX. GengY. Application of XGBoost algorithm in the optimization of pollutant concentration.Atmos. Res.202227610623810.1016/j.atmosres.2022.106238
    [Google Scholar]
  48. ZhangP. JiaY. ShangY. Research and application of XGBoost in imbalanced data.Int. J. Distrib. Sens. Netw.202218610.1177/15501329221106935
    [Google Scholar]
  49. NielsenD. Tree boosting with xgboost-why does xgboost win" every" machine learning competition?Master's thesis, NTNU2016
    [Google Scholar]
  50. BentéjacC. CsörgőA. Martínez-MuñozG. A comparative analysis of gradient boosting algorithms.Artif. Intell. Rev.20215431937196710.1007/s10462‑020‑09896‑5
    [Google Scholar]
  51. DorogushA.V. ErshovV. GulinA. CatBoost: gradient boosting with categorical features support.arXiv preprint20181810.11363
    [Google Scholar]
  52. ProkhorenkovaL. GusevG. VorobevA. DorogushA.V. GulinA. CatBoost: Unbiased boosting with categorical features.Adv. Neural Inf. Process. Syst.201831
    [Google Scholar]
  53. HancockJ.T. KhoshgoftaarT.M. CatBoost for big data: An interdisciplinary review.J. Big Data2020719410.1186/s40537‑020‑00369‑833169094
    [Google Scholar]
  54. LaValleyM.P. Logistic regression.Circulation2008117182395239910.1161/CIRCULATIONAHA.106.68265818458181
    [Google Scholar]
  55. SchoberP. VetterT.R. Logistic regression in medical research.Anesth. Analg.2021132236536610.1213/ANE.000000000000524733449558
    [Google Scholar]
  56. PandaN.R. PatiJ.K. MohantyJ.N. BhuyanR. A review on logistic regression in medical research.Natl. J. Community Med.202213426527010.55489/njcm.134202222
    [Google Scholar]
  57. SperandeiS. Understanding logistic regression analysis.Biochem. Med.2014241121810.11613/BM.2014.00324627710
    [Google Scholar]
  58. NatekinA. KnollA. Gradient boosting machines, a tutorial.Front. Neurorobot.201372110.3389/fnbot.2013.0002124409142
    [Google Scholar]
  59. FriedmanJ.H. Stochastic gradient boosting.Comput. Stat. Data Anal.200238436737810.1016/S0167‑9473(01)00065‑2
    [Google Scholar]
  60. LiC. A gentle introduction to gradient boosting.2016Available from: http://www. ccs. neu. edu/home/vip/teach/MLcourse/4_ boosting/slides/gradient_boosting. pdf
  61. KeG. MengQ. FinleyT. WangT. ChenW. MaW. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems.Long Beach, 4-9 December 2017, 3149-3157.10.5555/3294996.3295074
    [Google Scholar]
  62. PurushothamS. TripathyB.K. Evaluation of classifier models using stratified tenfold cross validation techniques.International conference on computing and communication systemsBerlin, Heidelberg, December 2011, pp. 680-690.
    [Google Scholar]
  63. PrustyS. PatnaikS. DashS.K. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer.Front. Nanotechnol.2022497242110.3389/fnano.2022.972421
    [Google Scholar]
  64. WidodoS. BrawijayaH. SamudiS. Stratified K-fold cross validation optimization on machine learning for prediction.Sinkron: J. Res. Informatics Eng.20226424072414
    [Google Scholar]
  65. T RM. vV.K. vD.K. GemanO. MargalaM. GuduriM. The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification.Healthc. Anal.2023410024710.1016/j.health.2023.100247
    [Google Scholar]
  66. ZengX. MartinezT.R. Distribution-balanced stratified cross-validation for accuracy estimation.J. Exp. Theor. Artif. Intell.200012111210.1080/095281300146272
    [Google Scholar]
  67. MishraD.P. GuptaH.K. SaajithG. BagR. Optimizing heart disease prediction model with gridsearchcv for hyperparameter tuning.2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU)Bhubaneswar, India, 2024, pp. 1-6.10.1109/IC‑CGU58078.2024.10530772
    [Google Scholar]
  68. KartiniD. NugrahadiD.T. FarmadiA. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers.2021 4th International Conference of Computer and Informatics Engineering (IC2IE)Depok, Indonesia, 14-15 September 2021, pp. 390-395.10.1109/IC2IE53219.2021.9649207
    [Google Scholar]
  69. AlemerienK. AlsarayrehS. AltarawnehE. Diagnosing cardiovascular diseases using optimized machine learning algorithms with gridsearchCV. J. Appl. Data Sci.2024541539155210.47738/jads.v5i4.280
    [Google Scholar]
  70. LaValleS.M. BranickyM.S. LindemannS.R. On the relationship between classical grid search and probabilistic roadmaps.Int. J. Robot. Res.2004237-867369210.1177/0278364904045481
    [Google Scholar]
  71. NovakovićJ.D. VeljovićA. IlićS.S. PapićŽ. TomovićM. Evaluation of classification models in machine learning.Theory Appl. Math. Comput. Sci.20177139
    [Google Scholar]
  72. TharwatA. Classification assessment methods.Appl. Comput. Informatics2021171168192
    [Google Scholar]
  73. VujovićŽ.Ð. Classification model evaluation metrics.Int. J. Adv. Comput. Sci. Appl.202112659960610.14569/IJACSA.2021.0120670
    [Google Scholar]
  74. Kelley PaceR. BarryR. Sparse spatial autoregressions.Stat. Probab. Lett.199733329129710.1016/S0167‑7152(96)00140‑X
    [Google Scholar]
  75. BhattacharyaA. Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more.Packt Publishing Ltd.2022
    [Google Scholar]
  76. MakumburaR.K. MampitiyaL. RathnayakeN. MeddageD.P.P. HennaS. DangT.L. HoshinoY. RathnayakeU. Advancing water quality assessment and prediction using machine learning models, coupled with explainable artificial intelligence (XAI) techniques like shapley additive explanations (SHAP) for interpreting the black-box nature.Results Eng.20242310283110.1016/j.rineng.2024.102831
    [Google Scholar]
  77. MoscaE. SzigetiF. TragianniS. GallagherD. GrohG. SHAP-based explanation methods: A review for NLP interpretability.Proceedings of the 29th international conference on computational linguisticsGyeongju, Republic of Korea, October 2022, pp. 4593–4603
    [Google Scholar]
  78. MinhD. WangH.X. LiY.F. NguyenT.N. Explainable artificial intelligence: A comprehensive review.Artif. Intell. Rev.2022166
    [Google Scholar]
/content/journals/cpb/10.2174/0113892010366485250415101928
Loading
/content/journals/cpb/10.2174/0113892010366485250415101928
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test