Skip to content
2000
image of Mahalanobis Distance-Based Supervised and Semi-Supervised Machine Learning Methods for Anomaly Detection in IoT Sensor Data

Abstract

Introduction

The data collected in Internet of Things (IoT) applications consist of unreliable and erroneous data due to their deployment in harsh or unattended environments. Such data is considered an anomaly as it deviates from the regular data. These anomalies need to be identified correctly to enhance decision-making. For this purpose, machine learning techniques have gained significant attention due to their ability to classify the data into normal and abnormal (or anomaly). Methods: This work proposes novel adaptations to supervised and semi-supervised machine learning algorithms by integrating the Mahalanobis Distance (MD) metric. These adapted algorithms are named as Mahalanobis Binary Classification (M-BC) and Mahalanobis One Class Classification (M-OCC). The performance of these proposed algorithms was evaluated on well-known IoT sensor datasets using performance metrics such as balanced accuracy, F1-Score, and AUC-ROC score.

Results

The results show that the M-BC algorithm exhibits significant improvements over conventional machine learning methods across several datasets considered in this study, including SHM4, MHM1, Occupancy, and Timeseries. The M-BC achieved an average improvement of 13.03% in balanced accuracy, 10.29% in F1-Score, and 13.16% in AUC score. Similarly, the M-OCC algorithm demonstrated substantial gains in one-class classification, with an average improvement of 21.07% in balanced accuracy, 26.49% in F1-Score, and 26% in AUC score across datasets such as AnomIoT, IBRL, SHM4, MHM1, Occupancy, and Timeseries compared to OCSVM.

Discussion

The results confirm that the proposed MD-based approaches are found to be simple, effective, and more accurate for detecting anomalies in IoT sensor data compared to their base methods. The integration of the MD metric significantly enhanced the ability of the algorithms to identify anomalous data points across various IoT domains.

Conclusion

The work presented successfully demonstrated the incorporation of the Mahalanobis distance into binary and one-class classification algorithms to improve anomaly detection performance. These M-BC and M-OCC algorithms show a robust and efficient solution to ensure data reliability in IoT sensor networks.

Loading

Article metrics loading...

/content/journals/swcc/10.2174/0122103279403481251014104800
2026-01-21
2026-02-22
Loading full text...

Full text loading...

References

  1. DeMedeiros K. Hendawi A. Alvarez M. A survey of AI-based anomaly detection in IoT and sensor networks. Sensors 2023 23 3 1352 10.3390/s23031352 36772393
    [Google Scholar]
  2. Adhikari D. Jiang W. Zhan J. Rawat D.B. Bhattarai A. Recent advances in anomaly detection in Internet of Things: Status, challenges, and perspectives. Comput. Sci. Rev. 2024 54 100665 10.1016/j.cosrev.2024.100665
    [Google Scholar]
  3. Al Samara M. Bennis I. Abouaissa A. Lorenz P. Enhanced efficient outlier detection and classification approach for WSNs. Simul. Model. Pract. Theory 2022 120 102618 10.1016/j.simpat.2022.102618
    [Google Scholar]
  4. Chandola V. Banerjee A. Kumar V. Anomaly detection. ACM Comput. Surv. 2009 41 3 1 58 10.1145/1541880.1541882
    [Google Scholar]
  5. Chatterjee A. Ahmed B.S. IoT anomaly detection methods and applications: A survey. Internet Things 2022 19 100568 10.1016/j.iot.2022.100568
    [Google Scholar]
  6. Xu X. Liu H. Yao M. Recent progress of anomaly detection. Complexity 2019 2019 1 2686378 10.1155/2019/2686378
    [Google Scholar]
  7. Cook A.A. Misirli G. Fan Z. Anomaly detection for iot time-series data: A survey. IEEE Internet Things J. 2020 7 7 6481 6494 10.1109/JIOT.2019.2958185
    [Google Scholar]
  8. Wang Z. Ding H. Pan L. Li J. Gong Z. Yu P.S. From cluster assumption to graph convolution: Graph-based semi-supervised learning revisited. arXiv:230913599 2024 10.48550/arXiv.2309.13599
    [Google Scholar]
  9. Wang Z. Ye X. Wang C. Cui J. Yu P.S. Network embedding with completely-imbalanced labels. IEEE Trans. Knowl. Data Eng. 2021 33 11 3634 3647 10.1109/TKDE.2020.2971490
    [Google Scholar]
  10. Wang Z Wang J Guo Y Gong Z Zero-shot node classification with decomposed graph prototype network. 2021 1769 1779 10.1145/3447548.3467230
  11. Poornima I.G.A. Paramasivan B. Anomaly detection in wireless sensor network using machine learning algorithm. Comput. Commun. 2020 151 331 337 10.1016/j.comcom.2020.01.005
    [Google Scholar]
  12. Guembe B. Azeta A. Misra S. Garg L. Multivariate and univariate anomaly detection in machine learning: A bibliometric analysis. In: Garg L, Ed. Lecture Notes in Networks and Systems. Cham: Springer 2023; 671. In: ISMS 2022; Cham: Springer 2023 10.1007/978‑3‑031‑31153‑6_29
    [Google Scholar]
  13. Samara M.A. Bennis I. Abouaissa A. Lorenz P. A survey of outlier detection techniques in IoT: Review and classification. J Sen Actuat Net 2022 11 1 4 10.3390/jsan11010004
    [Google Scholar]
  14. Jain M. Kaur G. Saxena V. A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection. Expert Syst. Appl. 2022 193 116510 10.1016/j.eswa.2022.116510
    [Google Scholar]
  15. McLachlan G.J. Mahalanobis distance. Resonance 1999 4 6 20 26 10.1007/BF02834632
    [Google Scholar]
  16. Berrendero J.R. Bueno-Larraz B. Cuevas A. On Mahalanobis distance in functional settings. J. Mach. Learn. Res. 2020 21 1 288 320
    [Google Scholar]
  17. De Maesschalck R. Jouan-Rimbaud D. Massart D.L. The Mahalanobis distance. Chemom. Intell. Lab. Syst. 2000 50 1 1 18 10.1016/S0169‑7439(99)00047‑7
    [Google Scholar]
  18. Aggarwal C.C. Outlier Analysis. Cham Springer International Publishing 2017 10.1007/978‑3‑319‑47578‑3
    [Google Scholar]
  19. Hubert M. Debruyne M. Rousseeuw P.J. Minimum covariance determinant and extensions. Wiley Interdiscip. Rev. Comput. Stat. 2018 10 3 1421 10.1002/wics.1421
    [Google Scholar]
  20. Sikder M.N.K. Batarseh F.A. Outlier detection using AI: A survey. In: AI Assur. Assur A.I. United States Academic Press 2023 231 291 10.1016/B978‑0‑32‑391919‑7.00020‑2
    [Google Scholar]
  21. Dashdondov K. Kim M-H. Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction. Neural Process. Lett. 2021 1 13 10.1007/s11063‑021‑10663‑y
    [Google Scholar]
  22. Daneshgadeh Çakmakçı S. Kemmerich T. Ahmed T. Baykal N. Online DDoS attack detection using Mahalanobis distance and Kernel-based learning algorithm. J. Netw. Comput. Appl. 2020 168 102756 10.1016/j.jnca.2020.102756
    [Google Scholar]
  23. Chen C. Zhang J. Guo J. Advancing soil microplastics detection: Insights from hyperspectral imaging technology. J Comput Technol Softw 2024 1 1 17 21
    [Google Scholar]
  24. Suthaharan S. Alzahrani M. Rajasegarar S. Leckie C. Palaniswami M. Labelled data collection for anomaly detection in wireless sensor networks. Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing Brisbane, QLD, Australia 07-10 December 2010 2010
    [Google Scholar]
  25. Time series. 2020 Available from:https://www.kaggle.com/code/drscarlat/anomaly-detection-in-multivariate-time-series
  26. Candanedo L. Occupancy detection. 2016 Available from:http://dx.doi.org/10.24432/C5X01N
  27. AnomIoT AnomIoT 2021 Available from:https://www.kaggle.com/datasets/hkayan/anomliot
  28. IBRL IBRL 2022 Available from:http://db.csail.mit.edu/labdata/labdata.html
  29. Vareldzhan G. Yurkov K. Ushenin K. Anomaly detection in image datasets using convolutional neural networks, center loss, and mahalanobis distance. Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) 2021 May 2021 387 390 10.1109/USBEREIT51232.2021.9455004
    [Google Scholar]
  30. Sarmadi H. Karamodin A. A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects. Mech. Syst. Signal Process. 2020 140 106495 10.1016/j.ymssp.2019.106495
    [Google Scholar]
  31. Hadi A.S. Imon A.H.M.R. Werner M. Detection of outliers. Wiley Interdiscip. Rev. Comput. Stat. 2009 1 1 57 70 10.1002/wics.6
    [Google Scholar]
  32. Cui L. Xia Y. Lang L. Hou B. Wang L. The Dual Mahalanobis-kernel LSSVM for Semi-supervised Classification in Disease Diagnosis. Arab. J. Sci. Eng. 2024 49 9 12357 12375 10.1007/s13369‑023‑08626‑7
    [Google Scholar]
  33. Duong N.H. Hai H.D. A semi-supervised model for network traffic anomaly detection. 2015 17th International Conference on Ad18 vanced Communication Technology (ICACT). PyeongChang, Korea (South), 01-03 July 2015 10.1109/ICACT.2015.7224759
    [Google Scholar]
  34. Ji H. Statistics Mahalanobis distance for incipient sensor fault detection and diagnosis. Chem. Eng. Sci. 2021 230 116233 10.1016/j.ces.2020.116233
    [Google Scholar]
  35. Roizman V. Jonckheere M. Pascal F. Robust clustering and outlier rejection using the Mahalanobis distance distribution. 28th European Signal Processing Conference (EUSIPCO) 2020 Amsterdam, Netherlands 18-21 January 2021 2448 2452 10.23919/Eusipco47968.2020.9287356
    [Google Scholar]
  36. Ahn J. Lee M.H. Lee J.A. Distance-based outlier detection for high dimension, low sample size data. J. Appl. Stat. 2019 46 1 13 29 10.1080/02664763.2018.1452901
    [Google Scholar]
  37. Vareldzhan G. Yurkov K. Ushenin K. Anomaly detection in image datasets using convolutional neural networks, center loss, and mahalanobis distance. arXiv:210406193 2021 10.1109/USBEREIT51232.2021.9455004
    [Google Scholar]
  38. Kamoi R Kobayashi K Why is the mahalanobis distance effective for anomaly detection? arXiv200300402 2020
  39. Todeschini R. Ballabio D. Consonni V. Sahigara F. Filzmoser P. Locally centred Mahalanobis distance: A new distance measure with salient features towards outlier detection. Anal. Chim. Acta 2013 787 1 9 10.1016/j.aca.2013.04.034 23830416
    [Google Scholar]
  40. Thennadil S.N. Dewar M. Herdsman C. Nordon A. Becker E. Automated weighted outlier detection technique for multivariate data. Control Eng. Pract. 2018 70 40 49 10.1016/j.conengprac.2017.09.018
    [Google Scholar]
  41. Koren M. Koren O. Peretz O. Weighted distance classification method based on data intelligence. Expert Syst. 2024 41 2 13486 10.1111/exsy.13486
    [Google Scholar]
  42. Ghosh A Ghosh A K Classification using global and local mahalanobis distances. arXiv240208283 2024
    [Google Scholar]
  43. Maitra S. A data mining-based dynamical anomaly detection method for integrating with an advance metering system. arXiv240502574 2024
    [Google Scholar]
  44. Simlai P.E. Risk characterization of firms with esg attributes using a supervised machine learning method. J. Risk Financ. Manag. 2024 17 5 211 10.3390/jrfm17050211
    [Google Scholar]
  45. Shrivastava A. Vamsi P.R. Improving anomaly classification using combined data transformation and machine learning methods. Int. J. Perform. Eng. 2024 20 2
    [Google Scholar]
  46. Tonini S Vandin A Chiaromonte F Licari D Barsacchi F Accurate and fast anomaly detection in industrial processes and IoT environments. arXiv240417925 2024
    [Google Scholar]
  47. Vilaça E.S.C. Vieira T.P.B. de Sousa R.T. da Costa J.P.C.L. Botnet traffic detection using RPCA and mahalanobis distance. 2019 Workshop on Communication Networks and Power Systems (WCNPS) 2019 1 6 10.1109/WCNPS.2019.8896228
    [Google Scholar]
  48. Wang Z. Yao L. Cai Y. Zhang J. Mahalanobis semi-supervised mapping and beetle antennae search based support vector machine for wind turbine rolling bearings fault diagnosis. Renew. Energy 2020 155 1312 1327 10.1016/j.renene.2020.04.041
    [Google Scholar]
  49. Safaei M. Driss M. Boulila W. Sundararajan E.A. Safaei M. Global outliers detection in wireless sensor networks: A novel approach integrating time‐series analysis, entropy, and random forest‐based classification. Softw. Pract. Exper. 2022 52 1 277 295 10.1002/spe.3020
    [Google Scholar]
  50. Wei Y. Jang-Jaccard J. Xu W. Sabrina F. Camtepe S. Boulic M. LSTM-autoencoder-based anomaly detection for indoor air quality time-series data. IEEE Sens. J. 2023 23 4 3787 3800 10.1109/JSEN.2022.3230361
    [Google Scholar]
  51. Carrington A.M. Manuel D.G. Fieguth P.W. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Trans. Pattern Anal. Mach. Intell. 2023 45 1 329 341 10.1109/TPAMI.2022.3145392 35077357
    [Google Scholar]
  52. Gavel S. Raghuvanshi A.S. Tiwari S. Comparative study of anomaly detection in wireless sensor networks using different kernel functions. In: Dutta D, Kar H, Eds. Lecture Notes in Electrical Engineering. Dutta D. Kar H. Singapore Springer 2020 587 10.1007/978‑981‑32‑9775‑3_8
    [Google Scholar]
  53. Ergen T. Kozat S.S. Unsupervised anomaly detection with LSTM neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020 31 8 3127 3141 10.1109/TNNLS.2019.2935975 31536024
    [Google Scholar]
  54. Geglio A Hedayati E Tascillo M Anderson D Barker J Havens T C C. Deep convolutional autoencoder for assessment of anomalies in multi-stream sensor data. arXiv220207592 2022
    [Google Scholar]
  55. Vamsi P.R. Chahuan A. Machine learning based hybrid model for fault detection in wireless sensors data. EAI End Trans Scalable Inf Syst 2020 7 24 e6 e6
    [Google Scholar]
  56. Rassam M.A. Zainal A. Maarof M.A. One-class principal component classifier for anomaly detection in wireless sensor network. Fourth International Conference on Computational Aspects of Social Networks (CASoN) 2012 Sao Carlos, Brazil 21-23 November 2012 271 276 10.1109/CASoN.2012.6412414
    [Google Scholar]
/content/journals/swcc/10.2174/0122103279403481251014104800
Loading
/content/journals/swcc/10.2174/0122103279403481251014104800
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test