Comparative Analysis of ML-Based Outlier Detection Techniques for IoT-Based Smart Energy Management Systems

 

Parh Yong Wong1, Nayef A. M. Alduais1,*, Nurul Aswa Omar1, Salama A. Mostafa1, Abdul-Malik H. Y. Saad2, Antar Shaddad H. Abdul-Qawy3, Abdullah B. Nasser4, Waheed Ali H. M. Ghanem5

1 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor 86400, Malaysia;

2 College of Engineering, University of Buraimi, 512, Al Buraimi, Oman

3 Department of Mathematics and Computer Science, Faculty of Science, Abdulrahman Al-Sumait University, Zanzibar, Tanzania;

4 School of Technology and Innovation, University of Vaasa, 65200 Vaasa, Finland;

5 Faculty of Computer Science and Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu Terengganu, Malaysia;

 

Emails:  hi220037@student.uthm.edu.my; nayef@uthm.edu.my; nurulaswa@uthm.edu.my; salama@uthm.edu.my;  abdulmalik.h@uob.edu.om; antarabdulqawy@sumait.ac.tz; nabdulla@uwasa.f; waheedghanem@umt.edu.my

Abstract

With the development and advancement of ICST, data-driven technology such as the Internet of Things (IoT) and Smart Technology including Smart Energy Management Systems (SEMS) has become a trend in many regions and around the globe. There is no doubt that data quality and data quality problems are among the most vital topics to be addressed for a successful application of IoT-based SEMS. Poor data in such major yet delicate systems will affect the quality of life (QoL) of millions, and even cause destruction and disruption to a country. This paper aims to tackle this problem by searching for suitable outlier detection techniques from the many developed ML-based outlier detection methods. Three methods are chosen and analyzed for their performances, namely the K-Nearest Neighbour (KNN)+ Mahalanobis Distance (MD), Minimum Covariance Determinant (MCD), and Local Outlier Factor (LOF) models. Three sensor-collected datasets that are related to SEMS and with different data types are used in this research, they are pre-processed and split into training and testing datasets with manually injected outliers. The training datasets are then used for searching the patterns of the datasets through training of the models, and the trained models are then tested with the testing datasets, using the found patterns to identify and label the outliers in the datasets. All the models can accurately identify the outliers, with their average accuracies scoring over 95%. However, the average execution time used for each model varies, where the KNN+MD model has the longest average execution time at 12.99 seconds, MCD achieving 3.98 seconds for execution time, and the LOF model at 0.60 seconds, the shortest among the three.

 

Received: August 13, 2023, Revised: November 26, 2023 Accepted: April 09, 2024

 

Keywords: Internet of Things (IoT); Smart Energy Management System; Outlier Detection Techniques; Comparative Analysis; K-Nearest Neighbor (KNN); Minimum Covariance Determinant (MCD); Local Outlier Factor (LOF).