An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems

Mahmoud A.; Nabil M.

doi:https://doi.org/10.54216/JCIM.180103

Full Length Article

Volume 18 • Issue 1 • PP: 21-26 • 2026

An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems

Mahmoud A. Zaher ^1*

mail

,

Nabil M. Eldakhly ²

mail

¹Asso. prof. Faculty of Artificial Intelligence and Information, Horus University (HUE), Egypt

²Asso. prof. Faculty of Computers and Information, Egypt

* Corresponding Author.

DOI https://doi.org/10.54216/JCIM.180103

format_quote Cite this article

Received: January 18, 2026 Revised: February 17, 2026 Accepted: March 28, 2026

View PDF open_in_new

Abstract

Email has been a key communication and information-management tool in contemporary organizations, yet it is also one of the most misused avenues to spam, fraud, credential theft, and malicious code delivery. Lightweight and reproducible detection models are especially useful to universities, public institutions, and small-to-medium enterprises which might not have access to costly proprietary filtering infrastructures because of the operational relevance of email security. In this paper I suggest an Explainable Hybrid SVM Framework (EHSF) to detect spam and malicious-risk email in a business information system. The framework integrates TF–IDF representation of text with lightweight risk-based email indicators, such as structural and lexical cues that can be obtained at low computation cost. An external Enron- Spam data were used so that it may be reproducible and will be checked later by the reviewers and readers. The experimentation process was coded in Python and assessed in terms of accuracy, precision, recall, F1-score, ROC-AUC, and confusion-matrix. These findings demonstrate that the suggested Linear SVM-based framework has the highest overall performance with accuracy of 0.9853, precision of 0.9818, recall of 0.9893, F1-score of 0.9855, and ROC-AUC of 0.9981 on the held-out test set. The confusion matrix shows that there were only 34 false negatives and 58 false positives which show that there was a good discrimination between ham and spam classes. Besides the predictive performance, the framework provides an interpretable layer based on the analysis of influential lexical indicators related to risky and legitimate enterprise emails. The research adds a replicable and operationally viable methodology that complies with the needs of cybersecurity and information-management, and is lightweight enough to be implemented in the real-life setting within an organization.

Keywords

Email security Spam detection Support vector machine Cybersecurity Information management Text mining Explainable machine learning

References

[1] S. S. Sayeed, M. S. Hossain, and K. Andersson, “A comprehensive survey on phishing detection using machine learning and deep learning techniques,” IEEE Access, vol. 12, pp. 15234–15258, 2024, doi: 10.1109/ACCESS.2024

[2]Anti-Phishing Working Group. Phishing Activity Trends Report, 4th Quarter 2024. https://docs. apwg.org/reports/apwg_trends_report_q4_2024.pdf, 2025.

[3] Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, George Paliouras, and Constantine D. Spyropoulos. An evaluation of naive bayesian anti-spam filtering. arXiv preprint cs/0006013, 2000.

[4] Carnegie Mellon University. Enron Email Dataset. https://www.cs.cmu.edu/~enron/, 2015.

[5] Godwin Caruana and Maozhen Li. A survey of emerging approaches to spam filtering. ACM Computing Surveys, 44(2):1–27, 2012. doi:10.1145/2089125.2089129.

[6] Gordon V. Cormack. Email spam filtering: A systematic review. Foundations and Trends in Information Retrieval, 1(4):335–455, 2008. doi:10.1561/1500000006.

[7] Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, Shafi’i Muhammad Abdulhamid, Adebayo Olusola Adetunmbi, and Opeyemi Emmanuel Ajibuwa. Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon, 5(6):e01802, 2019. doi:10.1016/j..2019.

[8] IBM. What Is Phishing? https://www.ibm.com/think/topics/phishing, accessed April 11, 2026.

[9] Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, pages 137–142. Springer, 1998. doi:10.1007/BFb0026683.

[10] Kaspersky. Kaspersky Reports Nearly 900 Million Phishing Attempts in 2024 as Cyber Threats Increase. https : / / www . kaspersky . com / about / press - releases / kaspersky - reports - nearly - 900 - million - phishing - attempts - in - 2024 - as - cyber - threats - increase, 2025.

[11] Bryan Klimt and Yiming Yang. The Enron corpus: A new dataset for email classification research. In Machine Learning: ECML 2004, pages 217–226. Springer, 2004. doi:https://doi.org/10. 1007/978-3-540-30115-8_22.

[12] Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. Spam filtering with naive bayes—which naive bayes? In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), 2006.

[13] Sunday Olusanya Olatunji. Improved email spam detection model based on support vector machines. Neural Computing and Applications, 31:691–699, 2019. doi:10.1007/s00521-017-3100-y.

[14] Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, pages 55–62. AAAI Press, 1998.

[15] Zahid Bin Siddique, Muhammad Attique Khan, Ikram Ud Din, Ahmad Almogren, Imran Mohiuddin, and Sajid Nazir. Machine learning-based detection of spam emails. Computational Intelligence and Neuroscience, 2021:6508784, 2021. doi:10.1155/2021/6508784.

[16] Guanting Tang, Jian Pei, and Wo-Shun Luk. Email mining: Tasks, common techniques, and tools. Knowledge and Information Systems, 41(1):1–31, 2014. doi:10.1007/s10115-013-0658-2.

[17] Marcel Wiechmann. MWiechmann/enron_spam_data: The Enron-Spam dataset preprocessed in a single, clean CSV file. https://github.com/MWiechmann/enron_spam_data, accessed April 11, 2026.

Cite This Article

Choose your preferred format

format_quote

Zaher, Mahmoud A., Eldakhly, Nabil M.. "An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems." Journal of Cybersecurity and Information Management, vol. Volume 18, no. Issue 1, 2026, pp. 21-26. DOI: https://doi.org/10.54216/JCIM.180103

Zaher, M., Eldakhly, N. (2026). An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems. Journal of Cybersecurity and Information Management, Volume 18(Issue 1), 21-26. DOI: https://doi.org/10.54216/JCIM.180103

Zaher, Mahmoud A., Eldakhly, Nabil M.. "An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems." Journal of Cybersecurity and Information Management Volume 18, no. Issue 1 (2026): 21-26. DOI: https://doi.org/10.54216/JCIM.180103

Zaher, M., Eldakhly, N. (2026) 'An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems', Journal of Cybersecurity and Information Management, Volume 18(Issue 1), pp. 21-26. DOI: https://doi.org/10.54216/JCIM.180103

Zaher M, Eldakhly N. An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems. Journal of Cybersecurity and Information Management. 2026;Volume 18(Issue 1):21-26. DOI: https://doi.org/10.54216/JCIM.180103

M. Zaher, N. Eldakhly, "An Explainable Hybrid SVM Framework for Spam and Malicious Email Detection in Enterprise Information Systems," Journal of Cybersecurity and Information Management, vol. Volume 18, no. Issue 1, pp. 21-26, 2026. DOI: https://doi.org/10.54216/JCIM.180103

Digital Archive Ready