Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models

Muhammad Ahsan

doi:https://doi.org/10.54216/JAIM.090202

Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models

Muhammad Ahsan ^{1
*}

1 School of Mathematical Sciences, Jiangsu University, Jiangsu 212013, China - (ahsan1826@gmail.com)

Doi: https://doi.org/10.54216/JAIM.090202

Received: January 1, 2025 Revised: February 05, 2025 Accepted: May 04, 2025

Abstract

Smart city infrastructure development and urban environment complexity increase the need for automated systems that detect violence immediately in surveillance footage. The current CCTV system depends on human operators, which becomes impractical when quick response times are mandatory for extensive deployment domains. This research develops a deep learning architecture that proposes automated detection methods for violence and weapon activities in practical CCTV surveillance through the Smart-City CCTV Violence Detection (SCVD) dataset. The system uses MobileNetV2 as its basic convolutional framework, which can extract spatial frame patterns through TimeDistributed layers from video sequence inputs. The features move to a stacked Long Short-Term Memory (LSTM) network to extract the temporal-based dependencies within violent actions. The system processes video sequences with 15 frames while maintaining a pixel size of 128128× to achieve operational efficiency and representational capability. Regularization techniques Batch Normalization and Dropout are used in every part of the network to improve generalization capability and limit overfitting. The pipeline finishes through dense layers linked in full connection, followed by a sigmoid activation function to achieve binary outputs. The experiments on the SCVD dataset resulted in highly positive outcomes. Evaluation of the model produced a 99.58% accuracy rate together with a minimal cross-entropy loss amounting to 0.0139. This model monitoring system demonstrated exceptional performance metrics because the standard class achieved 0.99 precision and 0.99 recall alongside 0.99 F1-score, and the violent class received a perfect score of 100 on every metric. The model proves effective for detecting and classifying violent activities with excellent reliability under diverse and complex surveillance settings. The research shows that real-time deployment of deep learning models in intelligent city surveillance can be accomplished using robust, compact solutions. The system design incorporates spatial along with temporal feature methodologies thus making it suitable for deployment on edge devices such as smart cameras and embedded systems. Through its work on uniting academic models with practical deployment, this study helps create safer urban environments by developing AI-driven public safety technologies.

Keywords :

Violence detection , Smart surveillance , MobileNetV2 , LSTM , CCTV analytics

References

[1] Romas Vijeikis, Vidas Raudonis, and Gintaras Dervinis. Efficient violence detection in surveillance. Sensors, 22(6):2216, 2022.

[2] Pablo Negre, Ricardo S Alonso, Alfonso Gonz´alez-Briones, Javier Prieto, and Sara Rodr´ıguez-Gonz´alez. Literature review of deep-learning-based detection of violence in video. Sensors, 24(12):4016, 2024.

[3] James Wensel, Hayat Ullah, and Arslan Munir. Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos. IEEE Access, 11:72227–72249, 2023.

[4] Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, and Stefano Soatto. Long short-term transformer for online action detection. Advances in Neural Information Processing Systems, 34:1086–1099, 2021.

[5] Waseem Ullah, Tanveer Hussain, Fath U Min Ullah, Mi Young Lee, and Sung Wook Baik. Transcnn: Hybrid cnn and transformer mechanism for surveillance anomaly detection. Engineering Applications of Artificial Intelligence, 123:106173, 2023.

[6] Xinxiao Wu, Ruiqi Wang, Jingyi Hou, Hanxi Lin, and Jiebo Luo. Spatial–temporal relation reasoning for action prediction in videos. International Journal of Computer Vision, 129(5):1484–1505, 2021.

[7] Weijun Tan and Jingfeng Liu. Detection of fights in videos: A comparison study of anomaly detection and action recognition. In European Conference on Computer Vision, pages 676–688. Springer, 2022.

[8] Huu-Thanh Duong, Viet-Tuan Le, and Vinh Truong Hoang. Deep learning-based anomaly detection in video surveillance: A survey. Sensors, 23(11):5024, 2023.

[9] Sanam Narejo, Bishwajeet Pandey, Doris Esenarro Vargas, Ciro Rodriguez, and M Rizwan Anjum. Weapon detection using yolo v3 for smart surveillance system. Mathematical Problems in Engineering, 2021(1):9975700, 2021.

[10] Palash Yuvraj Ingle and Young-Gab Kim. Real-time abnormal object detection for video surveillance in smart cities. Sensors, 22(10):3862, 2022.

[11] Toluwani Aremu, Li Zhiyuan, Reem Alameeri, Moayad Aloqaily, and Mohsen Guizani. Towards smart city security: Violence and weaponized violence detection using dcnn. arXiv, 2022.

[12] Pradeep Kumar, Guo-Liang Shih, Bo-Lin Guo, Siva Kumar Nagi, Yibeltal Chanie Manie, Cheng-Kai Yao, Michael Augustine Arockiyadoss, and Peng-Chun Peng. Enhancing smart city safety and utilizing ai expert systems for violence detection. Future Internet, 16(2):50, 2024.

[13] Mohammed Azzakhnini, Houda Saidi, Ahmed Azough, Hamid Tairi, and Hassan Qjidaa. Lavid: A lightweight and autonomous smart camera system for urban violence detection and geolocation. Com- puters, 14(4):140, 2025.

[14] Xiaohui Ren, Wenze Fan, and Yinghao Wang. Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance. Journal of Real-Time Image Processing, 21(4):112, 2024.

[15] MS Eran and H Hasranizam. The effectiveness of crime prevention using gis technology and cctv appli- cation for smart city. In Advances in Geoinformatics Technologies: Facilities and Utilities Optimization and Management for Smart City Applications, pages 59–75. Springer, 2024.

[16] Mohammed AJ Maktoof, Ibraheem HM, Mohammed A Abdul Razzaq, Ahmed Abbas, and Ali Majdi. Machine learning-based intelligent video surveillance in smart city framework. Fusion: Practice & Applications, 11(2), 2023.

[17] Himani Sharma and Navdeep Kanwal. Video surveillance in smart cities: current status, challenges & future directions. Multimedia Tools and Applications, pages 1–46, 2024.

[18] Surbhi Mathur and Krittika Sood. Policing perspective on pre-emptive and probative value of cctv archi- tecture in the security of the smart city-gandhinagar, gujarat. International Journal of Electronic Security and Digital Forensics, 15(3):252–258, 2023.

[19] M Humera Khanam and R Roopa. Hybrid deep learning models for anomaly detection in cctv video surveillance. In 2025 4th International Conference on Sentiment Analysis and Deep Learning (ICSADL), pages 1345–1351. IEEE, 2025.

[20] Kenneth Rodrigues, Glen Dsouza, Omkar Phansopkar, and Pramod Bide. Enhancing cctv violence detection: A comparative study of deep learning models for violence detection in surveillance videos. In 2024 IEEE 8th International Conference on Information and Communication Technology (CICT), pages 1–6. IEEE, 2024.

[21] Pin Wang, Peng Wang, and En Fan. Violence detection and face recognition based on deep learning. Pattern Recognition Letters, 142:20–24, 2021.

[22] Mann Patel. Real-time violence detection using cnn-lstm. arXiv preprint arXiv:2107.07578, 2021.

[23] Christo El Morr, Manar Jammal, Hossam Ali-Hassan, and Walid El-Hallak. Data preprocessing. In Machine learning for practical decision making: a multidisciplinary perspective with applications from healthcare, engineering and business analytics, pages 117–163. Springer, 2022.

[24] S. Alam and N. Yao. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Computational and Mathematical Organization Theory, 25(3):319–335, 2019.

[25] Azal Ahmad Khan. Balanced split: A new train-test data splitting strategy for imbalanced datasets. arXiv preprint arXiv:2212.11116, 2022.

[26] Amir Mosavi, Sina Ardabili, and Annamaria R Varkonyi-Koczy. List of deep learning models. In Inter- national conference on global research and education, pages 202–214. Springer, 2019.

[27] Shubhangi Kale. Violence detection through surveillance videos using combination of vgg16 and lstm. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pages 1–5. IEEE, 2024.

[28] C Shripriya, J Akshaya, R Sowmya, and M Poonkodi. Violence detection system using resnet. In 2021

5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), pages 1069–1072. IEEE, 2021.

[29] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.

[30] S Zargar. Introduction to sequence learning models: Rnn, lstm, gru. Department of Mechanical and Aerospace Engineering, North Carolina State University, 2021.

Cite This Article As :

Ahsan, Muhammad. Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models. Journal of Artificial Intelligence and Metaheuristics, vol. , no. , 2025, pp. 19-36. DOI: https://doi.org/10.54216/JAIM.090202

Ahsan, M. (2025). Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models. Journal of Artificial Intelligence and Metaheuristics, (), 19-36. DOI: https://doi.org/10.54216/JAIM.090202

Ahsan, Muhammad. Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models. Journal of Artificial Intelligence and Metaheuristics , no. (2025): 19-36. DOI: https://doi.org/10.54216/JAIM.090202

Ahsan, M. (2025) . Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models. Journal of Artificial Intelligence and Metaheuristics , () , 19-36 . DOI: https://doi.org/10.54216/JAIM.090202

Ahsan M. [2025]. Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models. Journal of Artificial Intelligence and Metaheuristics. (): 19-36. DOI: https://doi.org/10.54216/JAIM.090202

Ahsan, M. "Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models," Journal of Artificial Intelligence and Metaheuristics, vol. , no. , pp. 19-36, 2025. DOI: https://doi.org/10.54216/JAIM.090202

Journal of Artificial Intelligence and Metaheuristics

Journal DOI

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Real-Time Violence Detection in Smart Cities Using Lightweight Spatiotemporal Deep Learning Models

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download