Real-Time Violence Detection in Smart Cities Using Lightweight

Spatiotemporal Deep Learning Models

Muhammad Ahsan1,∗

1School of Mathematical Sciences, Jiangsu University, Jiangsu 212013, China

Email: ahsan1826@gmail.com

Abstract

Smart city infrastructure development and urban environment complexity increase the need for automated sys-

tems that detect violence immediately in surveillance footage. The current CCTV system depends on human

operators, which becomes impractical when quick response times are mandatory for extensive deployment

domains. This research develops a deep learning architecture that proposes automated detection methods for

violence and weapon activities in practical CCTV surveillance through the Smart-City CCTV Violence Detec-

tion (SCVD) dataset. The system uses MobileNetV2 as its basic convolutional framework, which can extract

spatial frame patterns through TimeDistributed layers from video sequence inputs. The features move to a

stacked Long Short-Term Memory (LSTM) network to extract the temporal-based dependencies within vio-

lent actions. The system processes video sequences with 15 frames while maintaining a pixel size of 128 ×

128 to achieve operational efficiency and representational capability. Regularization techniques Batch Nor-

malization and Dropout are used in every part of the network to improve generalization capability and limit

overfitting. The pipeline finishes through dense layers linked in full connection, followed by a sigmoid acti-

vation function to achieve binary outputs. The experiments on the SCVD dataset resulted in highly positive

outcomes. Evaluation of the model produced a 99.58% accuracy rate together with a minimal cross-entropy

loss amounting to 0.0139. This model monitoring system demonstrated exceptional performance metrics be-

cause the standard class achieved 0.99 precision and 0.99 recall alongside 0.99 F1-score, and the violent class

received a perfect score of 100 on every metric. The model proves effective for detecting and classifying vi-

olent activities with excellent reliability under diverse and complex surveillance settings. The research shows

that real-time deployment of deep learning models in intelligent city surveillance can be accomplished using

robust, compact solutions. The system design incorporates spatial along with temporal feature methodolo-

gies thus making it suitable for deployment on edge devices such as smart cameras and embedded systems.

Through its work on uniting academic models with practical deployment, this study helps create safer urban

environments by developing AI-driven public safety technologies.

Keywords: Violence detection; Smart surveillance; MobileNetV2; LSTM; CCTV analytics