Real-Time Violence Detection in Smart Cities Using Lightweight
Spatiotemporal Deep Learning Models
Muhammad Ahsan1,∗
1School of Mathematical Sciences, Jiangsu University, Jiangsu 212013, China
Email: ahsan1826@gmail.com
Abstract
Smart city infrastructure development and urban environment complexity increase the need for automated sys-
tems that detect violence immediately in surveillance footage. The current CCTV system depends on human
operators, which becomes impractical when quick response times are mandatory for extensive deployment
domains. This research develops a deep learning architecture that proposes automated detection methods for
violence and weapon activities in practical CCTV surveillance through the Smart-City CCTV Violence Detec-
tion (SCVD) dataset. The system uses MobileNetV2 as its basic convolutional framework, which can extract
spatial frame patterns through TimeDistributed layers from video sequence inputs. The features move to a
stacked Long Short-Term Memory (LSTM) network to extract the temporal-based dependencies within vio-
lent actions. The system processes video sequences with 15 frames while maintaining a pixel size of 128 ×
128 to achieve operational efficiency and representational capability. Regularization techniques Batch Nor-
malization and Dropout are used in every part of the network to improve generalization capability and limit
overfitting. The pipeline finishes through dense layers linked in full connection, followed by a sigmoid acti-
vation function to achieve binary outputs. The experiments on the SCVD dataset resulted in highly positive
outcomes. Evaluation of the model produced a 99.58% accuracy rate together with a minimal cross-entropy
loss amounting to 0.0139. This model monitoring system demonstrated exceptional performance metrics be-
cause the standard class achieved 0.99 precision and 0.99 recall alongside 0.99 F1-score, and the violent class
received a perfect score of 100 on every metric. The model proves effective for detecting and classifying vi-
olent activities with excellent reliability under diverse and complex surveillance settings. The research shows
that real-time deployment of deep learning models in intelligent city surveillance can be accomplished using
robust, compact solutions. The system design incorporates spatial along with temporal feature methodolo-
gies thus making it suitable for deployment on edge devices such as smart cameras and embedded systems.
Through its work on uniting academic models with practical deployment, this study helps create safer urban
environments by developing AI-driven public safety technologies.
Keywords: Violence detection; Smart surveillance; MobileNetV2; LSTM; CCTV analytics