Temporal fusion strategy for violence detection: utilising convolutional and LSTM neural networks for surveillance videos

Khaled Merit, Mohammed Beladgham, Abdelmalik Taleb-Ahmed

Temporal fusion strategy for violence detection: utilising convolutional and LSTM neural networks for surveillance videos

Číslo: 3/2025
Periodikum: Acta Polytechnica
DOI: 10.14311/ap.2025.65.0306

Klíčová slova: deep learning, efficient violence detection, temporal fusion, LSTM, automated video surveillance, intelligent cities, video recognition

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: In the latest intelligent cities, there is a pursuit for the utmost degree of automation and integration of services. One of the major challenges in the surveillance industry is the need to automate real-time video analysis to identify critical cases. This paper introduces sophisticated models using Convolutional Neural Networks (CNN), specifically MobileNet V3, VGG16, and InceptionV3 networks, as well as networks using LSTM and feedforward networks. These models are designed to accurately categorise videos into two completely separate classes, namely: (“Non-Violence” and “Violence”). The RLVS database is used for this classification task. Various data representations are used by Temporal Fusion approaches. The highest attained outcome was an Accuracy of 91.03 %, and an F1-score of 90.90 %, which is superior to the results obtained in similar research performed on the same database for achieving the goal of recognising actions that are violent in Surveillance Videos.