Context Aware Multimodal Fusion YOLOv5 Framework for Pedestrian Detection under IoT Environment

Y. Shu, Y. Wang, M. Zhang, J. Yang, Y. Wang, J. Wang, Y. Zhang

Context Aware Multimodal Fusion YOLOv5 Framework for Pedestrian Detection under IoT Environment

Číslo: 1/2025
Periodikum: Radioengineering Journal
DOI: 10.13164/re.2025.0118

Klíčová slova: Pedestrian detection, IoT, deep learning, multimodal fusion, YOLOv5

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: Pedestrian detection based on deep networks has become a research hotspot in the field of computer vision. With the rapid development of the Internet of Things (IoT) and autonomous driving technology, the deployment of pedestrian detection models on mobile devices places higher demands on the accuracy and real-time performance of detection. In addition, fully integrating multimodal information can further improve the robustness of the model. To this end, this article proposes a novel multimodal fusion YOLOv5 network for pedestrian detection. Specifically, to improve the performance of multi-scale pedestrian detection, we enhance contextual awareness abilities by embedding the multi-head self-attention (MSA) mechanism and graph convolution operations in the existing YOLOv5 framework. In addition, we can fully explore the real-time advantages of the YOLOv5 framework in pedestrian detection tasks. To improve multimodal information fusion, we introduce the joint cross-attention fusion mechanism to enhance knowledge interaction between different modalities. To validate the effectiveness of the proposed model, we conduct a large number of experiments on two multimodal pedestrian detection datasets. All the results confirm that our proposed model obtains the highest performance in terms of multi-scale pedestrian detection. Moreover, compared to other multimodal deep models, our proposed model still shows superior performance.