Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Continuous Token Partitioning for Real-Time Multi-modal 3d Object Detection. / Filatov, N.; Potekhin, R.
Advances in Neural Computation, Machine Learning, and Cognitive Research VIII : Selected Papers from the XXVI International Conference on Neuroinformatics. 2025. p. 426-437 (Studies in Computational Intelligence; Vol. 1179 SCI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Continuous Token Partitioning for Real-Time Multi-modal 3d Object Detection
AU - Filatov, N.
AU - Potekhin, R.
N1 - Export Date: 01 November 2025; Cited By: 0; Correspondence Address: N. Filatov; Peter the Great St. Petersburg Polytechnic University, Saint-Petersburg, Polytechnicheskaya, 29, 195251, Russian Federation; email: n.filatov@celsus.ai; Conference name: 26th International Conference on Neuroinformatics, NI 2024; Conference location: Moscow
PY - 2025
Y1 - 2025
N2 - Advancements in 3D object detection are pivotal for the development of autonomous driving technologies, demanding high accuracy, robustness, and real-time processing capabilities. Current state-of-the-art multi-modal 3d object detection frameworks often struggle to balance these demands, particularly under the computational constraints of autonomous vehicles. This study introduces a novel 3D object detection framework that leverages a transformer-based fusion module, employing unique radial and zigzag partitioning techniques to efficiently integrate LiDAR and camera data. Our method, termed CTP-net, is designed to optimize inference speed while maintaining competitive detection accuracy. Tested on the NuScenes validation dataset, CTP-net achieves a NuScenes Detection Score (NDS) of 68.39. Notably, it demonstrates remarkable inference speeds of 8.50 FPS on an NVIDIA RTX 3060 and 20.72 FPS on a Tesla A100, indicating substantial improvements over existing methods making it a viable solution for deployment on edge devices with limited computational resources. © 2025 Elsevier B.V., All rights reserved.
AB - Advancements in 3D object detection are pivotal for the development of autonomous driving technologies, demanding high accuracy, robustness, and real-time processing capabilities. Current state-of-the-art multi-modal 3d object detection frameworks often struggle to balance these demands, particularly under the computational constraints of autonomous vehicles. This study introduces a novel 3D object detection framework that leverages a transformer-based fusion module, employing unique radial and zigzag partitioning techniques to efficiently integrate LiDAR and camera data. Our method, termed CTP-net, is designed to optimize inference speed while maintaining competitive detection accuracy. Tested on the NuScenes validation dataset, CTP-net achieves a NuScenes Detection Score (NDS) of 68.39. Notably, it demonstrates remarkable inference speeds of 8.50 FPS on an NVIDIA RTX 3060 and 20.72 FPS on a Tesla A100, indicating substantial improvements over existing methods making it a viable solution for deployment on edge devices with limited computational resources. © 2025 Elsevier B.V., All rights reserved.
KW - Autonomous driving
KW - Multi-modal 3d object detection
KW - Sensor Fusion
KW - Transformers
UR - https://www.mendeley.com/catalogue/1f9a3c88-ac74-3b25-835b-b71565a7ae5a/
U2 - 10.1007/978-3-031-80463-2_40
DO - 10.1007/978-3-031-80463-2_40
M3 - статья в сборнике материалов конференции
SN - 9783031804625
T3 - Studies in Computational Intelligence
SP - 426
EP - 437
BT - Advances in Neural Computation, Machine Learning, and Cognitive Research VIII
Y2 - 21 October 2024 through 25 October 2024
ER -
ID: 143217188