The perception module is at the heart of modern Advanced Driver Assistance Systems (ADAS). To improve the quality and robustness of this module, especially in the presence of environmental noises such as varying lighting and weather conditions, fusion of sensors (mainly camera and LiDAR) has been the center of attention in the recent studies. In this paper, we focus on a relatively unexplored area which addresses the early fusion of camera and radar sensors. We feed a minimally processed radar signal to our deep learning architecture along with its corresponding camera frame to enhance the accuracy and robustness of our perception module. Our evaluation, performed on real world data, suggests that the complementary nature of radar and camera signals can be leveraged to reduce the lateral error by 15% when applied to object detection.
NeurIPS 2019 ML4AD Workshop Poster Paper
White boxes are network outputs, and black boxes are ground-truths.
Fusion(SGD) | Fusion(ADAM) | Camera Only | Radar Only | |
---|---|---|---|---|
mAP | 73.5% | 71.7% | 64.65% | 73.45% |
Position x | 0.145m | 0.152m | 0.156m | 0.170m |
Position y | 0.331m | 0.344m | 0.386m | 0.390m |
Size x | 0.268m | 0.261m | 0.254m | 0.280m |
Size y | 0.597m | 0.593m | 0.627m | 0.639m |
Matches | 8695 | 8597 | 7805 | 8549 |
We observed that our FusionNet outperforms individual sensors when trained with our proposed training scheme. Improvements in mAP is marginal, but positioning and size estimated are much more significant.
Fusion | Camera 0 | Radar 0 | Camera + Noise | Radar + Noise | |
---|---|---|---|---|---|
mAP | 73.5% | 55.0% | 19.4% | 61.2% | 71.9% |
Position x | 0.1458m | 0.1883m | 0.1816m | 0.1667m | 0.1524m |
Position y | 0.3315m | 0.4297m | 0.3602m | 0.3847m | 0.3360m |
Size x | 0.2688m | 0.3042m | 0.3230m | 0.4126m | 0.2686m |
Size y | 0.5975m | 0.7829m | 0.5653m | 0.7022m | 0.5853m |
Matches | 8695 | 8259 | 3051 | 8004 | 8554 |
Without retraining, we evaluated the network with the camera image or the radar range-azimuth image set to zero (Camera 0 and Radar 0 respectively), and added Gaussian noise of mean 0 and \(\sigma=0.1\) to the normalized input camera image and radar frame. We found the that performance of the network drops significantly when either sensor was corrupted or missing.
float32
image. This is comparable to camera data. We do not think that this is an overly large requirement.@article{radarfusion2019,
title={Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems},
author={Lim, Teck-Yian and Ansari, Amin and Major, Bence and Fontijne, Daniel and Hamilton, Michael and Gowaikar, Radhika and Subramanian, Sundar},
journal={ {NeurIPS} Machine Learning for Autonomous Driving Workshop},
year={2019}
}