## Abstract

The perception module is at the heart of modern Advanced Driver Assistance Systems (ADAS). To improve the quality and robustness of this module, especially in the presence of environmental noises such as varying lighting and weather conditions, fusion of sensors (mainly camera and LiDAR) has been the center of attention in the recent studies. In this paper, we focus on a relatively unexplored area which addresses the early fusion of camera and radar sensors. We feed a minimally processed radar signal to our deep learning architecture along with its corresponding camera frame to enhance the accuracy and robustness of our perception module. Our evaluation, performed on real world data, suggests that the complementary nature of radar and camera signals can be leveraged to reduce the lateral error by 15% when applied to object detection.

# Paper and Poster

NeurIPS 2019 ML4AD Workshop Poster Paper

# Results

Click for full resolution image
White boxes are network outputs, and black boxes are ground-truths.

mAP 73.5% 71.7% 64.65% 73.45%
Position x 0.145m 0.152m 0.156m 0.170m
Position y 0.331m 0.344m 0.386m 0.390m
Size x 0.268m 0.261m 0.254m 0.280m
Size y 0.597m 0.593m 0.627m 0.639m
Matches 8695 8597 7805 8549

We observed that our FusionNet outperforms individual sensors when trained with our proposed training scheme. Improvements in mAP is marginal, but positioning and size estimated are much more significant.

# Ablation and sensitivity studies

mAP 73.5% 55.0% 19.4% 61.2% 71.9%
Position x 0.1458m 0.1883m 0.1816m 0.1667m 0.1524m
Position y 0.3315m 0.4297m 0.3602m 0.3847m 0.3360m
Size x 0.2688m 0.3042m 0.3230m 0.4126m 0.2686m
Size y 0.5975m 0.7829m 0.5653m 0.7022m 0.5853m
Matches 8695 8259 3051 8004 8554

Without retraining, we evaluated the network with the camera image or the radar range-azimuth image set to zero (Camera 0 and Radar 0 respectively), and added Gaussian noise of mean 0 and $$\sigma=0.1$$ to the normalized input camera image and radar frame. We found the that performance of the network drops significantly when either sensor was corrupted or missing.

# FAQs during NeurIPS ML4AD Workshop

1. There are open datasets with radar, why aren’t they evaluated?
• Our radar data is different from other methods published. Consequently, it is impossible for us to apply our method on published datasets with radar data and vice versa.
2. How does the radar data in this work compare with those available in other datasets?
• In commercially packaged radar systems, data similar to our’s is processed with some form of constant false alarm rate (CFAR) detection algorithm to produce a small number of points. In contrast, we work on data before this detection algorithm, retaining far more information than the points alone provide.
3. Can such radar data (dense/minimally processed) be obtained from commercially available radar systems in cars today?
• Not that we know. Commercially available radars systems include post-processing steps to return only sparse points, together with velocity. We do not know of any commercially available vehicular radar packages that make the raw ADC readings available.
4. Can we apply the post-processing techniques on your data to obtain the sparse points that off-the-shelf radar packages provide?
• In theory, yes. However, the design of the radar front-end (the antenna pattern), the waveform parameters, and tuning parameters in the post-processing are not available in the product specifications. Thus while we can apply the same techniques in principle, we may not be able to reproduce the (sparse points) detection results thatcommercial systems produce without significant efforts in reverse engineering.
5. Will this dataset be made available?
• We hope to be able to share this dataset, but unfortunately, we are not in control of the dataset. Thus we are currently unable to promise or commit to a timeline where we can release the dataset.
6. The data bandwidth requirement in this radar seems far higher than the radars in other system implementation, is this impractical?
• The radar inputs to our proposed neural network is a $$256 \times 64$$ float32 image. This is comparable to camera data. We do not think that this is an overly large requirement.

### Cite

@article{radarfusion2019,
title={Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems},
author={Lim, Teck-Yian and Ansari, Amin and Major, Bence and Fontijne, Daniel and Hamilton, Michael and Gowaikar, Radhika and Subramanian, Sundar},
journal={ {NeurIPS} Machine Learning for Autonomous Driving Workshop},
year={2019}
}