{"id":880923,"date":"2022-09-26T16:36:43","date_gmt":"2022-09-26T23:36:43","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=880923"},"modified":"2022-09-27T11:22:52","modified_gmt":"2022-09-27T18:22:52","slug":"data-driven-sensor-simulation-for-realistic-lidars","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/data-driven-sensor-simulation-for-realistic-lidars\/","title":{"rendered":"Data-driven Sensor Simulation for Realistic LiDARs"},"content":{"rendered":"\n

Simulation is playing an increasingly major role in the development of safe and robust autonomous systems, especially given the advent of deep learning techniques. Given the challenges and effort involved with collecting data in real life, simulation provides an efficient alternative for gathering labeled training data for sensor observations, vehicle dynamics and environmental interactions. Furthermore, simulation allows extended evaluation through corner cases such as failures that would be inapplicable to a real life setup.<\/p>\n\n\n\n

Over the last decade, simulations have gotten increasingly better at visual and physical fidelity. Game engines such as Unreal Engine and Unity provide several advanced graphical capabilities out of the box such as real time ray tracing, high resolution texture streaming, dynamic global illumination etc. Such game engines have also formed the base for several robotics and autonomous systems simulators such as AirSim (opens in new tab)<\/span><\/a> and CARLA (opens in new tab)<\/span><\/a>, which allow users to deploy robotic platforms such as drones and cars equipped with cameras and other sensors in large 3D worlds.\u00a0<\/p>\n\n\n\n

While present simulations can generate high quality camera imagery, when it comes to non-visual classes of sensors, they often fall back upon simplified models. Complex sensors such as LiDAR, which lie at the heart of a majority of present day autonomous systems such as self-driving cars, are challenging to model given their dependence on aspects such as material properties of all the objects in an environment. Designing accurate LiDAR sensors in simulation often requires significant effort in handcrafting several environmental factors, and careful encoding of sensor characteristics for every new model. To alleviate this, we examine a new perspective on sensor modeling: one that involves learning sensor models from data. In our recent work \u201cLearning to Simulate Realistic LiDARs\u201d, we investigate how simulated LiDAR sensor models can be made more realistic using machine learning techniques. <\/p>\n\n\n\n

\"Visualization
Figure 1: On top, we see an RGB image and a corresponding LiDAR scan from real data, where the LiDAR scan exhibits characteristics like raydrop and changes in intensity. On the bottom, we see a synthetic image from a simulator, and a basic LiDAR scan which does not contain raydrop or intensity. Our method results in the point cloud shown on the bottom right, which is similar to a real LiDAR.<\/figcaption><\/figure>\n\n\n\n

LiDAR sensors are active sensors which emit laser beams in several directions around the sensor, and as these rays bounce off surrounding objects and return, the sensor tracks the time taken for the return to estimate the distance traveled. Along with the distances, LiDAR sensors also track the intensity of the returned ray, which depends on the reflectance of the object the ray was incident upon: for instance, metallic objects result in higher intensity returns. <\/p>\n\n\n\n

Creating accurate sensor models for LiDARs is thus challenging due to the dependence of the output on complex properties such as material reflectance, ray incidence angle, and distance. For example, when laser rays encounter glass objects, the rays are refracted and they rarely return, a phenomenon known as raydrop. Basic LiDAR models that exist as part of robotics simulators often yield simple point clouds obtained by casting rays naively at every object, and do not account for such properties. Similarly, it takes significant effort to encode material properties of each object in a simulator, which makes it challenging to also estimate intensities of the LiDAR returns – most LiDAR models in simulations do not return valid intensities. <\/p>\n\n\n\n

In this work, we introduce a pipeline for data-driven sensor simulation and apply it to LiDAR. The key idea we propose is that if we had access to data that contained both RGB imagery and LiDAR scans, we train a neural network to learn the relationship between appearance in RGB images, and scan properties such as raydrop and intensity in LiDAR scans. A model trained that way is able to estimate how a LiDAR scan would look, just from images alone, removing the need for complex physics-based modeling. <\/p>\n\n\n\n

\"\"
Figure 2: Training pipeline for RINet involves taking an RGB image and predicting a binary mask for raydrop, and per-pixel intensities for the LiDAR points matching the RGB location.<\/figcaption><\/figure>\n\n\n\n

We focus on these two key aspects of realistic LiDAR data, namely raydrop and intensity. Given that current simulations already possess the ability to output distances, we assume that there already exists a sensor that returns a point cloud, which we then modify using our model in order to be more realistic. We name our model RINet (Raydrop and Intensity Network). At the input, RINet takes an RGB image, and attempts to predict the realistic LiDAR characteristics corresponding to that scene through a data structure we refer to as an intensity mask. This intensity mask is a densified representation of the LiDAR scan, and for each pixel in the RGB image, the intensity mask reports the closest intensity value from the LiDAR scan corresponding to the real world location being observed by that pixel. If a corresponding ray does not exist due to raydrop, the mask contains a zero. Once trained, our model works in tandem with an existing simulation such as CARLA. The RGB images from the simulator are passed through the trained RINet model, which results in an intensity mask prediction; and this intensity mask is then \u201capplied\u201d to the original LiDAR scan, resulting in an enhanced scan. <\/p>\n\n\n\n

\"\"
Figure 3: During inference, RINet takes a synthetic RGB image and predicts corresponding LiDAR properties. These properties are then applied to the basic point cloud output by the simulator, resulting in an enhanced version.<\/figcaption><\/figure>\n\n\n\n

We train the RINet model on two real datasets:  the Waymo Perception dataset (opens in new tab)<\/span><\/a> and SemanticKITTI (opens in new tab)<\/span><\/a> dataset, each resulting in a distinct LiDAR model:The Waymo dataset contains data from a proprietary LiDAR sensor, whereas SemanticKITTI uses the Velodyne VLP-32 sensor. RINet leverages the well known pix2pix (opens in new tab)<\/span><\/a> architecture to go from an RGB frame to the intensity mask. We find that the RINet model is effective at learning material-specific raydrop (e.g.: dropping rays on materials like glass), as well as intensities (e.g.: learning that car license plates are metallic objects that result in high intensity returns). <\/p>\n\n\n\n

\n
\n
\n
\n
\"\"
RGB images and corresponding intensity masks from real data in the Waymo dataset. We can see noise in the LiDAR data, dropped rays at materials like glass, as well as varying intensity based on the objects.<\/figcaption><\/figure>\n<\/div>\n\n\n\n
\n
\"\"
Predictions from RINet for the same images – which demonstrate that the model is able to learn how to drop rays based on material observed from the image, and how to record intensities.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n

In order to validate our idea of enhancing existing simulators with our technique, we apply our model on top of LiDAR point clouds coming from the CARLA simulator. We can observe from the videos below that the performance is qualitatively better: as expected, rays are dropped on car windshields, while metallic objects such as vehicular surfaces, road signs etc. are more prominent in the intensity map. <\/p>\n\n\n\n

\n