Autonomous vehicles rely heavily on input data to make driving decisions. Logically, the more detailed the data, the better (and, importantly, safer) the decisions made by the vehicle. While modern cameras can capture very detailed representations of the world, the output is still in 2D, which isn’t ideal as it limits the information that we can feed to the neural networks operating the vehicle, which in turn means they have to learn to make certain assumptions about the 3D world. At the same time, though, cameras are limited in their capacity to capture information. For example, the rain may make the image almost useless while lidar will still capture information. So, cameras might not work in all environments or circumstances. Since autonomous vehicles are a particularly risky and high-impact use case of neural networks, we need to make sure that the networks we build are as robust as possible, and it all starts with the data. Ideally, we want our network to get 3D data as an input since it needs to make predictions about the 3D world. This is where LiDAR comes in.
This article aims to provide a gentle yet thorough introduction to LiDAR technology and the networks that use it. By the end, you will have a grasp of the following:
- What LiDAR is and how it works
- How neural networks work on LiDAR data and what challenges they face
- How LiDAR data differs from 2D images and how the annotation process changes
What exactly is LiDAR?
LiDAR stands for light detection and ranging. Put simply, it’s a remote sensing technology that uses light in the form of laser pulses to measure distances and dimensions between the sensor and target objects. In the context of autonomous vehicles, LiDAR sensors are used to detect and pinpoint the locations of objects like other cars, pedestrians, and buildings relative to the vehicle. The rising popularity of artificial neural networks has made LiDAR even more useful than before.
LiDAR technology has been used since the 1960s when it was being installed on airplanes to scan the terrain they flew over. LiDAR grew more popular in the 1980s with the advent of GPS, when it started being used to build 3D models of real-world locations.
How does LiDAR work?
Most LiDAR systems consist of four elements:
1) Laser: Sends pulses of light towards objects (usually ultraviolet or near-infrared).
2) Scanner: Regulates the speed at which the laser can scan target objects and the maximum distance reached by the laser.
3) Sensor: Measures the time it takes for the light from the laser to bounce off the target object and return to the system (thereby measuring distance).
4) GPS: Tracks the location of the LiDAR system to ensure accuracy of the distance measurements.
Modern LiDAR systems can often send up to 500k pulses per second. The measurements stemming from these pulses are aggregated into a point cloud, which is essentially a set of coordinates that represent objects that the system has sensed. The point cloud is used to create a 3D model of space around the LiDAR.
There are two general types of LiDAR systems: airborne and terrestrial. Since our use case of interest is autonomous vehicles, we’ll mainly be focusing on terrestrial LiDAR. Terrestrial LiDAR is attached to objects fixed to the ground and scans in all visible directions. They can be static (e.g., attached to an unmoving tripod or a building) or mobile (e.g., attached to a car or a train).
Uses of deep learning with LiDAR data
Given the type of output that LiDAR systems generate, combining them with neural networks seems like a natural fit, and indeed neural networks operating on point clouds have proven effective. The application of LiDAR point clouds for autonomous vehicles can be split into two categories:
- Real-time environment perception and processing with the purpose of object detection and scene understanding.
- Generation of HD maps and urban models for object localization and referencing.
This may sound complicated, but really it just means that LiDAR data is used for semantic segmentation, object detection/localization, and object classification, the only difference being that now we’re doing it in 3D, which allows for more nuance in our models.
An interesting challenge for neural networks operating on LiDAR data is the fact that there’s a ton of variation based on scanning times, weather conditions, sensor types, distances, backgrounds, and a plethora of other factors. Due to the way LiDAR works, the density and intensity of objects vary a lot. Combined with the fact that sensors are often noisy and LiDAR data, in particular, is often incomplete (because of factors like low surface reflectance of certain materials and cluttered backgrounds in cities), neural networks working with LiDAR data need to be able to handle a lot of variation. Another problem with 3D data is that, unlike 2D images, there isn’t an intuitive order to the points from a LiDAR sensor, which introduces the need for permutation and orientation invariance in our model, which not all architectures satisfy.
Four interesting families of architectures proposed to deal with LiDAR data are as follows:
1) Point cloud-based methods: These networks operate directly on the point clouds using different approaches. One such approach is learning the spatial features of each point directly via MLPs and accumulating them via max-pooling.
2) Voxel-based methods: The 3D data is divided into a 3D grid of voxels (essentially a grid of cubes), and 3D convolution and pooling are applied in a CNN-like architecture.
3) Graph-based methods: These methods use the inherent geometry present in point clouds to construct graphs out of them and apply common GNN architectures like graph CNNs and graph attention networks (which also happen to satisfy the earlier mentioned condition of permutation invariance).
4) View-based methods: These methods rely on creating 2D projections of the point clouds using the tried and tested architectures from 2D computer vision. A tactic that can help improve model performance, in this case, is to create multiple projections from different angles and vote for a final prediction.
Annotating LiDAR data
As we now know, the most common deep learning tasks on LiDAR data are variants of semantic segmentation, object detection, and classification. Therefore, LiDAR annotation is quite similar to annotating images for those tasks. Human annotation is very common, but because of the somewhat more complicated and potentially confusing nature of LiDAR data, companies are trying to automate the process as much as possible using pre-trained networks.
Since we’re dealing with 3D data, it may seem that annotating it will become cumbersome because of its 3D nature. This isn’t necessarily the case. For 3D semantic segmentation and 3D object classification, the drill is pretty much the same as their 2D counterparts, with the exception that we’re now dealing with more sparse points in space than pixels in 2D images. As for 3D object detection, the only added complexity annotation-wise is the fact that aside from the object’s location, we also need to annotate its orientation, which is just the direction the object is facing.
As one can see, the main hurdle of LiDAR data annotation doesn’t really stem from its 3D nature. It’s just that LiDAR data doesn’t look as simple and intuitive as regular images, so annotation can take somewhat longer, especially for someone unused to this type of data.
As discussed, LiDAR is a technology that uses laser pulses and sensors to construct a 3D view of the sensor’s surroundings. While it has been used since the 1960s, one of the most common use cases nowadays is combining LiDAR data with neural networks for autonomous vehicles. Common neural architectures have been used to operate on LiDAR data, albeit with some necessary tweaks. While the nature of point clouds generated by LiDAR makes for a drastically different data format from 2D images, the LiDAR annotation process doesn’t change quite as much.
Author: Davit Grigoryan, Independent Machine Learning Engineer