Keywords: stereo vision; accuracy; individual stereo frames; obstacle detection; cuboid; model; refining; fragmentation; driver assistance systems.

1. Domain and motivation

This thesis falls into the field of autonomous robots in general and in autonomous vehicles in particular. Specifically, it analyzes and proposes new methods for detecting obstacles using stereo vision.

Advanced driver assistance systems (ADAS = Advanced Driver Assistance Systems), available on vehicles today, implement various functions of an autonomous driving system. Developing ADAS systems is also an iterative process toward achieving full autonomous driving.

Developing a sensorial system for autonomous driving vehicles using artificial stereo vision, is the motivation underlying the entire thesis. There are other types of sensors (radar, laser), but the topic is best suited to artificial vision, because the entire road infrastructure is designed for visual perception.

The importance of the domain comes from the fact that the main requirement of autonomous driving systems is understanding the structure of the scene: the free space (driveable) and the obstacles. The obstacle trajectories and the possible collisions can then be determined.

In March 2014, Mercedes-Benz has launched a pedestrian detection system using stereo vision. It can automatically brake in case of possible collision with pedestrians (the most vulnerable participants in city traffic). The same system is used to pre-determine the quality of the road ahead in order to automatically adjust the parameters of the adaptive suspension system, for providing a high degree of comfort. Robert Bosch, the developer and manufacturer of automotive components, has announced for 2015 the release of an emergency braking system based on stereo vision. Therefore, the thesis is part of a state-of-the-art research direction.

Although there are many approaches for obstacle detection based on stereo vision, there is no one to deeply investigate the detection from individual frames. This thesis comes to fill this gap by proposing an elaborated approach that exploits as many clues available in the individual stereo frames.

2. The objectives of the thesis

To achieve autonomous vehicles, there is the need of percepting the significant structural elements of the scene. These elements can be divided into:

- the ground surface, often related to the notion of free space; its detection provides the ground-obstacle separation as well, this way reducing the space where the obstacle detection algorithm has to look for;

- the obstacles can be divided into:

o foreground obstacles: can be subject to colide with the ego car;

o background obstacles: after a while, they can become foreground obstacles (especially those being in motion); their detection can be important for map building systems as well.

Many obstacles present in typical traffic scenes have particularities and can be detected more easily: poles, buildings, pedestrians, cyclists, cars etc. However, there are many other obstacle classes: vehicles of various shapes and appearances, different types of fences, atypical buildings, mounds, bushes, waste collection containers, various forms of tree trunks, poles that have attached other items (signs, traffic lights) etc.

The main objective of this thesis is to investigate and implement new and efficient methods for the detection of generic obstacles. It is considered that there is a separate module which performs the ground-obstacles separation. In addition, the detection will be made in independent stereo frames; it is important to deeply understand the possibilities of detection at frame level, before exploiting motion information available in successive frames. To represent obstacles, the cuboidal model will be used, including a possible orientation. Attention will be paid in order to confidently model the obstacles as cuboids.

Some secondary objectives aim the understanding of advanced driver assistance systems, analytical models proposed for stereo reconstruction, obstacle modeling and, not least, the study of existing obstacle detection approaches.

3. The structure of the thesis

The content of the thesis consists of several chapters grouped into three parts: the study of the context, the detailed description of the contributions and the conclusions. A section of appendices is attached at the end in order to provide further information.

The first part begins with the presentation of advanced driver assistance systems (Chapter 2): their functions, ways of perception and processing the elements of the scene, explaining the need of stereo vision etc. Some concrete systems are detailed. It proposes an original structure of such a system based on stereo vision.

To achieve high performance in obstacle detection approaches, there comes the need of deeper understanding of how 3D stereo reconstruction works, being conducted numerous experiments. Thus, Chapter 3 presents the stereo reconstruction steps: cameras parameters, calibration, rectification, feature selection, stereo matching and 3D reconstruction. Particular attention was paid to an original analysis on quality, quantity and distribution of 3D points.

To test different approaches for obstacle detection, there were implemented various models to represent obstacles, both 2D and 3D, both schematic and detailed. All these and others found in the literature are presented and analyzed in Chapter 4: the unoriented rectangle model, the oriented rectangle model, the octagonal model, the polygonal model, the polygonal model with constant polar resolution and variable height resolution (aka stixels), the polygonal model with variable polar resolution and constant height, the Cartesian cuboidal model, the polar cuboidal model, the cuboidal model with multiple orientations, the multi-element model, the curved model, the polyhedral model.

In Chapter 5 are classified and presented various existing approaches, being grouped according to the used data source: mono intensity image, mono color images, image sequences and stereo images. It also makes a critical analysis of their choice by presenting the relevant features selection, advantages, disadvantages, the used models for obstacle representation, as well as possible improvements. Special attention was given to approaches based on stereo vision, which are grouped by the main processing space; there is also a critical analysis of these processing spaces and the research teams.

The second part, consisting of Chapter 6, presents an original approach for obstacle detection in individual stereo frames. The approach consists of a set of processing steps and substeps. Of these, some are applied to all obstacles, and others only in specific cases, as a result of decisions made based on automatic analysis.

Detection steps can be grouped in several directions. The first one aims the localization of occupied areas. Assuming the existence of a separate module that performs ground-obstacle separation (not part of this thesis), only the above-ground 3D points are selected, as belonging to structures that may potentially collide with the ego vehicle. These Cartesian points are then transformed into a compressed horizontal space (top-view grid), taking into account the real possibilities of perception of the scene through stereo vision. In the lateral axis, the perception is a polar one, so that the columns of the compressed space correspond to the optical polar directions; so that judgments related to reconstruction errors can be easily done, and also reasoning about occlusions. The rows of the compressed space capture the depth discernment through stereo vision. In this space, the cells are grouped using proximity and density criterion. Specific preprocessing was needed to compensate for various factors that influence the proximity and the density. Then, the border of each obstacle is improved by processing done at both the individual columns level and at the whole obstacle level.

The second direction concerns about the processing done on the vertical direction. For accurate detection of obstacles higher than the ego vehicle (e.g. trucks) the obstacles are extended straight up, as high as there are 3D points, but not above 4.5 m. If above an obstacle placed on the ground, there are other 3D points due to various reasons, a vertical distribution analysis helps determine the correct height of the obstacle.

An important direction is the one of the processing that exploits the obstacle shape, in order to get to confident modelling by cuboids. Often, several obstacles are grouped together, but the border of the group may provide additional clues. Thus, if the visible border (towards camera) is concave, the deepest point of the concavity is used to determine the fragmentation column of the group. Then, the quasi-linear longest visible part of the convex hull may indicate the orientation of each obstacle. At this point each obstacle is modeled by a cuboid, oriented as possible. Next, the fidelity of the cuboid is assessed by the area of the free surface between the sides of the cuboid and the visible convex hull. If this area is significant, a fragmentation algorithm is applied. Of course, it is taken into account when the free surface is caused by occlusion due to foreground obstacles or if a part of the cuboid is not visible in both cameras of the stereo system.

In the case when a small obstacle is situated inside a larger one, the small obstacle is absorbed by the larger one. Based on an analysis of the occlusions that occur among obstacles, the obstacles are classified as foreground or background obstacles. The small obstacles are rejected, based on an analysis of their 3D points number, of their Cartesian size and of their polar size.

In the third part, consisting of Chapter 7, the conclusions and the contributions of the thesis are presented. Some proposals, analysis and original visions are also presented in the second part (“the study of the context”). It can be observed that the original proposed method, by its many rised and solved problems, is the most elaborated state-of-the-art approach, in the field of obstacle detection based on single frame stereo vision.

The appendices complete the picture of the thesis:

Appendix A: refferences;
Appendix B: the list of 20 articles published in conference proceedings, three articles published in national journals and two book chapters;
Appendix C: presents a list of the 5 most cited articles, summing up 355 out of a total of 430 citations;
Appendix D: a list with 28 technical reports that were written during 10 research contracts;
Appendix E: the list of figures;
Appendix F: three selected articles;
Appendix G: Curriculum Vitae.

4. Conclusions

The topic of the thesis is in artificial vision for road vehicles. More exactly, the main objective of the thesis is obstacle detection from individual stereo frames. It is supposed that the ground-obstacle separation is done by other module.

While developing the thesis, many other topics of the field were investigated, often in an iterative fashion. Thus, chapter 2 analyzes the topic of the Advanced Driver Assistance Systems (ADAS), at the structural level, at the sensorial level and by detailing some concrete systems. It was concluded that the stereo vision sensors are the most suitable for obstacle detection in urban traffic scenes. In such scenes, the typical depth is a relatively small one (under 30 meters) and such complex scenes are often hardly solvable by other sensors. The stereo vision provides a much larger amount of data as compared to other sensors like radar or laser. The artificial vision also has the advantage that the whole road infrastructure is designed and implemented for being perceived in a visual manner. For developing such stereo vision based systems, it has been proposed a generic diagram, organized on several levels.

The third chapter is dedicated to understanding the stereo vision. All the steps of the stereo vision were experimented and analyzed: understanding of the geometrical parameters of the cameras, camera calibration (done with [OpenCV]), image rectification (with OpenCV), features selection, stereo matching (with OpenCV) and 3D reconstruction. It was demonstrated that, despite of its computational burden, the stereo matching step can be done in real-time, on CPU (VGA resolution, 40ms on a single core 2007 processor). Regarding the stereo reconstruction, an original analysis was done, on several aspects: the native accuracy and its model, errors and error magnitude classes, the quantity and the distribution of the 3D points. In the same time, the resolution of the perception of the real scene structure by means of stereo vision was constantly addressed. It was shown that, on the horizontal and the vertical directions, there is a polar perception and that in the Cartesian space the resolution lowers linearly with the depth. On the depth direction, the perception is done by means of disparities, which translates into a Cartesian depth resolution that lowers in a quasi-quadratic manner. Thus, one of the main contributions of the thesis is the constant emphasizing of the perception possibilities of the scene by means of stereo vision.

A reliable obstacle detection approach is able to handle different obstacles, regardless their 3D shape. In order to describe their location and size, it is necessary to represent them by means of models that can describe generic obstacles. During the long phase of the thesis development, different obstacle models were needed and implemented. Chapter 4 details the implemented models. Other models found in the literature are also discussed. In the same chapter, there are also presented the coordinate systems used for modeling the problem’s domain (the real scene) and the solution’s domain (the stereo vision). For the real scene, the Cartesian 3D system is suited, while for the perception of the scene, the native 3D coordinates of the stereo vision sensor (U, V and disparity) are suited.

Chapter 5 presents a survey of the most important approaches existing in the literature. There are approaches on grayscale mono vision, on color mono vision, on video images and on stereo vision. The stereo vision based approaches are of great interest for this thesis. In the case of stereo vision, the survey classifies the approaches in an original way: based on the main processing space. It was observed that, in order to compact the data and to simplify the processing, there are often being used particular projections of the Cartesian space or of the native space (mentioned above). Original comparisons of the processing spaces and of the most important research teams were done. It was explained that the best main processing space is U-disparity, also used in this thesis. The Cartesian space should be used for secondary processing or for supplementary criteria during the main processing.

In chapter 6, it is presented an original approach for obstacle detection in independent stereo frames. There are about 20 processing steps and sub-steps. The 3D points are represented in a horizontal compressed space (top-view grid), by taking into account the real possibilities of perceiving the scene by means of stereo vision. In particular, this can be the U-disparity space. In this space, the cells are grouped into obstacles, based on density and vicinity criteria. Different factors, that affect the density and the vicinity, were identified and compensated. Then, the frontier of each obstacle is improved, by both column level and obstacle level processing. In order to confidently detect high traffic participants, like trucks, the obstacles are extended upwards, as long as there are 3D points, but not higher than 4.5 m. Two analyses are then involved in order to establish the vertical limits more accurately.

Often, more close obstacles are being grouped as one detected obstacle, but the shape of the frontier can reveal supplementary clues. Thus, if the visible frontier (towards the camera) has concavities, they are used for determining the fragmentation into constituent obstacles. Then, the longest quasi-linear part of the visible frontier may indicate the obstacle orientation. After this step, each obstacle is modeled as a cuboid, possibly oriented. In the next step, the fidelity of the cuboid is being assessed by the free space that is encompassed in between the cuboid and the frontier. When this space is significant, a fragmentation is applied. It is taken into account that sometimes such free space is due to occlusions in between obstacles or due to the lateral limits of the field of view of the stereo cameras.

Most of these steps and sub-steps are of original conception and/or solution. Many times has happened that, while deepening one of the steps, new research directions opened up. For instance, the improving of the obstacle limits has led to the analysis of the frontier and the convex hull of the obstacle. Later, this led to algorithms for obstacle group fragmentation and for obstacle orientation. In the end, it was reached a high degree of confident modeling of real obstacles as cuboids.

The proposed objectives were reached and were often extended. In the context understanding part, there were studied: the autonomous vehicles domain (emphasizing the usage of stereo vision based sensors), the stereo vision and its possibilities, different obstacle models, the main existing obstacle detection approaches. The original approach for obstacle detection in independent stereo frames, by its many challenges and solutions, is the most elaborated state-of-the-art approach.

The thesis contributions were developed and used in the frame of 10 research contracts funded by Volkswagen AG from 2001 to 2009 (appendix D). The proposed solution continued to be used as the main method for obstacle detection (especially the static obstacles – having no motion information) in the Image Processing and Pattern Recognition Group at Technical University of Cluj-Napoca.

Along the thesis itself, there were also being published 23 papers, two book chapters and 28 technical reports for the client. Four delegations to Volkswagen AG summed up six months. The results are appreciated by 430 citations (300 in the last 5 years), most of them being independent.

5. The thesis contributions

It was explained that the obstacle detection is suited to be done by stereo vision, in complex traffic scenes.
It was proposed a generic multi-level structure for ADAS, based on stereo vision.
An original analysis on stereo vision itself was done, by concerning aspects like: the native accuracy and its modeling, the errors and the error magnitude classes, the quantity and the distribution of the 3D points.
The resolution through which the real world is perceived by the stereo vision was constantly addressed.
The obstacle models were studied, most of them being implemented and experimented as part of the thesis.
In the survey of the existing approaches, there is an original classification of the approaches based on the processing space that is mainly used. A comparison of the processing spaces is done. It is explained that the U-disparity space, also used in this thesis, is one of the best.
An original approach was developed for obstacle detection in independent stereo frames. The approach consists of a series of steps and sub-steps, most of them not being addressed before. They can be split in several directions:

Detection of occupied areas:

A top-view grid was build, by taking into consideration many aspects related to the possibilities offered by the stereo vision.
A specialized labeling algorithm was developed, by using the nature of the density and the vicinity of the obstacles’ cells.
The frontier of the obstacles was refined, both at the column level and the obstacle level.

Processing along the vertical direction:

A solution to correctly detect both lower and taller obstacles was found.
Fragmentations and refinements

Exploiting the obstacle shape:

In the case when an obstacle doesn’t have a cuboidal shape, based on the shape of the obstacle frontier, two fragmentation algorithms were developed, one based on the frontier concavity criterion and one based on the frontier convexity criterion.
The obstacle orientation was also determined based on the shape of the obstacle frontier.
Special attention was paid for the cases when an obstacle is partially occluded by other obstacles or when it partially gets out of the field of view of the stereo cameras.

It has been reached a high confidence of modeling the real obstacles by cuboids.
An important problem of the obstacle detection is that neither the obstacle shape nor the reconstruction errors are known. This thesis uses different clues and techniques in order to diminish this problem.
Design and implementation of methods for detection of obstacles from 3D point had a pioneering character. Previously, disparities were mostly used.
The original approach for obstacle detection in independent stereo frames, by its many challenges and solutions, is the most elaborated state-of-the-art approach.

6. Results

A typical urban scene example and the obstacle detection results are presented in the next figure: