The 2nd Workshop on Self-supervised Learning for Next-Generation Industry-level Autonomous Driving

Self-supervised Learning for Next-generation Industry-level Autonomous Driving refers to a variety of studies that attempt to refresh the solutions for challenging real-world perception tasks by learning from unlabeled or semi-supervised large-scale collected data to incrementally self-train powerful recognition models. Thanks to the rise of large-scale annotated data sets and advances in computing hardware, various supervised learning methods have significantly improved the performance in many problems (e.g. 2D detection, instance segmentation and 3D Lidar Detection) in the field of self-driving. However, these supervised learning approaches are notorious "data hungry", especially in the current autonomous driving fields. The performance of self-driving perception systems highly relies on the annotation scale of labeled bounding boxes and IDs, which makes them not practical in many real-world industrial applications. While the intuition is that a human driver can keep accumulating experiences from self-exploring the roads without any tutor’s guidance, current CV solutions are still baby-sitted with extensive annotation efforts on every new scenario. To facilitate an industry-level autonomous driving system in the future, the desired visual recognition model should be equipped with the ability of self-exploring, self-training and self-adapting across diverse new-appearing geographies, streets, cities, weather conditions, object labels, viewpoints or abnormal scenarios. To address this problem, many recent efforts in self-supervised learning, large-scale pretraining, weakly supervised learning and incremental/continual learning have been made to improve the perception systems to deviate from traditional paths of supervised learning for self-driving solutions. Many research works have been devoted to related topics, leading to rapid growth of related publications in top-tier conferences and journals such as CVPR, ICCV, ECCV, T-IP, and T-PAMI, albeit mostly evaluated on relatively small scale and highly curated datasets.

This workshop is the second edition of our SSLAD 2021 workshop, which was firstly organized at ICCV 2021 and was a great success with over 500 online attendees and 400 teams that competed in the various challenges. We organize this 2nd SSLAD workshop to further explore broader topics including multi-task learning, foundation models for multi-tasks and a new tracking challenge based on the BDD100K benchmark, besides investigating advanced ways of building next-generation industry level autonomous driving systems by resorting to self-supervised/semi-supervised learning, covering (but not limited to):

Self-supervised learning techniques
Life-long/incremental visual recognition methods
Weakly supervised learning algorithms
One/few/zero shot learning for perception tasks in self-driving
Learning in the presence of noisy data
Domain adaptation
Weakly supervised learning for 3D Lidar and 2D images
Real world self-driving image applications, e.g. lane detection, anomaly detection, object semantic segmentation/detection/localization, scene parsing, etc.
Vision-based localization and tracking
Safety/explainability/robustness for self-driving cars in the abovementioned settings

As part of the workshop, we also release a Self-training Self-Driving (SSD) challenge as last year. This year we will expand our dataset annotations and organize the 2nd Self-training Self-Driving (SSD) challenge which includes five semi-supervised competition tracks that aim to progressively improve vision recognition models from large-scale unlabeled raw data: a) 2D object detection, b) Multi-modality 3D object detection, c) Corner Case detection, d) Multiple object tracking and segmentation (MOT & MOTS), e) Unified model for multi-task benchmark.

Overview