Collision Avoidance Challenge

Oct. 16, 2019, 5:03 p.m. UTC

Dec. 16, 2019, 1:35 p.m. UTC


The competition is over.

Download the data

Training data: Download

Testing data: Download


To learn more about how to format your submission, please visit the submission rules section.

Differences Between Training and Testing Data

Each dataset is made of several unique events (close encounters betwen two objects) which are indexed by a unique number in the event_id column.

The training set has 162634 rows and 13154 unique events (giving on average about 12 rows/CDMs per close encounter).

The testing set has 24484 rows and 2167 unique events (giving on average about 11 rows/CDMs per close encounter).

Important: Note that the testing set and the training set have not been randomly sampled from the database. In other words, while they come from the same database, with the same collection process and the same features, they have been hand picked in order to over-represent high risk events and to create an interesting predictive model. This is a characterstic of this competition where high risk events are scarce, but represent the true final target of a useful predictive model.

In particular, the testing data differs in two major ways compared to the training set:

Columns Description

The dataset is represented as a table, where each row correspond to a single CDM, and each CDM contains 103 recorded characteristics/features. There are thus 103 columns, which we describe below. The dataset is made of several unique collision/close approach events, which are identified in the event_id column. In turn, each collision event is made of several CDMs recorded over time. Therefore, a single collision event can be thought of as a times series of CDMs. From these CDMs, for every collision event, we are interested in predicting the final risk which is computed in the last CDM of the time series (i.e. the risk value in the last row of each collision event).

For the column description, we first describe columns which have unique names and then the columns whose name difference only depends on whether they are referring to the target object (if the column name starts with a t) or the chaser object (if the column name starts with a c). Here, target refers to the ESA satellites while chaser refers to the space debris/object we want to avoid. describe the column names shared for both the chaser and the target, we replace t and c with the placeholder x. For instance, c_sigma_r and t_sigma_r both correspond to the description of x_sigma_r.

Note that all the columns are numerical except for c_object_type.

Uniquely Named Columns

Shared Column Names Between the Chaser and the Target Object