Training data: Download

Testing data: Download

`train_data.csv`

for training and validating the model`test_data.csv`

for which you will need to make predictions on

To learn more about how to format your submission, please visit the submission rules section.

Each dataset is made of several unique events (close encounters betwen two objects) which are indexed by a unique number in the `event_id`

column.

The `training`

set has 162634 rows and **13154 unique events** (giving on average about 12 rows/CDMs per close encounter).

The `testing`

set has 24484 rows and **2167 unique events** (giving on average about 11 rows/CDMs per close encounter).

**Important: Note that the testing set and the training set have not been randomly sampled from the database. In other words, while they come from the same database, with the same collection process and the same features, they have been hand picked in order to over-represent high risk events and to create an interesting predictive model. This is a characterstic of this competition where high risk events are scarce, but represent the true final target of a useful predictive model.**

In particular, the `testing`

data differs in two major ways compared to the `training`

set:

It only contains events for which the latest CDM is within 1 day (

`time_to_tca`

< 1) of the time to closest approach (TCA). This is because, in some cases, the latest available CDM is days away from the (known) time to closest approach. It would be wrong to assume that the computed risk 7 days before the actual time to closest approach can be a good approximation to the risk at TCA. Furthermore, predicting the risk many days prior the time to closest approach is not of great interest to us. On the other hand, the`training`

set is unfiltered and you will find many cases where the latest available CDMs is days away from the TCA. We have chosen to keep these collision events in the training set because they may still be useful when it comes to predicting events from the test set.There are no CDMs to learn from which are within 2 days of the TCA. In other words, the data available closest to the TCA will be at least 2 days away. This is because, as mentioned in the challenge section, a potential avoidance manoeuvre is planned at least 2 days prior to closest approach. Similarly to the above, the

`training`

set will contain all cases, including events where no data is available at least 2 days prior to closest approach (i.e. events with all their CDMs being within 2 days of TCA are still present in the dataset).

The dataset is represented as a table, where each row correspond to a single CDM, and each CDM contains 103 recorded characteristics/features. There are thus 103 columns, which we describe below. The dataset is made of several unique collision/close approach events, which are identified in the `event_id`

column. In turn, each collision event is made of several CDMs recorded over time. Therefore, a single collision event can be thought of as a times series of CDMs. From these CDMs, for every collision event, we are interested in predicting the final risk which is computed in the last CDM of the time series (i.e. the risk value in the last row of each collision event).

For the column description, we first describe columns which have unique names and then the columns whose name difference only depends on whether they are referring to the target object (if the column name starts with a **t**) or the chaser object (if the column name starts with a **c**). Here, target refers to the ESA satellites while chaser refers to the space debris/object we want to avoid. describe the column names shared for both the chaser and the target, we replace **t** and **c** with the placeholder **x**. For instance, `c_sigma_r`

and `t_sigma_r`

both correspond to the description of `x_sigma_r`

.

Note that all the columns are **numerical** except for `c_object_type`

.

`risk`

:self-computed value at the epoch of each CDM [base 10 log].**In the**`test`

set, this value is to be predicted, at the time of closest approach for each`event_id`

. Note that, as mentioned above, in the`test`

set, we do not know the actual data contained in CDMs that are within 2 days to closest approach, since they happen in the "future".`event_id`

: unique id per collision event`time_to_tca`

: Time interval between CDM creation and time-of-closest approach [days]`mission_id`

: identifier of mission that will be affected`max_risk_estimate`

: maximum collision probability obtained by scaling combined covariance`max_risk_scaling`

: scaling factor used to compute maximum collision probability`miss_distance`

: relative position between chaser & target at tca [m]`relative_speed`

: relative speed between chaser & target at tca [m/s]`relative_position_n`

: relative position between chaser & target: normal (cross-track) [m]`relative_position_r`

: relative position between chaser & target: radial [m]`relative_position_t`

: relative position between chaser & target: transverse (along-track) [m]`relative_velocity_n`

: relative velocity between chaser & target: normal (cross-track) [m/s]`relative_velocity_r`

: relative velocity between chaser & target: radial [m/s]`relative_velocity_t`

: relative velocity between chaser & target: transverse (along-track) [m/s]`c_object_type`

: object type which is at collision risk with satellite`geocentric_latitude`

: Latitude of conjunction point [deg]`azimuth`

: relative velocity vector: azimuth angle [deg]`elevation`

: relative velocity vector: elevation angle [deg]`F10`

: 10.7 cm radio flux index [\(10^{-22}\) W/(\(m^{2}\) Hz)]`AP`

: daily planetary geomagnetic amplitude index`F3M`

: 81-day running mean of F10.7 (over 3 solar rotations) [\(10^{-22}\) W/(\(m^{2}\) Hz)]`SSN`

: Wolf sunspot number

`x_sigma_rdot`

: covariance; radial velocity standard deviation (sigma) [m/s]`x_sigma_n`

: covariance; (cross-track) position standard deviation (sigma) [m]`x_cn_r`

: covariance; correlation of normal (cross-track) position vs radial position`x_cn_t`

: covariance; correlation of normal (cross-track) position vs transverse (along-track) position`x_cndot_n`

: covariance; correlation of normal (cross-track) velocity vs normal (cross-track) position`x_sigma_ndot`

: covariance; normal (cross-track) velocity standard deviation (sigma) [m/s]`x_cndot_r`

: covariance; correlation of normal (cross-track) velocity vs radial position`x_cndot_rdot`

: covariance; correlation of normal (cross-track) velocity vs radial velocity`x_cndot_t`

: covariance; correlation of normal (cross-track) velocity vs transverse (along-track) position`x_cndot_tdot`

: covariance; correlation of normal (cross-track) velocity vs transverse (along-track) velocity`x_sigma_r`

: covariance; radial position standard deviation (sigma) [m]`x_ct_r`

: covariance; correlation of transverse (along-track) position vs radial position`x_sigma_t`

: covariance; transverse (along-track) position standard deviation (sigma) [m]`x_ctdot_n`

: covariance; correlation of transverse (along-track) velocity vs normal (cross-track) position`x_crdot_n`

: covariance; correlation of radial velocity vs normal (cross-track) position`x_crdot_t`

: covariance; correlation of radial velocity vs transverse (along-track) position`x_crdot_r`

: covariance; correlation of radial velocity vs radial position`x_ctdot_r`

: covariance; correlation of transverse (along-track) velocity vs radial position`x_ctdot_rdot`

: covariance; correlation of transverse (along-track) velocity vs radial velocity`x_ctdot_t`

: covariance; correlation of transverse (along-track) velocity vs transverse (along-track) position`x_sigma_tdot`

: covariance; transverse (along-track) velocity standard deviation (sigma) [m/s]`x_position_covariance_det`

: determinant of covariance (~volume)`x_cd_area_over_mass`

: ballistic coefficient [\(m^2\)/kg]`x_cr_area_over_mass`

: solar radiation coefficient . A/m (ballistic coefficient equivalent)`x_h_apo`

: apogee (-\(R_{earth}\)) [km]`x_h_per`

: perigee (-\(R_{earth}\))[km]`x_ecc`

: eccentricity`x_j2k_inc`

: inclination [deg]`x_j2k_sma`

: semi-major axis [km]`x_sedr`

: energy dissipation rate [W/kg]`x_span`

: size used by the collision risk computation algorithm (minimum 2 m diameter assumed for the chaser) [m]`x_rcs_estimate`

: radar cross-sectional area [\(m^2\)]`x_actual_od_span`

: actual length of update interval for orbit determination [days]`x_obs_available`

: number of observations available for orbit determination (per CDM)`x_obs_used`

: number of observations used for orbit determination (per CDM)`x_recommended_od_span`

: recommended length of update interval for orbit determination [days]`x_residuals_accepted`

: orbit determination residuals`x_time_lastob_end`

: end of the time interval in days (with respect to the CDM creation epoch) of the last accepted observation used in the orbit determination`x_time_lastob_start`

: start of the time in days (with respect to the CDM creation epoch) of the last accepted observation used in the orbit determination`x_weighted_rms`

: root-mean-square in least-squares orbit determination