Scoring

The Collision Avoidance Challenge is to predict the final risk $ r $ of collision between a satellite and a given object (see the challenge section for a detailed description). Since this is a continuous value this challenge can be seen as a regression task, where our target is the final risk $ r $ associated to each time series of CDMs. However, the prediction in the case of high-risk events is much more important than a correct prediction of low-risk events. In other words, we strongly want to avoid false negatives and penalize their occurrences. This is done by constructing a loss combining the Mean Squared Error (MSE) and the F1 score with $ \beta$=2.

The final score/loss that is used to evaluate all submissions is given by:

$$ L(r, \hat{r}) = \frac{1}{F_2}MSE(r,\hat{r}), $$

where $F_2$ is computed over the whole dataset, using two classes, (high final risk: $ r \geq 10^{-6}$, low final risk: $ r < 10^{-6}$ ) and the MSE(.,.) is only computed for the events that belong to the first class. We have that:

$$ MSE(r,\hat{r}) = \frac{1}{N}\sum\limits_{{i=1}}^N(r_i - \hat{r}_i)^2, \lbrace i \mid r_i \geq 10^{-6}\rbrace, $$

and

$$ F_{\beta} = (1+\beta^2)\frac{precision \times recall}{(\beta^2 \times precision)+recall}, $$

where in our case, $\beta=2$, $N$ is the number of observations with high true risk, $r_i$ and $\hat{r}_i$ are the true risk and predicted risk for observation $i$, respectively, and precision and recall are quantities closely related to Type 1 and Type 2 errors.

Remember that, as noted in the rules section, the score on the leaderboard is only evaluated on a subset of the actual test data, in order to avoid potential overfitting of the test data. Only at the end of the competition all solutions will be evaluated on the entire dataset.