A submission consists of a .zip-file which contains a single .json-file (UTF-8 encoded) that contains your predicted object coordinates for each sequence and frame of the test folder. The structure of this .json-file should be the same as the file training_anno.json: an array of objects, where each object contains 4 key-value pairs:
This object describes that you predicted 3 objects for frame 2 of sequence 1224 with the corresponding x-y coordinates. Note that your coordinates can be floating point numbers, despite the fact that all images consist of 640x480 pixels. Pixel centers are offset by -0.5 pixel, thus your coordinates need to be in the rectangle spanned from (-0.5, -0.5) to (639.5, 479.5).
It is mandatory to identify predictions for each sequence and frame. Thus, if you do not detect objects, you need to state this fact in your submission explicitly.
For each sequence and frame you have to make exactly one prediction: submissions with duplicates or missing frames/sequences are not valid. For each frame, you can predict up to 30 objects, but not more. The number of objects you predict has to match with the number of coordinates you provide.
If you want to check whether your .json-file constitutes a valid submission, you may run the code that we provide in our starter kit (this link will take you to "Zenodo").