Object Detection


The SkyData challenge aims to improve the performance of detection algorithms for tiny object detection where the scene is densely populated. As in the classical object detection problem, the goal is detecting the bounding boxes for each of the nine predefined classes (i.e., people, bicycle, motor, pickup, car, van, truck, bus and boat) with real-valued confidence scores. The data is captured by a drone platform at different heights, and is available on the download page. We manually annotate the bounding boxes of different categories of objects in each image. Annotations on the training sets are publicly available on the dataset page. We require each evaluated algorithm to output a list of detected bounding boxes with confidence scores for each test image in the predefined format.




In SkyData, most of the objects are small. The distribution of object size is given in the table below. Similar to COCO dataset, area is measured as the number of pixels in the segmentation mask (segmentation area). In COCO, an object is considered as small if area < 322, medium if 322 < area < 962 and large if area > 962. In addition to COCO, we split small objects into micro (area < 122), tiny (122 < area < 222) and small (222 < area < 322) due to the heavy number of small objects in SkyData.




Our metrics are similar to COCO metrics (however they are not exactly the same). Similar to the evaluation protocol in MS COCO [1], we use mAP, APIOU=0.50, AP, ARmax=1, ARmax=10, ARmax=100, and ARmax=1000 metrics to evaluate the results of detection algorithms. Unless otherwise specified, the AP and AR metrics are averaged over multiple intersection over union (IoU) values. Specifically, we use ten IoU thresholds of [0.50:0.05:0.95]. All metrics are computed allowing for at most 1000 top-scoring detections per image (across all categories). These criteria penalize missing detection of objects as well as duplicate detections (two detection results for the same object instance). The AP metric is used as the primary metric for ranking the algorithms. The metrics are described in the following table. Our main metric is mAP to compare the algorithms’ performance. Our evaluation metrics are computed similar to COCO metrics. Therefore, the following three definitions are mostly taken from the COCO website: Unless otherwise specified, AP and AR are averaged over multiple Intersection over Union (IoU) values. Specifically we use 10 IoU thresholds of .50:.05:.95. This is a break from tradition, where AP is computed at a single IoU of .50 (which corresponds to our metric APIoU=.50). Averaging over IoUs rewards detectors with better localization. AP is averaged over all categories. Traditionally, this is called "mean average precision" (mAP). We make no distinction between AP and mAP (and likewise AR and mAR) and assume that the difference is clear from context. AP (averaged across all 10 IoU thresholds and all 9 categories) will determine the challenge winner. This should be considered the single most important metric when considering performance on SkyData. In addition to AP, we also compute the following metrics as additional information:


Measure Perfect Description
AP 100% The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories
AP IOU=0.50 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50
AP IOU=0.75 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75
AP micro 100% The average precision for micro objects.
AP tiny 100% The average precision for tiny objects.
AP small 100% The average precision for small objects.
AP medium 100% The average precision for medium objects.
AP large 100% The average precision for large objects.
AR max=1 100% The average recall given 1 detection per image
AR max=10 100% The average recall given 10 detections per image
AR max=100 100% The average recall given 100 detections per image
AR max=1000 100% The average recall given 1000 detections per image
AR micro 100% The average recall for micro objects.
AR tiny 100% The average recall for tiny objects.
AR small 100% The average recall for small objects.
AR medium 100% The average recall for medium objects.
AR large 100% The average recall for large objects.





We accept submissions in a single JSON file similar to the COCO format for the entire detection task. During the submission, we require a single JSON file. Use the format below:


    'image_id'      : int, 
    'category_id'   : int, 
    'bbox'          : [x,y,width,height], 
    'score'         : float,


In the format, the box coordinates are floating numbers measured from the top left image corner and they are 0-indexed. We recommend rounding coordinates to the nearest tenth of a pixel to reduce the file size of the resulting JSON file.


[1] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Proceedings of European Conference on Computer Vision, 2014, pp. 740–755