DICTA 2024 Data Challenge Competition
Subtle Differences Recognition (Among Visually Similar Objects)
Overview:
The detailed differences among similar objects, such as attributes like the arrangement and quantity of parts, composition of components, shape, size, and texture, are crucial in various applications such as pick&place robotic task in the warehouses or anomaly detection in the production lines (to distinguish between similarly looking objects).
Existing recognition models have focused on the drastic changes between image pairs: e.g. addition or removal of objects or people, changes in position, and discrete changes in color or shape between two images. However, these models address only simple, discrete changes and not the more complex, continuous differences such as slight shape, color or texture variations. Therefore, for this challenge we propose to work on two tasks for subtle difference recognition (Figure 1) targeting highly similar image pairs with only subtle differences between object attributes including color, shape, and texture.
Proposed tasks:
In the difference image selection task, the input consists of two images, along with a text describing the differences between the image pair. The task is to choose which of the two images, contains a change, described in the text. On contrary, the objective of the conditional difference captioning task is to evaluate to describe the minor differences in shape, color or texture between two images.
Dataset:
To evaluate the performance of the proposed in the challenge models, we propose to use a new Subtle-Diff evaluation dataset. This dataset was constructed using LLMs and image generation models to automatically generate similar image pairs containing subtle differences. Human annotators additionally described the differences in the generated image pairs with text. Stable-Diff consists of 2802 image pairs with 12828 annotations for 570 different objects (Table 1). The examples of images and captions from the Subtle-Diff dataset can be found in Figure 2.
Image pairs | Annotations | Objexts | Annotators | Vocabulary | Sentence length |
2,802 | 12,828 | 570 | 11 | 1,930 | 12.78 (average) |
Evaluation metrics:
For the difference image selection task, which is a binary classification task, we propose to use accuracy as the evaluation metric.
For the conditional difference captioning task, evaluation can be conducted using BLEU-4 and CIDEr metrics, widely used in image captioning tasks.
Challenge details:
The challenge is hosted at: EvalAI (https://eval.ai/web/challenges/challenge-page/2340/overview). And submission should be done via EvalAI platform as well.
The Subtle-Diff dataset is ready and available for download (train and val splits) from Google Drive: https://drive.google.com/file/d/1iaHvLiKck4GS6CVmK6WId-LvqaUDzsUu/view?usp=drive_link
The annotations for train and val splits: https://drive.google.com/file/d/159N7-SLmNHomvC6QlB3s1rvJMkvPxylF/view?usp=sharing
Important dates:
- 1st May , data available.
- 5th July, challenge launched.
- 10th Sept, challenge end and verification process begins.
- 10th Sept, paper submission deadline
- 15th Sept, notification of paper acceptance
- 22nd September, camera ready submission
- 30th September, early bird registration
- 28th November Award ceremony
Prize:
The winner will be awarded attractive prize and certificate sponsored by Centre for Artificial Intelligence and Machine Learnign (CAIML), School of Science, Edith Cowan University.
Contacts:
Challenge Organisers:
- Yue Qiu, National Institute of Advanced Industrial, Science and Technology, Japan (qiu.yue@aist.go.jp)
- Mariia Khan, Edith Cowan university, Perth, Australia (mariiak@our.ecu.edu.au)
- Yanjun Sun, National Institute of Advanced Industrial, Science and Technology, Japan(sunyanjun@keio.jp)
Challenge Chairs:
- Dr Naeha Sharif, University of Western Australia (naeha.sharif@uwa.edu.au)
- Dr Longguang Wang-NUDT, National University of Defense Technology (wanglongguang15@nudt.edu.cn)