๐น TL;DR โ CroQS: Cross-modal Query Suggestion Benchmark
We introduce CroQS, a benchmark for evaluating cross-modal query suggestion systems that generate textual queries guided by visual content. This enables a more intuitive exploration of image collections.
Query suggestion, a technique widely adopted in information retrieval, enhances system interactivity and the browsing experience of document collections. In cross-modal retrieval, many works have focused on retrieving relevant items from natural language queries, while few have explored query suggestion solutions.
In this work, we address query suggestion in cross-modal retrieval, introducing a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection, following the premise of “Maybe you are looking for”. To facilitate the evaluation and development of methods, we present a comprehensive benchmark named CroQS. This dataset comprises initial queries, grouped result sets, and human-defined suggested queries for each group. We establish dedicated metrics to rigorously evaluate the performance of various methods on this task, measuring representativeness, cluster specificity, and similarity of the suggested queries to the original ones. Baseline methods from related fields, such as image captioning and content summarization, are adapted for this task to provide reference performance scores.
Although rather far from human performance, our experiments reveal that both LLM-based and captioning-based methods achieve competitive results on CroQS, improving the recall on cluster specificity by more than 122% and representativeness mAP by more than 23% with respect to the initial query.
@InProceedings{10.1007/978-3-031-88711-6_9,
author="Pacini, Giacomo and Carrara, Fabio and Messina, Nicola and Tonellotto, Nicola and Amato, Giuseppe and Falchi, Fabrizio",
editor="Hauff, Claudia and Macdonald, Craig and Jannach, Dietmar and Kazai, Gabriella and Nardini, Franco Maria and Pinelli, Fabio and Silvestri, Fabrizio and Tonellotto, Nicola",
title="Maybe You Are Looking for CroQS Cross-Modal Query Suggestion for Text-to-Image Retrieval",
booktitle="Advances in Information Retrieval",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="138--152",
isbn="978-3-031-88711-6"
}
This work has received financial support by the project FAIR โ Future Artificial Intelligence Research - Spoke 1 (PNRR M4C2 Inv. 1.3 PE00000013) funded by the European Union - Next Generation EU.
This work has received financial support by the European Union โ Next Generation EU, Mission 4 Component 1
CUP B53D23026090001 (a MUltimedia platform for Content Enrichment and Search in audiovisual archives โ MUCES PRIN 2022 PNRR P2022BW7CW).
This work has received financial support by the Spoke ``FutureHPC & BigData'' of the
ICSC โ Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing funded by the Italian Government.
This work has received financial support by the FoReLab and CrossLab projects (Departments of Excellence), the NEREO PRIN project (Research Grant no. 2022AEFHAZ) funded by the Italian Ministry of Education and Research (MUR).