2026

2025

LLM-as-a-Judge: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation

Note: Most of the content in this blogpost was previously published on the Zalando Engineering Blog. [Update, March 2025] This paper was published at ECIR 2025 (European Conference on Information Retrieval). We introduce a novel approach to large-scale product retrieval evaluation using Multimodal Large Language Models (MLLMs). Evaluated on 20,000 examples, our method shows how MLLMs can help automate the relevance assessment of retrieved products, achieving levels of accuracy comparable to human annotators and enabling scalable evaluation for high-traffic e-commerce platforms. ...

April 7, 2025 · Kasra Hosseini