Retrieval that actually works
Zero-shot, few-shot, and instruction-following retrieval systems that can handle diverse video collections and complex user intent.
A casual, high-signal ACL 2026 session for researchers thinking about video retrieval, multimodal retrieval, and efficient video understanding โ from zero-shot systems to scalable search over millions of videos without melting GPUs.
Video is everywhere, but retrieving the right video moment remains expensive, messy, and surprisingly unsolved. This session is for people who want systems that are not only accurate, but also scalable, interactive, and deployable.
Zero-shot, few-shot, and instruction-following retrieval systems that can handle diverse video collections and complex user intent.
Memory-efficient search, compressed representations, smarter candidate filtering, and practical ways to avoid brute-force everything.
Informal discussion, idea sharing, open problems, and probably too many opinions about embeddings, rerankers, and compute budgets.
Bring your papers, failed experiments, weird benchmark observations, and half-formed ideas. Especially the half-formed ideas.
The session is designed to be lightweight and discussion-heavy. Replace the placeholder times below once the ACL schedule is finalized.
A quick framing of current video retrieval problems: queries, videos, moments, events, captions, embeddings, and where current systems break.
Short informal contributions from attendees: recent work, promising directions, painful bottlenecks, and open questions.
How do we retrieve from huge video databases? How do we reduce annotation burden? When should we use rerankers? What should future benchmarks measure?
Wrap up with promising problems, shared resources, and people who should probably talk to each other after the session.
Come for the retrieval. Stay for the embeddings. Leave with at least one dangerously good research idea.
La Jolla. This session is available in-person only.
Researchers working on video retrieval, multimodal retrieval, efficient video understanding, representation learning, reranking, or scalable search.
Questions, opinions, open problems, negative results, benchmark frustrations, and ideas that are not fully baked yet.