Sharing my CVPR 2026 talk from the Vision for Intelligent Task Assistants workshop: "From Perception to Agency: The Cognitive Stack for Video Task Assistants."
It covers our SVI-Bench project (https://t.co/BAtXqeU5oY) plus a video+robotics project we'll release soon.
w/ @YuluPan_00@mmiemon@Han_Yi_724@mars_su0311@baiqil0203
https://t.co/9hpF28ZpUB
We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽
In the second before a play develops, a basketball player can instantly recognize the defensive scheme (perception), anticipate how the defense will rotate (causal reasoning), simulate several possible outcomes (simulation), and choose the best move (decision).
Today's video AI is far from this. These models can describe what they see, but they cannot explain why something happened, predict what comes next, or decide how to respond. We introduce SVI-Bench to measure these capabilities, and to push toward models that can reason over real-world, multi-agent video.
🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis
🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance).
🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap!
📄Paper: https://t.co/rqcILZ6SGY
Let’s build models that truly have expert-level understanding. Work done with amazing collaborators:
@YuluPan_00@gberta227
Paper: https://t.co/qGNUjFCbjS
Project Page: https://t.co/HVQvAMQEA5
🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis
🎥 3,521 expert-curated video QA pairs in 6 domains (Sports, Bike Repair, Cooking, Health, Music & Dance).
🧠 GPT‑4o scores 44.70% vs human experts at 82.02%—a huge gap!
📄Paper: https://t.co/rqcILZ6SGY
6️⃣ Real-World Impact
🤖 Goal: Build AI systems that support real coaching and feedback
🎯 From video understanding ➡️ actionable skill guidance
🌍 We hope ExAct inspires progress toward expert-level AI