🚀 Our community-led ML Agents group is kicking off a new collaborative project to build a Street Navigation Agent for more inclusive, region-aware local search. In many parts of the world, businesses exist physically — but not digitally.
They're exploring how AI can use tools like Google Street View to read storefront signs, apply distance & category constraints, and reason step-by-step to identify real-world services.
We’re also building a global benchmark across countries and languages to evaluate visual verification.
Interested in contributing?
✨ Beginners welcome — no hard requirements
✨ Familiarity with VLMs is helpful for evaluation
✨ Experience with agentic workflows and PyTorch is a plus
Learn more and get involved today: https://t.co/oOVlxQhTm7
Many thanks to our community leads @_1024_m, @ankanpy, @SovitRath5 and @jebish7 for their leading this initiative!
Cohere Labs x ICLR 2026: Kaleidoscope
A multilingual multimodal benchmark with exam-style questions written directly in 18 languages (not translated from English).
@universeinanegg Thanks for bringing this to my feed. Went through few threads. My biggest takeaway is that, we are really lacking in our evaluations of models. Really fascinating idea this, kinda also shows how agents operate in social settings.
@universeinanegg Tried this with Claude and GPT, and it’s the same. Gemini 3 pro did give two 3s ( after thinking a lot). Though all of their first two numbers are 3 followed by 1 (irrespective of temperature). Models seem to love 3.
Kaleidoscope has been accepted at ICLR 🔥. This is the first of its kind massive multimodal multilingual benchmark. Configurations to everyone @Cohere_Labs 🎊
Congrats to everyone involved in Kaleidoscope, a cross-institutional collaboration accepted to ICLR 2026 🔥
A special shoutout to @mziizm who championed this collaboration from day 1. It is the first accepted paper for many of the collaborators who are first time authors.
Many researchers join our community seeking mentorship, support, and a roadmap as they embark on their journeys.
@_1024_m and @jebish7 did just this. Now, just 2 years later, they are creating these pathways for others, opening doors, and leading the way.
🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts.
Check our newest paper here: https://t.co/nujSGeHKMx
#AIagents#WorldModel#ToolUse
@PingbangHu There was sudden score inflation after the leak, so this was the only realistic way to fix it. They could have rolled everything back to just before the leak, but that would’ve been unfair to people whose reviewers hadn’t responded yet.