Personal update: I am starting my PhD @mbzuai where I look forward to work in multimodal realm (interpretability, modality imbalance, eval & application) to address foundational gaps with @AlhamFikri and co.
The humbling lesson for humans from Alyosha: humans turned out much simpler than we thought, 90% of the time we’re just nearest neighbor machines, pastiches from high-school reading lists 🙃 #cvpr2026
Introducing Cosmos 3: Our latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today we’re releasing Super (32B) and Nano (8B) variants.
"Learn from your own latents, not tokens: A Sample Complexity Theory"
This paper explains why data2vec and JEPA can learn with much less data.
They showed that when data has hidden hierarchy, token prediction becomes harder as the hierarchy gets deeper. But latent prediction keeps the learning problem simple at every level.
Which suggests that models may learn faster when they stop predicting raw tokens and start predicting their own abstractions.
AI can give researchers the freedom to pursue “crazier” ideas.
For Terence Tao, AI creates more room to experiment, test unexpected paths, and discover what might otherwise stay out of reach.
'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section.
It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon + others). They reviewed 170+ open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain.
Agent performance in the real world = Model capability + Harness quality
For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks.
This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering).
I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing...
Open source lets your experiments enter the research papers. That is still one of the best parts of this field.
The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes.
Paper & Repo: https://t.co/PAjqvOXedL
After submitting our culture mixing paper to CVPR (https://t.co/YWFLGl1BSp), we came across the ConfusedTourist paper which shares same motivation but different and interesting analysis!
We’ve put together a joint website to share our findings. Check it out below!
Too much? Come try the samples in our hub!
You can copy our exact prompts and culture-mixed images to test where your VLM's understanding breaks down 🤖
[5/n]
If you are attending #CVPR2026 and looking for Happy hour suggestions, check this out.
1)World Models & Drinks @reactorworld : https://t.co/1cZKxpJxYB
2)Researcher Reception @nvidia: https://t.co/OtXZxTz1td
3)Robotics & World Models :
https://t.co/w9FufM5PYr
[Cont]
just realized that the days can get super busy during the conference and u can't just keep opting for all things😭
regardless, excited to bump into some of these!
#CVPR starts in one week 🚀
One thing that always frustrated me at CVPR was workshop/tutorial days.
Schedules are scattered across dozens of websites, and planning your day means opening 20 tabs.
So I built CVPR Workshop Radar 👇
@Kyriakos_Pelek balancing the training data is always best, but at post-training/test time, exercising a stronger perception module through improved prompting and/or an agentic, multi-phased approach to extract the information can also be the way!
I am going to #CVPR2026 to present 3 of my papers!
1. ConfusedTourists ✈️😵💫
Geographical object or background perturbation is causing up to -40% VLMs accuracy drop 🚨
2. M4-RAG 🌇⛏️
80k+ multimodal-multilingual-multicultural RAG hub, the bigger the agent... the accuracy does not always go up 🤔
3. Counting to 4 is still a Chore 👀🔢
VLMs are still struggling with object counting, can attention budgeting help them?
[1/4]