@maxbittker Big fan of this work, I eagerly await the benchmark update on every model release ๐ญ I hear Fable has some strong vision capabilities to play games at the โvisionโ level as opposed to underlying env access, any plans to test that out? (Or, is that what this has always been!)
This week at #CVPR2026, NVIDIA Research is presenting three papers across physical ai that offer groundbreaking solutions for training at scale across diverse applications:
โ GraspGen-X: the first foundation model for zero-shot grasping, trained on billions of simulated grasps
โ LCDrive: a model that replaces expensive text-based reasoning with compact latent representations
โ NitroGen: a generalized gameplay AI foundation model that harnesses NVIDIA Isaac GR00T to help train embodied agents
Learn more: https://t.co/H748YkAWS9
Can MLLMs actually track what's happening in a video?
Introducing VSTAT ๐ฏ, our new benchmark for visual state tracking.
The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't.
https://t.co/dgqhqeVuSv
๐งต [1/11]
Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude.
Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.
Humans can see in high-res, high-FPS in real-time. Why can't VLMs?
Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos.
๐ https://t.co/GhbWZwMAg7
๐ https://t.co/mEJ991MAIR
๐ค https://t.co/FOfc2QRThi
(1/n)๐งต
Expectation: the age of the IDE is over
Reality: weโre going to need a bigger IDE
(imo).
It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. Itโs still programming.
sadly the agents do not want to loop forever. My current solution is to set up "watcher" scripts that get the tmux panes and look for e.g. "esc to interrupt", and send keys to whip if not present. Need an e.g.:
/fullauto you must continue your research!
(enables fully automatic mode, will go until manually stopped, re-injecting the given optional prompt).