Vision-language AI models have a gaze. And you can steer it! 👀
Redirect just 9% of a model’s attention heads to any region in an image, and the VLM will start describing that region mid-generation. We call them Gaze Heads!
Try the demo: https://t.co/y5jlb0iBI8 🧵👇
ETH Zurich just open-sourced their entire 2026 robot learning course.
Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo.
The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics.
Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox.
12 weeks. Free. No signup.
If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now.
📍[https://t.co/eKsIjILi60]
——
Weekly robotics and AI insights.
Subscribe free: https://t.co/9Nm01QUcw3
Run Gemma 4 26b MTP on 8 GB VRAM GPUs at 25+ tokens/second. Flags included!
local llm space is moving at terminal velocity. only 3 days ago google released gemma 4 26b a4b qat quants. more efficient than before, ran on 8gb vram at 20 tok/sec.
and now just a few hours ago, mainline llama.cpp merged a massive update and we just shattered our own record. decode throughput went 25-40% up on the same 8 GB VRAM setup!
Before MTP: 20 tps -> After MTP: 28 tps!
llama.cpp just officially merged PR #23398 ("add Gemma4 MTP"), bringing native Multi-Token Prediction (MTP) support to Gemma 4 models.
By running speculative drafting on the same 8GB VRAM RTX 4060 setup, my decode throughput on a 64k context instantly leaped to a blistering 25–27 tokens/sec thats 25-30% increase with the same hardware.
Here is the architectural catch you need to know: Unlike the Qwen 3.5 and 3.6 series, which bake the MTP heads directly into the base GGUF, the Gemma 4 MTP head is not built in.
You must download a separate, specialized MTP drafter GGUF (the assistant model) to act as the speculator. (I've dropped the download link in the replies).
copy and try the exact flags:
-m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf --spec-type draft-mtp --spec-draft-n-max 6 --spec-draft-p-min 0.7 --spec-draft-model gemma-4-26b-A4B-it-assistant-Q4_0.gguf -c 64000 -v
n-max 4 and p-min 0.7 is also worth checking out. benchmark on your setup and workflow.
if you have a single 8 gb vram nvidia rtx 4060, 3060, 3070, 2080, 2070, grab the MTP drafter GGUF link in the comments and try it yourself.
Check it out even if you have asmaller or a larger gpu, such as a single rtx 3090, 4090, 3060, 2060.
MTP works for all gemma 4 sizes such as gemma 4 12b, gemma 4 31b etc. but remember to grab the correct mtp draft assistant models respectively.
what are you benchmarking today
NVIDIA's LocateAnything is a new vision model for grounding and detection. Very performant and accurate!
> 10x faster than Qwen3-VL
> 138M queries + 785M boxes
> GUI, OCR, docs, dense detection
> Free & open source
https://t.co/UvkH8l0QRb