Congrats to @GoogleDeepMind on the launch of DiffusionGemma.
The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100.
We're supporting it from day one with:
β’ BF16 and NVFP4 checkpoints on @huggingfaceπ€
β’ Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS
β’ @vllm_project support with FP8 precision
Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs
Huge congrats to the team, D4RT is a team work and all the authors have been working very hard on this in the past one year. Very well deserved. π» and thank you Award Committee Members for the recognition.
3D scene reconstructions by NVIDIA.
ArtiFixer - repairs artifacts and extends sparse views via Wan 2.1.
- high-fidelity inpainting in occluded regions
- gens hundreds of consistent frames in a single pass
- 3D Gaussian Splatting for navigable scene reconstruction
Makes the 3D environment look photorealistic and fully navigable for VR/AR. It basically turns a broken 3D model into a polished, professional scene.
https://t.co/weOQcfXleO
Today weβre introducing Gemma 4 12B β our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. Itβs open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Hereβs how we did it π§΅
OpenJarvis: a local-first personal AI is now available to run with Ollama
Built by Stanfordβs @HazyResearch and Scaling Intelligence labs, as part of their βIntelligence Per Wattβ research into efficient local AI. @Stanford
Learn more in the blog post πππ
LLM Wiki v0.4.16 just made knowledge graphs feel insanely fast.π€―
Huge rendering upgrades mean you can now explore massive AI knowledge maps without the lag, freezes, or clutter.
Search flows smoother. Navigation feels instant.
This is starting to look less like a wiki⦠and more like a second brain.
Repoπ
Feed-forward 3D reconstruction methods typically predict pointmaps in camera-centric frames. But why should a camera's arbitrary orientation define the coordinate system?
We introduce G3T, a transformer that predicts pointmaps in gravity-aligned frames. Regardless of input image orientation, our method always produces upright pointmaps (see demo).
We leverage this uprightness to create G3T-Long, a submap-based reconstruction method that improves robustness on long-sequence 3D reconstruction (more on that below).
Interactive demos, code, and model weights are available on our project page.
This #CVPR2026 paper from our research team is trending #1 on @HuggingFace π€
Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, βseeingβ is only useful if a model can pinpoint where something is fast enough to act.
Trained on 138M high-quality samples, LocateAnything decodes bounding boxes in parallel instead of one coordinate at a time, improving localization accuracy while dramatically increasing throughput for visual grounding and detection.
Project page: https://t.co/O7JMe8tzFM
π¨ NotebookLM + Google Antigravity might be the most underrated AI combo right now.
Almost no one is using itβ¦
But the people who are? Theyβre getting a massive edge.
This setup can help you:
β Learn faster
β Research smarter
β Turn ideas into polished content in minutes
And it takes less than 2 minutes to set up.
Hereβs exactly how to use it + what it can do ππ§΅
π¨NotebookLM + Google Antigravity is one of the most powerful combo available right nowβand almost no one is using it.
If youβre not taking advantage of this, youβre missing out on serious leverage.
Hereβs how to set it up in 2 minutes + what it can do π
PanoWorld.
An interesting way to use Qwen-Edit. It converts 2D floor plans into photorealistic, consistent VR home tours.
Great for real estate and interior designers. It lets you walk through a home that hasnβt been built or furnished yet.
Ensures seamless 360 views via CPRoPE
https://t.co/rWDnjaqE5k
Persistent memory was step one.
Now comes always-on, asynchronous, event-driven agents.
The hard part isnβt intelligence anymore β itβs reliability, permissions, coordination, and knowing when not to act.
We are excited to release the code for our paper OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness, accepted to #CVPR2026.
Source Code: https://t.co/na6fLrmzE8
From a dashcam video, OpenVO estimates how the camera of the vehicle moves in metric scale.
π¨ Google acaba de liberar sus skills oficiales para agentes de IA.
Ha publicado 13 skills compatibles con Claude Code, Cursor, Copilot y otros agentes.
Permiten que los agentes puedan ejecutar tareas avanzadas y automatizar flujos de trabajo complejos.
Es gratis y open-source π