What a crazy week in AI! 🚀
LocateAnything
CubePart
Opus 4.8
Step 3.7 Flash
MiniCPM5 1B
PhysX Omni
DeepSWE
TriSplat
Gamma World
PiD
GenRecon
AutoScientists
ControlLight
& more!
Watch the full recap:
https://t.co/lOFYyoBrXB
Bytedance drops an interesting paper, Representation Forcing.
Understanding and generation meet in a single representation space, learned end to end, with no VAE in between.
Looks like VAE will not be necessary for future models.
https://t.co/U7yMBZTEGA
MDA is a new depth-estimation method from NVIDIA.
It removes flying points around object edges, glass, and sky with almost no extra overhead.
Code available
https://t.co/qkdbRDF44u
LaVR renders existing videos with new camera paths. It keeps scenes geometrically consistent while avoiding the distortions, hallucinated objects, and wrong camera motion
https://t.co/V6nqy0Yd1n
VSTAT is a new video benchmark that tests whether AI models can continuously track changing visual states over time, and it’s impressive because humans score about 90.5% while the best current model only reaches 44.4%, exposing a huge weakness in today’s multimodal AI.
https://t.co/IBJrGGFNkt
Ideogram 4.0 is out and it's impressive! Top OPEN image model that rivals Nano Banana.
> Strong text rendering & layout control
> Multilingual text + 2K images
> Open weights, code available
https://t.co/KGaxnBPmah
Bytedance drops an interesting paper, Representation Forcing.
Understanding and generation meet in a single representation space, learned end to end, with no VAE in between.
Looks like VAE will not be necessary for future models.
https://t.co/U7yMBZTEGA
Stability AI drops Stable-Layers, a new method for splitting images into editable layers. Cleaner separation and fewer artifacts
https://t.co/c9rHbyyzO8
Ideogram 4.0 is out and it's impressive! Top OPEN image model that rivals Nano Banana.
> Strong text rendering & layout control
> Multilingual text + 2K images
> Open weights, code available
https://t.co/KGaxnBPmah
Bytedance drops an open-source Gemini Omni!!!
Bernini is a new AI video generation + editing framework.
> Edit videos with text prompts
> Image/video references
> Code available
https://t.co/rMKIBITUWW
Bytedance drops an open-source Gemini Omni!!!
Bernini is a new AI video generation + editing framework.
> Edit videos with text prompts
> Image/video references
> Code available
https://t.co/rMKIBITUWW
NVIDIA's PiD is my new favorite upscaler. Accurate, fast, and incredible details. Generates a 4K image in under 5s.
Full tutorial https://t.co/mwwUs6tomh
Google & Meta release PaGeR, a new AI model for 360° scene geometry.
> Depth, metric depth, normals, & sky masks
> SOTA panoramic depth + normals
> Code available
https://t.co/PwoNneqCWa
Exciting news! NVIDIA RTX Spark brings RTX + AI into slim laptops and small desktops. Local agents can run with CUDA + big unified memory.
> Up to 1 petaflop FP4 AI
> Up to 128GB unified memory
> 6,144 Blackwell RTX GPU cores
https://t.co/3pC2Y9ZEdl