Wan Streamer is a real-time interactive model that listens, sees, speaks, and replies with synchronized video.
It runs at 25 fps with ~200 ms latency, making audio-video agents feel closer to live conversation.
https://t.co/MDqfe2bwe2
Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026!
I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart.
The paper has two main contributions:
- InfiniteDiffusion: a new approach to infinite generation with diffusion models.
- Terrain Diffusion: the world’s first learned procedural terrain generator.
Here’s why this matters, and how they are connected. 🧵
I'm blown away.
This AI filmmaking workflow for precise camera control, multiple characters, and dialogue is insane:
1. Generate a start frame in Midjourney
2. Match the poses in Blender, animate the camera
3. Feed both to Seedance
I didn't think this would work. Two consistent characters, solid performances, the move tracked perfectly through the entire beat.
Even the soup looks great.
🚀 Introducing HappyHorse 1.1 — now officially live on Alibaba Cloud Model Studio!
All HappyHorse 1.1 capabilities are available via API, providing enterprise customers and developers with a complete integration solution. This release delivers production-ready video synthesis systematically optimized across core content generation scenarios.
🔥 Launch Promotion: Enjoy a 40% OFF sitewide discount for the first 2 weeks! Optimize your integration costs today.
We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality:
- GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81)
- Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls
- GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build.
Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!
Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://t.co/hhO6qTawgb 🐡
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced!
🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite.
🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6.
🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates.
⚡️ 6x High-Speed Mode coming soon!
🔌 Available today via Kimi API and Kimi Code.
🔗 Kimi Code: https://t.co/uvoSJKyGCY
🔗 API: https://t.co/EOZkbOwCN4
We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽
Here’s a system prompt you can use inside a ChatGPT or Claude project.
The main idea is simple:
You feed it a basic idea, a rough image prompt, a scene description, or even an uploaded image, and it enhance it and return with 10 cinematic prompts, each exploring a different composition, camera angle.
For example, I used:
“a gladiator riding a horse on a mountainside”
And these are the results!
It's a great way to explore different visual languages, discover interesting compositions.
System prompt below. 👇
Today, we’re launching Reve 2.0, the best 4K image model in the world.
We invented a new way to generate and edit any image using precise layouts. For the first time, it’s possible to create images you can touch.
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Building apps has never been easier.
With Sites, Codex can turn your work, ideas, and plans into an interactive website or app your team can explore, use, and share with a URL.
Rolling out to Business and Enterprise plans, before expanding more broadly.
Seven new models launching at Build: let’s go!
Reasoning. Code. Image. Transcribe. Voice.
Built from scratch on a clean data lineage, designed for efficiency, working seamlessly as a family of models
Thread 🧵
#MSBuild
1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation!
🚀100M VLM-captioned image-text pairs for training
📊1M image-text pairs for benchmarking
🖼️~28 trillion pixels
🤗Centrally Hosted
✅Fully permissive for research + commercial use
Dataset, benchmark and models🧵👇
Co-led with @KyleSargentAI
It’s never been easier to design your dream house.
Draw a shape. Define your rooms. Set your constraints.
@DraftedAI generates complete floor plans, elevations, and 3D home designs in seconds.
Over the last month, 120,000 people generated 325,000+ home designs with https://t.co/XqC0LP5n3y.