So much discussion around "agents" is build on top of so many abstracted systems that it's easy to get lost and forget how these systems build on top of LLMs. I made a small project you can reference to make your own agents, claude-code type platform, etc. https://t.co/n9Bx9ZwN9d
IT'S TIME TO MAKE SOME CRAZY, CRAZYYYYYYY MONEYYYYYYY!!!!!!!
Crazy Taxi returns with a brand new entry due for 2027 on XBOX, PlayStation 5, Nintendo Switch 2 and PC
🚨 Gemini 3.5 Pro
Google is trying their best for these checkpoints
> Laziness is not solved for pro models
> Continue iteration will come , so we can expect better things
We will continue to test keep you guys updated: DevMode™
A quick first pass at a new iteration of the realtime diffusion pipeline: physical interface → camera → realtime img2img → upscale → 3DGS through the new TripoSplat.
Rough and held together with tape for now, but the whole loop runs end to end. Next:
- better input and camera's pov,
- splitting it across two machines to automate the hand-off,
- and adding in the Tripo3D API for higher-res models... the 3D printer is just waiting for that.
#diffusion #3DGS #TripoSplat
GPT-5.6 is exceptionally good at replicating designs from an image in code.
This is a 0-shot SVG output (!) from 5.6, with a 1-sentence prompt and no tools, alongside an image of an Xbox One controller.
🔥
Our first commercial TTS model was optimized for WER and SSIM because that’s what research had taught us over years to be the standard metrics. The first customer feedbacks we had unveiled the huge blind spots of these metrics, in particular on naturalness, rhythm, emphasis, question intonation, etc. Now our internal eval has dozens of criteria monitored on each model.
I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.
Wow. That's cool.
Researchers just released World, an open-source Unreal Engine 5 simulator for training and testing LLM and VLM agents in realistic 3D environments.
The platform supports RGB, depth, and segmentation sensors, along with navigation, vehicles, pedestrians, robots and procedural city generation.
It's built with a Gym-like Python interface, it allows AI agents to learn physical and social reasoning in complex virtual worlds before real world deployment.
We're moving from AI that only understands information to AI that can perceive, reason and act inside realistic simulated environments.
We’re going all in on World Models.
Today we’re launching the 1X World Model Lab.
The bet is simple:
You can’t fine-tune your way to AGI.
And you definitely can’t fine-tune your way to robots that can operate in the physical world.
General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task.
The frontier is not better VLA wrappers.
The frontier is embodied world models.
The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up.
The next frontier in AI requires scaling:
web-scale media + egocentric human videos + sim + dexterous remote operated robot data + on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI
The robot collects data.
The model gets better.
The robot gets better.
Repeat.
To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models.
Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career.
If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us.
Send background + evidence of exceptional ability to:
[email protected]
We’re building the model that makes autonomous labor real.
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀
https://t.co/AskX3J5oEJ
Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI.
While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality:
▪ LLM²: AI models automating research to invent better preference optimization algorithms.
▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance.
▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models.
▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning.
▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity.
▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature.
Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve.
Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI.
We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good.
We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.
Today, I'm incredibly excited to share my longest piece to date. 50 Crowns.
It's no secret that I have a deep love for the gaming industry. I believe non-linear storytelling is where entertainment is heading and with this piece, I wanted to see just how far we can push today's technology in the realm of cutscenes.
Enjoy!