What a weekend. Around 30 teams showed up to build on Laguna XS.2, and the bar was very, very high.
Winners below 🏆
1st: Overthinking Machines Labs
@emilfristed
Pseudo-full-duplex with text-only models through dialogue modeling with silence tokens.
https://t.co/rP4BZWrtrz
2nd: Coding Kernels by the Pool
Charlie Masters, Evan O’Leary, Jessica Mak
Laguna-Dense: a ~3B fully dense distillation of Laguna XS.2 for generating CUDA kernels from PyTorch.
https://t.co/OLmGezfGqF
3rd: attnvq
@alaradirik
Attention-aware product vector quantization of KV caches.
https://t.co/SwXmcIEOhn
Honorary mention: Laguna Vision
Aaron Kazah @aaronkazah
A SigLIP vision encoder + resampler + LoRA adapters, trained on 300k examples to give Laguna XS.2 a native visual input path.
https://t.co/dgrtgI7eoj
Huge congrats to the winners, and thank you to everyone who hacked, demoed, judged, helped, and pushed Laguna XS.2 in directions we would not have found on our own!
@nvidia@PrimeIntellect@adaption_ai@huggingface
Love seeing the work @RedHat_AI and @vllm_project are doing to make Laguna XS.2 easier to run.
Red Hat AI trained a DFlash speculator: a 0.6B drafter that predicts 8 tokens per pass, with Laguna verifying the output.
So builders get faster generation without changing output quality.
With vLLM support and FP8/NVFP4/INT4 checkpoints through LLM Compressor, it’s also easier to tune for different latency, memory, and hardware constraints.
Grateful for the team building the infra that makes open models easier to use, serve, and improve!
Laguna XS.2 from @poolsideai is a 33B MoE built for agentic coding.
Red Hat AI trained a DFlash speculator for it: 0.6B drafter, 8 tokens per pass, no quality loss.
FP8, NVFP4, and INT4 checkpoints via LLM Compressor.
Models in comments. Speedup with @vllm_project:
Super comprehensive writeup that covers many frameworks & case studies on async RL. I learned a lot from the discussion of adding bias to the objective and how techniques that introduce bias (e.g., TIS + CISPO) help stabilize smaller batches but scale more poorly.
The level of the @poolsideai hackathon in London was higher than the average in SF.
We tried distilling Laguna XS.2 into a dense model. A ~11x parameter reduction from 33B to 3B.
https://t.co/9bwmzRr2kc
Many thanks to the organizers @poolsideai@eloquake@Badiaserra
this week I was at the @poolsideai talk hosted by @CrusoeAI and heard @varunrandery discuss what he calls the "agent API."
tldr; we stop sending text and getting text back, and start sending a unit of work and getting the finished result back, technically it's clean, and I liked it.
then I thought about it longer and started to wonder what's left for the rest of us to build, and what it means for SaaS more broadly.
wrote it up: https://t.co/q2qpbGJRzg
Loving the @latentspacepod breakdown of our Laguna M.1/XS.2 Technical Report! The Latent Space paper club just did a deep dive, and their takeaways perfectly capture what we set out to build with our Model Factory. A few quotes from the video 🧵👇 (1/6)
https://t.co/GojK1nTzK6
If you’re in London tonight, come hear @varunrandery at @CrusoeAI Talks!
He’ll unpack harnesses, agent APIs, and the blurring boundaries between models, tools, and products.
Turned into a busy thursday today!!!
you'll catch me in sunny Hyde Park at the @DesciLondon picnic.
But you can get technical with @CrusoeAI and @nvidia & @poolsideai before their hack this weekend
or watch some cool demos from @LynettaWang126 & @Tomasmrky at @join_ef
or dive into the weeds of quantum AI with @SiriusQuantum at @encodeclub (rumour has it there might be a members rooftop party as well 👀)
find them all on londoncalling [dot] guide
Honored to be included in @Redpoint 2026 InfraRed 100, recognizing the companies shaping the future of infrastructure and AI.
Congratulations to all the companies featured this year!
The Redpoint InfraRed 100 is now live.
These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on.
Congratulations to this year's honorees!
Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B
Today we’re publishing the technical report behind Laguna M.1 and Laguna XS.2.
This report opens up more of what went into them: Model Factory, pre-training data, distributed training, post-training, agent RL, quantization, and evaluation.
https://t.co/RWk2F9IrAI
Laguna M.1 and XS.2 now support 256K context.
Laguna M.1 is now live with a 256K context window on the Poolside API and OpenRouter.
With this update, it reaches 45.8% on Terminal-Bench 2.0, improving long-horizon performance.
Laguna XS.2 is also moving to 256K today, with the updated config already available on Hugging Face.
Both models remain free to use.
Over 1T tokens have been processed since launch 4 weeks ago. Excited to see what people build with the longer context window.
6/ Evaluation
Agentic evals are sensitive to the full execution setup: harness, sandbox, task image, verifier, dependencies, and trajectory behavior.
For the four agentic benchmarks reported, we used patched task images where needed to reduce benchmark artifacts, including infrastructure drift and known reward-hacking vectors like leaked git history.
We also ran post-hoc reward-hack checks on our Laguna evaluation trajectories.
The goal was to make the reported numbers reflect agent behavior more than benchmark artifacts.
6/ Quantization
XS.2 ships in FP8, INT4, and NVFP4, but the useful lesson was not just which recipe worked.
It was where degradation showed up.
Some intermediate quantization attempts looked acceptable on traditional single-turn evals, but showed clear degradation on agentic coding benchmarks.
For agent models, long trajectories, strict formatting, tool use, and verifier feedback expose failures that single-turn evals can miss.