Today, I’m excited to formally announce @mirendil with my amazing co-founders Harsh Mehta, Shayan Salehian, and Tara Rezaei!
We’re fortunate to work with @a16z and @kleinerperkins, who led our seed round of $200M, followed by a major investment from NVIDIA, among others.
Mirendil exists to accelerate science and technology, and through them, to help solve humanity's most pressing problems.
Self-accelerating AI R&D is the most direct path to delivering on AI's broader promise, which is why we believe the most important application of AI is AI itself. Get this loop right, and it compounds. It fundamentally changes the rate of progress itself across all domains.
We believe this capability should be democratized. It should be used to power all scientific efforts trying to innovate at the frontier. There are far more important problems—and broader ones—than any single lab can take on, so more groups should be able to pursue them.
This pulls concentration of power away from a few labs: businesses and science labs can own their AI and infrastructure, keep their margins, and control their own destiny instead of ceding it all to a single AI lab.
We’re a small team with a singular focus. Our founding team consists of 20 researchers and engineers from frontier institutions including Anthropic, xAI, Google DeepMind, and OpenAI, united by a passion for science and a drive to build the technologies that move it faster. If you want to build the system that builds systems, join us!
@HarshMeh1a, @shayan_, @tararezaeikh
Big news for AI builders: NVIDIA dropped a quantized version of Qwen3.6 that fits in 35B parameters but runs like a 3B model. This FP4 MoE beast is a game changer for efficient inference.
Release day: Clapet est en ligne!
https://t.co/YzGBpQGd8j
J'ai construit le tool de System Design que j'aurais aimé avoir pour préparer mes entretiens.
Clapet te guide pas à pas pour construire une archi au lieu de te laisser seul devant un canvas vide a la Excalidraw.
Mistral OCR 4 turned a handwritten calculus exam into clean LaTeX!
We gave it a photo of a hand-written exam page. The model read the handwriting and rebuilt every formula into structured digital text
Output: Time: 5.1s · Cost: $0.09
Formulas came through exactly right - the hard part was nailed. The graph, unfortunately, it didn’t redraw. But that’s the telling part: most OCR tools just dump the text and quietly drop the figure. OCR 4 caught the plot, boxed it, and tagged it as a chart. It doesn’t get redrawn, but it gets read and accounted for
Happy to announce my new book, with Alex Townsend. It's about the math behind enormous systems (medical scans, AI, etc.) and what happens when they outgrow our ability to understand them. More info (and pre-order for 25% off) at:
https://t.co/lOXl2Mivkh @littlebrown@BNBuzz
1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5
We gave 3 models the same prompt and compared one-shot outputs.
The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.
Which output do you like best?
GGUF: https://t.co/BMkxswdj5N
context engineering docs for agentic engineering - plans, research, etc SHOULD NOT be stored in version control:
A good docs management system keeps them:
> outside your repo
> accesible to agent via FS tools
> discoverable by agent (even just maybe via sysprompt append)
> persisted / recoverable / archivable
> collaborative (shareable, commentable)
why keep them outside the core VCS repo?
1) they don't need merge semantics, just linear history is plenty in 99.9% of cases
2) if they are committed that means they can live on branches, get lost when you change branches, you have to remember where they were, etc etc
wdyt?
fantastic talk from @Mappletons from @github on the need for alignment and the fact that "one engineer running 12 claude terminals on their workstation" isn't the future.
https://t.co/vl1bm54HFQ
Introducing GLM 5.2 for autoresearch
GLM 5.2 is the first open weights model we've tried on our autoresearch pipeline that's proven capable for real research tasks.
With Fable 5's restrictions on research, having an open weights alternative is a huge win for open source
Watch it carry out fully async vs colocated sync RL training on Harbor code contests across two 8xH100 nodes on top of SkyRL. Resolves setup issues, tracks runs to completion, and produces a full comparison of throughput and reward stability
Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://t.co/hhO6qTawgb 🐡
An interesting new paper by my recent PhD graduate on how AI agents' greed for visible incentives can lead them to abandon their safety alignment.
You can read it here: https://t.co/y64uOBvSiC
recommended reading. i love this type of forensics. an internet friend thought me 8087 fpu programming in the 90ies. first time i had contact with a stack. good times.
https://t.co/IuNiFXQ0vr
Unlimited-OCR 🔥New OCR from @PaddlePaddle
It can parse hundreds of pages in a single pass while maintaining stable speed.
The key idea is R-SWA (Reference Sliding Window Attention), which keeps KV cache constant during decoding.
🏆 93% on OmniDocBench
📈 +6% over DeepSeek-OCR
At HumanLayer, we’re on a mission to solve the AI slop code problem.
In 2025 we open-sourced our Research, Plan, Implement framework, now deployed inside fortune 500s like Block and Uber - places where shipping slop is just not an option
And that was just the beginning.
Today, we’re opening access to HumanLayer - an Agentic IDE, collaboration platform, and building blocks for your software factory.
HumanLayer enables engineers solving hard problems in complex codebases to:
> move 2-3x faster across the entire SDLC (not just coding)
> maintain rigorous standards for system architecture and program design
Hundreds of engineers at companies of all sizes are already using HumanLayer to ship fast without sacrificing quality.
I'm excited to invite you to try humanlayer today at https://t.co/cQ648EkrnG, and I'm even more excited to see what you build.
@0xblacklight and I are deeply grateful to our team, our customers who give us so much incredible energy and feedback, our investors who have always been in our corner, and our friends and family who have supported us along this crazy journey
if you're a staff or principal engineer trying to make AI coding work at scale for your team, we'd love to hear from you
as @swyx likes to say - let's make this the year of no more slop
there's a new word i'm hearing a lot in the most frontier-pushingest coding-agent builders:
_program design_
for even the best agentic coders trying to maintain code quality, we've all seen it
- you come up with something to build
- you research the codebase, riff with the agent, align on what the end state looks like
- you (or the agent) breaks it down into tasks for individual agents / context windows
- you rip the implementation
- the code works or is close to working - and it follows your spec to the letter
but the code itself is still trash
- poorly factored methods
- leaky abstractions
- tramp data
- overloaded interfaces
- try catch, useEffect, global variables everywhere
I thought models would catch up, or that this wouldn't matter - that if we stayed in spec-land, understood the high-level architecture, and tokenmaxxed hard enough, we would be able to skip code review and just stay shipping
doesn't seem to be working out that way
I have seen agent-owned codebases spin up out of nothing...
...and I have seen them collapse into rubble within 6 months
now there's something to be said about "skate where the puck is going"...
...and I can't tell you what tomorrows models will be capable of
but I *can* tell you that *today*, models are mid-to-bad at program design
you can solve some of this with memory / agents.md, but the scope of program design is massive.
- entire companies have been built to help you implement it
- books, classes, and professions have spun up around it
are you building something to last? Or are you slinging more slop on the pile?
anyways, thats the post, stay tuned for a fun announce tomorrow y'all 🙂
🎉 Congrats to @poolsideai on Laguna M.1, a new open-weights agentic coding model. Day-0 support landed in vLLM v0.21.0.
🧠 70-layer sparse MoE: 225B total params, 23B active per token, 256K context
🔀 256 experts with top-k=16 routing, built for long-horizon agentic coding
🛠️ Native interleaved reasoning between tool calls, toggleable per request, Apache 2.0
Recipe 🔗 https://t.co/lDG8poco5g