A new Database product based on @ApacheDataFusio was announced today from @LangChain -- focused on agent observability. It is really neat to see how people are building (very) customized data + query systems faster than ever now that they don't have to build the whole stack
Excited to see SmithDB announcement at Interrupt, our purpose-built distributed database for agent observability! SmithDB is built on top object storage, written in Rust and leverages Apache DataFusion and the Vortex file format.
Just announced at Interrupt! SmithDB.
Agent traces have outgrown the databases built to hold them.
That’s why we built SmithDB, a purpose-built distributed database for agent observability.
Read the announcement from Co-Founder @ankush_gola11 → https://t.co/mu1zvuujwt
we're on an Open Model mission to help builders create world class agents >20x cheaper than what they have today
a couple things have become evident recently:
1. The age of the token subsidy is being pulled back
2. Open Models have crossed an intelligence threshold making them viable for real world agents at a fraction of the cost
As teams get exponentially larger monthly bills from the labs, it's worth exploring how many agents today perform just as well using Open Models
Check out the numbers on external evals + try it yourself by dogfooding and running on internal evals - @OpenRouter and @ArtificialAnlys have great leaderboards and breakdowns of what people are using. The time investment is definitely worth the massive cost savings
- Instead of Sonnet 4.6 (or even 5.5/Opus) try Kimi-2.6, GLM5.1, Deepseek v4 pro, etc
- Instead of Haiku try DeepSeekv4 Flash, Nemotron, etc
Open models require some tuning to make sure they work well in your harness for your task (another reason why open harnesses are important)
The closed models are excellent, there's no need to full-scale rip them out. Often the first use of Open Models is as subagents or using a closed frontier model as an Advisor to an open driver model
At LangChain we want to make it as easy as possible to build the best agents in the world as cheaply and quickly as possible. We're leaning into open models heavily across our products and libraries
try out an open model in deepagents in just a couple lines and come ride the open model, open harness future
🚀 Interrupt sold out last year, so don't wait: May 13-14 in San Francisco. https://t.co/XFDhS88YFP
Last year, hundreds of builders packed a room to share what's actually working in agent production. This year, we're going bigger.
@hwchase17, @AndrewYNg, @cj_mongodb, are headlining. Alongside them: real-world sessions from teams shipping agents today, time with LangChain engineers, and a pioneering AI builder community.
If you're working on agents or thinking about it, this is two days with the people furthest along.
🔗 Announcing LangChain OSS Skills
LangChain has the most popular frameworks for building AI agents — and now your coding agent can be an expert in it.
We're excited to release the first iteration of LangChain OSS Skills, giving your agent expertise in our open source frameworks. The skills include guidance on how to use langchain, langgraph, and deepagents to effectively build agents.
➡️ Install our OSS skills for your coding agent here: https://t.co/PoOgOeLpMQ
➡️ Read more: https://t.co/9spxPkyd7J
Building Better Coding Agent Harnesses
at @LangChain we're thinking hard about the science of harness engineering + open research on what works & doesn't
A quick peak on our deepagents X Terminal Bench 2.0 work, shoutout to @alexgshaw & Harbor (they're great). Broad research goals:
1. Find general purpose agent improvement recipes
2. Measure which design changes most affect model performance and how
3. Measure if/how models are non-fungible in their harness
Some previews on what worked well:
- Self-Verification & Iteration as first class citizens. Models are very good at self correction if they get a feedback signal, but they often won't participate in this loop. So designing prompts & deterministic hooks to force them into this helps a lot.
- Automated Context Engineering: Pre-fetching some environment context up front avoids discovery errors for tools/files.
- Large scale reflection over Traces is a powerful general recipe for stratifying errors + validating proposed improvements
We'll be releasing a blog and research artifacts soon on all of this! Will return to measure more vectors of harness design + use codex-5.3
If you're interested in effective harness engineering and building great coding agents, would love to hear from you
clawdbot really made the rounds on Twitter over the weekend with people posting about how it's booked flights for them, made dinner reservations, and more. But what are the failure modes?
Let's dig into execution traces to find out 🧵
Now in 𝚘𝚙𝚎𝚗𝚠𝚘𝚛𝚔 0.2! Organize your agents in a kanban view.
See all your threads at a glance- what's running, what's waiting for you, and what's done. Subagents show up as cards too so you can track parallel work.
Try it with 𝚗𝚙𝚡 𝚘𝚙𝚎𝚗𝚠𝚘𝚛𝚔@𝚕𝚊𝚝𝚎𝚜𝚝!
I vibecoded an open-source poker river solver over the holiday break. The code is 100% written by Codex, and I also made a version with Claude Code to compare.
Overall these tools allowed me to iterate much faster in a domain I know well. But I also felt I couldn't fully trust them. They'd make mistakes and encounter bugs, but rather than acknowledging it they'd often think it wasn't a big deal or, on occasion, just straight up try to gaslight me into thinking nothing is wrong.
In one memorable debugging session with Claude Code I asked it, as a sanity check, what the expected value would be of an "always fold" strategy when the player has $100 in the pot. It told me that according to its algorithm, the EV was -$93. When I pointed out how strange that was, hoping it would realize on its own that there's a bug, it reassured me that $93 was close to $100 so it was probably fine. (Once I prompted it to specifically consider blockers as a potential issue, it acknowledged that the algorithm indeed wasn't accounting for them properly.) Codex was not much better on this, and ran into its own set of (interestingly) distinct bugs and algorithmic mistakes that I had to carefully work through. Fortunately, I was able to work through these because I'm an expert on poker solvers, but I don't think there are many other people that could have succeeded at making this solver by using AI coding tools.
The most frustrating experience was making a GUI. After a dozen back-and-forths, neither Codex nor Claude Code were able to make the frontend I requested, though Claude Code's was at least prettier. I'm inexperienced at frontend, so perhaps what I was asking for simply wasn't possible, but if that was the case then I wish they would have *told* me it was difficult or impossible instead of repeatedly making broken implementations or things I didn't request. It highlighted to me how there's still a big difference between working with a human teammate and working with an AI.
After the initial implementations were complete and debugged, I asked Codex and Claude Code to create optimized C++ versions. On this, Codex did surprisingly well. Its C++ version was 6x faster than Claude Code's (even after multiple iterations of prompting for further optimizations). Codex's optimizations still weren't as good as what I could make, but then again I spent 6 years of PhD making poker bots. Overall, I thought Codex did an impressive job on this.
My final request was asking the AIs if they could come up with novel algorithms that could solve NLTH rivers even faster. Neither succeeded at this, which was not surprising. LLMs are getting better quickly, but developing novel algorithms for this sort of thing is a months-long research project for a human expert. LLMs aren't at that level yet.
Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months
https://t.co/HD9Zf85SG2
This year it's divided into 26 sections! This is the table of contents:
Here we go TigerStyle 🚀
Episode with @jorandirkgreef Founder & CEO, @TigerBeetleDB on TigerStyle is now available to watch.. https://t.co/J2n3gRVDVe
Please like, share and subscribe!
LangChain has raised a $125M Series B, valuing the company at $1.25B 🦜🔗
It's been a great ride so far, seeing LangChain in the last year grow from 1 to 3 products with traction and customers like Replit, Rippling, Clay, and Cloudflare using us to ship AI agents in production!
Tickets just dropped for Interrupt: The AI Agent Conference by LangChain 🚀 🌒
Join us this May 13-14 in San Francisco for Interrupt, LangChain’s first-ever conference — a space for anyone building or shaping the future of AI agents.
Learn more: https://t.co/vl9uNssvlK
hi infra ppl - some updates on https://t.co/dV7REdvfEg that @mabb0tt and i are teaching this quarter:
1. lots of people asked if there'll be a live stream. as of now, we don't have plans to do a live stream, but we'll record and post the lectures online (as long as each speaker is okay with that), ideally within a couple days of each lecture. right now, we only have 1 course assistant and 180+ students, so pls bear with us as we figure out how to handle ops
2. @mntruell is now our latest addition to the speaker lineup! and, his lecture will be up online - i'm sure people are going to love it. the @cursor_ai team is on the frontlines of ai application scaling, with tons of painfully earned valuable lessons for the infra community