🎪 I'm extremely excited to announce the release of version 0.3 of Barnum! Barnum is now a programming language for asynchronous and parallel computation whose goal is to make it extremely easy for you to orchestrate your agents!
So why use a programming language for this? Why not just use plan mode/a markdown file for the complicated cases?
Well, LLMs are incredibly powerful tools, but they certainly aren't reliable. If an LLM is in charge, you risk it changing its mind and implementing something else, or disabling unit tests and generally cutting corners. (Very relatable, to be honest.) And furthermore, it is hard to accurately express complicated workflows with loops and conditionals in prose.
The answer is to use a workflow engine. Barnum is a workflow engine masquerading as a programming language. When a workflow engine is in charge, your LLMs can't wriggle out of requirements, and it's easier to accurately describe the actual, complicated workflow. And it's this increase in reliability that allows you to build bigger, more impactful agentic workflows.
Already, I've used Barnum to ship hundreds of PRs. Other folks have used it to push forward on automated migrations, remove dead code, implement a RAG search pipeline, and validate all of the statements in publicly facing documentation.
I hope you give it a try!
pnpm install @barnum/barnum
But read on for more cool details...
Introducing Barnum, or... how I ship hundreds of PRs per week, burn through backlogs, and automatically fact-check documentation.
LLMs are incredibly powerful tools. But when we try to use them to drive more complicated refactors or more intricate workflows, their shortcomings are quickly revealed. When their context gets full, they get forgetful, and they can't be relied upon to necessarily do the steps that you ask. They often cut corners.
Put simply, having an inherently probabilistic process perform what should be deterministic work necessarily comes at the cost of reliability. And you can't build a complicated workflow off of unreliable foundations.
That's where Barnum comes in. Barnum is the missing workflow engine for agents. Rather than having agents be responsible for upholding guarantees (e.g., always lint and commit your changes atomically), agents instead do just what they're good at: reading text and reasoning. Everything else is done deterministically, on the outside, by Barnum.
This means that you can build bigger, more involved workflows without sacrificing reliability. Because you can intersperse bash scripts, you save on token usage. The agents performing a micro-task only receive the instructions for that specific task, meaning that context does not get overwhelmed and they don't get forgetful. And because all inputs, outputs, and transitions are validated, the agents can't wriggle out of doing the work.
This workflow is essentially a state machine described in a config file. And the best part? The configuration has a JSON schema, so agents are actually really good at writing the workflow!
It's already been used to ship hundreds of PRs, run automated refactors, burn through various backlogs, fact-check every statement in documentation, and build a deep-research clone!
The attached image is a representation of the workflow that I use to identify and implement automated refactors. I follow this up with a separate workflow that splits each commit into a separate PR, judges the refactor, and potentially completes the refactoring (for example, by modifying call sites if the refactor changed some public API).
So go on, give it a try. Check out https://t.co/hjtprWm4NS, star the repository, and join the Discord! I can't wait to see what you build with it! And I'd love for you to get involved!
🏥 🇫🇷 Француз Бенуа Ришо стал мемным героем Олимпиады, когда выяснилось, что на Играх он тренирует сразу 16 фигуристов из 13 стран
Бенуа по долгу работы должен появляться в кадре со спортсменами, поэтому ему часто приходится переодеваться в форму той страны, что и у его подопечного 🤔