The next era of AI engineering is self-improving agentic systems! Really excited to share what we are building at NeoSigma!
Self-maintaining agent systems represent a shift in how we build and operate software. We, at NeoSigma are building the infrastructure to support this feedback loop in real-world systems, helping teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior.
We @neosigmaai@RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior.
We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).
"I really dont think there is gonna be a single product that does all of code and software engineering. There are just so many experiences to serve" - @ScottWu46
there is so much to do together with all the experiences these great companies are providing
People often say: NeoSigma is such a great name (and logo).
so here’s my attempt at explaining it:
Neo means new.
Sigma (Σ) is the greek mathematical symbol for summation, a core abstraction behind any optimization (or loss) function.
on a philosophical level, NeoSigma represents a new paradigm of self-optimizing software systems. @neosigmaai
ps: yes, @ritvikkapila and I designed the logo ourselves :)
We @NeoSigmaAI recently presented our mission at @agihouse_org at the Agent Harness Build Day!
We are building the future of self-improving agentic systems! Come build with us: https://t.co/MrMpwUfQWS
Self-improving agentic systems are the future!
We presented our work at @agihouse_org as folks built on top of our auto-harness repo - https://t.co/RHUZsk5i4l.
The builder energy in the room was captivating. Come join us: https://t.co/0AUoGi0Kg6
Let your agents self-improve in production!
Recently presented our mission at @agihouse_org at the Agent Harness Build Day!
It was amazing to see the energy in the room and developers build on top of our auto-harness repo - https://t.co/koPJH0US3K.
Come build the future of agentic software with us: https://t.co/FPDTXzUUw6
If you're building agents right now, you already know the hard part isn't capability, it's reliability. Agent harness is being called the defining infrastructure problem of 2026, and right now fewer than 15% of organizations have agents actually running in production. On April 18 we're bringing together the people closing that gap.
Speakers include Zayd Enam (founder of Cresta AI & Enam Co) and Ivan Burazin, CEO of Daytona AI — plus researchers, founders, and engineers deploying agents at scale.
Happy to have @PingCAP and @daytonaio on board, as well as @gauri__gupta, @maxvwolff, @jelares, @rockyrmit, @zaydenam, @rolandgvc, @RitvikKapila, @JIACHENLIU8, @ninametamind, @AlexaOrent
Register now → https://t.co/s9ZxbqDpE6
Come work with us @NeoSigmaAI. We are a product-driven research lab pushing the frontier of agentic systems and redefining the interface between humans and AI.
Apply here https://t.co/AExbQksORc
Hiring a cracked software engineer for backend development to come work with us @NeoSigmaAI. If you’ve built and shipped real systems and like building things ground up, we would love to talk.
Please apply with things you’ve built, your contributions, and any details about your previous work and experience.
Currently, only looking for in-person roles based in SF.
Everything is going bananas right now and it is all the same pattern
Everything is a Ralph tool
Anything that you need to do, ask yourself: can an agent draft it, can an agent review it, and can an agent improve the process?
If the answer is yes, then loop
@FStrongpaw@gauri__gupta Point your coding agent at the repo (or a fork to define your own benchmark) and run: Read PROGRAM.md and start the optimization loop. That's it, the agent reads failures, improves your harness, gates every change, and repeats. https://t.co/tO9RUAwPxF
Meta-Harness is a great example of how powerful automated harness optimization can be when you have the eval set. s/o to @yoonholeee!
We purpose-built NeoSigma for production systems, where engineers maintain these evals, and we're helping them. We start with no eval set, building it automatically by mining and clustering live failures, then permanently encoding every fix as a regression constraint. This is what ensures that fixing bug 2 doesn't silently re-introduce bug 1. The loop never terminates; it compounds.
@owengretzinger@gauri__gupta The open-source version is intentionally simplified to get people started with the loop. We're building the failure mining, clustering, and multi-candidate search parts for production systems. Drop us a note at
https://t.co/hwL3g6fjO2
Releasing auto-harness: an open source library for our self improving agentic systems with auto-evals. We got a lot of responses from people wanting to try the self-improving loop on their own agent. So we open-sourced our setup.
Connect your agent and let it cook over the weekend! brrrrrrr!