Most production incidents trace back to something that changed recently. A deploy. A config update. A merged PR.
Steadwing connects to your GitHub or GitLab repos and automatically correlates incidents with recent commits, deployments and releases. When something breaks, the RCA shows you which specific change caused it down to the exact commit and code diff.
No more asking "did anyone deploy something?" in Slack. No more manually scanning CI/CD pipelines. The correlation happens automatically as part of the investigation.
The AI for incident response space went through two very different eras.
AIOps was about anomaly detection. Structured metrics. The output was a correlation score and an alert. No diagnosis. No code understanding. Remediation was mostly "roll back".
LLM-powered RCA is a completely different approach. It reads logs, metrics, traces, code diffs, configs, and deploy history. It generates a natural language root cause analysis with an evidence chain you can actually verify. It references past incidents. It suggests fixes on code level, infra and CI/CD.
The shift isn't just better models. It's a fundamentally different architecture from pattern-matching on metrics to actually understanding what your system is doing and why it broke.
That's the approach we took with Steadwing to show evidence-based RCA with fixes you can verify and immediately ship.
“On-call engineers shouldn’t exist.”
@Abejith (ex-Freshworks) said it. @khant_dev (ex-Mem0) had lived it.
Weeks later: live with real teams.
Demo: a bug a team couldn’t triangulate for years → root cause in seconds.
Most RCA tools fail.
Watch out for @steadwing at @join_ef Demo Day 👀
30 alerts in 2 minutes. Your on-call engineer opens Slack and gets overwhelmed.
It's rarely 30 problems, usually one issue causing a chain reaction.
They end up triaging each alert & 20 minutes are gone.
We group alerts into 1 incident & give root cause for it.
We’re excited to share @steadwing at Entrepreneurs First @join_ef Demo Day on April 29, 2026.
It’s a special milestone for the team, and another step toward the future we believe in.
Let’s make production software self-healing!
As engineers, we know that the hardest part of an incident isn't figuring out how to fix it. It's putting together the context from all the tools and coming up with the reason why it happened.
Those first critical minutes are spent jumping across logs, metrics, traces, and recent changes and just trying to understand what’s going on.
And that's why we are building @steadwing!
It takes info from your logs, metrics, traces and codebase, connects the dots across your whole stack, and gives you a full root cause analysis with automatable fixes on code level, around deployment, and infra.
The investigation is over by the time your on-call person opens the laptop.
we rewrote our entire memory system 3 times before it stopped embarrassing us in benchmarks
92.4% on LongMemEval. #1 across every LoCoMo category.
here's the mass deletion of working code arc nobody asked for 🧵