Announcing Disaster Recovery Testing at Gremlin 🚀
Do you know how your system will respond when major outages strike?
✅ Verify resilience
✅ Validate DR/Business Continuity plans
✅ Prove regulation compliance
Learn more: https://t.co/kSfm8BBrpn
Companies across industries are building agentic pipelines to ship features faster than ever. 🚀
But not without risk.
Reliability guardrails ensure your org can take advantage of this new velocity while ensuring systems remain resilient and reliable:
https://t.co/7pb0swRbne
Join Gremlin on 6/30 as we explore:
➡️ Where current disaster recovery verifications fail
➡️ How to break down the most common failures into individual tests
➡️ And how your team can simulate disaster scenarios organization-wide:
https://t.co/B2pQuJpOkh
“Does your chaos engineering tool integrate with your AI tooling?" wasn't a question anyone asked in 2020.
It's now one of 15 you should be asking before onboarding any new tooling.
Get the 2026 guide here:
https://t.co/5VHSMEp7qh
Announcing no-code application fault injection 🎉
Now you can prove the reliability of your serverless applications without modifying a single line of code.
Read more⬇️
https://t.co/QRhHzZlghk
“There's this pie chart of everything that can go wrong. And only like half or two thirds of it lives in staging. You're never gonna find a set of failures until you test in production.”
— @KoltonAndrus
Gremlin is used by some of the leading retailers in the world across industries, including beauty, apparel, and more. These testing best practices have helped them build reliable, resilient POS systems that customers can count on.
Learn more: https://t.co/plnIMqfc19
As Chaos Engineering adoption increased, we found organizations running into the same hurdles when they tried to scale.
The only way an org can improve reliability at scale is to build on standards, validation testing, and reporting.
Read more: https://t.co/c7nAutkStj
A surprising number of organizations have never tested a full regional failover.
Coordinating a safe, controlled test across teams & services is hard.
Make testing reliability actionable with Gremlin's Disaster Recovery Testing: https://t.co/FVpRyW85n3
AI-driven development = faster releases = more reliability risk.
Your chaos engineering tool needs to keep up — automated testing in production, on a schedule, tracking reliability over time.
Here’s what to look for in 2026: https://t.co/r3IwhvGVWs
💡 “ When I get to hear stories like, ‘Hey, we just had our holiday sales event kick off and everything went smoothly and I didn't have to wake up in the middle of the night.’
That is really the true definition of reliability.”
What’s the difference between programs that succeed and the ones that fade?
If you want to build an effective, long-lasting reliability program in your company, then make sure you start by asking these key questions: https://t.co/3VlAo4sd4E
5 minutes. That’s how much downtime some of the world’s largest enterprises will tolerate.
Discover what it takes to make a SaaS platform like Gremlin highly available, and how your organization can benefit from what we learned.
https://t.co/qZa7t5dKop
AI has massively accelerated code deployment...but not without risk.
That’s where reliability guardrails come in.
Reliability guardrails ensure your org can take advantage of this new velocity while ensuring systems remain resilient & reliable.
https://t.co/OExoCOIyUi
💡In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls discusses how companies will want to audit AI to keep it reliable... and what that means for your team.
After every major outage, the same questions show up in the postmortem:
Why didn’t we see this earlier?
Why didn’t failover work?
Why did recovery take so long?
In many cases, the answer is simple:
The scenario had never been tested before.
“It's the conversations around figuring out where that score has changed. It's very much a team activity because that's where all of those great questions come from.”
At Gremlin, we run reliability tests- and discuss them!- every week, so we can always keep improving. 🚀
Consumption-based pricing for chaos engineering sounds reasonable until you realize it actively discourages testing. 🛠️
Outages are inevitable, so make sure your system can proactively prepare, with help from our 2026 Chaos Engineering buyer's guide: https://t.co/LufYFHPfjI
Released as part of Reliability Intelligence, the Gremlin MCP Server allows you to bring your LLM of choice to explore your Gremlin data and find opportunities to get more out of Gremlin.
Learn more: https://t.co/6LbPvxDxYN
💡 “Reliability must be a crucial outcome for all of the architectures, and that will make the systems stable, that will make the business effective, and the customers would actually see their services as reliable.”
What resilient engineering teams do differently:
🛠️ Regularly test failure scenarios, not just discuss them
🛠️ Validate recovery processes before incidents occur
🛠️ Treat reliability as an ongoing engineering practice
Incidents become learning opportunities, not surprises.