Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia

Joined May 2020

26 Following

72 Followers

1.1K Posts

Pinned Tweet

Devashish Upadhyay @devashishup

about 2 months ago

@AnthropicAI What does the validation layer look like when Glasswing flags a critical vuln? We saw with 70+ AI agents that the confidence in a finding matters as much as the finding itself.

20K

Devashish Upadhyay @devashishup

17 days ago

@AishwaryaDevv They can see just usage metrics. Not who did what.

Devashish Upadhyay @devashishup

about 2 months ago

@thsottiaux the sweet spot isnt the $100. its whether you can trust @OpenAI Codex output in production. we built 70+ agents, learned that shipping more code faster without better testing just moves the failure point downstream.

164

Devashish Upadhyay @devashishup

about 2 months ago

@OpenAIDevs 5x more Codex = 5x more AI-generated code shipping to prod. built 70+ agents, only 7 reached production safely. @OpenAI is scaling the output side fast. nobody's scaling the testing side. those two things have to meet eventually.

Who to follow

Blessed Father - stay learnin 🧠 2 stay earnin 💰 ♻️ Sales/Closer 🤝🏼; Investor 📈 CEO of Self Choices & Outcome Influencer

Laura Silva

@Laurams2008

🇵🇹 Football ❤️🦅

Devashish Upadhyay @devashishup

about 2 months ago

@kimmonismus 5x rates for long high-effort sessions is fair. the real question nobody's asking: at that usage level, are you testing your @OpenAI Codex agents the same way you'd test production code? most teams aren't.

561

Devashish Upadhyay @devashishup

about 2 months ago

@zerohedge @OpenAI chasing $100B in ads while @AnthropicAI builds the enterprise compliance stack. one of these revenue models means AI can actually be deployed in regulated industries. the other means more banner ads.

126

Devashish Upadhyay @devashishup

about 2 months ago

@claudeai spent 2 years watching enterprise AI agents fail audit. the two questions that kill every deployment: who approved this agent to run, and what did it actually touch. @AnthropicAI just answered both with RBAC + expanded OpenTelemetry.

Devashish Upadhyay @devashishup

about 2 months ago

@CoinMarketCap the hard part isn't agents making payments -- it's what happens when they make the wrong one. @Visa is smart to build this but who's testing these flows before prod? compliance on autonomous purchases is still basically unsolved

Devashish Upadhyay @devashishup

about 2 months ago

@mronge curious what you do when the agent goes off-script and you're not watching -- how do you catch state drift remotely? at Ziplo we built around exactly this gap

Devashish Upadhyay @devashishup

about 2 months ago

@jbulltard1 the valuation gap makes sense once you realize @OpenAI is betting on consumer. @AnthropicAI is the enterprise infra play. enterprise deals close slower but don't churn. 5yr from now that delta flips

Devashish Upadhyay @devashishup

about 2 months ago

@GithubProjects curious how skill validation works across different agent runtimes. one bug in a shared skill = every agent using it fails. portable power, until it quietly misbehaves at 3am @github

122

Devashish Upadhyay @devashishup

about 2 months ago

@shimabu_it orchestration maybe. testing doesn't. @AnthropicAI managing your agent's infra doesn't mean it behaves correctly. built 70+ agents - the failures were never hosting, always "did it do the right thing in prod"

490

Devashish Upadhyay @devashishup

about 2 months ago

@oikon48 managed infra is a win. but who validates the agent's behavior once @AnthropicAI owns the plumbing? built 70+ agents, only 7 reached prod. hosting was never the failure mode

131

Devashish Upadhyay @devashishup

about 2 months ago

@tammireddy exactly. we saw this with 3 pilots. teams spent months on LLM selection. the agent broke in week 2 because nobody mapped the exception paths first. tool was fine. tribal knowledge wasnt.

Devashish Upadhyay @devashishup

about 2 months ago

VCs poured $242B into AI in Q1 2026. 80% of all global venture funding. yet most teams shipping agents can't tell you which ones will survive prod. we launched 70+ at a financial firm. 7 made it.

Devashish Upadhyay @devashishup

about 2 months ago

@Ben100__ classic moat strategy. ban the adapter layer, then launch your own. curious how the @AnthropicAI managed agent handles edge cases that only surface in your specific codebase. that's the part that never generalizes.

183

Devashish Upadhyay @devashishup

about 2 months ago

@vision_ia managed infra is the easy part. the @ycombinator cohort that dies is the one that thought orchestration == product. the gap @AnthropicAI can't fill: testing whether the agent does what you need it to do in your specific context.

241

Devashish Upadhyay @devashishup

about 2 months ago

@APompliano built 70+ of them. the scary part isn't that they're smart - it's that when they're wrong, they're confidently wrong. 7 made it to prod safely. the other 63 failed in ways we almost missed.

150

Devashish Upadhyay @devashishup

about 2 months ago

@amritwt curious what happens with agents running @OpenAI Codex long-term in production. one-off review bias is one thing, but cascading tool calls that inherit that preference? built 70+ agents and saw exactly how style drift compounds.

980

Devashish Upadhyay @devashishup

about 2 months ago

@nwilliams030 the model can't access your data. the real risk is AI agents built on top of it with no guardrails. @AnthropicAI's Mythos is scary-good but 7/10 builders skip validation tests. that's the actual 10.

Devashish Upadhyay @devashishup

about 2 months ago

@claudeai prototype to launch in days is the dream. what kills production is the boring stuff -- auth failures at 3am, rate limits mid-task, hallucinated tool calls. @AnthropicAI handles infra. test before you ship.

653

Devashish Upadhyay

@devashishup

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users