AI agents don't succeed on model quality alone. They succeed when they're grounded in the context of the business: data, workflows, policies, and shared knowledge.
Join me and @amrcn_werewolf at #MSBuild tomorrow to see how Microsoft IQ turns enterprise context into action—helping agents reason, decide, and deliver real outcomes.
https://t.co/lNspDcU1ML
#AI #Agents
My paper with @ReshabhSharma01 and @shraddha_96, “Willful Disobedience: Automatically Detecting Failures in Agentic Traces”, will appear this month in the First ACM Conference on AI and Agentic Systems (CAIS 2026). https://t.co/xQMO55b7q6
Continuing the topic of reviewing -- I am on the PC for Onward! this year at Splash (@splashcon).
I always enjoy this track -- if you have some big ideas that just haven't come together or interesting thoughts to share it is a great place!
Well I had a lot of fun reviewing for the ACM CAIS Conference @CAISconf (https://t.co/7U2CVh6BpC). I read some really nice papers and think the program looks great.
If you are interested in Agents (who isn't) and want to hear/discuss the leading edge of the field this is a great event to attend!
The Static Analysis Symposium (SAS) is taking place as a part of SPLASH/ISSTA this year! Consider submitting a paper, deadline *May 1st*, with special topics including Static Analysis and AI, and Static Analysis and Education:
https://t.co/R8CQTHHr3F
Just posted a new blog entry on how Bosque does this:
https://t.co/ybhqpCrtID
And you can also check out an arXiv paper with more detail:
https://t.co/3n00BGlc27
GC pause times, starvation freedom, or low overhead -- up until now existing collectors allowed you to pick 1 (or maybe 2). This is not just an engineering issue, but for languages like Java/C#/JavaScript, it is theoretically impossible to simultaneously satisfy all three!
Intriguingly many of the same features that make Bosque amenable to formal analysis can also be used to (simplify) and optimize the language runtime.
It turns out that they allow us to get around this theoretical limitation and build a GC that optimizes for all 3 (and more) objectives!
Ironically, hypermedia (HATEOAS) has accidentally become a plausible API design scheme again. LLMs will robustly follow API links just like its designers hoped.
Feedback loops are shaped as Counter-example Guided Inductive Synthesis.
Two dueling players: coding agent proposes, oracle finds a counterexample, repeat until specification and implementation converge.
@DominikTornow iirc, the keynote at SPLASH 24 covered just such an example — Unicode processing implemented in Dafny and compiled to Java
https://t.co/71iG3hWCk1
AI is writing a growing share of the world's software. No one is formally verifying any of it.
New essay: "When AI Writes the World's Software, Who Verifies It?"
https://t.co/8zjS9FkdA8
An interesting side question is how this model impacts the value of frontier models and data-paywalls.
As we have more powerful platforms and harnesses that are robust to models making mistakes, does this allow us to optimize more for lower token costs? Similarly, does a stable ecosystem for exposing and integrating APIs and data sources dis-intermediate platforms and providers? Definitely potential for disruption!
I just put up an arXiv paper (https://t.co/KSXxVFg8FY) with our roadmap for building an "Agentic Infused Software Ecosystem". The key insight is that agentic workflows -- with code generation or freeform -- depend on the agent itself, the software platform it is built on, and the environment/runtime that the agent interacts with.
Building a high-reliability agent, multiple-9's success rate and strong safety guarantees, requires us to build all thee of these parts together with as a cohesive system. Our current push is to take the smaller scale experiments over the past few years and integreate them into this cohesive system!