The @symbolica research team has achieved 36% on ARC-AGI-3, the newest, hardest agentic AI benchmark. By allowing agents to synthesize programs, perform dynamic code execution, and modify their own behavior we are able to drastically enhance the capabilities of base models relative to baseline. And our system performs comparably to humans as well at only ~1.68x the action budget of your average player.
Today we are launching Agentica: https://t.co/ajXJ9cAdHW
Most other agent systems can write and run short code snippets but they can't do long running tasks that involve code execution. Additionally, other systems struggle with tool usage by requiring bespoke MCP integrations or manual workflow configuration.
This changes today. We built the first MCP-free dynamic coding agent that can interact with any tool or API you can throw at it with safety and correctness guarantees. It can perform tasks successfully over multi-hour long time horizons thanks to the reliability of the @agenticasdk at its core. Having Agentica complete complex tasks for you that involve structured data, using the web, etc feels like magic. I don't think any other tool today even comes close.
Agentica is currently in beta so expect a few bugs here and there. Feedback is very welcome. Please feel free to DM me bug reports or feature requests! We can't wait to see what you build.
Agentica Beta is live: chat to build long-running AI agents.
Describe a task in plain English, connect your tools, and deploy an agent that keeps working in the background.
This demo shows an agent tracking new San Francisco rental listings, as they appear. Just one example - you can build anything.
Many people have asked us: what changes when an agent has access to a persistent Python runtime?
We ran a side-by-side comparison to demonstrate:
Agentica's Python REPL-based agent vs traditional tool calling agents
Full breakdown below 👇
Reasoning. Persistent state. Recursion.
In our ARC-AGI implementation, agents autonomously decide when and what state to pass into a sub-agent’s REPL, allowing them to focus on analysing training examples, test inputs or both.
Check out the logs https://t.co/AjYSfludwq.
Victoria Klein (@its_hapenin) from our team will be talking at Dust's Engineering Night London 🇬🇧 on Feb 23.
She’ll be talking about what we're building with the @agenticasdk
Register Here: https://t.co/22GHU9cmHm
London !
We’ll be at Dust's Engineering Night on Feb 23.
Vic (@its_hapenin) from our team will speak.
Agents will run.
Register here: https://t.co/gnHokJDC5q
Most agent frameworks share state, tools, and execution paths.
That’s fine. Until one agent errors and everything downstream collapses.
Isolation, typed outputs, and scoped capabilities aren’t ‘nice to haves’.
They’re survival traits.
#AgenticAI
A query comes in: “Backtest a TSLA strategy.”
Agents spring into action. They fetch data, run calculations, generate charts - all working together seamlessly.
Agentica’s types, composition, and dynamic code execution let them pass tasks and results like a single mind, orchestrating complex workflows in real time.
#MultiAgentSystems #AgenticAI #AutonomousAgents #IntelligentAgents #MultiAgentAI #ArtificialIntelligence #AICommunity
Great guide from Google Cloud on building multi-agent systems →
Step #3 hits the nail on the head: tools, state, and reliability are what turn agent demos into real systems.
Laying out workflows is easy… keeping multi-agent systems reliable in production is the hard part.
This is why we had to build Agentica.
What’s the hardest step for your team when shipping agents to production?
4 steps for startups to build multi-agent systems:
#1 - Build your foundation
#2 - Build out the engine
#3 - Tools, state, and reliability
#4 - Go from Localhost to a scalable deployed product
Check out the full technical guide, here → https://t.co/V9LV2f9tzj
@shaka_cx@googlecloud Couldn’t agree more . State and recovery are where the real engineering challenge lies.
Turning agents from “toy demos” into production-ready systems is what we focus on every day.
@bundelibhau@googlecloud Exactly, this is where most “fun experiments” hit the wall.
That’s exactly why we had to build Agentica. To make agents typed, observable, and resilient in production.