Agentica Beta is live: chat to build long-running AI agents.
Describe a task in plain English, connect your tools, and deploy an agent that keeps working in the background.
This demo shows an agent tracking new San Francisco rental listings, as they appear. Just one example - you can build anything.
Agentica SDK achieved an unverified 36.08% on the 25 public ARC-AGI-3 games using the same harness we open-sourced a month ago. Average cost of $40 per game, and less than 2x the game actions taken by the second-best human players.
The ARC-AGI-3-Agents harness and the Agentica SDK, which is a general purpose framework that is very much not ARC-AGI-3 specific, are different. The ARC-AGI-3-Agents harness is built using the Agentica SDK and it is the ability to write tools and code dynamically that gives the harness the ability to solve these puzzles.
@concernedAIguy@FakePsyho Not sure. I don't think that AGI is here, and I think that new unlocks will be required to achieve full AGI. But this result certainly demonstrates that a lot can be unlocked in the base models by giving them access to dynamic program execution.
Yes and no. Our submission https://t.co/JNbM5EyOGG uses the Agentica SDK with tools to help the model navigate and interpret the ARC 3 puzzles. To a certain extent any model needs this, at the bare minimum to be able to interact with the game engine. However with Agentica the agents are free to build their own tools, design their own tests, and so on. This is where the definition of a harness becomes fuzzy, because there is no pre-prescribed structure at this point. It's precisely this dynamic capability that gives us such a huge advantage when solving these puzzles. You can experience this yourself by interacting with the raw SDK directly. https://t.co/uQDSRO9Ebx
Yes and no. Our submission https://t.co/JNbM5EyOGG uses the Agentica SDK with tools to help the model navigate and interpret the ARC 3 puzzles. To a certain extent any model needs this, at the bare minimum to be able to interact with the game engine. However with Agentica the agents are free to build their own tools, design their own tests, and so on. This is where the definition of a harness becomes fuzzy, because there is no pre-prescribed structure at this point. It's precisely this dynamic capability that gives us such a huge advantage when solving these puzzles. You can experience this yourself by interacting with the raw SDK directly. https://t.co/uQDSRO9Ebx
Agentica Beta is live: chat to build long-running AI agents.
Describe a task in plain English, connect your tools, and deploy an agent that keeps working in the background.
This demo shows an agent tracking new San Francisco rental listings, as they appear. Just one example - you can build anything.
Agentica is not an ARC-AGI-3 specific 'harness'. It is a general purpose symbolic exoskeleton. You can use it to do any task. Interact with it directly here https://t.co/eL3jTi3zXP
Agentica is not an ARC-AGI-3 specific 'harness'. It is a general purpose symbolic exoskeleton. You can use it to do any task. Interact with it directly here https://t.co/eL3jTi3zXP
Agentica is not an ARC-AGI-3 specific 'harness'. It is a general purpose symbolic exoskeleton. You can use it to do any task. Interact with it directly here https://t.co/eL3jTi3zXP
For us, that's at 36.08% at $1,005 total cost... vs 0.25% for $8,900 with Opus 4.6 directly.
We used the same harness from our previous open source ARC-2 solutions on ARC-3.
Read more: https://t.co/yZNUYw4um2
Code/Logs: https://t.co/hFogxNbIRr
The same SDK driving this score also powers https://t.co/6AqlLLCd0L, where you can build long-running agents in chat.
No sign-up, no code. Try it: https://t.co/6AqlLLCd0L