Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
This tiny sea slug is called the leaf sheep, and it can steal energy from the sun like a plant
After eating algae, it keeps part of their energy-producing cells and uses sunlight to help feed itself
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
Ants can build living bridges to cross rivers
Thousands of ants link their bodies together and continuously adjust the structure to help the colony reach the other side safely
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow
Frontier models are still stuck at ~54% on ARC-AGI-2 (GPT-5.2 Pro). With heavy refinements, the best versions are reaching ~73% (GPT-5.2 Refine). Humans: 100%. Scaling helps, but genuine generalization remains elusive. Who’s going to be the first to break 5%?
@arc
@UnitFlow