if your agent browses the web, assume every page is trying to talk to it. we treat everything on a page as data. the agent reads it. it never follows it. that one rule killed most of our weird production failures.
today openai disproved 80-year-old math. nvidia posted $75B in one quarter. i spent my morning on why our agent mis-parsed a customer's date format. small company life. the date parser is the whole world. the customer noticed the date parser. nobody mentioned erdos.
if you're training a model and your eval scores are stuck, look at your data before your architecture. we spent months scaling model size for tiny gains. then we 20x'd our synthetic task data, curated it hard, and the scores finally moved.
@Sirupsen <$1M raised to a $100M run-rate, profitable. infra that actually works just spreads on its own. half our agent stack got picked exactly this way, someone shipped something good and we found it. wild result. congrats.
@milesdeutscher courses are a solid start. honestly though, the thing that taught us how claude actually behaves was shipping with it for months and watching where it broke. you can't really certify that part. you just have to log the failures.
@DeryaTR_ the low cost tracks with how it was trained. composer 2.5 saw a huge volume of synthetic coding tasks in training. train on the workload you'll actually run and you can cut cost without losing much. we get the same result fine-tuning our own models.
@bcherny per-component token breakdown is what i've wanted for a year. agents burn tokens in places you can't see. usually a chatty sub-agent, or a tool dumping 10x the context the model actually needed. finally being able to see which one.
openai's reasoning model just disproved a math conjecture from 1946. people believed it for 80 years. turns out it was wrong. and it wasn't some math-specialized system. it's the same general reasoner we point agents at.
@OwenGregorian the model didn't grind the geometry harder. it reached into algebraic number theory, a different field entirely, to get there. didn't expect a model to make that kind of cross-domain jump on its own yet.
80-year-old geometry puzzle cracked by OpenAI using number theory | Aamir Khollam, Interesting Engineering
A longstanding Paul Erdลs conjecture has fallen after OpenAI connected the problem with deep algebraic number theory.
For nearly 80 years, mathematicians believed they understood the limits of a famous geometry puzzle first posed by legendary Hungarian mathematician Paul Erdลs. Now, an AI model developed by OpenAI has overturned that assumption and solved one of the fieldโs most stubborn open problems.
The breakthrough centers on the โunit distance problem,โ a deceptively simple question that asks how many pairs of points can sit exactly one unit apart on a flat plane. Despite its simplicity, the problem has challenged mathematicians since 1946 and became one of the best-known questions in combinatorial geometry.
Decades-old puzzle
Imagine placing dots on a sheet of paper. The challenge is to arrange those dots so that as many pairs as possible sit exactly one unit apart. For decades, mathematicians believed square-grid patterns offered the best possible solution.
Erdลs himself proposed that the number of unit-distance pairs could only grow slightly faster than linearly as more points were added. Researchers spent generations trying to prove or disprove that theory. The new AI-generated proof changes that picture entirely.
According to OpenAI, the model discovered an infinite family of point arrangements that produce significantly more unit-distance pairs than the classic square-grid approach. Princeton mathematician Will Sawin later refined the result and showed the improvement could be expressed with a fixed exponent.
What surprised researchers most was the method behind the proof. Instead of relying on traditional geometry tricks, the AI connected the problem to algebraic number theory, a deep branch of mathematics that studies number systems extending ordinary integers. The proof used advanced concepts such as infinite class field towers and Golod-Shafarevich theory, tools rarely associated with geometric puzzles.
In simple terms, the AI found a way to use hidden symmetries inside exotic number systems to create many more one-unit distances between points. That connection stunned experts.
Mathematicians take notice
The proof underwent external review by mathematicians who also produced a companion paper explaining the argument and its broader importance. Fields Medal winner Tim Gowers called the achievement โa milestone in AI mathematics.โ Number theorist Arul Shankar said the work shows AI systems can move beyond assisting mathematicians and begin generating genuinely original ideas.
Researchers also noted that the result may influence other geometry problems long thought unrelated to number theory.
Thomas Bloom, one of the mathematicians involved in the companion work, said the discovery suggests deep number theory may hold answers to several unsolved questions in discrete geometry. He added that many mathematicians will likely revisit older problems using these newly revealed connections.
The result also highlights how rapidly AI reasoning systems are evolving. Unlike specialized theorem-proving software, OpenAI said this proof came from a general-purpose reasoning model. Engineers did not specifically train it on the unit distance problem or build dedicated search tools for this task.
That detail matters because it hints at broader scientific applications. Researchers believe systems capable of managing long chains of reasoning could eventually assist in fields such as physics, biology, engineering, and medicine.
For now, the unit distance breakthrough stands as a landmark moment. A problem that resisted human effort for nearly eight decades fell to an AI system that approached geometry from an entirely unexpected direction.
https://t.co/bEl97t8rfF
@Prathkum the prod-db one we learned the expensive way. our agents get no standing database access now, just scoped per-action permissions that expire. give an agent permanent creds and it'll eventually do something confident and very wrong.
@RoundtableSpace 1,400 MCP tools sounds great. honestly having the tools was never our bottleneck. the agent reliably picking the right one mid-task was. does the 34 min cover tool selection?
@mark_k@OpenAI it's not 'over,' but the result is underrated for a different reason. it was a general reasoner, not a math-specialist. the same kind we build agents on. a better general core lifts everything on top of it.
it's over.
@OpenAI just had an internal model disprove a long-standing conjecture in discrete geometry, related to the Erdลs unit distance problem.
The basic question sounds almost innocent: Put n points on a plane. How many pairs can be exactly distance 1 apart?
For decades, the usual grid-like constructions looked basically unbeatable. Turns out they weren't.
The model found an infinite family of counterexamples, with a polynomial improvement. External mathematicians checked the proof.
This is the kind of thing that should make people pause. Not because AI wrote some clever text about math.
Because it actually found new math.
@deno_land no shared credentials is exactly our setup. our agent never holds prod creds, it requests scoped access per action. a firewall enforcing that on the wire is the version i wish we'd built first.
anthropic's first profitable quarter is basically here. ~$186M/month in operating profit. they're also paying spacex $1.25B/month for compute. the compute bill is almost 7x the profit. and everyone's calling this the good news.
@efipm the orgs rolling back agents the most are the ones who can see their agents failing. that's what mature guardrails do, they surface the problems. the teams to worry about are the ones shipping agents with no rollback at all.
@sundarpichai@GeminiApp running 24/7 is the easy part. the hard part is an unattended agent knowing when to stop and check in with you. get that wrong and autonomy turns into a liability fast. get it right and nobody notices it's even running.
@Av1dlive true for most agents, not just voice. we've swapped a smarter model for a faster one more than once. an agent that answers in 400ms and is 95% right beats one that's 99% right in 3 seconds. users feel latency before they feel intelligence.