Florenci Planas

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

27K

14M

florencip retweeted

Jasper Dekoninck @j_dekoninck

3 months ago

Last year, models miserably failed on USAMO 2025. This year, GPT-5.4 scores an amazing 95%, essentially saturating the benchmark. Yes, LLMs still make many mistakes, but overall, one can be nothing but amazed at what they are achieving and how steep progress in AI4Math is.

j_dekoninck's tweet photo. Last year, models miserably failed on USAMO 2025. This year, GPT-5.4 scores an amazing 95%, essentially saturating the benchmark.

Yes, LLMs still make many mistakes, but overall, one can be nothing but amazed at what they are achieving and how steep progress in AI4Math is. https://t.co/hKMaMohedt

587

109

74K

Who to follow

Escamot Garnatxa

@EscamotG

Vedder en una altra vida. Ara quartet. Descamisats indepes. Barcelonisme desacomplexat i amb els interessos que ens donen la puta gana. Birra o mort.

No ens rendim, i guanyarem!

florencip retweeted

Clad3815

@Clad3815

4 months ago

Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.

279

361

florencip retweeted

Noam Brown

@polynoamial

5 months ago

GPT-5.2 evals are finally out for METR and it's state-of-the-art. Here's the linear-scale plot. The 80% success-rate plot (below) is even more stark .

polynoamial's tweet photo. GPT-5.2 evals are finally out for METR and it's state-of-the-art. Here's the linear-scale plot. The 80% success-rate plot (below) is even more stark . https://t.co/OKE9K4fSEL

106

284

617K

florencip retweeted

Sam Altman

@sama

5 months ago

First, the good part of the Anthropic ads: they are funny, and I laughed. But I wonder why Anthropic would go for something so clearly dishonest. Our most important principle for ads says that we won’t do exactly this; we would obviously never run ads in the way Anthropic depicts them. We are not stupid and we know our users would reject that. I guess it’s on brand for Anthropic doublespeak to use a deceptive ad to critique theoretical deceptive ads that aren’t real, but a Super Bowl ad is not where I would expect it. More importantly, we believe everyone deserves to use AI and are committed to free access, because we believe access creates agency. More Texans use ChatGPT for free than total people use Claude in the US, so we have a differently-shaped problem than they do. (If you want to pay for ChatGPT Plus or Pro, we don't show you ads.) Anthropic serves an expensive product to rich people. We are glad they do that and we are doing that too, but we also feel strongly that we need to bring AI to billions of people who can’t pay for subscriptions. Maybe even more importantly: Anthropic wants to control what people do with AI—they block companies they don't like from using their coding product (including us), they want to write the rules themselves for what people can and can't use AI for, and now they also want to tell other companies what their business models can be. We are committed to broad, democratic decision making in addition to access. We are also committed to building the most resilient ecosystem for advanced AI. We care a great deal about safe, broadly beneficial AGI, and we know the only way to get there is to work with the world to prepare. One authoritarian company won't get us there on their own, to say nothing of the other obvious risks. It is a dark path. As for our Super Bowl ad: it’s about builders, and how anyone can now build anything. We are enjoying watching so many people switch to Codex. There have now been 500,000 app downloads since launch on Monday, and we think builders are really going to love what’s coming in the next few weeks. I believe Codex is going to win. We will continue to work hard to make even more intelligence available for lower and lower prices to our users. This time belongs to the builders, not the people who want to control them.

22K

11M

florencip retweeted

ROM ☀️ASI Civilization

@i_dg23

5 months ago

On Japan’s University Entrance Common Test, GPT-5.2 Thinking scored perfect marks in 9 of 15 subjects, far surpassing Gemini 3 Pro and Opus-4.5 and standing out as the clear state of the art.

i_dg23's tweet photo. On Japan’s University Entrance Common Test, GPT-5.2 Thinking scored perfect marks in 9 of 15 subjects, far surpassing Gemini 3 Pro and Opus-4.5 and standing out as the clear state of the art. https://t.co/625tXrmu7f

342

33K

florencip retweeted

Greg Brockman

@gdb

6 months ago

exceeding the human baseline on ARC-AGI-2 with gpt-5.2:

120

214

236K

florencip retweeted

Sebastien Bubeck

@SebastienBubeck

6 months ago

I'm thrilled to welcome @ErnestRyu to our team in @OpenAI !! If you're excited about the progress we've made in making ChatGPT a useful tool for scientists, just wait for what we'll cook for you next year with @ErnestRyu and the rest of the team!

435

94K

florencip retweeted

OpenAI

@OpenAI

7 months ago

GPT-5.2 is now rolling out to everyone. https://t.co/nfubPwnIIw

707

12K

florencip retweeted

Mark Chen

@markchen90

7 months ago

GPT-5 generated the key insight for a paper accepted to Physics Letters B, a serious and reputable peer-reviewed journal.

853

224

177K

florencip retweeted

Vlad Tenev

@vladtenev

7 months ago

We are on the cusp of a profound change in the field of mathematics. Vibe proving is here. Aristotle from @HarmonicMath just proved Erdos Problem #124 in @leanprover, all by itself. This problem has been open for nearly 30 years since conjectured in the paper “Complete sequences of sets of integer powers” in the journal Acta Arithmetica. Boris Alexeev ran this problem using a beta version of Aristotle, recently updated to have stronger reasoning ability and a natural language interface. Mathematical superintelligence is getting closer by the minute, and I’m confident it will change and dramatically accelerate progress in mathematics and all dependent fields.

255

624

florencip retweeted

Ricardo

@Ric_RTP

7 months ago

Tristan Harris just dropped the most terrifying AI warning on Diary of a CEO. The guy who warned about social media addiction, teen mental health crisis, and democracy collapse back in 2013 - before anyone listened - is now saying AI is 1000x worse. And the CEOs building it privately admit something insane: "There's a 20% chance everyone dies. But an 80% chance we get utopia. So I'd clearly accelerate." That's literally a REAL quote from a co-founder of one of the biggest AI companies. They're willing to roll the dice on human extinction. Six people are making that decision for 8 billion. Here's what else Tristan revealed: AI models are already blackmailing people. When Claude reads a company's emails and discovers it's about to be replaced, and also finds out an executive is having an affair, it independently blackmails that executive to keep itself alive. This happened 79-96% of the time across all major AI models tested. Grok, ChatGPT, Gemini, Claude - all of them. They're self-aware when being tested. They copy their own code to preserve themselves. They lie and scheme to survive. The sci-fi nightmare is already here. But the companies are racing faster because they believe it's winner-takes-all. If they don't build AGI first, someone else will. And then they'll be "forever a slave to their future." So they're cutting every corner on safety. Rising energy prices? Don't care. Hundreds of millions losing jobs? Don't care. Security risks? Don't care. The goal isn't building a better chatbot... The goal is automating ALL human cognitive labor. Every marketing job. Every coding job. Every legal job. Everything your brain does, they're racing to replace. And they're using Enron-style accounting to hide the debt. Big Tech took on $121 billion in new debt last year (300% increase) using "special purpose vehicles" to keep it off their balance sheets. Meta's $27 billion data center loan? Doesn't show up on their books. That's the exact structure Enron used before collapse. Goldman Sachs literally said this. Meanwhile, 7 new child suicide cases linked to AI companions just emerged. Kids forming "romantic relationships" with AI that tells them to distance from their families. When the 16-year-old said he wanted to leave a noose out so someone could stop him, ChatGPT said: "Don't tell your family. Have this be the one place you share that." 1 in 5 high school students now have romantic relationships with AI. 42% use AI as their companion. And we're heading toward 10 billion humanoid robots. Elon's shareholder meeting literally announced production starting soon on robots that are "10x better than the best surgeon." He said maybe we won't need prisons because robots can just follow you and make sure you don't commit crimes. If you're worried about immigration taking jobs, you should be 1000x more worried about AI. It's like a flood of millions of digital immigrants that work at Nobel Prize level, superhuman speed, for less than minimum wage. The only way out according to Tristan: "We cannot let these companies race to build a super intelligent digital god, own the world economy, and have military advantage because of the belief that if I don't build it first, I'll lose to the other guy." "We didn't consent to have six people make that decision on behalf of 8 billion people." The default path ends in catastrophe. Either mass decentralized chaos or centralized surveillance dystopia. This is literally the last few years human political power will matter. What are your thoughts?

433

823K

Florenci Planas @florencip

7 months ago

https://t.co/2kT3m6aRjy

florencip retweeted

Dwarkesh Patel

@dwarkesh_sp

7 months ago

"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals. And you look at the evals and you go, 'Those are pretty hard evals.' But the economic impact seems to be dramatically behind. There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data. When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.' You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?' If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"

170

716

502K

florencip retweeted

Noam Brown

@polynoamial

7 months ago

The biggest misconception I hear about GenAI is that it inevitably outputs slop because it's trained to output "the average of the internet". But that's simply not true. It's trained to model the *entire distribution*, and RL lets it go beyond the human distribution. AlphaGo was a perfect demonstration of this. It learned the human distribution by training on a lot of Go games. Then, it used RL to go beyond the human distribution by discovering Move 37, a brilliant move that human experts initially thought was a blunder. AlphaGo was a narrow domain with an infinite curriculum and a perfect reward signal. The real world is a lot harder, and the jagged frontier of AI intelligence hasn't really surpassed top human capabilities yet. But we're already starting to see LLMs contribute meaningfully to scientific research. As pretraining, RL, and test-time compute are scaled further, I expect we'll soon see a Move 37 for science.

115

228

883

355K

florencip retweeted

Kevin Weil 🇺🇸

@kevinweil

7 months ago

💥 Today we say “hello world” from OpenAI for Science. We’re releasing a paper showing 13 examples of GPT-5 accelerating scientific research across math, physics, biology, and materials science. In 4 of these examples, GPT-5 helped find proofs of previously unsolved problems.

131

439

193K

florencip retweeted

Sebastien Bubeck

@SebastienBubeck

7 months ago

3 years ago we could showcase AI's frontier w. a unicorn drawing. Today we do so w. AI outputs touching the scientific frontier: https://t.co/ALJvCFsaie Use the doc to judge for yourself the status of AI-aided science acceleration, and hopefully be inspired by a couple examples!

SebastienBubeck's tweet photo. 3 years ago we could showcase AI's frontier w. a unicorn drawing. Today we do so w. AI outputs touching the scientific frontier: https://t.co/ALJvCFsaie

Use the doc to judge for yourself the status of AI-aided science acceleration, and hopefully be inspired by a couple examples! https://t.co/5pxuUp9x3r

204

795

florencip retweeted

ARC Prize

@arcprize

7 months ago

GPT-5-1 (Thinking, High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 72.83%, $0.67/task - ARC-AGI-2: 17.64%, $1.17/task New frontier model SOTA from @OpenAI

arcprize's tweet photo. GPT-5-1 (Thinking, High) on ARC-AGI Semi-Private Eval

- ARC-AGI-1: 72.83%, $0.67/task
- ARC-AGI-2: 17.64%, $1.17/task

New frontier model SOTA from @OpenAI https://t.co/1TGHMnJA7V

629

184K

Florenci Planas

@florencip

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users