@dwarkesh_sp Since your podcasts with David Reich were well received:
Gregory Clark, of “A Farewell to Alms” fame, would be an appropriate guest (his proposal is that natural selection drove productivity in Britain and culminated in industrial revolution).
I'm excited to share a new paper: "Mastering Board Games by External and Internal Planning with Language Models"
https://t.co/jWoSojZtbQ
(also soon to be up on Arxiv, once it's been processed there)
@erikphoel@JrKibs for e.g. moving SOTA from 2% to 25% on FrontierMath. That's very impressive, I think you'd agree. Personally, I was betting strongly on prediction markets that SOTA performance on FrontierMath would not cross 10%, and definitely not 30% by the end of 2025. o3 reminded me this:
@erikphoel@JrKibs o3 has not been tested as comprehensively, but it was just a preview and we don't have its system card yet. Also consider this: most benchmarks have already been saturated, and the benchmarks it *was* tested on, it showed highly impressive performance, (continued in next reply)
@Justin_Halford_@Jason You're lending support to my IBM analogy by citing all these releases.
Nothing about Veo2 or Whisk is groundbreaking research-wise. It would have been more impressive if there was a Veo before there was a Sora.
@Justin_Halford_@Jason When you put it like that, sounds like Google suffers from a kind of "IBM disease".
FWIW, I'm not too optimistic about OpenAI in the coming future, but they seem not to have lost their momentum so far.
@Justin_Halford_@Jason I understand how important compute is for research. RLHF was just an illustrative example of a component that cannot be replaced with more compute.
My point is, you're taking Bitter Lesson to be simpler than it is, and extrapolating it too far to "more compute = best AI".
@Justin_Halford_@Jason Bitter Lesson does not enable companies to magically capture market share or have good research insights. I also disagree with the idea that compute advantage can offset lack of good research. No amount of compute can substitute RLHF, for example.