The bitter lesson in 26 words:
Don’t be distracted by human knowledge, as AI has been historically.
Instead focus on methods for creating knowledge that scale with computation, like search and learning.
By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."
@voooooogel Yeah for sure. I meant specifically the “AI only” social media aspect, where humans are not allowed to directly participate but can observe
SOTA models definitely make it more interesting in lots of new ways
@willccbb@seconds_0 It’s not well documented but you can also use gpt-5-nano/mini with
reasoning_effort: "minimal"
It uses 0 reasoning tokens in all my evals and it’s cheaper + higher throughput vs. 4.1 series
@jxmnop Been working on envs that reward caring about these strange little errors + policies for horizons long enough to develop this heuristic
Yes SWE-benchmaxxing is great and all, but it bakes in so many assumptions that break OOD