We've known about LLM test-time compute scaling since @OpenAI o1.
Yet 2 years later labs still report scalar evals for models; safety orgs are still surprised when a scaffold does better via 100x inference; and RSPs still ignore inference budget when deciding critical thresholds.
AI keeps getting better but the last time the shape of the jagged frontier changed radically was o1 & the Reasoner
A good mental model of the coming months is that models get very good at the things they are already good at (coding), but weaknesses will be similar (long fiction)
OpenAI o1-preview and o1-mini are rolling out today in the API for developers on tier 5.
o1-preview has strong reasoning capabilities and broad world knowledge.
o1-mini is faster, 80% cheaper, and competitive with o1-preview at coding tasks.
More in https://t.co/l6VkoUKFla.
Chomsky argued that LLMs learned impossible languages as well as possible ones & thus couldn’t tell us useful things about language.
Nope: “Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim.”
This is not just true for CS students.
If you want to do research on AI, or figure out how it can be used in your organization, the first step is talk to the models a lot. Use it for everything you do (within legal & ethical bounds). You don’t know what it does until you use it.
Speaking from experience, if you want to impress non-AI users with what AI can do (without freaking them out too much), you should make a song on demand with Udio or Suno based on their suggestions or something you know about them. Genuinely delights folks.
🚨Our new paper on how instructors can use AI to create research-based experiences that would have been impossible to do before the advent of generative AI.
We also teach how to build them.
Blog on democratizing innovation: https://t.co/oP1Ct4ZRvQ
Paper: https://t.co/hJU9HA2cAZ
Introducing a series of updates to the Assistants API 🧵
With the new file search tool, you can quickly integrate knowledge retrieval, now allowing up to 10,000 files per assistant. It works with our new vector store objects for automated file parsing, chunking, and embedding.
It is weird how effective it is to apply human-inspired approaches to problem solving to help LLMs “think” better.
Here, asking the AI to visualize each step in a navigation problem by drawing diagrams helps greatly improve performance on the problem.
The big education crisis caused by AI is not going to be in schools (there was cheating before AI & we can figure out AI uses that boost learning), but after graduation.
White collar work is secretly based on an apprenticeship system that will break
From my book Co-Intelligence