๐ฃ Excited to share my first work @Princeton : ๐ง๐ผ๐๐ฎ๐ฟ๐ฑ๐ ๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ ๐ผ๐ณ ๐๐ ๐๐ด๐ฒ๐ป๐ ๐ฅ๐ฒ๐น๐ถ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐
AI agents keep getting more capable. But are they actually reliable?
๐ Paper: https://t.co/1CvygFLdct
๐ Dashboard: https://t.co/C1EfoMyaS8
๐งต๐
Slides for my lecture โLLM Reasoningโ at Stanford CS 25: https://t.co/eApGUHyIDo
Key points:
1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial insight is that transformer models can become nearly arbitrarily powerful by generating many intermediate tokens, without the need of scaling the model size (https://t.co/HO2seV6vVl).
2. Pretrained models, even without any fine-tuning, are capable of reasoning. The challenge is that reasoning-based outputs often donโt appear at the top of the output distribution, so standard greedy decoding fails to surface them (https://t.co/75h2QQzT9M)
3. Prompting techniques (e.g., chain-of-thought prompting or "letโs think step by step") and supervised finetuning were commonly used to elicit reasoning. Now, RL finetuning has emerged as the most powerful method. This trick was independently discovered by several labs. At Google, credit goes to Jonathan Lai on my team. Based on our theory ( see point 1), scaling RL should focus on generating long responses rather than something else.
4. LLM reasoning can be hugely improved by generating multiple responses and then aggregating them, rather than relying on a single response (https://t.co/BA5MUzg3PR).
Will Barta WINS the last stage of @VueltaCV , his first professional victory after a heroic 50 km solo attack from the breakaway! One of my favourite Movistar victories EVER. Huge congrats, @willbarta ! ๐ #75VCV
Open challenges in LLM research
The first two challenges, hallucinations and context learning, are probably the most talked about today.
Iโm the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives).
Number 5 and number 6, new architectures and new hardware, are very challenging, but are inevitable with time. Because of the symbiosis between architecture and hardware โ new architecture will need to be optimized for common hardware, and hardware will need to support common architecture โ they might be solved by the same company.
I referenced a lot of papers here, but I have no doubt that I still missed a ton. If thereโs something you think I missed, please let me know!
https://t.co/Al2b2Zjqb7
Hereโs how the first #WEC grid of the season shapes up for tomorrowโs race ๐คฉ
#๏ธโฃ5๏ธโฃ0๏ธโฃ - P1
#๏ธโฃ5๏ธโฃ1๏ธโฃ - P4
#FerrariHypercar#Ferrari499P ๐บ๐ธ #1000MSebring