⛷️ As últimas seis provas de Lucas Pinheiro Braathen no slalom gigante:
🥈 Copa do Mundo Alta Badia 🇮🇹
🥈 Copa do Mundo Adelboden 🇨🇭
🥈 Copa do Mundo Schladming 🇦🇹
🥇 Jogos Olímpicos - Bormio 🇮🇹
🥇 Copa do Mundo Kranjska 🇸🇮
🥇 Copa do Mundo Lillehammer 🇳🇴
-
🔮 Globo de Cristal
I am unreasonably excited about self-driving. It will be the first technology in many decades to visibly terraform outdoor physical spaces and way of life. Less parked cars. Less parking lots. Much greater safety for people in and out of cars. Less noise pollution. More space reclaimed for humans. Human brain cycles and attention capital freed up from “lane following” to other pursuits. Cheaper, faster, programmable delivery of physical items and goods. It won’t happen overnight but there will be the era before and the era after.
Are you ready for web-scale pre-training with RL ? 🚀
🔥 New paper: RLP : Reinforcement Learning Pre‑training
We flip the usual recipe for reasoning LLMs: instead of saving RL for post‑training, we bring exploration into pretraining.
Core idea: treat chain‑of‑thought as an action.
Reward it by the information gain it provides for the very next token:
This gives a verifier‑free, dense reward on ordinary text with no task checkers, no labels, no filtering.
Why this matters ?
* 🧠 Models think before predicting during pretraining, not just after alignment.
* 📈 Position‑wise credit at every token = stable signal at full web‑scale.
* 🔁 No proxy filters or “easy‑token” heuristics. Trains on the entire stream.
Results:
On the 8‑benchmark math+science suite (AIME’25, MATH‑500, GSM8K, AMC’23, Minerva Math, MMLU, MMLU‑Pro, GPQA):
• Qwen3-1.7B-Base:
RLP improves the overall average by 24% !
• Nemotron-Nano-12B-v2-Base:
RLP improves the overall average by 43% !
📄Paper: https://t.co/9AmMKvO2xd
✍️Blog: https://t.co/6PNJYfiAoJ
#AI #LLM #ReinforcementLearning #ChainOfThought #Pretraining #RLP
The jump from "agents are nowhere close to working" to "okay, narrow agents for research and coding work pretty well" to (very recently) "general purpose agents are actually useful for a range of tasks" has been quick enough (less than a year) so that most people have missed it.
So, effective vibe coding is an hour writing specs, and then you let your agent work while you write another spec for another agent...so it is all specs (or context as people say). This is the 90s over? Not so viby i feel...
You pay for some CC Max plan and what you receive: "5-hour limit reached ∙ resets 12pm". Do you think i work at google? Or at some government agency? I work 8 hours minimum bro. You should too. I dont go 100% codex because CC is better at fixing parser and finance code.
We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: https://t.co/sY30aQM7hU