Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?
We find a simple explanation: it's just steering vector distillation.
We explain which traits transfer and why subliminal learning fails across models.
Recently met @srush_nlp and he started giving me an impromptu lecture on how targeted on-policy self-distillation works.
I asked him if I could record it on my iPhone.
The basic idea is this: if the model made a mistake at some point in the rollout (for example, calling a tool that doesn't exist), we want to discourage this specific error, but we don't want to just learn from the final reward, because it's a very noisy signal spread out over the whole trajectory.
So we have another model read this trajectory and figure where the error was made. It simply inserts some hint tokens to the part of the trajectory right above where the mistake was made.
Now with these injected hint tokens, have the model run a forward pass. You're not having to regenerate a new rollout - aka no new decode required.
The hint causes the model to assign lower probabilities to the error tokens. You then trains the original model to match these new probabilities, teaching it to downweight that specific mistake.
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
been asking others at Anthropic how they stay in the loop with Claude and fully understand the work being done
this is one of my favorites from Suzanne:
G7 Vision on AI openness opportunities and shared language
学習データが法的・技術的に公開不可能な場合に限り、詳細な「データ情報(データのメタデータや概要)」を提供することでオープンを名乗る事ができる
ってのがオープンソースイニシアティブ定義との大きな違い
https://t.co/TtnTp9RHfH
xiaomi follows deepseek's playbook: mimo-v2.5-pro api now matches deepseek-v4-pro pricing to the cent
benchmarks (mimo vs deepseek):
• gdpval-aa (general agent elo): 1581 vs 1554 ✅
• τ³-bench (tool-use): 72.9 vs 71.8 ✅
• claweval (function calling): 63.8 vs 59.8 ✅
• humanity's last exam (frontier reasoning): 48.0 vs 48.2 🟰
• swe-bench pro (real-world coding): 57.2 vs 55.4 ✅
• swe-bench verified (coding fixes): 78.9 vs 80.6 ❌
• terminal-bench 2.0 (shell tasks): 68.4 vs 67.9 ✅
artificial analysis (mimo vs deepseek):
• intelligence index: 54 vs 50 ✅
• speed (median tok/s): 53 vs 54 🟰
• latency: 3.81s vs 1.86s ❌
same price, near-identical capability, trade-offs only at the margins. the chinese frontier is commodifying – and the price war is just getting started
follow @thehypedotnews for 24/7 ai news, analysis and breakdowns
I wrote a new post on what we need to keep human and what to hand over to AI, with forays into experiments in education, consulting, and the the latest controversy over literary prizes. https://t.co/NqWO8wyVG8
In the era of #ArtificialIntelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. We must lovingly safeguard the grandeur of humanity bestowed upon us and revealed in its fullness in Christ, the splendor of which no machine can ever replace. #MagnificaHumanitas
https://t.co/6i9MWs6LJl
New post: "Generalization Dynamics of LM Pre-training"
Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing.
This mental model is wrong. The true dynamics are stranger, and far more fascinating!
We call it Mode-Hopping.
Tubesaku(YouTuber/Vtuber向けのツール提供サイト)の過去1ヶ月の流入元
AI AgentやLLMがアクセスしやすい構成にしているんですがWebサイト運営側の立場だとchatGPTやgemini、notebooklmがサーチエンジンに置き換わる未来が想像できない
アプリ経由だとDirectに分類される説もありますがclaudeどこ?
🧵 1/11 Everyone's doing on-policy distillation now (Qwen3, Deepseek V4, GLM-5).
But here's what nobody's asking: at any given token or for a question and a teacher, when does the teacher's guidance actually help, and when does it quietly make things worse?
We found a way to answer this. No training needed!