This 2 hour Stanford lecture will teach you more about how LLMs like ChatGPT & Claude are built than most people working at top AI companies learn in their entire careers.
Bookmark this & give 2 hours today, no matter what. It'll be the most productive thing you do this week.
🚨BREAKING: Claude is insane for market research.
I reverse-engineered how top consultants at McKinsey, Goldman Sachs, & JP Morgan use it.
The difference is night and day.
Here are 12 insane Claude Opus 4.6 prompts they don't want you to know (Save for later)
Sparrows discussing how prickly it is. One on top is Trump and the other one is... wait for it.... the rest of the world 😂 #trump#davos#thorns#prickly#geopolitics
Stanford researchers built a new prompting technique!
By adding ~20 words to a prompt, it:
- boosts LLM's creativity by 1.6-2x
- raises human-rated diversity by 25.7%
- beats fine-tuned model without any retraining
- restores 66.8% of LLM's lost creativity after alignment
Let's understand why and how it works:
Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses.
This happens because of typicality bias in human preference data:
When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output.
But here's the interesting part:
The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model.
Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training.
The idea is simple:
Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke."
By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer.
Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality.
Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further.
You can find the paper link in the next tweet.
👉 Over to you: What other methods can be used to improve LLM diversity?
Banger paper for agent builders.
Multi-agent systems often underdeliver. The problem isn't how the agents themselves are built. It's how they're organized.
They are mostly built with fixed chains, trees, and graphs that can't adapt as tasks evolve.
But what if the system could learn its own coordination patterns?
This new research introduces Puppeteer, a framework that learns to orchestrate agents dynamically rather than relying on handcrafted topologies.
Instead of pre-defining collaboration structures, an orchestrator selects which agent speaks next based on the evolving conversation state. The policy is trained with REINFORCE, optimizing directly for task success.
Rather than searching over complex graph topologies, they serialize everything into sequential agent selections. This reframing sidesteps combinatorial complexity.
What emerges is surprising: compact cyclic patterns develop naturally. Not sprawling graphs, but tight loops where 2-3 agents handle most of the work.
The remarkable part is that the system discovers efficiency on its own.
Results:
- On GSM-Hard math problems: 70% accuracy (up from 13.5% for the base model alone).
- On MMLU-Pro: 83% (vs 76% baseline).
- On SRDD software development: 76.4% (vs 60.6% baseline).
These gains come with reduced token consumption. The paper shows that token costs consistently decrease throughout training while performance improves.
They also prove the agent selection process satisfies Markov properties, meaning the current state alone determines the optimal next agent. No need to track full history.
Why it matters for AI devs: learned simplicity beats engineered complexity. A trained router with a handful of specialized agents can outperform elaborate handcrafted workflows while cutting computational overhead.