Very thoughtful analysis of distinction between human intelligence and LLM intelligence. Evokes @karpathy comparison of “building animals” vs “summoning ghosts”
Major preprint just out!
We compare how humans and LLMs form judgments across seven epistemological stages.
We highlight seven fault lines, points at which humans and LLMs fundamentally diverge:
The Grounding fault: Humans anchor judgment in perceptual, embodied, and social experience, whereas LLMs begin from text alone, reconstructing meaning indirectly from symbols.
The Parsing fault: Humans parse situations through integrated perceptual and conceptual processes; LLMs perform mechanical tokenization that yields a structurally convenient but semantically thin representation.
The Experience fault: Humans rely on episodic memory, intuitive physics and psychology, and learned concepts; LLMs rely solely on statistical associations encoded in embeddings.
The Motivation fault: Human judgment is guided by emotions, goals, values, and evolutionarily shaped motivations; LLMs have no intrinsic preferences, aims, or affective significance.
The Causality fault: Humans reason using causal models, counterfactuals, and principled evaluation; LLMs integrate textual context without constructing causal explanations, depending instead on surface correlations.
The Metacognitive fault: Humans monitor uncertainty, detect errors, and can suspend judgment; LLMs lack metacognition and must always produce an output, making hallucinations structurally unavoidable.
The Value fault: Human judgments reflect identity, morality, and real-world stakes; LLM "judgments" are probabilistic next-token predictions without intrinsic valuation or accountability.
Despite these fault lines, humans systematically over-believe LLM outputs, because fluent and confident language produce a credibility bias.
We argue that this creates a structural condition, Epistemia:
linguistic plausibility substitutes for epistemic evaluation, producing the feeling of knowing without actually knowing.
To address Epistemia, we propose three complementary strategies: epistemic evaluation, epistemic governance, and epistemic literacy.
Full paper in the first reply.
Joint with @Walter4C & @matjazperc
Huge congratulations to the Neptune team on their acquisition by OpenAI. It’s an incredible milestone and well-deserved.
For any customers affected by the shutdown of their services, wandb is ready to help ensure your experiments and workflows continue without interruption.
If you’re an AI team deploying agents to production, you need RL to post train smaller, cheaper, more accurate models for your domain.
RL now available without the infra hassle. Christmas came early today.
RL X-mas came early. 🎄
For too long, building powerful AI agents with Reinforcement Learning has been blocked by GPU scarcity and complex infrastructure. That ends today.
Introducing Serverless RL from wandb, powered by @CoreWeave! We're making RL accessible to all.
We are teaming up with @OpenAI to show you, in just two free hours, how to ship production-ready AI agents!
Learn from @ilanbigio and @ash0ts as they break down tool chaining, memory, multi-agent patterns and evals.
Course drops on June 2nd! 👇
Incisive WSJ article on how regulations have crippled Europe’s competitiveness in the global technology industry. As a Brit who left the UK 29 years ago to come to Silicon Valley, it saddens me to see my “mother” continent fall behind.
https://t.co/InRZ6IWstP
If you’re trying to get AI agents working for your enterprise, don’t miss this event whether you’re an AI developer looking to iterate faster or an exec looking to get inspired.
🚀 Fully Connected 2025 • SF • June 17‑18
Day 1: hands‑on labs with safe LLMs, multi‑agent orchestration & deploy‑today fine‑tuning.
Day 2: AI Pioneer Series w/ @metaai, @GoogleAI, @Adobe, @CoreWeave, @windsurf_ai, @Pinterest, @Snowflake + more.
Get your ticket below!
Today we announced that we are being acquired by @CoreWeave, the AI Hyperscaler. 🪄🐝
We could not be prouder or more excited to join forces with this team.
Our CEO, @l2k, wrote a blog post with more details:
https://t.co/mTLuSlgAyQ
Curious how @cursor_ai became the go-to AI coding assistant, handling 100M+ requests daily and indexing billions of files in real time? 👀
Dive into the latest Gradient Dissent as @l2k sits down with Cursor co-founder @sualehasif996.
Vibe coders won’t want to miss this!
👇
Today we announced that we are being acquired by @CoreWeave, the AI Hyperscaler. 🪄🐝
We could not be prouder or more excited to join forces with this team.
Our CEO, @l2k, wrote a blog post with more details:
https://t.co/mTLuSlgAyQ
Incredibly proud that @wandb enabled researchers at Berkeley to reproduce DeepSeek R1-Zero
They published the W&B workspace too
https://t.co/oJDnzpR2qi
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works
Through RL, the 3B base LM develops self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Code: https://t.co/UcGKN2SVGj
Here's what we learned 🧵
My o1-based AI programming agent is now state of the art on SWE-Bench Verified! It resolves 64.6% of issues.
This is the first fully o1-driven agent we know of. And we learned a ton building it.
@jeremyphoward@jeremyphoward you should check out @shawnup's experience in building an o1 AI programming agent to form a more rounded POV on agents https://t.co/PVlAoaXmDB
My o1-based AI programming agent is now state of the art on SWE-Bench Verified! It resolves 64.6% of issues.
This is the first fully o1-driven agent we know of. And we learned a ton building it.
🪄 Think you’re an AI wizard? Prove it.
We’ve partnered w/ @AIExplainedYT to launch the Simple Bench Evals Competition—a challenge so tough, he said:
“If anyone gets 20/20 with a general-purpose prompt, I would be truly shocked.” 😳
Details below 👇
Excited to be in Jensen Huang’s CES keynote and expand our @nvidia collaboration!
Integrating @weave_wb into the AI Virtual Assistant Blueprint unlocks instant tracing and visibility.
More to come, as we dive into the future of agentic AI tools! 🚀
Our Co-founder and CEO, @l2k, has been named an AWS Gamechanger for pioneering generative AI progress!
Captured by legendary @langestudio, Lukas' portrait and the story of how we partnered with @AWScloud to empower @leonardoai to generate millions of images daily are featured at #AWSreInvent on a 16-foot video wall in Las Vegas this week.
📣 We're excited to announce that @weave_wb is now in GA! Join @l2k in this hands-on broadcast, to see how Weave is being used to build, evaluate and improve LLM applications in production with confidence! 👇 https://t.co/w09iy3ptz5
🔥 We've been cooking!
Tune in LIVE tomorrow 9am PT, to hear @l2k unveil exciting news for AI developers straight from the cracked team at @weave_wb!
YT link 👇
We just dropped the latest episode of #GradientDissent with @juliangreensf and @l2k discussing AI’s role in weather forecasting.
They dive into:
📊 The shift from physics-based to AI-driven weather models
🌍 Democratizing forecasting for better climate solutions
🌪️ How AI predicts extreme weather events
🚀 Exciting News!
@wandb and @MicrosoftAzure join forces to streamline fine-tuning models like GPT-4 & GPT-4o on Azure OpenAI Service!
Got a WANDB_API_KEY & using Azure? Now you can keep an eye on the loss curves of your GPT-4o finetune dropping 👀
More details 👇