Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Performance Hints
Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've recently published a version of it externally.
We'd love any feedback you might have!
Read the full doc at: https://t.co/jej95g236P
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably
https://t.co/iN2JtWhn23
Today, Zipline launched in Dallas, our first major metro in the US. We’re seeing the dawn of a new era of robotic instant logistics that is going to become an indispensable part of our lives. After 4 years and hundreds of thousands of test flights, teleportation is finally here.
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://t.co/nZ5JQ19CN6
we are excited to make this a very, very good model!
__
we are planning to release our first open-weigh language model since GPT-2.
we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do.
before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release.
we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above.
we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.
🚨 New paper & dataset! 🚨
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
- Synthesizes 2.8M challenging and diverse questions which require multi-step reasoning, along with reference answers
- Shows steeper data scaling curve for knowledge distillation than existing datasets
- Demonstrates potential for self-training to improve general reasoning capabilities beyond math, code, etc.
Paper link: https://t.co/y17ItAT129
Dataset Link: https://t.co/rAfArxtGgC
🧵(1/5)
Introducing the Model Context Protocol (MCP)
An open standard we've been working on at Anthropic that solves a core challenge with LLM apps - connecting them to your data.
No more building custom integrations for every data source. MCP provides one protocol to connect them all:
Today(), @stripe is launching a SDK built for AI agents:
- LLMs can call payment, billing, issuing, etc APIs
- Integrates with @vercel, @langchain, @crewAIInc
- Use any model via functions (w/ per token billing)
Excited what you, and your bots, build: https://t.co/3GiXmNm6en
Today we’re excited to introduce Integuru — the first AI agent to build low-latency integrations with platforms lacking official APIs 🤝
This open-source agent can build an integration in minutes through reverse-engineering internal APIs.
Check out the details in this thread.
On the topic of overregulation of space launches: US Customs actually required the astronauts returning from the moon to file a customs declarations form for their moon rocks