🔥 Meet Mistral Small 4: One model to do it all.
⚡ 128 experts, 119B total parameters, 256k context window
⚡ Configurable Reasoning
⚡ Apache 2.0
⚡ 40% faster, 3x more throughput
Our first model to unify the capabilities of our flagship models into a single, versatile model.
📢Current world models aren't really modeling the world; they're modeling one agent's view of it. Partial observations ≠ world state.
Future world models will be independent of any one agent's perspective. You will be able to “drop in” any number of agents at any point in time, and a persistent world state will evolve with their interactions. Imagine a neural MMORPG server. 🧵[1/10]
Whats the difference in cost between running a bunch of agents locally to generate value or just mining bitcoin?
if knowledge work becomes incredibly cheap then at what point does just mining crypto become a better proposition?
Mistral has one of the best science teams in the world
The untold story: insane eng + infra team led by co-founder Timothée Lacroix, shipping frontier models at eye-watering efficiency
Now investing >€1bn to take this directly to customers
@jparkerholder I'm always too early. I also was making a competition platform for AIs back in 2018. Today we have things like https://t.co/kQk7UoqFYU
@join_ef didn't want to invest in me ;)
I'm not bitter at all.
i built the 1st "world model game engine"
it makes games from scratch, all running locally on my Mac M1
- 30fps
- 40ms latency
- 50 million parameters
- trained on 15 min of footage
made using only "synthetic" datasets
this is the first of its kind, & you can try it now
Robotics is the next big milestone in AI. We're still working out how to train, scale, evaluate these kinds of things. A whole new world of challenges!
Come work with us.
Very excited to release two new open-weight models, Devstral 2 (123B) and Devstral Small 2 (24B), along with Mistral Vibe, a CLI built for Devstral that enables end-to-end code automation!
I want to especially thank @MistralAI for releasing the base models for Mistral 3. Fewer companies are sharing base models and this opens many use cases from custom instruct to non-instruct cases