Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models.
Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.
MiniMax-M3 is now the best open-source model on the AAI
It's ahead of Kimi-K2.6, GLM-5.1 and DeepSeek-V4-Pro
On the Coding Index it's slightly behind V4-Pro and K2.6, but it's beating them on the Agentic Index
CritPt scores are a bit worrisome.
But the biggest highlight here is probably its reasoning efficiency compared to other open models.
(but still miles behind closed models)
microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.
this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.
the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale
let's look at all of this in this likely very long thread 🧵
Introducing the newest Coral board, for efficient, on-device AI!
Check out the demos in the video:
- On-board speech translation
- Natural language controlling hardware
- Vision & sound generating music
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
andrew dai最近的访谈中就直指deepmind和google brain之前并没有人关心语言模型和scale,都在玩自己的hobby project,最后还是gpt3出来把大家吓一跳,alex radford看出来andrew一个意外引发的街越性能提升。scale是一个工程问题,牵涉到办公室政治和credit拿结果需要上面批准,但是没结果怎么让批
SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible.
The potential speed improvement vs JAX for large training runs is over an order of magnitude.
Interesting photonics selloff today on no news?
$LITE down -4.95%
$AAOI down -4.85%
$SIVE down -14.8%
$SOI down -5.73%
$AXTI down -8.13%
$IQE down -12.13%
I think it’s probably the most compelling theme going forward (even more than power semis).
Just tends to be very volatile on the way up.
Surprised about $AAOI though given there’s some institutional notes apparently about long term $AMD or $NVDA agreements. (Rosenblatt). Maybe $600m ATM caps some near term upside.
$SIVE as well, given EU Chips Act 2 is next week around photonics, and they’re listed on the blueprint. Same with MSCI/NASDAQ omx inflow next week.
I’ve been personally adding to positions since I have high conviction in the photonics theme (CPO especially) given TAM expansion overall next 2 years.
https://t.co/35OFRzhW2b covers the 150k star, "Karpathy-Inspired Claude Code Guidelines" It's made for founders. Including,
/plan-ceo-review on any feature idea
/review on any branch with changes
/qa on your staging URL