๐ DSPy is (really) All You Need
https://t.co/0y4nMyr1cE
@kmad returns to AIE for a special workshop sure to please the @DSPyOSS fans - a comprehensive overview of DSPy! (our second after @lateinteraction's talk with us at AIEWF)
1/ ๐ฅ @NoPriorsPod x @LatentSpacePod chat with @SatyaNadella at @Microsoft Build. He has the sharpest mental models of any public company CEO I've interviewed.
$MSFT is at its heart still a tools company! Big focus on agentic coding, harness & AI evals. Takeaways:๐
*If* this is confirmed this is fascinating - a team converted Googleโs quantum algo ZKP into a benchmark and had agents hillclimb against it, eventually exceeding their results!
@lateinteraction it was my idea :)
Using GEPA is a very natural workflow for creating LLM programs. The iteration speed is very quick, and it easily allows researchers to bias the optimization with some priors (usually derived from just looking at the data).
Thanks a lot for the great tool!
god i'm so excited to have noah on the team. been trying to get him here for almost a year. his record of innovation at the frontier of algorithms + infra for self-improving ai is honestly insane, and i think his recent work is my favorite yet. idk how he's so chill about it.
@dbreunig@lennypruss@trq212@CAISconf Whoa thatโs an elegant way of putting it.
Have been thinking about exactly this topic (how many tasks today are under-specified to be useful)
@ThePeshwa@lateinteraction@PrimeIntellect Both, actually. I would switch to the other when I ran out of credits. Opus was nice for the big picture, gpt-5.5 for the execution and diagnostics
So /goal is awesome
Over the past few weeks I used @PrimeIntellect to train a 149M late interaction model based on GTE-ModernColBERT-v1 using PyLate, focused on clause extraction from legal contracts.
On the MLEB benchmark it does well for its size: it's the best accuracy-per-parameter open model on the task, 3rd of 17 open-source models, ahead of Google's EmbeddingGemma (308M, 0.829) and the same-size legal peer Free Law ModernBERT (0.764), behind only Qwen3-Embedding-4B/8B (which are 27โ53ร larger).
The agents love the prime cli. I only used the UI for paying my bill.
@antoine_chaffin None taken ๐
It was many /goals over time butโฆ yes this was codex/cc doing the experimentation. I was just the human asking dumb questions.
RLMs are so resilient.
Multiple times I've run into bugs in our setups. What's interesting is that those bugs only became apparent after careful trace reviews, because the RLM actually found a way forward despite some broken state.
Truly mind-boggling.