@lateinteraction it was my idea :)
Using GEPA is a very natural workflow for creating LLM programs. The iteration speed is very quick, and it easily allows researchers to bias the optimization with some priors (usually derived from just looking at the data).
Thanks a lot for the great tool!
RLMs are so resilient.
Multiple times I've run into bugs in our setups. What's interesting is that those bugs only became apparent after careful trace reviews, because the RLM actually found a way forward despite some broken state.
Truly mind-boggling.
TIL: You can optimize any agent (cli) with GEPA to automatically optimize your prompts.
GEPA accepts any `(str) -> str` callable, it works with your own custom CLI, local models, or API agents. Wrap your agent in a python function and let it self-optimize.
The late-interaction multivector retrieval ecosystem is exploding right now.
To help separate the signal from the noise, we put together an "Awesome Multivector Retrieval" list organizing the top models, engines, libraries, and datasets all in one place 📚 🧵👇
Getting back around to this. OBLIQ is a really interesting benchmark, and feels like the right one for this space.
It's almost gratuitously hard, but seems pretty well-aligned with interesting agent observability problems. Saturation on this set would probably solve a lot of more common real-world use cases along the way.
if you're testing a new retrieval model or long-context LLM, it's a waste of your time (and ours...) to report 0.2% gains on the many saturated and expired benchmarks
if you're in that position and looking for way to rescue your great new idea, put it to the test on OBLIQ-Bench
This is the initial prompt:
"Write a classical haiku given the provided inputs."
The screenshot shows the new version.
This is how @DSPyOSS adds clarity:
- express intent in logical building blocks
- add your eval criteria + dataset
- GEPA optimization algo
what I also love is how it has really become a community effort.
Like so many different people from the DSPy community shipping stuff for the official website or package, I love it.
That's how I always envisioned open source.
reading the new getting start tutorial really gives me the same goosebumps like I had when I discovered DSPy last year.
This all just makes so much sense. It is the right balance between expressiveness/flexibility and control.
DSPy requires more up front learning than just writing natural language instructions. But once you get it, it makes building, maintaining, and improving AI programs so much easier.
We want to soften this learning curve to make these benefits more accessible, starting with more accessible docs and a focused front page. Check it out!
New DSPy docs dropped finally!!
@dbreunig did an amazing job on redesigning the front page and getting started tutorial.
If you haven't yet tried dspy, this is an amazing opportunity to learn about it!
Thanks to @dbreunig, we have a new front page and new docs, built for easier onboarding.
We're slowly approaching a major DSPy 4.0 release, based on radical ideas that have been brewing for the past 1.5 years and have now mostly taken shape. Stay tuned!
https://t.co/eTtzzZL5i8
to their credit, teams at anthropic have been unusually forthcoming repeatedly acknowledging the influence of RLMs and other open research on their thinking and design choices, e.g.
In case you're curious about why dynamic workflows are so powerful and the future, read the RLM paper! Opus 4.8 + dynamic workflows in Claude Code is perhaps the first instance of a frontier model seriously trained to be an RLM.
I suspect within a year they'll just become the standard for nearly all coding agent interactions.