@levie This is why we built @redis Langcache
We took everything we know about caching and built the best prompt cache for agents
https://t.co/lu67iCW9Gy
@antirez - does the idea of a model selector in ds4, so that it can be my *one* coding agent work with your philosophy for ds4-agent?
Alternatively, I could write a new harness to do this and use ds4 under the covers for all local inference.
If so I'll look at submitting a PR
Prediction:
This time next year, many devs will run local models for *many* tasks.
Dwarfstar by @antirez is the way
There are more powerful desktops coming by NVIDIA, MS, AMD
DS4 should add a router to allow sending some hard tasks to SOTA models
https://t.co/G0LdAPLh2e
(DS4 should let me use SOTA cloud models like OpenAI 5.6 or Opus 4.8). This would make ds4-agent possible to be my primary agent and no tradeoff. Either an auto router or a model selector.
@nrmehta@jakesaper Personally, I want a quarter inch drill! Then I'm sure I'll find all kinds of holes that I need ;)
More tools in my garage makes me happy
@antirez@derekcollison@ziglang Agreed...
When was the last time you read a book that was transcribed by hand?
When was the last time you rode a horse to work?
Most AI agents are stateless by design. That means they start every session from zero. No memory. No context. No continuity. Big problem. Redis Iris, our context engine for agents, fixes that.
Watch as @SonnySangha breaks down Redis Iris and builds an agent from scratch using the platform. Watch the entire 50-minute masterclass here: https://t.co/lkbYj2SFl3
Learn more about Redis Iris: https://t.co/gAJ9QLbyjK
We are in the final stages of a new product launch by the engineers who brought you Redis.
If all goes well we will change the industry for extreme scale (PB), ultra low latency data platforms
Designed for the most demanding AI workloads
Buckle up
We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top
DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.
The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.
More below.
We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top
DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.
The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.
More below.
@antirez I was using Fable to build then run an auto-researcher loop for Dwarfstarβs ds4-agent designed to improve the teaching tokens
Is building better coding agent harness off limits?
Will it be blocked and sent to opus? How do you know?
I find myself telling coding agents on different machines to take a log about the work the are doing, and to use different skill files for certain processes. It's time to start using Redis arrays with ARGREP I guess, to have all centralized. I'll share a skill file and a video.
Running Dwarfstar on my MacBook Pro m4 MAX, 128gb ram feels not dissimilar to running Claude (in terms of TPS). Not quite there yet, but the path is clear:
* Local inference
* Control your model
* Open source all the way.
Good news. Now the DwarfStar story of local inference shifted from "you need 128GB" to "you need a decent MacBook" if you can afford going slower. Indeed the usability level at this speed is different, but it is *very* good that now this is a spectrum and not a hard can/can't.