I've been circling a similar problem (what does durable learning look like with LLMs in the loop) - whether capability actually transfers to the learner. Your /teach skill and the Fable release prompted me to revisit this: I kept your mission/workspace skeleton, replaced learning records with a probe-graded capability ledger scheduled by FSRS, and added a log of every time the learner reaches for the answer before attempting. Here's what I ended up building: https://t.co/OVllYKuvAM - would appreciate your feedback!
It's 7 markdown files and one Python script, portable to any agent that reads skills. I'm now using it to learn about post-trade settlement - would love your feedback if you give it a go. https://t.co/pG1Mwq7QlM
I wrote about durable learning with a model in the loop: once a model sits inside, output stops being a signal of understanding, and what actually matters is whether the capability transfers to the learner. I wanted to test @claudeai Fable, so I tried turning that into a skill.
The really important part is agency telemetry. The skill always answers when you ask, but it logs when you reached for the answer before attempting so it's framed like a training log rather than a conscience. Over weeks your independence compounds next to your capability.
Hey @claudeai: feature suggestion - user profiles with on/off memory per profile. Currently memory is account-wide, so either everything gets pooled together, or there's no memory across your chats. Projects and Incognito get close, but do not actually close the gap.
I'm 10 months post-ACL surgery - so the final stage of recovery (thankfully!), working on pivoting for tennis and padel. This needs un-cued training, because the hard part of those sports is reacting to things you didn't plan for. Cued, predictable drills do not help with this.
Most of the work was getting the spec right - what counts as truly un-cued, where the random draws need to happen, what the active screen should look like from 3 meters away. Built it in 10 mins with @claudeai
Published the spec for Prism's HTTP API and JSON log format in a SPEC.md at the project root, so anyone can build an SDK in any language against the same UI server. Python is today's official SDK; the TypeScript port is next on the list. Interested to hear feedback on the spec, especially from anyone wanting to take a swing at a community SDK before I get to TS. https://t.co/9gJTeuAwZ0
I audited the 9 running versions of my @polymarket weather trading bot. A 285:1 longshot that resolved yes hit every version, skewing the PnL by 75%. A trade that every version catches cannot rank them - I found that the comparison was more valuable in areas of disagreement.
The harder problem is measuring transfer itself, not just retrieval. Agents could change this - a model that can probe what a learner actually understands, not just what they produce, might get closer to measuring transfer than anything we have now.
I've been thinking about what durable learning looks like when a model can answer almost anything. Not what it means to learn faster - what it means to actually know something when the answer is always a query away.
With @BrainbankSpace I've tried to design for this: AI generates & explains, the spaced repetition loop forces internalisation. The capability should live in the user, not in the tool. This handles the retrieval side, but it does not solve or measure the transfer of capability.