Amit Tandon

Verified account

@AmitTandonAI

Building Rekursor. AI agents that learn from every task. Creator of The Scroll. AI research to product. Cornell.

Joined August 2019

126 Following

14 Followers

81 Posts

Pinned Tweet

about 2 months ago

AI systems don't learn from experience. Ours does. Claude Opus 4.6 had never passed a Terminal-Bench 2.0 task in 5 published trials. We wrapped it with our learning system. Pass in 42 steps. No model changes. No prompt engineering. We then ran the same learning system on GLM 5.1, a completely different LLM. GLM had failed the same task 8 times in a row. Pass in 7 steps. One learning system. Any model. https://t.co/chLqJUvgWB

AmitTandonAI's tweet photo. AI systems don't learn from experience. Ours does.

Claude Opus 4.6 had never passed a Terminal-Bench 2.0 task in 5 published trials. We wrapped it with our learning system. Pass in 42 steps.

No model changes. No prompt engineering.

We then ran the same learning system on GLM 5.1, a completely different LLM. GLM had failed the same task 8 times in a row. Pass in 7 steps.

One learning system. Any model.

https://t.co/chLqJUvgWB

0

1

0

0

498

about 12 hours ago

@sonyatweetybird @harvey @FactoryAI The app layer can become the learning layer. @FactoryAI routes models, Harvey is advancing agent/advisor routing, and at Rekursor we’re working on skill routing: selecting durable, reusable capabilities learned from prior work. Moving up from apps to systems that compound.

0

1

0

0

15

about 14 hours ago

Three routing innovations this week. @FactoryAI routes the model: frontier quality, 25% lower cost. @Harvey + @FireworksAI_HQ route the advisor: open model primary; frontier model selectively invoked. Rekursor routes inspectable, customer-owned skills selected at runtime, with a new primitive that shows 100% top-pick correctness across an 18 → 504 skill library while RAG collapses past ~200 candidates. Different layers. Same wave. Routing is becoming the architecture. https://t.co/n9piULQ2XT

AmitTandonAI's tweet photo. Three routing innovations this week.

@FactoryAI routes the model: frontier quality, 25% lower cost.

@Harvey + @FireworksAI_HQ route the advisor: open model primary; frontier model selectively invoked.

Rekursor routes inspectable, customer-owned skills selected at runtime, with a new primitive that shows 100% top-pick correctness across an 18 → 504 skill library while RAG collapses past ~200 candidates.

Different layers. Same wave. Routing is becoming the architecture.

https://t.co/n9piULQ2XT

2 days ago

Introducing model routing to Factory. Factory Router picks the right model for every task, automatically. Maintain frontier performance while cutting costs by 25%.

197

2K

216

2K

2M

0

14

0

0

1K

1 day ago

@EnoReyes Awesome. We built something conceptually similar for skills routing using a different selection primitive that works way better than standard RAG. Just released in our latest blog yesterday.

0

1

0

0

22

1 day ago

@garrytan called the skills resolver bottleneck: once you have a lot of skills, the hard part is picking the right one. That's the problem Rekursor solved with a different selection primitive. Across 18 → 504 skills, Rekursor's resolver picked the skill the run confirmed correct 100% of the time. Standard RAG over skill descriptions: ~11% at 18 skills, ~2% at 50, effectively zero by ~200. A skill library only compounds if the resolver scales. Full post: https://t.co/n9piULQ2XT

0

0

0

0

15

2 days ago

The top frontier model passed 7.1% of LAB tasks end-to-end in @harvey benchmark. That is the gap. Post-training can raise the model floor. Long context can improve the run. Rekursor adds a third axis: a learning layer above the model that turns scored work into skills that can revise near-misses to all-pass. New results: held-out transfer (45/49 → 48/49), all-pass revision (49/50 → 50/50), autonomous skill generation, revision without regressions, and routing that holds as the skill library grows. Full post: https://t.co/n9piULQ2XT

0

0

0

0

152

14 days ago

@gabepereyra @harvey Thanks Gabe, really appreciate it. We’ll keep running Rekursor across more tasks and practice areas, and share what we find to help make LAB even more useful for evaluating legal agents.

0

0

0

0

111

16 days ago

Legal AI agents shouldn't just execute tasks. They should learn from them. Rekursor's first result on @harvey open benchmark, Harvey LAB with their agent setup: Baseline: 45/48 (fail) With Rekursor: 48/48 (all-pass) 0 regressions. No fine-tuning. https://t.co/IeltjbOOXn

1

2

0

3

708

15 days ago

Rekursor just hit 48/48 on the first task in Harvey LAB benchmark (baseline was 45/48). This is continual learning for legal AI agents: @WeAreLegora @SpellbookLegal

AmitTandonAI's tweet photo. Rekursor just hit 48/48 on the first task in Harvey LAB benchmark (baseline was 45/48). This is continual learning for legal AI agents: @WeAreLegora @SpellbookLegal https://t.co/QzCpEHMWOr

16 days ago

Legal AI agents shouldn't just execute tasks. They should learn from them. Rekursor's first result on @harvey open benchmark, Harvey LAB with their agent setup: Baseline: 45/48 (fail) With Rekursor: 48/48 (all-pass) 0 regressions. No fine-tuning. https://t.co/IeltjbOOXn

1

2

0

3

708

0

0

0

0

282

17 days ago

@JoshLu mean reversion

0

0

0

0

222

17 days ago

@lawheroezV2 Optimizing workflows would def be key to change

0

0

0

0

45

17 days ago

@chiajy2000 Same here.

0

1

0

0

27

17 days ago

@b1rdmania Cool. We're building the learning layer for AI agents and help you optimize past plateaus.

0

1

0

0

12

17 days ago

@ArtificialLawya wen

0

0

0

0

18

17 days ago

@andrewchen fast inference speed of 1000+ tokens/second is so cool though

0

0

0

0

10

17 days ago

Skills are essentially function calling for prompts.

0

0

0

0

98

21 days ago

@MaxJunestrand @AnthropicAI Strong post. One piece missing from the stack: continual learning. Vertical platforms + governance get you to production. They don't get the agent past the plateau every loop hits. That's what we just launched.

0

0

0

0

31

21 days ago

@scottastevenson Congrats. we just launched a layer that handles the long-tail review patterns university procurement throws at contract AI. Same kind of plateau Harvey wrote about in April.

0

0

0

0

58

21 days ago

@winstonweinberg 50% daily usage = the bottleneck Harvey named in April is now a daily event in production. We just launched the fix. Autoresearch loops that hit plateau → we break it.

0

0

0

0

383

22 days ago

https://t.co/OZJF1PTwpk

0

0

0

0

106

22 days ago

A former Latham associate just released an open-source legal AI tool that he says replicates much of what Harvey ($11B) and Legora ($5.5B) charge enterprise prices for. Built it in two weeks. The story isn't that he did it. It's that the conversation in every AI vendor renewal meeting just changed: from "is this magic?" to "what exactly am I paying for?" Writing something longer on this for tomorrow.

1

0

0

0

83

Last Seen Users on Sotwe

Trends for you

Most Popular Users