amitunix @amitunix - Twitter Profile

amitunix retweeted

2 days ago

An absolute must-read if you're building an agent! That's how Shortcut is built, and I would certainly apply this design to many other kinds of agents. ✨

0

537

19

2K

190K

amitunix retweeted

Codez

@0xCodez

2 days ago

Anthropic Managed Agents team: "Fable 5 is our best model for running self-improving agent systems. Add /loops, dynamic workflows, dreaming and you are unstoppable" in 13-minutes, Anthropic team shows how to build self-improving agent systems with Fable 5 from scratch. Worth more than a $500 agent building course. Live from the last Anthropic stage in Japan. Unpublished.

40

2K

255

5K

453K

amitunix retweeted

Lian Lim | Dashboard & AI Automation Expert

@dashboardlim

3 days ago

https://t.co/jdHH8d2vDr

1

125

29

184

11K

amitunix retweeted

Andrej Karpathy

@karpathy

4 days ago

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

1K

25K

2K

6K

3M

Who to follow

Scott Bleackley 🇨🇦

@folkstone

Social Implications Of Emerging Technologies. People, Communities, Resilience in Complex Systems, Deep Learning, Apple iOS & OSX & Individual & Business.

jenny.ab.goodman.mpa.mhsa*tall.white.slow.driver🐢

@BalbiJen

MSc Comm.MScIT Health,fan of unendorsed&genuine brilliant https://t.co/GNvftKci0r ma to unparalled stellar superior incredible kids {anti-graft🌊}

amitunix retweeted

3 days ago

Lots of people asked how I used Fable to edit its own launch video so I made a video about that! TLDR it wrote a lot of code & tool calls to use transcription services, ffmpeg, do colorgrading, use the figma mcp, make remotion UI and render it. I didn't touch a video editor.

291

9K

615

13K

951K

amitunix retweeted

Nuno Campos

@nfcampos

3 days ago

https://t.co/GrSOl3uMb2

4

145

17

240

17K

amitunix retweeted

The Startup Ideas Podcast (SIP) 🧃

@startupideaspod

4 days ago

https://t.co/NzFHA9Svcl

4

126

15

200

8K

amitunix retweeted

J.B.

@VibeMarketer_

5 days ago

WTF is a loop visualized

22

1K

124

2K

199K

amitunix retweeted

rody

@0x_rody

6 days ago

Anthropic's main manager: "Nobody types prompts from scratch. The commands should be live in the project." In 26 minutes, she walks through how Anthropic runs Claude Code, including the command library every new dev inherits on day one. Watch the full talk, then save the config below👇

25

1K

120

3K

261K

amitunix retweeted

Shann³

@shannholmberg

5 days ago

what is agent looping for the last two years we prompted agents one task at a time. that is starting to change instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up at its simplest, looping is one agent working on itself: > researches > drafts > checks the draft against a goal > fixes what is weak > runs that cycle again until the work clears the requirements you are not prompting each step anymore. the agent repeats the cycle for you the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end you create a goal, and the system runs the loop until it finishes within the reqs you set open and closed looping: OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine CLOSED LOOPING is bounded. a human designs the end-to-end path first: > clear goal > defined steps > an eval at each step > a point where it stops or hands back to you (and feeds back performance data) the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight. for most marketing work, closed is the one that pays off today. > the orchestrator owns the goal > the specialists own the steps > the subagents do the narrow work > an eval gate make sure its not slop

shannholmberg's tweet photo. what is agent looping

for the last two years we prompted agents one task at a time. that is starting to change

instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met

looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up

at its simplest, looping is one agent working on itself:

> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements

you are not prompting each step anymore. the agent repeats the cycle for you

the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents

the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met

one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end

you create a goal, and the system runs the loop until it finishes within the reqs you set

open and closed looping:

OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out

this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time

the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine

CLOSED LOOPING is bounded. a human designs the end-to-end path first:

> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)

the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.

for most marketing work, closed is the one that pays off today.

> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop

200

6K

695

10K

737K

amitunix retweeted

Viv

@Vtrivedy10

5 days ago

/goal as a service?? i'm picturing /goal orchestrating other /goal with good verification - roughly like this

25

213

20

221

27K

amitunix retweeted

Anderson Cooper 360°

@AC360

7 days ago

Social psychologist Jonathan Haidt, author of "The Anxious Generation," shares why he's encouraging more techno-skepticism and warns of the detrimental impact technology can have on kids.

29

362

91

114

93K

amitunix retweeted

Khairallah AL-Awady

@eng_khairallah1

7 days ago

Anthropic engineer: "You're not supposed to prompt Claude. You're supposed to build a system that prompts itself." this is one of the best workflows I've seen in a long time in this video he breaks down exactly how most people are using Claude: - the 14% you lose to CLAUDE.md before typing a word - the plugins that 95% of users have never installed - the caching setup that keeps it at 95% hit rate and almost free - why starting every chat from zero is the slowest way to use Claude if you've been using Claude for more than a month and never left the chat window, you've been using one project when you could be running a team of them instead of another show tonight, watch this make sure to bookmark it before it gets lost in your feed full guide in the article below

110

7K

834

20K

1M

amitunix retweeted

rody

@0x_rody

7 days ago

original link: https://t.co/yS1wsisVOa

0

14

1

17

3K

amitunix retweeted

Viv

@Vtrivedy10

7 days ago

imo there’s a pretty solid default recipe that everyone should use to optimize a system of Agent = Model + Harness you should “train” both 1. Build v1 agent using a sensible base harness and some task specific prompting + tools 2. Harness Engineering using eval tasks that roughly match prod this is often enough - most companies can get acceptable perf doing this. then they collect traces, mine them for patterns, and make slight tweaks from there 3. SFT using data collected from traces) or synthetic data. Often is good candidate for “distillation tasks” to train a cheaper model while maintaining existing performance 4. RL if you have the bandwidth and ability and desire to create environments and designing rewards that represents the tasks you want your agent to be good at. Push past the SFT behavior of “copying” data from existing model to pushing past in some dimension 5. Light harness engineering again to squeeze any more juice (ex: slight prompting) using the trained model that’s better at your task distribution this loop will largely be productized as a general purpose recipe for building and improving agents we’re still in the earliest innings of the world’s companies getting comfortable with steps 1-2 of this loop. Harness engineering will probably be the dominant way ppl will optimize agents but i expect a large number of companies to onboard through this entire loop on some trial project of interest in the next year

25

394

45

677

57K

amitunix retweeted

Shubham Saboo

@Saboo_Shubham_

9 days ago

All UI will be AI. AI Agents need generative UI to EXPRESS not another paragraph of text.

19

286

36

464

78K

amitunix retweeted

cat

@_catwu

10 days ago

Excited to share how Anthropic's data team has automated 95% of business analytics queries with Claude. Blog post covers how we approach evals, ablations, and online validation!

59

3K

120

3K

878K

amitunix retweeted

ClaudeDevs

@ClaudeDevs

11 days ago

How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop: