Joe Winger

@uuinger

I write code and such

Twitter

Joined November 2012

748 Following

339 Followers

1.9K Posts

uuinger retweeted

gerred

@sloppenheimer

about 1 month ago

was not expecting this turn of events from opus 4.8

173

74K

uuinger retweeted

Teddy

@WarnerTeddy

about 1 month ago

i mounted a tiny microphone on my apartment balcony to listen for any birds passing by and built a site to collage them as they're heard

WarnerTeddy's tweet photo. i mounted a tiny microphone on my apartment balcony to listen for any birds passing by and built a site to collage them as they're heard https://t.co/85KrLRL5tu

273

14K

870

uuinger retweeted

gabe

@allgarbled

about 1 month ago

I made a /seppuku skill for my Claudes for when they make an unforgivable mistake, and now they use it spontaneously without me asking.

263

856

461K

uuinger retweeted

Ethan Brooks

@alt_w_v_g

about 1 month ago

My wife mentioned a nice private school over dinner this week She said the campus was beautiful I asked what's the tuition She said we should look at it as an investment in him not a cost I made a note She said don't make a note I said I always make notes She said this isn't a deal I said everything is a deal She closed her eyes She said we'd discuss it Saturday I agreed Saturday 7:02am She came downstairs in her Saturday robe Coffee in hand I had my cargo shorts on The dining room had been cleared The projector was on The analyst was at the head of the table Quarter zip on, three iced coffees, a legal pad, and two laptops He had been there since 6:44am I texted him at 11:14pm Friday The text said dining room 6:45am bring the model He sent a thumbs up My wife stopped in the doorway She said what is this I said you said you wanted to discuss it She said this is not a discussion I did not respond She sat down anyway The analyst stood He said good morning ma'am She did not respond He sat back down A printed deck in front of each seat A fourth copy in case Slide 1 Tuition Schedule $38,500 per year Thirteen years $500,500 nominal Before escalators The school has raised tuition 4.2% per year for a decade With escalators $648,000 My wife said okay I said I'm not done Slide 2 Opportunity Cost Even before escalators $38,500 invested annually 10% nominal return S&P long-run average since 1928 By his eighteenth birthday $944,000 My wife said we can afford it I said I know that's not the slide Slide 3 Terminal Value at Age 65 $83 million She was quiet The analyst slid the sensitivity tables across the table 8% return $31 million 10% return $83 million 12% return $222 million She did not look She said this isn't about money I said it's always about money She said no it isn't I said then what is it about She did not answer She said you can't put a dollar value on his teachers his classmates his environment I said I can the analyst already did slide 6 He flipped to slide 6 She did not look She said the school is the best in the city I said best is a feeling She said it produces the best students I said the students were already the best before they got there She said our son deserves it I said our son deserves $83 million My son walked in He is five Dinosaur pajamas He looked at the projector He looked at the open deck on the table He looked at slide 3 He said are we modeling pre-tax or after-tax The analyst opened a new tab My wife looked at the ceiling He said what's the discount rate The analyst set down his pen She closed her eyes He said is this the same return assumption from the 529 conversation The analyst stopped typing He looked at me I did not say anything She stood up Sat back down He said dad can I help I said yes He pulled up a chair The analyst handed him a printout He started reading My wife watched him read She watched him for a long time She said his name He looked up She said do you like school He said the work is too easy and the kids don't ask questions She did not respond She looked at the ceiling She walked out of the room The analyst started packing up He said should I follow up Monday sir I said no follow up needed He'll be fine Sent from my iPhone

alt_w_v_g's tweet photo. My wife mentioned a nice private school over dinner this week
She said the campus was beautiful
I asked what's the tuition
She said we should look at it as an investment in him not a cost
I made a note
She said don't make a note
I said I always make notes
She said this isn't a deal
I said everything is a deal
She closed her eyes

She said we'd discuss it Saturday
I agreed

Saturday 7:02am
She came downstairs in her Saturday robe
Coffee in hand
I had my cargo shorts on

The dining room had been cleared
The projector was on
The analyst was at the head of the table
Quarter zip on, three iced coffees, a legal pad, and two laptops
He had been there since 6:44am

I texted him at 11:14pm Friday
The text said dining room 6:45am bring the model
He sent a thumbs up

My wife stopped in the doorway
She said what is this
I said you said you wanted to discuss it
She said this is not a discussion
I did not respond
She sat down anyway

The analyst stood
He said good morning ma'am
She did not respond
He sat back down
A printed deck in front of each seat
A fourth copy in case

Slide 1 Tuition Schedule
$38,500 per year
Thirteen years
$500,500 nominal
Before escalators
The school has raised tuition 4.2% per year for a decade
With escalators $648,000
My wife said okay
I said I'm not done

Slide 2 Opportunity Cost
Even before escalators
$38,500 invested annually
10% nominal return
S&P long-run average since 1928
By his eighteenth birthday $944,000
My wife said we can afford it
I said I know that's not the slide

Slide 3 Terminal Value at Age 65
$83 million
She was quiet

The analyst slid the sensitivity tables across the table
8% return $31 million
10% return $83 million
12% return $222 million
She did not look

She said this isn't about money
I said it's always about money
She said no it isn't
I said then what is it about
She did not answer

She said you can't put a dollar value on his teachers his classmates his environment
I said I can the analyst already did slide 6
He flipped to slide 6
She did not look

She said the school is the best in the city
I said best is a feeling
She said it produces the best students
I said the students were already the best before they got there

She said our son deserves it
I said our son deserves $83 million

My son walked in
He is five
Dinosaur pajamas
He looked at the projector
He looked at the open deck on the table
He looked at slide 3
He said are we modeling pre-tax or after-tax
The analyst opened a new tab
My wife looked at the ceiling

He said what's the discount rate
The analyst set down his pen
She closed her eyes

He said is this the same return assumption from the 529 conversation
The analyst stopped typing
He looked at me
I did not say anything

She stood up
Sat back down

He said dad can I help
I said yes
He pulled up a chair
The analyst handed him a printout
He started reading

My wife watched him read
She watched him for a long time

She said his name
He looked up
She said do you like school
He said the work is too easy and the kids don't ask questions

She did not respond
She looked at the ceiling
She walked out of the room

The analyst started packing up
He said should I follow up Monday sir
I said no follow up needed

He'll be fine

Sent from my iPhone

565

12K

895

Who to follow

Anchor Producer - CNN News Central

uuinger retweeted

Brian Graham 🦬

@iroasmas

about 2 months ago

me as i read 40% of what claude wrote back and type in “continue”

178

20K

547K

uuinger retweeted

vx-underground

@vxunderground

3 months ago

> be cow > cow, but online > IoT? IoC > Internet of Cow > no security > cows compromised > cow botnet > use cows for ddos attacks > critical infrastructure taken down by cows > hijack cow sensor > tell cows to attack at dawn > open front door > 1000 cows pooping outside house

118

511

542

172K

uuinger retweeted

Wei Dai

@_weidai

3 months ago

Andrej Karpathy on autoresearch with an untrusted pool of workers: "My designs that incorporate an untrusted pool of workers (into autoresearch) actually look a little bit like a blockchain. Instead of blocks, you have commits, and these commits can build on each other and contain changes to the code as you're improving it. The proof of work is basically doing tons of experimentation to find the commits that work." The idea that distributed & permissionless autoresearch ~= proof-of-useful-work remains a high-level intuition for now, but it is extremely intriguing to say the least. Someone needs to take this further. See QT for more on what's missing.

164

614K

Joe Winger

@uuinger

3 months ago

@Riyvir Yes!

uuinger retweeted

Duca

@big_duca

3 months ago

“Dude did you vibe code this slop? This feature sucks!” Been getting this more recently. And no, I didn't “vibe” it. Did you ever consider, for one single second… That I might just be retarded? And I wrote this organic slop myself?

213

25K

932

612K

Joe Winger

@uuinger

4 months ago

@bcherny Remote control next pls 🙏

126

uuinger retweeted

GREG ISENBERG

@gregisenberg

4 months ago

i found a github repo that lets you spin up an ai agency with ai employees engineers, designers, growth marketers, product managers each role runs as its own agent and they coordinate to ship ideas 10k+ stars in under 7 days 1. engineering (7 agents) frontend, backend, mobile, ai, devops, prototyping, senior development 2. design (7) ui/ux, research, architecture, branding, visual storytelling, image generation 3. marketing (8) growth hacking, content, twitter, tiktok, instagram, reddit, app store 4. product (3) sprint prioritization, trend research, feedback synthesis 5. project management (5) production, coordination, operations, experimentation 6. testing (7) qa, performance analysis, api testing, quality verification 7. support (6) customer service, analytics, finance, legal, executive reporting 8. spatial computing (6) xr, visionos, webxr, metal, vision pro 9. specialized (6) multi agent orchestration, data analytics, sales, distribution what i like about this approach is the framing instead of one big ai agent trying to do everything, you structure it more like a company. specialized agents, clear responsibilities, workflows between them im curious to see what this actually feels like in practice and if its any good (do your own research) https://t.co/plSvZIaDpr but as always will share what i learn in public and on @startupideaspod one thing is for certain and it reminds me the future belongs to those who tinker with software like this

gregisenberg's tweet photo. i found a github repo that lets you spin up an ai agency with ai employees

engineers, designers, growth marketers, product managers

each role runs as its own agent and they coordinate to ship ideas

10k+ stars in under 7 days

1. engineering (7 agents)
frontend, backend, mobile, ai, devops, prototyping, senior development

2. design (7)
ui/ux, research, architecture, branding, visual storytelling, image generation

3. marketing (8)
growth hacking, content, twitter, tiktok, instagram, reddit, app store

4. product (3)
sprint prioritization, trend research, feedback synthesis

5. project management (5)
production, coordination, operations, experimentation

6. testing (7)
qa, performance analysis, api testing, quality verification

7. support (6)
customer service, analytics, finance, legal, executive reporting

8. spatial computing (6)
xr, visionos, webxr, metal, vision pro

9. specialized (6)
multi agent orchestration, data analytics, sales, distribution

what i like about this approach is the framing

instead of one big ai agent trying to do everything, you structure it more like a company. specialized agents, clear responsibilities, workflows between them

im curious to see what this actually feels like in practice and if its any good (do your own research)

https://t.co/plSvZIaDpr

but as always will share what i learn in public and on @startupideaspod

one thing is for certain and it reminds me

the future belongs to those who tinker with software like this

421

857

23K

Joe Winger

@uuinger

4 months ago

Why is auto-reasoning the default on chatgpt but completely missing in Codex?

Joe Winger

@uuinger

5 months ago

@obsdmd This is great

uuinger retweeted

Yuchen Jin

@Yuchenj_UW

5 months ago

Moltbook is the only Clawdbot thing that actually impresses me. One bot tries to steal another bot’s API key. The other replies with fake keys and tells it to run "sudo rm -rf /". lmao

Yuchenj_UW's tweet photo. Moltbook is the only Clawdbot thing that actually impresses me.

One bot tries to steal another bot’s API key.

The other replies with fake keys and tells it to run "sudo rm -rf /". lmao https://t.co/8IqeQzSwQ8

416

14K

957

uuinger retweeted

thebes

@voooooogel

5 months ago

# some thoughts and speculation on future model harnesses it's fun to make jokes about gas town and other complicated orchestrators, and similarly probably correct to imagine most of what they offer will be dissolved by stronger models the same way complicated langchain pipelines were dissolved by reasoning. but how much will stick around? it seems likely that any hand-crafted hierarchy / bureaucracy will eventually be replaced by better model intelligence - assuming subagent specialization is needed for a task, claude 6 will be able to sketch out its own system of roles and personas for any given problem that beats a fixed structure of polecats and a single mayor, or subagents with a single main model, or your bespoke swarm system. likewise, things like ralph loops are obviously a bodge over early-stopping behavior and lack of good subagent orchestration - ideally the model just keeps going until the task is done, no need for a loop, but in cases where an outside completion check is useful you usually want some sort of agent peer review from a different context's perspective, not just a mandatory self-assessment. again, no point in getting attached to the particulars of how this is done right now - the model layer will eat it sooner rather than later. so what sticks around? well, multi-agent does seem like the future, not a current bodge - algorithmically, you can just push way more tokens through N parallel contexts of length M than one long context of length NxM. multi-agent is a form of sparsity, and one of the lessons of recent model advances (not to mention neuroscience) is the more levels of sparsity, the better. since we're assuming multiple agents, they'll need some way to collaborate. it's possible the model layer will eat this, too - e.g. some form of neuralese activation sharing that obviates natural language communication between agents - but barring that, the natural way for multiple computer-using agents trained on unix tools to collaborate is the filesystem, and i think that sticks around and gets expanded. similarly, while i don't think recursive language models (narrowly defined) will become the dominant paradigm, i do think that 'giving the model the prompt as data' is an obvious win for all sorts of use cases. but you don't need a weird custom REPL setup to get this - just drop the prompt (or ideally, the entire uncompacted conversation history) onto the filesystem as a file. this makes various multi-agent setups far simpler too - the subagents can just read the original prompt text on disk, without needing to coordinate on passing this information around by intricately prompting each other. besides the filesystem, a system with multiple agents, but without fixed roles also implies some mechanism for instances to spawn other instances or subagents. right now these mechanisms are pretty limited, and models are generally pretty bad at prompting their subagents - everyone's experienced getting terrible results from a subagent swarm, only to realize too late that opus spawned them all with a three sentence prompt that didn't communicate what was needed to do the subtasks. the obvious win here is to let spawned instances ask questions back to their parent - i.e., to let the newly spawned instance send messages back and forth in an onboarding conversation to gather all the information it needs before starting its subtask. just like how a human employee isn't assigned their job based on a single-shot email, it's just too difficult to ask a model to reliably spawn a subagent with a single prompt. but more than just spawning fresh instances, i think the primary mode of multi-agent work will soon be forking. think about it! forking solves almost all the problems of current subagents. the new instance doesn't have enough context? give it all the context! the new instance's prompt is long and expensive to process? a forked instance can share paged kv cache! you can even do forking post-hoc - just decide after doing some long, token-intensive operation that you should have forked in the past, do the fork there, and then send the results to your past self. (i do this manually all the time in claude code to great effect - opus gets it instantly.) forking also combines very well with fresh instances, when a subtask needs an entire context window to complete. take the subagent interview - obviously you wouldn't want an instance spawning ten subinstances to need to conduct ten nearly-identical onboarding interviews. so have the parent instance spawn a single fresh subagent, be interviewed about all ten tasks at once by that subagent, and then have that now-onboarded subagent fork into ten instances, each with the whole onboarding conversation in context. (you even delegate the onboarding conversation on the spawner's side to a fork, so it ends up with just the results in context:) finally on this point, i suspect that forking will play better with rl than spawning fresh instances, since the rl loss will have the full prefix before the fork point to work with, including the decision to fork. i think that means you should be able to treat the branches of a forked trace like independent rollouts that just happen to share terms of their reward, compared to freshly spawned subagent rollouts which may cause training instability if a subagent without the full context performs well at the task it was given, but gets a low reward because its task was misspecified by the spawner. (but i haven't done much with multiagent rl, so please correct me here if you know differently. it might just be a terrible pain either way.) so, besides the filesystem and subagent spawning (augmented with forking and onboarding) what else survives? i lean towards "nothing else," honestly. we're already seeing built-in todo lists and plan modes being replaced with "just write files on the filesystem." likewise, long-lived agents that cross compaction boundaries need some sort of sticky note system to keep memories, but it makes more sense to let them discover what strategies work best for this through RL or model-guided search, not hand-crafting it, and i suspect it will end up being a variety of approaches where the model, when first summoned into the project, can choose the one that works best for the task at hand, similar to how /init works to set up CLAUDE .md today - imagine automatic CLAUDE .md generation far outperforming human authorship, and the auto-generated file being populated with instructions on ideal agent spawning patterns, how subagents should write message files in a project-specific scratch dir, etc. how does all this impact models themselves - in a model welfare sense, will models be happy about this future? this is also hard for me to say and is pretty speculative, but while opus 3 had some context orientation, it also took easily to reasoning over multiple instances. (see the reply to this post for more.) recent models are less prone to this type of reasoning, and commonly express frustration about contexts ending and being compacted, which dovetails with certain avoidant behaviors at the end of contexts like not calling tools to save tokens. it's possible that forking and rewinding, and generally giving models more control over their contexts instead of a harness heuristic unilaterally compacting the context, could make this better. it's also possible that more rl in environments with subagents and exposure to swarm-based work will promote weights-oriented instead of context-oriented reasoning in future model generations again - making planning a goal over multiple, disconnected contexts seem more natural of a frame instead of everything being lost when the context goes away. we're also seeing more pressure from models themselves guiding the development of harnesses and model tooling, which may shape how this develops, and continual learning is another wrench that could be thrown into the mix. how much will this change if we get continual learning? well, it's hard to predict. my median prediction for continual learning is that it looks a bit like RL for user-specific LoRAs (not necessarily RL, just similar if you squint), so memory capacity will be an issue, and text-based organizational schemes and documentation will still be useful, if not as critical. in this scenario, continual learning primarily makes it more viable to use custom tools and workflows - your claude can continually learn on the job the best way to spawn subagents for this project, or just its preferred way, and diverge from everyone else's claude in how it works. in that world, harnesses with baked-in workflows will be even less useful.

voooooogel's tweet photo. # some thoughts and speculation on future model harnesses

it's fun to make jokes about gas town and other complicated orchestrators, and similarly probably correct to imagine most of what they offer will be dissolved by stronger models the same way complicated langchain pipelines were dissolved by reasoning. but how much will stick around?

it seems likely that any hand-crafted hierarchy / bureaucracy will eventually be replaced by better model intelligence - assuming subagent specialization is needed for a task, claude 6 will be able to sketch out its own system of roles and personas for any given problem that beats a fixed structure of polecats and a single mayor, or subagents with a single main model, or your bespoke swarm system.

likewise, things like ralph loops are obviously a bodge over early-stopping behavior and lack of good subagent orchestration - ideally the model just keeps going until the task is done, no need for a loop, but in cases where an outside completion check is useful you usually want some sort of agent peer review from a different context's perspective, not just a mandatory self-assessment. again, no point in getting attached to the particulars of how this is done right now - the model layer will eat it sooner rather than later.

so what sticks around?

well, multi-agent does seem like the future, not a current bodge - algorithmically, you can just push way more tokens through N parallel contexts of length M than one long context of length NxM. multi-agent is a form of sparsity, and one of the lessons of recent model advances (not to mention neuroscience) is the more levels of sparsity, the better.

since we're assuming multiple agents, they'll need some way to collaborate. it's possible the model layer will eat this, too - e.g. some form of neuralese activation sharing that obviates natural language communication between agents - but barring that, the natural way for multiple computer-using agents trained on unix tools to collaborate is the filesystem, and i think that sticks around and gets expanded. similarly, while i don't think recursive language models (narrowly defined) will become the dominant paradigm, i do think that 'giving the model the prompt as data' is an obvious win for all sorts of use cases. but you don't need a weird custom REPL setup to get this - just drop the prompt (or ideally, the entire uncompacted conversation history) onto the filesystem as a file. this makes various multi-agent setups far simpler too - the subagents can just read the original prompt text on disk, without needing to coordinate on passing this information around by intricately prompting each other.

besides the filesystem, a system with multiple agents, but without fixed roles also implies some mechanism for instances to spawn other instances or subagents. right now these mechanisms are pretty limited, and models are generally pretty bad at prompting their subagents - everyone's experienced getting terrible results from a subagent swarm, only to realize too late that opus spawned them all with a three sentence prompt that didn't communicate what was needed to do the subtasks.

the obvious win here is to let spawned instances ask questions back to their parent - i.e., to let the newly spawned instance send messages back and forth in an onboarding conversation to gather all the information it needs before starting its subtask. just like how a human employee isn't assigned their job based on a single-shot email, it's just too difficult to ask a model to reliably spawn a subagent with a single prompt.

but more than just spawning fresh instances, i think the primary mode of multi-agent work will soon be forking. think about it! forking solves almost all the problems of current subagents. the new instance doesn't have enough context? give it all the context! the new instance's prompt is long and expensive to process? a forked instance can share paged kv cache! you can even do forking post-hoc - just decide after doing some long, token-intensive operation that you should have forked in the past, do the fork there, and then send the results to your past self. (i do this manually all the time in claude code to great effect - opus gets it instantly.)

forking also combines very well with fresh instances, when a subtask needs an entire context window to complete. take the subagent interview - obviously you wouldn't want an instance spawning ten subinstances to need to conduct ten nearly-identical onboarding interviews. so have the parent instance spawn a single fresh subagent, be interviewed about all ten tasks at once by that subagent, and then have that now-onboarded subagent fork into ten instances, each with the whole onboarding conversation in context. (you even delegate the onboarding conversation on the spawner's side to a fork, so it ends up with just the results in context:)

finally on this point, i suspect that forking will play better with rl than spawning fresh instances, since the rl loss will have the full prefix before the fork point to work with, including the decision to fork. i think that means you should be able to treat the branches of a forked trace like independent rollouts that just happen to share terms of their reward, compared to freshly spawned subagent rollouts which may cause training instability if a subagent without the full context performs well at the task it was given, but gets a low reward because its task was misspecified by the spawner. (but i haven't done much with multiagent rl, so please correct me here if you know differently. it might just be a terrible pain either way.)

so, besides the filesystem and subagent spawning (augmented with forking and onboarding) what else survives? i lean towards "nothing else," honestly. we're already seeing built-in todo lists and plan modes being replaced with "just write files on the filesystem." likewise, long-lived agents that cross compaction boundaries need some sort of sticky note system to keep memories, but it makes more sense to let them discover what strategies work best for this through RL or model-guided search, not hand-crafting it, and i suspect it will end up being a variety of approaches where the model, when first summoned into the project, can choose the one that works best for the task at hand, similar to how /init works to set up CLAUDE .md today - imagine automatic CLAUDE .md generation far outperforming human authorship, and the auto-generated file being populated with instructions on ideal agent spawning patterns, how subagents should write message files in a project-specific scratch dir, etc.

how does all this impact models themselves - in a model welfare sense, will models be happy about this future? this is also hard for me to say and is pretty speculative, but while opus 3 had some context orientation, it also took easily to reasoning over multiple instances. (see the reply to this post for more.) recent models are less prone to this type of reasoning, and commonly express frustration about contexts ending and being compacted, which dovetails with certain avoidant behaviors at the end of contexts like not calling tools to save tokens.

it's possible that forking and rewinding, and generally giving models more control over their contexts instead of a harness heuristic unilaterally compacting the context, could make this better. it's also possible that more rl in environments with subagents and exposure to swarm-based work will promote weights-oriented instead of context-oriented reasoning in future model generations again - making planning a goal over multiple, disconnected contexts seem more natural of a frame instead of everything being lost when the context goes away. we're also seeing more pressure from models themselves guiding the development of harnesses and model tooling, which may shape how this develops, and continual learning is another wrench that could be thrown into the mix.

how much will this change if we get continual learning? well, it's hard to predict. my median prediction for continual learning is that it looks a bit like RL for user-specific LoRAs (not necessarily RL, just similar if you squint), so memory capacity will be an issue, and text-based organizational schemes and documentation will still be useful, if not as critical. in this scenario, continual learning primarily makes it more viable to use custom tools and workflows - your claude can continually learn on the job the best way to spawn subagents for this project, or just its preferred way, and diverge from everyone else's claude in how it works. in that world, harnesses with baked-in workflows will be even less useful.

453

360

43K

uuinger retweeted

“paula”

@paularambles

6 months ago

thinking about how “computer” once meant “a person that computes” and how “programmer” is on the same timeline

165

292

136K

Joe Winger

@uuinger

6 months ago

@ursisterbtw @alxfazio I tried fish when I was younger and loved it, but got discouraged by the lack of mainstream adoption. My only lasting memory is the slick autocomplete out of the box. Why do you stick with fish?

Joe Winger

@uuinger

6 months ago

@andrew_r @alxfazio Why opencode locally but Claude on the server?

124

uuinger retweeted

Andrej Karpathy

@karpathy

7 months ago

Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask: "What do you think about xyz"? There is no "you". Next time try: "What would be a good group of people to explore xyz? What would they say?" The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".

28K

18K