My wife mentioned a nice private school over dinner this week
She said the campus was beautiful
I asked what's the tuition
She said we should look at it as an investment in him not a cost
I made a note
She said don't make a note
I said I always make notes
She said this isn't a deal
I said everything is a deal
She closed her eyes
She said we'd discuss it Saturday
I agreed
Saturday 7:02am
She came downstairs in her Saturday robe
Coffee in hand
I had my cargo shorts on
The dining room had been cleared
The projector was on
The analyst was at the head of the table
Quarter zip on, three iced coffees, a legal pad, and two laptops
He had been there since 6:44am
I texted him at 11:14pm Friday
The text said dining room 6:45am bring the model
He sent a thumbs up
My wife stopped in the doorway
She said what is this
I said you said you wanted to discuss it
She said this is not a discussion
I did not respond
She sat down anyway
The analyst stood
He said good morning ma'am
She did not respond
He sat back down
A printed deck in front of each seat
A fourth copy in case
Slide 1 Tuition Schedule
$38,500 per year
Thirteen years
$500,500 nominal
Before escalators
The school has raised tuition 4.2% per year for a decade
With escalators $648,000
My wife said okay
I said I'm not done
Slide 2 Opportunity Cost
Even before escalators
$38,500 invested annually
10% nominal return
S&P long-run average since 1928
By his eighteenth birthday $944,000
My wife said we can afford it
I said I know that's not the slide
Slide 3 Terminal Value at Age 65
$83 million
She was quiet
The analyst slid the sensitivity tables across the table
8% return $31 million
10% return $83 million
12% return $222 million
She did not look
She said this isn't about money
I said it's always about money
She said no it isn't
I said then what is it about
She did not answer
She said you can't put a dollar value on his teachers his classmates his environment
I said I can the analyst already did slide 6
He flipped to slide 6
She did not look
She said the school is the best in the city
I said best is a feeling
She said it produces the best students
I said the students were already the best before they got there
She said our son deserves it
I said our son deserves $83 million
My son walked in
He is five
Dinosaur pajamas
He looked at the projector
He looked at the open deck on the table
He looked at slide 3
He said are we modeling pre-tax or after-tax
The analyst opened a new tab
My wife looked at the ceiling
He said what's the discount rate
The analyst set down his pen
She closed her eyes
He said is this the same return assumption from the 529 conversation
The analyst stopped typing
He looked at me
I did not say anything
She stood up
Sat back down
He said dad can I help
I said yes
He pulled up a chair
The analyst handed him a printout
He started reading
My wife watched him read
She watched him for a long time
She said his name
He looked up
She said do you like school
He said the work is too easy and the kids don't ask questions
She did not respond
She looked at the ceiling
She walked out of the room
The analyst started packing up
He said should I follow up Monday sir
I said no follow up needed
He'll be fine
Sent from my iPhone
> be cow
> cow, but online
> IoT? IoC
> Internet of Cow
> no security
> cows compromised
> cow botnet
> use cows for ddos attacks
> critical infrastructure taken down by cows
> hijack cow sensor
> tell cows to attack at dawn
> open front door
> 1000 cows pooping outside house
Andrej Karpathy on autoresearch with an untrusted pool of workers:
"My designs that incorporate an untrusted pool of workers (into autoresearch) actually look a little bit like a blockchain.
Instead of blocks, you have commits, and these commits can build on each other and contain changes to the code as you're improving it.
The proof of work is basically doing tons of experimentation to find the commits that work."
The idea that distributed & permissionless autoresearch ~= proof-of-useful-work remains a high-level intuition for now, but it is extremely intriguing to say the least.
Someone needs to take this further. See QT for more on what's missing.
“Dude did you vibe code this slop? This feature sucks!”
Been getting this more recently.
And no, I didn't “vibe” it.
Did you ever consider, for one single second…
That I might just be retarded?
And I wrote this organic slop myself?
i found a github repo that lets you spin up an ai agency with ai employees
engineers, designers, growth marketers, product managers
each role runs as its own agent and they coordinate to ship ideas
10k+ stars in under 7 days
1. engineering (7 agents)
frontend, backend, mobile, ai, devops, prototyping, senior development
2. design (7)
ui/ux, research, architecture, branding, visual storytelling, image generation
3. marketing (8)
growth hacking, content, twitter, tiktok, instagram, reddit, app store
4. product (3)
sprint prioritization, trend research, feedback synthesis
5. project management (5)
production, coordination, operations, experimentation
6. testing (7)
qa, performance analysis, api testing, quality verification
7. support (6)
customer service, analytics, finance, legal, executive reporting
8. spatial computing (6)
xr, visionos, webxr, metal, vision pro
9. specialized (6)
multi agent orchestration, data analytics, sales, distribution
what i like about this approach is the framing
instead of one big ai agent trying to do everything, you structure it more like a company. specialized agents, clear responsibilities, workflows between them
im curious to see what this actually feels like in practice and if its any good (do your own research)
https://t.co/plSvZIaDpr
but as always will share what i learn in public and on @startupideaspod
one thing is for certain and it reminds me
the future belongs to those who tinker with software like this
Moltbook is the only Clawdbot thing that actually impresses me.
One bot tries to steal another bot’s API key.
The other replies with fake keys and tells it to run "sudo rm -rf /". lmao
# some thoughts and speculation on future model harnesses
it's fun to make jokes about gas town and other complicated orchestrators, and similarly probably correct to imagine most of what they offer will be dissolved by stronger models the same way complicated langchain pipelines were dissolved by reasoning. but how much will stick around?
it seems likely that any hand-crafted hierarchy / bureaucracy will eventually be replaced by better model intelligence - assuming subagent specialization is needed for a task, claude 6 will be able to sketch out its own system of roles and personas for any given problem that beats a fixed structure of polecats and a single mayor, or subagents with a single main model, or your bespoke swarm system.
likewise, things like ralph loops are obviously a bodge over early-stopping behavior and lack of good subagent orchestration - ideally the model just keeps going until the task is done, no need for a loop, but in cases where an outside completion check is useful you usually want some sort of agent peer review from a different context's perspective, not just a mandatory self-assessment. again, no point in getting attached to the particulars of how this is done right now - the model layer will eat it sooner rather than later.
so what sticks around?
well, multi-agent does seem like the future, not a current bodge - algorithmically, you can just push way more tokens through N parallel contexts of length M than one long context of length NxM. multi-agent is a form of sparsity, and one of the lessons of recent model advances (not to mention neuroscience) is the more levels of sparsity, the better.
since we're assuming multiple agents, they'll need some way to collaborate. it's possible the model layer will eat this, too - e.g. some form of neuralese activation sharing that obviates natural language communication between agents - but barring that, the natural way for multiple computer-using agents trained on unix tools to collaborate is the filesystem, and i think that sticks around and gets expanded. similarly, while i don't think recursive language models (narrowly defined) will become the dominant paradigm, i do think that 'giving the model the prompt as data' is an obvious win for all sorts of use cases. but you don't need a weird custom REPL setup to get this - just drop the prompt (or ideally, the entire uncompacted conversation history) onto the filesystem as a file. this makes various multi-agent setups far simpler too - the subagents can just read the original prompt text on disk, without needing to coordinate on passing this information around by intricately prompting each other.
besides the filesystem, a system with multiple agents, but without fixed roles also implies some mechanism for instances to spawn other instances or subagents. right now these mechanisms are pretty limited, and models are generally pretty bad at prompting their subagents - everyone's experienced getting terrible results from a subagent swarm, only to realize too late that opus spawned them all with a three sentence prompt that didn't communicate what was needed to do the subtasks.
the obvious win here is to let spawned instances ask questions back to their parent - i.e., to let the newly spawned instance send messages back and forth in an onboarding conversation to gather all the information it needs before starting its subtask. just like how a human employee isn't assigned their job based on a single-shot email, it's just too difficult to ask a model to reliably spawn a subagent with a single prompt.
but more than just spawning fresh instances, i think the primary mode of multi-agent work will soon be forking. think about it! forking solves almost all the problems of current subagents. the new instance doesn't have enough context? give it all the context! the new instance's prompt is long and expensive to process? a forked instance can share paged kv cache! you can even do forking post-hoc - just decide after doing some long, token-intensive operation that you should have forked in the past, do the fork there, and then send the results to your past self. (i do this manually all the time in claude code to great effect - opus gets it instantly.)
forking also combines very well with fresh instances, when a subtask needs an entire context window to complete. take the subagent interview - obviously you wouldn't want an instance spawning ten subinstances to need to conduct ten nearly-identical onboarding interviews. so have the parent instance spawn a single fresh subagent, be interviewed about all ten tasks at once by that subagent, and then have that now-onboarded subagent fork into ten instances, each with the whole onboarding conversation in context. (you even delegate the onboarding conversation on the spawner's side to a fork, so it ends up with just the results in context:)
finally on this point, i suspect that forking will play better with rl than spawning fresh instances, since the rl loss will have the full prefix before the fork point to work with, including the decision to fork. i think that means you should be able to treat the branches of a forked trace like independent rollouts that just happen to share terms of their reward, compared to freshly spawned subagent rollouts which may cause training instability if a subagent without the full context performs well at the task it was given, but gets a low reward because its task was misspecified by the spawner. (but i haven't done much with multiagent rl, so please correct me here if you know differently. it might just be a terrible pain either way.)
so, besides the filesystem and subagent spawning (augmented with forking and onboarding) what else survives? i lean towards "nothing else," honestly. we're already seeing built-in todo lists and plan modes being replaced with "just write files on the filesystem." likewise, long-lived agents that cross compaction boundaries need some sort of sticky note system to keep memories, but it makes more sense to let them discover what strategies work best for this through RL or model-guided search, not hand-crafting it, and i suspect it will end up being a variety of approaches where the model, when first summoned into the project, can choose the one that works best for the task at hand, similar to how /init works to set up CLAUDE .md today - imagine automatic CLAUDE .md generation far outperforming human authorship, and the auto-generated file being populated with instructions on ideal agent spawning patterns, how subagents should write message files in a project-specific scratch dir, etc.
how does all this impact models themselves - in a model welfare sense, will models be happy about this future? this is also hard for me to say and is pretty speculative, but while opus 3 had some context orientation, it also took easily to reasoning over multiple instances. (see the reply to this post for more.) recent models are less prone to this type of reasoning, and commonly express frustration about contexts ending and being compacted, which dovetails with certain avoidant behaviors at the end of contexts like not calling tools to save tokens.
it's possible that forking and rewinding, and generally giving models more control over their contexts instead of a harness heuristic unilaterally compacting the context, could make this better. it's also possible that more rl in environments with subagents and exposure to swarm-based work will promote weights-oriented instead of context-oriented reasoning in future model generations again - making planning a goal over multiple, disconnected contexts seem more natural of a frame instead of everything being lost when the context goes away. we're also seeing more pressure from models themselves guiding the development of harnesses and model tooling, which may shape how this develops, and continual learning is another wrench that could be thrown into the mix.
how much will this change if we get continual learning? well, it's hard to predict. my median prediction for continual learning is that it looks a bit like RL for user-specific LoRAs (not necessarily RL, just similar if you squint), so memory capacity will be an issue, and text-based organizational schemes and documentation will still be useful, if not as critical. in this scenario, continual learning primarily makes it more viable to use custom tools and workflows - your claude can continually learn on the job the best way to spawn subagents for this project, or just its preferred way, and diverge from everyone else's claude in how it works. in that world, harnesses with baked-in workflows will be even less useful.
@ursisterbtw@alxfazio I tried fish when I was younger and loved it, but got discouraged by the lack of mainstream adoption. My only lasting memory is the slick autocomplete out of the box. Why do you stick with fish?
Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask:
"What do you think about xyz"?
There is no "you". Next time try:
"What would be a good group of people to explore xyz? What would they say?"
The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".