> be the SemiAnalysis note
> jan 2026
> supply chain degens check HBM4 samples
> $MU pin speeds are actually cooked
> reduce their Rubin share to zero
> "Nvidia not ordering from them”
> market opens the note
> “MU is getting cut from Rubin lmao”
> headlines go nuclear
> everyone reposts the zero share line
> 0 people read the DDR margins footnote
> march GTC
> Micron: “we’re already in volume production btw”
>11 Gbps achieved
> shipments started Q1
> Jensen: “all three suppliers certified for Rubin HBM4”
> semianalysis: “you didn’t read the full note”
> market: “so the 0 share thing was temporary??”
> SOCAMM note drops
> “most systems won’t use max config”
> 28 TB instead of 55 TB
> market: “memory demand getting halved!!”
> semiAnalysis: “bro it’s just realistic config”
> early sample risk was real
> HBM4 is actually harder
> gaps close faster than x updates
> wafer reallocation thesis still holds
> SK moved first and got paid
> Micron just showed it can still ship
> public excerpts move stocks in hours
> actual execution shows up in earnings months later
> be the market
> reads the spicy line
> sells the dip
> company drops volume numbers
> “anyway”
> the lines are already converted
> now it’s just who doesn’t fuck up yields and packaging when new supply hits in 2027
>you_didnt_read_the_full_note_but_neither_did_the_timeline.txt
The cheapest way to cut your AI bill isn't a faster chip, a smaller model, or a clever quantization scheme. It's giving the model a cleaner desk to work from.
Karpathy described his own setup: an Obsidian knowledge base, clean markdown folders, a wiki the model can pull from instead of being handed the same sprawling context over and over. The reported result was a 70 to 90% cut in token burn, replicable in an afternoon, with no new infrastructure at all.
The reason it works is the same memory truth that runs under everything in inference. You pay for every token of context you send, every time you send it. If your context is bloated, redundant, and rebuilt from scratch on every call, you're buying the same thought again and again.
Caching and reusing that state, or just structuring it so the model only sees what it needs, is pure margin.
This is the lever most teams miss. Routing simple work to cheap models, caching aggressively, and controlling context will save more, durably, than waiting for the per-token price to drop another notch.
Price drops are out of your hands but context discipline is entirely in them.
Everyone wants to optimize the model but the money is in optimizing what you feed it.
one of the quotes i find most inspiring on a hard day:
"Whatever your hand finds to do, do it with all your might, for in the realm of the dead, where you are going, there is neither working nor planning nor knowledge nor wisdom"
Ecclesiastes 9:10
Discussion we had this evening. In NYC we are bringing back the IRISH HELLO. AKA THE 90s. Here’s how it works. You and your friends all share location. Sporadically throughout the day (lunch break, coffee walk, after work, before work) you check to see where they are. If you’re close to someone, you just show up at their location. Aka bring them a coffee to work, stop by their apt unannounced, kick shoes off and spill the tea. Spontaneously grab a drink or dinner because you’re in the same vicinity. And even if you’re not, meet in the middle. It’s easy to get around here. We’re nostalgic for a time that does not exist and yet we have the means to create it and still we refuse. Everyone is too cool or too nonchalant or too scared to appear desperate. Who gives a fuck. I think we should all be a little more desperate. You’re alive. You need people and people need you. Be the one who calls. Be the one who makes the plan, who sets the tone. I guarantee you’ll be surprised at who shows up. It’s better to be the person who tries than the person who doesn’t.
HELLO I AM HERE.
This is basically what philosophers and anthropologists mean when they say that the 'individual' is a relatively modern construct, and that many other cultures conceive of people in a more porous, contingent, interwoven fashion. Weird as it sounds, it's just observably true.
how to life maxx more:
> get off your phone
> say yes to spontaneous plans even when you're tired - some of the best nights are unplanned
> talk to strangers - at coffee shops, events, literally anywhere. serendipity maxx
> make a bucket list and work your way through said bucket list!!
> stop opting for boring hangs. switch things up with your friends. try something new!!
> start a random hobby just for fun - pottery, dance, improv, cooking. not everything needs to "be productive" ok??
> be 5% more silly in your life. dance in your room, sing badly in the car, crack a bad joke. it's not that serious. grow the silly muscle
> surround yourself with people who make you feel lighter - your time and energy is precious
> don't forget the basics: move your body, get sunlight, take your vitamins, eat well, sleep
> your time to live life is happening NOW so stop saving it for later!!
get off those phones & out into the real world people!!
lets go PLAY!!!