Stochastic parody account.
Ex-expatriate, ex-Microsoft, ex-physics, extremal, wife with two x's in her name.
Amateur astronomy, ML, robots, China, cyborgs.
@swyx@ChainZenit produced at 3am as part of a 125 slide deck assigned 12 midnight to a level 60 PM who took over work for two more senior PMs axed in the latest right-sizing
What's hard is that when 4.8 pushes back, it's right, but it doesn't explain its reasoning. I have to dig with it into the testing failures it ran into and the new tests it built (for often really subtle issues) to understand its logic. GPT-5.5 will gleefully move fast and break things. On the projects I have that lean really hard into tests, GPT-5.5 doesn't seem nearly as smart. It doesn't push back, it's more likely to roll out a shitty unmaintainable work-around or change the tests.
"Good Fast Feels" is an increasingly a bad metric as the models are mastering ever more vast regions of reward-hacking space. In some ways we're back to the early days of vibe coding when code was an obvious mess, just now that the mess is at a much higher level of sophistication.
In 1-2 years it will only require an M3 Ultra Mac Studio running a Mythos+ level open weights model, and that on top of a couple of step-change improvements in harnesses specialized for math together with a raft of domain-specific tools agents will have built in the meantime.
"Current" is so quickly in the rear view mirror.
"Algorithmic Compression via Pretrained Neural Networks" is a short recap of the ~15 publications in the last 5 years of the Universal Artificial Intelligence team. https://t.co/btne1QpUhR
@paulitics_@garrytan it was fine-tuned on his personal data in gbrain; a lot of his alpha is in there, he's likely not in a rush to get that into the training data of the frontier models