AI research is a series of next-step decisions. We looked at sessions where a human researcher took a wrong turn, showed Claude the session up to that point, and asked it what to do next. Mythos Preview improved on humans 64% of the time—up from 22% in 2024.
@usr_bin_roygbiv we get normal performance in the morning and great performance from noon to like 9, after that it really shits the bed. i spend the morning speccing and architecting and the afternoon goes to pre-planned autonomous runs
re bread and beer: i'm celiac
@arm64le@JasonBotterill what are you possibly doing with spark that you can’t do with 3.5 flash? it’s not great at coding but as a general intelligence speed model it’s very decent
Claude bugged out and I saw the raw CoT of the summarizer and wtf are they putting in those system promps
“Let me also check if I’m not adding any additional details. I’m not adding any additional details, so I think I’m not adding any additional details.”
OpenAI slept on coding, so Anthropic stole the crown.
Anthropic didn’t secure enough GPUs/TPUs to turn that lead into a monopoly. Now Codex has caught up.
Gemini will catch up too. It’s only a matter of time.
AI coding is becoming a three-body problem.
In a footnote to Capital, Marx brought up a testimony of a French worker. In France, he assumed he could do just one specific type of job (= printing). In California, he discovered that he in fact can do anything & all the constraints he faced in Europe were purely artificial
right monitor is 20 codex instances. left monitor has situational awareness on autoscroll. center monitor is my word doc mainfesto. two keyboards, one for both hands. left airpod is dwarkesh x eric jang, 3x speed. right airpod tchaikovsky. meta quest 3 overlays my HUD: heart rate, words per minute, blood caffeine content. one assistant hooks me to an iv of chinese peptides, cocktail. the other feeds me kimchi. my unitree robot steps in when my posture slouches. blue light beams down on me in my herman miller chair. efficiency. no wasted movement. no wasted thoughts. think you can keep up with me? good luck. this is just for my morning emails.