AI for science at @PeriodicLabs. Formerly, building AI climate models at Google. I also contribute to the scientific Python ecosystem (Xarray, NumPy, JAX).
Things I didn’t expect from becoming a dad:
- I’m now in a secret club with most of the world’s adult population
- I’ve been magically transformed into a morning person!
- Babies are genuinely lots of fun 😊
What's the best way for teams to share coding agent skills, especially across Claude/Codex?
Checking everything into a shared repos sounds great, until you accidentally trigger your colleague's bespoke workflow.
@_sholtodouglas@_arohan_ Claude is notably bad at PDF forms. When I had it look at my tax returns, it confidently misstated which lines I had filled out. I can fix this by asking the model to read PDFs as images but that’s really awkward.
Conductor will be OK -- GPT 5.5 is a better model than Opus 4.7, anyways, and OpenAI is much more accommodating of third-party usage of Codex.
Still, what a huge self-own from Anthropic!
Wow, this is incredibly disappointing to hear from Anthropic -- basically a death sentence for Claude in @conductor_build, which simply wraps Claude Code in a nicer GUI. Not sure how it got lumped in with autonomous agents like OpenClaw.
This means that third-party tools built on the Agent SDK like Conductor and OpenClaw work with your Claude plan, but will draw from your credit the same way your own scripts do.
I'm happy to report that this issue seems to resolved with recent versions of Codex and GPT 5.5 -- it kept on running experiments for me for 4 hours last night!
Dear Codex, when I tell you to run a massive parameter sweep & model exploration overnight, you should not report “done for the night” after 30 minutes when you self-report that it’s only 2/3 done!
Asked Claude:
'There's a meme called the "fix everything easily switch". What policies do you think are the best candidates for being a real fix everything switch in the US? Give me your top ten, your confidence, your reasoning, and why a given policy has not been implemented.'
I'm excited to finally open-source the model from my 2022 paper, “Forecasting Global Weather with Graph Neural Networks”.
Highlights:
• 10-day forecast in <1 min
• Initialize forecasts from ERA5 or IFS analysis
• Scripts for eval, sensitivities, & Hurricane Sandy
@RyanKeisler This paper was such a breakthrough! Reading it was the first time I believed that SOTA pure-AI weather prediction was possible.
Thanks for sharing, Ryan.
Things I didn’t expect from becoming a dad:
- I’m now in a secret club with most of the world’s adult population
- I’ve been magically transformed into a morning person!
- Babies are genuinely lots of fun 😊
My one minor ask for @thsottiaux -- could GPT be a little more proactive about explaining what it's doing?
Claude gives nice little updates that let me track it's progress. Codex will go through dozens of tool calls over tens of minutes and I have to trust it's still on track (which yes, it usually is!)
GPT 5.5 in recent versions of Codex feels like a real breakthough -- incredibly smart and gets things done.
I have a new default coding model. Sorry, Claude!
@bravo_abad@AnimaAnandkumar Really cool work, but as the authors note in the discussion, this GPU based approach still about 2x slower than sparse direct solvers on the CPU.
@charlieholtz Steering is such a win, thank you!
I wonder if you could do this next-level version or if it would require fixes at the Claude/Codex level: https://t.co/qa0NDoHaDB
Feature request for coding agents: Address my latest request *now* & then decide whether to interrupt.
My common pattern:
1. Agent does something iffy
2. Agent starts something slow (e.g., running tests)
3. I have a question about (1) that may or may not require interrupting (2)