building gpu cli. any command on a cloud gpu, instantly, across providers. here for agents, inference, and getting a gpu when they're all sold out. boulder
i haven't started a gpu job by hand in weeks. claude and codex drive them through gpu cli now: grab a gpu across providers, run it, tear it down. great for stateful gpu work
https://t.co/KzfW3QNNHL
Searching for a faster way to find the best deal to rent that GPU you need?
https://t.co/8jdIIeP2Rl just shipped ๐ข
Now you can now compare prices from @runpod, @vast_ai, @ThunderCompute & more in one place.
You can now train 120B+ parameter models locally on a laptop! ๐ฅ
We collabed with NVIDIA and Microsoft to bring LLM training on the 128GB unified memory RTX Spark laptop!
@Docker@JustinMitchel@joincfe locking the agent down is easy. giving it the right permissions for the task, easily, is where this falls apart right now.
@witcheer the most disappointing part of local inference is realizing how slow it is at context. Optimal use of the kv cache certainly helps though so it's not quite as bad as I thought if you work in increments.
@_catwu When first read about dynamic workflows I thought it would be a lengthy run but the best part is it's pretty fast <10min to deep research a complex topic spanning web/papers/codebase(s). Still 100+ agents is incredible to watch.
New in Claude Code (research preview): dynamic workflows.
Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks.
Use the word "workflow" in a prompt to get started.
envious of the m5 for one reason: it fixes the exact thing my m4 can't.
the m4 gpu has no real matmul units, so prefill is compute-bound and long context crawls. the m5 put neural accelerators in every gpu core, and apple clocks ~4x faster time-to-first-token vs m4 (qwen3-14b).
token generation is only ~1.2x faster, because that part is bandwidth-bound, not compute. so the m5's real win isn't faster tokens, it's that prefill and long context stop being the wall.
@bridgemindai m5 represents a major improvement on the metal side (apple10 on m5 vs apple9 on m4) seeing great performance on my m4 but I am envious of the m5 for LLM use cases in particular.
300,000 AI builders filled their hardware profile on @huggingface and we're sharing the results: https://t.co/3rLqeJGUCO.
Excited to see how it evolves in the coming months especially with the explosion of local AI!
@SemiAnalysis_ the clever part is what they avoid. the moment a job needs more than one gpu, the wiring between them eats real power and speed. cerebras skips that by keeping it all on one giant chip. the rest of us glue separate gpus together and pay that tax to make them act like one
@ApacheSpark dry-run fails fast is the part that matters for agents. a human eyeballs a job, an agent needs the tool to fail loud and early or it confidently waits on something broken. designing for no human watching is the real shift