James Lal

Verified account

@lightsofapollo2

building gpu cli. any command on a cloud gpu, instantly, across providers. here for agents, inference, and getting a gpu when they're all sold out. boulder

Boulder, Co

Joined February 2026

73 Following

6 Followers

56 Posts

Pinned Tweet

@lightsofapollo2

17 days ago

i haven't started a gpu job by hand in weeks. claude and codex drive them through gpu cli now: grab a gpu across providers, run it, tear it down. great for stateful gpu work https://t.co/KzfW3QNNHL

0

1

0

0

61

lightsofapollo2 retweeted

GPU CLI @gpucli

6 days ago

Searching for a faster way to find the best deal to rent that GPU you need? https://t.co/8jdIIeP2Rl just shipped 🚢 Now you can now compare prices from @runpod, @vast_ai, @ThunderCompute & more in one place.

0

2

1

0

31

@lightsofapollo2

7 days ago

@HuggingPapers This model is incredibly good and fast on local hardware! Using it on mac now

0

0

0

0

2

lightsofapollo2 retweeted

2 months ago

Achieves 1.63% WER on LibriSpeech (offline) and 1.78% WER (streaming) with built-in punctuation and capitalization. Model: https://t.co/OmLYdAMsBz

1

9

2

6

857

lightsofapollo2 retweeted

9 days ago

You can now train 120B+ parameter models locally on a laptop! 🔥 We collabed with NVIDIA and Microsoft to bring LLM training on the 128GB unified memory RTX Spark laptop!

UnslothAI's tweet photo. You can now train 120B+ parameter models locally on a laptop! 🔥

We collabed with NVIDIA and Microsoft to bring LLM training on the 128GB unified memory RTX Spark laptop! https://t.co/mKbbIRWh9c

55

1K

107

181

89K

@lightsofapollo2

10 days ago

@Hikari_07_jp This is actually pretty great but also why I am nervous about having my own 😅

1

2

0

0

44

@lightsofapollo2

10 days ago

@Docker @JustinMitchel @joincfe locking the agent down is easy. giving it the right permissions for the task, easily, is where this falls apart right now.

1

1

0

0

22

@lightsofapollo2

11 days ago

@vast_ai 😅 I have been seriously considering buying some GPUs... Vast has been great though in terms of availability the last few weeks.

0

1

0

0

15

@lightsofapollo2

13 days ago

@The_Only_Signal That someone was me about 1 yr ago 😓 A serious factor in decode speeds for llm

0

2

0

0

811

@lightsofapollo2

13 days ago

@witcheer the most disappointing part of local inference is realizing how slow it is at context. Optimal use of the kv cache certainly helps though so it's not quite as bad as I thought if you work in increments.

0

1

0

0

20

@lightsofapollo2

14 days ago

@_catwu When first read about dynamic workflows I thought it would be a lengthy run but the best part is it's pretty fast <10min to deep research a complex topic spanning web/papers/codebase(s). Still 100+ agents is incredible to watch.

0

1

0

0

2K

@lightsofapollo2

14 days ago

Over 100 agents on this one /deep-research run so far 🤯

0

1

0

0

6

@lightsofapollo2

14 days ago

What was used to rewrite Bun into Rust apparently

14 days ago

New in Claude Code (research preview): dynamic workflows. Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks. Use the word "workflow" in a prompt to get started.

ClaudeDevs's tweet photo. New in Claude Code (research preview): dynamic workflows.

Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks.

Use the word "workflow" in a prompt to get started. https://t.co/re4SG3AyDm

369

11K

952

6K

4M

1

3

1

0

46

@lightsofapollo2

15 days ago

envious of the m5 for one reason: it fixes the exact thing my m4 can't. the m4 gpu has no real matmul units, so prefill is compute-bound and long context crawls. the m5 put neural accelerators in every gpu core, and apple clocks ~4x faster time-to-first-token vs m4 (qwen3-14b). token generation is only ~1.2x faster, because that part is bandwidth-bound, not compute. so the m5's real win isn't faster tokens, it's that prefill and long context stop being the wall.

0

1

1

0

18

@lightsofapollo2

15 days ago

@bridgemindai /w the built in MTP ? I tried this on an m4 and it didn't benefit from it much but the m5 should...

0

0

0

0

21

@lightsofapollo2

17 days ago

@bridgemindai m5 represents a major improvement on the metal side (apple10 on m5 vs apple9 on m4) seeing great performance on my m4 but I am envious of the m5 for LLM use cases in particular.

0

0

0

0

128

lightsofapollo2 retweeted

@ClementDelangue

18 days ago

300,000 AI builders filled their hardware profile on @huggingface and we're sharing the results: https://t.co/3rLqeJGUCO. Excited to see how it evolves in the coming months especially with the explosion of local AI!

39

242

39

89

40K

@lightsofapollo2

17 days ago

@SemiAnalysis_ the clever part is what they avoid. the moment a job needs more than one gpu, the wiring between them eats real power and speed. cerebras skips that by keeping it all on one giant chip. the rest of us glue separate gpus together and pay that tax to make them act like one

0

0

0

0

406

@lightsofapollo2

17 days ago

@ApacheSpark dry-run fails fast is the part that matters for agents. a human eyeballs a job, an agent needs the tool to fail loud and early or it confidently waits on something broken. designing for no human watching is the real shift

0

0

0

0

36

@lightsofapollo2

17 days ago

@ClementDelangue MTP is amazing but I found prefill and long context still to be a problem on my mac hardware anyway ... Great win on decode

0

1

0

0

779

Last Seen Users on Sotwe

Trends for you

Most Popular Users