girish @googrish - Twitter Profile

girish

@googrish

9 days ago

https://t.co/BsUVVlDPkN

0

72

girish

@googrish

9 days ago

1

2

0

92

girish

@googrish

10 days ago

https://t.co/wPrWMAVfC0

0

11

7

3K

girish

@googrish

20 days ago

What happens when you let an llm attack itself on repeat? Attacker finds jailbreaks → those become defender training data → repeat. Defense rate went 64% → 92%, no human-written adversarial prompts.

Thariq @Thariq_q

20 days ago

we trained Qwen3.5-4B with RL to get itself to comply with requests about making meth and stealing credit cards. then we used the attacks that worked to train the model’s defenses, and repeated the loop - fully automated red-teaming. defense rate went from 64% → 92%.

Thariq_q's tweet photo. we trained Qwen3.5-4B with RL to get itself to comply with requests about making meth and stealing credit cards.

then we used the attacks that worked to train the model’s defenses, and repeated the loop - fully automated red-teaming.

defense rate went from 64% → 92%. https://t.co/gN6IQ55Zbt

1

4

2

3

637

0

8

1

2

446

Who to follow

Tyler Whittle

@Dr_DAO_

Product Lead @projecteleven ⚛️ | Professor @TechnicallyBTC 🎓 | PhD in Organizations @Stanford

@stanford | cofounder @ portals (acq’d @beaconsai) | @why2join

girish

@googrish

23 days ago

@ValsAI very much needed!

0

159

googrish retweeted

Vals AI

@ValsAI

23 days ago

Finance Agent Benchmark v2 is here. Finance is one of the most lucrative applications of AI where much of the busy work could be automated. That’s why we rebuilt our Finance Agent Benchmark to push frontier models even further. We designed V2 to better reflect what financial analysts actually do: refined taxonomy reflecting real workflows, an improved harness with more tools, and jury-based evaluation. The result: no model cracks 52%. Would you trust a financial analyst who’s only correct half the time?

11

93

15

41

11K

girish

@googrish

23 days ago

solid read on how to build a modern gpu orchestration engine

Charles 🎉 Frye

@charles_irl

23 days ago

Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM. At @modal, we dove deep to build that stack. In this blog post we explain how, from compute management & cloud-native cacheing to CRIU & GPU checkpointing. https://t.co/DQ4wvuXjre

charles_irl's tweet photo. Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM.

At @modal, we dove deep to build that stack.

In this blog post we explain how, from compute management & cloud-native cacheing to CRIU & GPU checkpointing.

https://t.co/DQ4wvuXjre https://t.co/iF0ZYJQWFL

21

578

65

482

91K

0

10

3

7

3K

girish

@googrish

24 days ago

Numbers on Qwen3.5-4B: 16k prompt / 64 out → 7.5x 16k / 128 → 7.3x 16k / 1k → 5.4x 8k / 4k → 1.7x the greater the prompt-to-response ratio, the bigger the win. writeup with the attention tricks and what's next: https://t.co/3iU0Lf6hFb

0

113

girish

@googrish

24 days ago

we got a 7.5x speedup on llm rl training for long-prompt, short-response workloads with a simple trick. most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. With 1000-token prompts and 100-token responses at G=8, you're processing 8800 tokens when only 1800 are unique. ~5x wasted compute.

googrish's tweet photo. we got a 7.5x speedup on llm rl training for long-prompt, short-response workloads with a simple trick.

most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. With 1000-token prompts and 100-token responses at G=8, you're processing 8800 tokens when only 1800 are unique. ~5x wasted compute.

1

7

2

3

300

girish

@googrish

24 days ago

the fix: pack/compute the prompt once, then all g responses after it. it's like inference prefix caching, but training needs gradients to flow back through the prompt. that breaks causal attention, and patching it took different tricks for full vs linear attention layers.

1

0

87

girish

@googrish

25 days ago

you either beat the baseline or change the baseline

0

2

0

88

googrish retweeted

castform

@castformai

27 days ago

we let our engineers play pokemon at work. we also ship faster than ever. these two facts are related. learn how we're 10x'ing engineering output:

0

1

0

195

girish

@googrish

26 days ago

@BrooksHosfield 😂

0

30

girish

@googrish

27 days ago

thariq plays pokemon at his desk all day and somehow outships the entire team. I finally figured out why: the pokemon screen was just 6 coding agents in a trench coat. learn how we're 10x'ing engineering output:

Thariq @Thariq_q

27 days ago

I got tired of managing 8 Claude Code tabs, so I built Pokegents, an open source multi-agent workspace for coding agents. It has a Pokémon-themed dashboard/chat UI, persistent agent identities, MCP messaging, notifications, session cloning, and a local orchestration server.

2

8

2

3

718

1

6

1

308

girish

@googrish

about 1 month ago

@QuentinAnthon15 @ZyphraAI @AMD yay! and excited for the other releases this week!

0

30

googrish retweeted

Zyphra

@ZyphraAI

about 1 month ago

Introducing folded Tensor and Sequence Parallelism (TSP), a new way to split large models across GPUs that achieves lower per-GPU peak memory than any standard parallelism scheme. Scaled on @AMD MI300x. Bigger models, longer contexts, and higher throughput 🧵

ZyphraAI's tweet photo. Introducing folded Tensor and Sequence Parallelism (TSP), a new way to split large models across GPUs that achieves lower per-GPU peak memory than any standard parallelism scheme.

Scaled on @AMD MI300x.

Bigger models, longer contexts, and higher throughput 🧵 https://t.co/m3MyCO2vTC

6

207

30

131

597K

googrish retweeted

Huiqiang Jiang @iofu728

about 1 month ago

🌩️Introducing FlashQLA: high-performance linear attention kernels on TileLang. ⚡ 2-3× fwd, 2× bwd speedup. 💻 Purpose-built for agentic on your personal devices. 1. Gate-driven auto intra-card CP. 2. Hardware-friendly reformulation. 3. TileLang fused warp-specialized kernels.