Vin Howe @vinhowe - Twitter Profile

Vin Howe

@vinhowe

15 days ago

@archiemckenzie_ @bgub_ To be a fly on the wall!

0

1

0

48

Vin Howe

@vinhowe

15 days ago

Preprint 🧵! How compartmentalized are LLMs? For data in different formats (English/Chinese, Wiki/Q&A), how much transfer occurs? We provide evidence that LLMs can struggle with this sort of transfer, with consequences like sample inefficiency and capacity competition.

3

10

3

4

2K

Vin Howe

@vinhowe

15 days ago

@LChoshen I hadn't - happy to cross post on their Slack if that's what you mean?

1

0

25

Vin Howe

@vinhowe

15 days ago

@LChoshen Will add this - thanks for the heads up!

0

8

Who to follow

Mathieu

@miniapeur

Gradient surfer by day, Möbius stripper by night. PhD @ai_ucl, interned @valence_ai. Lazily looking for opportunities. DMs are clopen.

Chris Rytting

@ChrisRytting

Shipping research. Co-founder at @LaudeInstitute. Formerly @UW, @nvidia, OSPC @AEI, @NewYorkFed Macroeconomic Research. PhD in CS/NLP from @BYU.

15 days ago

@AdamJaber248 @TheGrizztronic @Louis9687221579 afaict different setup and measurements, but we saw something similarly disappointing/weird https://t.co/Efc52FqQdw

Vin Howe

@vinhowe

15 days ago

The setup: train on text split 50/50 across disjoint token vocabs, and, vs. an "A-only" basteline - 1️⃣ The sample efficiency gap shows up and persists across scale up to 1B 2️⃣ Representations are near-totally orthogonal; each split uses capacity independently, with a higher overall val loss plateau 3️⃣ We show that unified, lower loss solutions exist but SGD doesn't find them from a generic init 4️⃣ Massive parallel data fails to bridge representations This looks to us like a potential problem in LLMs, and it also gives us the "no-sharing" baseline we're looking for. We use this to find a preliminary result for natural multilingual transfer. ➡️

vinhowe's tweet photo. The setup: train on text split 50/50 across disjoint token vocabs, and, vs. an "A-only" basteline -
1️⃣ The sample efficiency gap shows up and persists across scale up to 1B
2️⃣ Representations are near-totally orthogonal; each split uses capacity independently, with a higher overall val loss plateau
3️⃣ We show that unified, lower loss solutions exist but SGD doesn't find them from a generic init
4️⃣ Massive parallel data fails to bridge representations

This looks to us like a potential problem in LLMs, and it also gives us the "no-sharing" baseline we're looking for.

We use this to find a preliminary result for natural multilingual transfer. ➡️

1

0

158

0

1

0

52

Vin Howe

@vinhowe

15 days ago

Preprint link: https://t.co/gtBGYLMQmr Super fun project. I'll be a fellow at @MATSProgram in Berkeley next month. Reach out!

1

0

144

Vin Howe

@vinhowe

15 days ago

We build on existing work showing that frontier performance on all sorts of transfer is more inconsistent than we might hope, especially after learning from trillions of tokens: https://t.co/mYBiTyVoWk @NitCal https://t.co/Au95cAwhWX @omerNLP https://t.co/AC6IahZYI4 @LChoshen

omer goldman @omerNLP

about 1 year ago

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

omerNLP's tweet photo. Wanna check how well a model can share knowledge between languages? Of course you do! 🤩

But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯 https://t.co/h9ea5ek0Rv

1

44

15

8

4K

1

2

1

0

369

Vin Howe

@vinhowe

26 days ago

@bgub_ https://t.co/9mkNC7Ef7K or just Rust

1

0

42

vinhowe retweeted

Josh Greaves

@joshgreaves_ml

4 months ago

The big labs are betting RL will unlock superhuman coding. But their infrastructure is closed, and OSS tooling doesn't support true online RL—just iterative batch optimization. We're releasing ARES to close that gap 🧵

9

219

28

150

38K

Vin Howe

@vinhowe

7 months ago

Thanks to: - @grantpitt0, who helped create the original idea, provided invaluable feedback, and helped me debug a few cursed numerical bugs. - @fleetwood___ for help with Ratchet (and pushing me to write a blog post). - @bgub_ for helpful feedback. 💜

0

2

0

380

Vin Howe

@vinhowe

7 months ago

Train a language model in your browser with WebGPU! I built a playground for training sequence models (Transformers, LSTMs, GRUs, vanilla RNNs) completely in your browser on synthetic tasks like sorting and simple natural language datasets like TinyStories. You can fiddle with 50+ experiment knobs to build your own model, which can be as big as you have the VRAM to accommodate. You don't have to install anything—all you need is a browser with WebGPU support. Check it out! Link to repo + blog post + features and technical details in the reply. 🧵

3

22

3

14

2K

Vin Howe

@vinhowe

7 months ago

This project was inspired directly by: - @fleetwood___ Ratchet - @willdepue WebGPT - @dsmilkov, @shancarter TensorFlow Neural Network Playground - @kellerjordan0 Modded-NanoGPT and Muon - @xenovacom Transformers.js - @polodataclub Transformer Explainer - @brendanbycroft LLM Visualization - @karpathy ConvNetJS, micrograd, minGPT, llm.c

1

3

0

552

Vin Howe

@vinhowe

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users