michael @_michaelginn - Twitter Profile

michael @_michaelginn

1 day ago

@adamkbaranowski @gabriberton https://t.co/puaWwEA6bP

michael @_michaelginn

1 day ago

In low-resource languages, speculative decoding may actually be hurting performance! 1/n

1

0

91

0

1

0

1

18

michael @_michaelginn

1 day ago

So what should we do instead? It turns out the simple n-gram models might be a better choice, thanks to incredibly fast inference speeds. Sometimes, simpler is better! 4/n

_michaelginn's tweet photo. So what should we do instead? It turns out the simple n-gram models might be a better choice, thanks to incredibly fast inference speeds.

Sometimes, simpler is better!
4/n https://t.co/kZMSmxBOdF

0

19

michael @_michaelginn

1 day ago

In low-resource languages, speculative decoding may actually be hurting performance! 1/n

1

0

91

michael @_michaelginn

1 day ago

Distillation is a common way to improve acceptance rates, but we find that distillation on one task (translation) tends to generalize poorly to another task (story generation) in the language 3/n

_michaelginn's tweet photo. Distillation is a common way to improve acceptance rates, but we find that distillation on one task (translation) tends to generalize poorly to another task (story generation) in the language

3/n https://t.co/GTrWm9RbpI

1

0

42

Who to follow

Software and Applied AI Engineer | CS Student

Lauren Mudlaff

@laurenmudlaff

_michaelginn retweeted

will brown

@willccbb

2 days ago

the God Model is a useful theoretical construct akin to a Worst-Case Adversary or a Busy Beaver Program or an NP Oracle, less compelling as a target to seek than as a foil for designing minimax programs which can be tangibly realized

2

59

3

8

8K

michael @_michaelginn

4 days ago

@valmianski @rplevy @MasterTimBlais Among other issues, unitary certainly does not equal Turing computable, they’re unrelated

0

31

michael @_michaelginn

4 days ago

@valmianski @rplevy @MasterTimBlais “You can pretty obviously show that all human processes can be represented computationally” Okay do it

1

2

0

47

michael @_michaelginn

4 days ago

@joseph_h_garvin @buildwithparas In my experience they’re actually pretty bad at it—but I think that’s an effect of RLHF and style training; base models are way better

0

1

0

14

michael @_michaelginn

5 days ago

@edmundheaphy @renegadesilicon It’s still the mechanism though. Understanding *how* we predict the next token doesn’t change that.

0

1

0

19

michael @_michaelginn

6 days ago

@p_maverick_b That’s not what the post is saying

0

3

0

178

michael @_michaelginn

7 days ago

@xboxbodywash My favorite was a student who couldn’t remember how to break out of a loop to end the program, so instead they entered an infinite inner loop

2

4K

11

23

67K

michael @_michaelginn

8 days ago

@TMoldwin Yup https://t.co/P6Hdi5AOed

0

1

0

1

159

michael @_michaelginn

16 days ago

@gabriberton (Results may vary with hardware though)

0

23

michael @_michaelginn

16 days ago

@gabriberton Yup, even if you have task-specific data for distillation. It turns out since the forward pass is OOM faster, you get a favorable speed up even though the acceptance rate is a lot lower

2

1

0

217

michael @_michaelginn

16 days ago

@gabriberton Apple does for some of the on-device tiny models I also actually just finished a paper (arxiv soon) showing that ngram models consistently work better for rare languages

2

1

2

230

michael @_michaelginn

17 days ago

@Sun_of_AZ @ml_pacheco_ - guy who can’t spell “lose” https://t.co/DRYnfwJ96g

Z @Sun_of_AZ

18 days ago

Most people against this are foreigners who are already here mad there about to loose their jobs to Americans. This will make it more appealing to hire American citizens and American college grads. The Americans against it are cultural suicidal drags

0

1

0

80

0

10

michael @_michaelginn

21 days ago

@ironcrakka Wrong, they also watch anime

1

0

186

michael @_michaelginn

22 days ago

@Ewaakaa @curiouswavefn So you admitted it was written in a rush, which would completely support the claim that it’s poorly written. Good job!

0

1

0

70

michael

@_michaelginn

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users