Gurvan @gurvanson - Twitter Profile

Gurvan @gurvanson

8 months ago

@jsuarez what do you think about MinGRU for RL ?

1

0

52

Gurvan @gurvanson

9 months ago

@yacineMTB thank you

0

106

Gurvan @gurvanson

10 months ago

@jeremyphoward @ggerganov I understand that the amount of memory is a bottleneck on consumer GPUs, but wouldn't the inference speed still be better with less active parameters during generation?

1

0

1

233

Gurvan @gurvanson

11 months ago

@kalomaze if the issue is stable ssh try mosh maybe?

0

34

Who to follow

KAPA

@KAPA_team

TO'S SSBM PARIS, équipe: @Madady44, @Zer_orchyd_Six, @NeoAsiat, Adi et @Paulolssbm. Dead Zone 2 et 3, Wanted Melee, THE CRT et @Bronol_Tournois C'est nous Weshh

Melvil

@MelvilSmith

He/him | MTG fanatic & Melee enthusiast | Président @salty_arena TO lillois, streamer et caster Melee / Heroes of the Storm

PEPI

@SmashPepi

🇨🇵 J'aime tellement Melee. Drawing for @kapa_team. (Crédit photo bannière : @pyjaSSB)

Gurvan @gurvanson

11 months ago

@dearmadisonblue is there a normal range of behavior for an egregore?

1

0

25

Gurvan @gurvanson

about 1 year ago

@SSBM_Arte I think it happened to me once when I had paste an image that wouldn't properly upload, but refreshing the page fixed it iirc. It could also be that a previous message is open for editing i guess

0

1

0

88

Gurvan @gurvanson

over 1 year ago

@kalomaze you should check phillip if you haven't https://t.co/n2CvMAOMry

0

1

0

366

Gurvan @gurvanson

over 1 year ago

@qtnx_ talking about it tricks you into thinking you've already done something. don't fall for it

0

50

Gurvan @gurvanson

over 1 year ago

@jaxmorphy i mean, 3 denoising steps is not a lot. you can still see the stage outline really well. do you plan on using rolling diffusion/diffusion forcing?

1

0

32

Gurvan @gurvanson

over 1 year ago

@jaxmorphy that's the world model i want to see

0

1

0

41

Gurvan @gurvanson

over 1 year ago

@y0b1byte i think the DreamerV3 paper mentions that it uses the same set of hyperparameters for every experiment, so the comparison might not be entirely fair

0

36

Gurvan @gurvanson

over 1 year ago

@iScienceLuvr already done by Schmidhuber in Recurrent Highway Networks https://t.co/3d27pWYCXD better illustrated here https://t.co/FAQNIKBG5W

0

18

1

12

1K

Gurvan @gurvanson

over 1 year ago

@spikedoanz @filipviz normalize by average won't work with negative logits. you could offset everything by the smallest logit maybe, and it would also get you the translation invariance property of softmax

0

2

0

101

Gurvan @gurvanson

over 1 year ago

@spikedoanz still not idempotent tho

0

5

0

474

Gurvan @gurvanson

over 1 year ago

@rami_mmo that's good to know, thank you for answering!

0

1

0

48

Gurvan @gurvanson

over 1 year ago

@rami_mmo this seems contrary to what's commonly though about quantized latents, and about why VQVAE was made in the first place. do you think KL works better here because the minecraft scenery is not that diverse? (i think you allude to this in the article)

1

0

76

Gurvan @gurvanson

over 1 year ago

@torchcompiled @giffmana I think the issue here is that for Transformer they plot the cumulative training time (124M+354M+757M+1.4B), instead of comparing to just the 1.4B trained from scratch, which seems to take about the same amount of TPU hours as the Tokenformer 1.4B, so the graph seems disingenuous

1

2

0

202

Gurvan

@gurvanson

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users