EIFY @EIFY - Twitter Profile

Pinned Tweet

EIFY @EIFY

about 12 years ago

How did increased regulation of childhood affect social and geographical mobility?

0

5

1

0

EIFY @EIFY

2 days ago

@ArtsyMarx1st Article is paywalled but efficacy against all infection (symptomatic or not) is reported to be ~35% (21.5% -> 14.0%) https://t.co/zF7JAaL6ho

Hiroshi Yasuda (保田浩志)

@Yash25571056

20 days ago

"An antiviral pill has, for the first time, been shown to prevent COVID-19 in people exposed to the SARS-CoV-2 virus at home.. The drug, called ensitrelvir, is made by the Japanese pharmaceutical company Shionogi.. In an international study of more than 2,000 household contacts conducted from June 2023 to September 2024, about 9% of people who got a placebo within 72 hours of a housemate developing symptoms became symptomatic themselves, compared with only about 3% of those who got a five-day course of ensitrelvir. Rates of viral transmission were lower in the ensitrelvir group, too: confirmed infections, symptomatic or not, turned up in only 14.0% of those who received the drug, compared with 21.5% of those who got a placebo.." Not bad. 'At last, a pill that can prevent COVID after exposure to infected people' https://t.co/H6GfHPAvLy

33

1K

438

488

66K

1

10

0

1

821

EIFY @EIFY

2 days ago

@Nepsuka Choice of this photo generated quite a bit of buzz back then https://t.co/wpcrgNgWDe

1

6

0

1

1K

EIFY @EIFY

3 days ago

@eliebakouch Hmm, isn't DeepSeek v4 Flash the cost-efficiency frontier model? It's curiously missing from the graph... https://t.co/j7EZUjliU0

kalomaze

@kalomaze

5 days ago

cc @teortaxesTex it's a fairly simple probe reverse engineered from my personal agent grievances... but MAN this chart is so funny [relevant task axis: "model's ability to realize when a local change requires exacting multihop global changes"]

kalomaze's tweet photo. cc @teortaxesTex
it's a fairly simple probe reverse engineered from my personal agent grievances... but MAN this chart is so funny
[relevant task axis: "model's ability to realize when a local change requires exacting multihop global changes"] https://t.co/8BG5VgcKfk

3

54

1

13

12K

1

2

0

1K

Who to follow

ビッグママ

@nekosukibigmama

猫ちゃんとディズニーが大好きですとりわけミッキーマウスが大好き💕

ぎっし

@gishigishi999

key作品他色々好き😃サマポケ小説書くマン。識しゅき、だいしゅき。pixiv:https://t.co/5sd0LElUUb

レイド@ヘブバン

@Reido_key

趣味垢 / ヘブバン / 推しは全推し / 強いて言うなら月歌と山脇様 / サマポケ / key作品大体手つけてます/ keyオケ / extreme flag 福岡 / 春眠旅団福岡

EIFY @EIFY

4 days ago

@kalomaze Parallax still works with AdamW though and in fact beats attention with the right LR schedule, just not significantly. I wonder why something similar hasn't been reported for Shampoo and whether it's due to less adoption or people who know can't speak. https://t.co/obSnH0HG53

Tilde

@tilderesearch

5 days ago

~6/7~ Crucially, we find Muon counterfactually amplifies the advantage of Parallax. The strength of Parallax depends heavily on the norm and alignment of the probe and the KV covariance, which is very sensitive to choice of optimizer. To our knowledge, this is the first clear case of explicit architecture–optimizer codesign for attention mechanisms.

tilderesearch's tweet photo. ~6/7~ Crucially, we find Muon counterfactually amplifies the advantage of Parallax.

The strength of Parallax depends heavily on the norm and alignment of the probe and the KV covariance, which is very sensitive to choice of optimizer.

To our knowledge, this is the first clear case of explicit architecture–optimizer codesign for attention mechanisms.

1

66

6

18

5K

0

2

1

627

EIFY @EIFY

4 days ago

@Creative_Math_ I don't think there is a definite answer yet. A camp believes that 1 - \beta should match the frequency of the next feature the model should learn and therefore needs to decrease over time for LLMs (log-time momentum): https://t.co/kTtjgdNyRf

Damien Ferbach @damien_ferbach

3 months ago

3/10 Why log-time schedules? AdamW's fixed β₁, β₂, λ create a fixed memory horizon. But language has a power-law structure (Hilberg, Zipf): informative events can be Θ(T) steps apart. The longer you train, the worse the mismatch. Log-time schedules let memory grow with time.

damien_ferbach's tweet photo. 3/10 Why log-time schedules?

AdamW's fixed β₁, β₂, λ create a fixed memory horizon. But language has a power-law structure (Hilberg, Zipf): informative events can be Θ(T) steps apart. The longer you train, the worse the mismatch. Log-time schedules let memory grow with time. https://t.co/kYd5e6u5QA

1

9

2

2K

0

1

0

1

807

EIFY @EIFY

4 days ago

@Sauers_ The bizarre twist is that floating point numbers can also represent ±inf…

0

1

0

101

EIFY @EIFY

5 days ago

@boeslab Is this the brain region where GLP-1 exerts its effects?

0

213

EIFY @EIFY

5 days ago

@TimDarcet The Deep Learning Framework of Theseus?

0

1

0

955

EIFY @EIFY

5 days ago

@keshigeyan Yes I have seen that, but if I were to use GPIC for a project right now I would find the original title and description of the image on Flickr (say) useful.

0

59

EIFY @EIFY

6 days ago

@CarlZha 奪回熱蘭遮城！（不是）

0

174

EIFY @EIFY

7 days ago

@xiangjinrhfg @americanmcgee Don't forget the Taiwanese regime

1

11

0

201

EIFY @EIFY

7 days ago

@rosinality I will take a closer look later but shouldn't they compare to the existing Nesterov option of Muon...?

0

272

EIFY @EIFY

8 days ago

@willccbb To me at least by the time Jeremy Bernstein posted on Thinky blog there is already a body of works by himself, Cesista, and Su on it (e.g. https://t.co/qWS8vLRQ0w) so I didn't pay much additional attention.

0

2

0

103

EIFY @EIFY

10 days ago

@guilhermeotina @cloneofsimo Evergreen xkcd: https://t.co/88dVUVb7fB

0

15

EIFY @EIFY

12 days ago

@konstmish Hold on, ScheduleFree+ still needs warm-up: "(...) a decreasing step size is not necessary with Schedule-Free Learning, however a learning rate warmup is still needed for best performance". With warmup, C-warmup, and annealing β it's ironically scheduling many variables 😅

1

3

0

319

EIFY @EIFY

13 days ago

@norpadon You meant eps is the difference between 1.0 and the next number right

1

2

0

228

EIFY @EIFY

13 days ago

@cowtung @prerat Frightening indeed

0

92

EIFY @EIFY

13 days ago

@pleometric “GPT's Frightening Construction” This should be the official name

0

63

EIFY @EIFY

15 days ago

@kankei_arahen @gbrl_dick That’s revisionist. Taiwanese view of Japan’s colonization only turned positive after 1. The living memory died with the elder generations; 2. DPP changed the textbooks to whitewash the history. In fact their textbooks don’t even call it colonization. It’s now just 日治

2

4

0

205

EIFY

@EIFY

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users