K. Akimoto @kosuke1701 - Twitter Profile

Pinned Tweet

11 days ago

We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.” Joint work with @kanaheinousagi and @stillpedant. In this thread, I’ll explain the main idea and key findings. (1/N)

kosuke1701's tweet photo. We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.”
Joint work with @kanaheinousagi and @stillpedant.

In this thread, I’ll explain the main idea and key findings. (1/N) https://t.co/5ysvWV6zS7

1

15

5

4

4K

K. Akimoto @kosuke1701

2 days ago

どんなに賢いモデル渡されても、実験回す金とGPU無いんじゃどうにもならん。

0

104

K. Akimoto @kosuke1701

2 days ago

知能じゃなくて実験を回すラボリソースが当面ボトルネックになる（というか最近自分に関してはそうなってきた）から、制御不能なモデルの自律進化は懸念されてるようなコミュニティではなくハイパースケーラーからしか出てきようがない気もする。既存モデル自体の悪用は当然別だろうけど。

0

97

K. Akimoto @kosuke1701

11 days ago

複数エポックや多言語学習の効果に加えて、多段階学習の効果もモデリングする"M^3 Scaling Law"を新しく提案しました！📈 多段階学習の効果は①平均言語割合と②最終言語割合「だけ」に依存するdual power lawでよくモデル化できる、という面白い観察結果も得られています！

K. Akimoto @kosuke1701

11 days ago

We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.” Joint work with @kanaheinousagi and @stillpedant. In this thread, I’ll explain the main idea and key findings. (1/N)

1

15

5

4

4K

0

16

1

5

3K

Who to follow

Kuni

@_kuni88

Researcher/Engineer at N Interest: Machine Learning, NLProc, Information Retrieval, 第2回&第3回AI王🥇🥇 Python, Guitar, H!P, F1🏎️, NFL🏈

Hisao USUI

@hisao_usui

NLP、DH系の研究をしている人。新米です。ついに博士学生になってしまいました。農工大古宮研所属です。 NLP新米の会運営です。 Ph.D student in Tokyo University of Agriculture and Technology.

Boxuan Lyu

@lyu_boxuan

Science Tokyo Okumura-Funakoshi Lab D2/CA AI Lab (part-time)/NLP/Machine Translation

K. Akimoto @kosuke1701

11 days ago

Overall, M^3 adds multi-stage training to scaling-law recipe design. Beyond “how often should we repeat target data?” or “how much high-resource data should we mix?”, it additionally asks whether a staged recipe should be used or not! (11/N)

0

1

0

64

K. Akimoto @kosuke1701

11 days ago

We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.” Joint work with @kanaheinousagi and @stillpedant. In this thread, I’ll explain the main idea and key findings. (1/N)

1

15

5

4

4K

K. Akimoto @kosuke1701

11 days ago

M^3 gives a scaling-law explanation for why late target-heavy stages can be effective in continued pretraining and mid-training. It predicts when this should be preferred under fixed compute and target-data budgets. (10/N)

1

0

78

K. Akimoto @kosuke1701

14 days ago

後輩はどんどん入ってくるけど先輩は構造的に減る一方だからなぁ

0

1

0

188

K. Akimoto @kosuke1701

14 days ago

同期とか年下の人が出ていくよりも、先輩だった人が退職する方が、自分にとっての「職場の空気」が変質するという意味で心理的影響はでかいな。入社した時に周りにいた人たちっていうのはなんだかんだ自分のルーツの一部になってるのか。

0

2

0

159

kosuke1701 retweeted

mooz @stillpedant

19 days ago

We’ll demo cotomi Act on May 27—come say hi! https://t.co/5ZO5LReS2k cotomi Act is a web browsing copilot built from two ingredients: (1) a carefully designed, context-efficient browser harness (2) a brand-new “big sibling” that watches your daily work and learns from it

1

4

3

1

774

K. Akimoto @kosuke1701

about 1 month ago

LLMが賢くなってある分野で自分が抜かされて自信を無くすみたいなことあるけど、LLM＋自分のチームで考えると自分の能力スタックで足引っ張ってた所が順次底上げされていく感じだから、自分が一番輝ける能力で抜かされるまではむしろ自分でやれる質が上がって自信が深まるのもありそうと思い始めた

0

3

0

232

K. Akimoto @kosuke1701

about 2 months ago

Codexの$100週間リミットを使い切って一日休みにしようと思っていたその瞬間にリセットが入ってバグを疑った。

0

3

0

172

K. Akimoto

@kosuke1701

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users