We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.”
Joint work with @kanaheinousagi and @stillpedant.
In this thread, I’ll explain the main idea and key findings. (1/N)
We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.”
Joint work with @kanaheinousagi and @stillpedant.
In this thread, I’ll explain the main idea and key findings. (1/N)
Overall, M^3 adds multi-stage training to scaling-law recipe design.
Beyond “how often should we repeat target data?” or “how much high-resource data should we mix?”, it additionally asks whether a staged recipe should be used or not! (11/N)
We just posted our paper: “M^3 Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models.”
Joint work with @kanaheinousagi and @stillpedant.
In this thread, I’ll explain the main idea and key findings. (1/N)
M^3 gives a scaling-law explanation for why late target-heavy stages can be effective in continued pretraining and mid-training.
It predicts when this should be preferred under fixed compute and target-data budgets. (10/N)
We’ll demo cotomi Act on May 27—come say hi!
https://t.co/5ZO5LReS2k
cotomi Act is a web browsing copilot built from two ingredients:
(1) a carefully designed, context-efficient browser harness
(2) a brand-new “big sibling” that watches your daily work and learns from it