microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.
this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.
the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale
let's look at all of this in this likely very long thread 🧵
@butchseiya im glad you pointed this out because the vague descriptions i heard about it were making me not want to read it lol. but now i probably will
@Cayden_Cline not korean but i would assume it's usually perceived closer to ㄷ than ㄸ. korean stops aren’t really distinguished by voicing the same way so english /d/ tends to map to the lenis rather than the tense segment
@_neilarmstrong@SanSip i was surprised to see mansfield park at 56 and then wouldnt you know it, three more jane austens taking up spots in the top 20. and 5 virginia woolfs?
Eulogikon is a new Ancient Greek web library currently under construction that has more obscure works in its catalog than most of the competition. You can find various scholia there and things like the fragments of the Phoenician History by Philon of Byblos.
https://t.co/c7P6joFNoS
Tying economic opportunity to dominant languages is part of the mechanism for assimilating minority languages. When people can't find work using their mother tongue, they often transmit the dominant languagae to their kids, so they'll have better job prospects.