I think this is just a chinese labs + nvidia skill issue mostly, and the more inference demand you expect for ur model the more it justifies pretraining overtraining, so frontier us models are probably much more ‘overtrained’. They also likely are much more mature in their synthetic pretraining data pipelines. But yeah assuming they keep models around for post training for half a year to a year, it just seems crazy to me to some extent that they would pretrain them for only a few days!
This makes sense thanks for the answer ig I’m forgetting they have a crapton of compute. I think it’s plausible they have 150T unique tokens and 2-3 epochs gets u to 400Tish training tokens, which would get to month+ training run. But maybe I am making numbers up that match my intuition, not sure. Just seems surprising that they could not make use of significantly more compute time than that for pretraining given its apparent importance
I think it's underappreciated how economically valuable AI safety is. A model that frequently goes off the rails, takes dangerous actions, is misleading or deceptive, etc. is simply much less valuable than a model that does not do that.
This benchmark is great! A model that I like scores highly and a model that I dislike scores poorly.
This benchmark is slop! A model that I dislike is at the top of the rankings. How can that be possible? I have taste!
@fleetingbits@ar0cket1 Probably they realized it led to a higher price / lower usage than they actually needed to charge to maintain margins per unit of compute, so changed it (beyond possible inference efficiency gains between then and now)
@ValsTutor following ECI trends (my own implementation, slightly diff results to Epoch's) you get to mythos preview level open weights in 10ish months, bc mythos preview is pretty outlier in its capability level
@ar0cket1 they force fast mode to be API usage! So they decrease the API margin but maintain gross margin by decreasing fraction of tokens that are heavily subsidized
Interesting how different are frontier labs' notions of progress towards AGI.
OpenAI: "we've disproven an old conjecture in math"
Anthropic: "we've discovered ALL the vulnerabilities"
DeepSeek: "we've made context free"
Google DeepMind: "we've reduced the batch size for Flash"