@IterIntellectus Some aspects are quite uncontroversial & universal although often fuzzy, e.g. "lying is usually bad", "being helpful tends to be good", "don't eradicate humanity", etc. - so I think that at least some progress is possible
@che_shr_cat Interesting method! Just wondering - are there also cases where it's strictly worse than standard AdamW? Or is it always at least similarly effective?
@ebarenholtz If the changes the layers introduce in the residual stream meaningfully correspond to geometric operations in the manifolds visualized in these fancy animations - then "they think in shapes" doesn't really sound like the worst description, or?
@mariusmosbach@johnhewtt My first idea was to add this as an extra regularization loss, so I was positively surprised seeing that it apparently already learns this implicitly
@bojie_li Apparently only ~1% improvement could be achieved by increasing question catalogue size, so it seems already good.
Nice that the data is public!
@bojie_li Awesome work! Did you check how size prediction accuracy scales with benchmark size? Would it make sense to double the number of questions to get better predictions, or is it saturated already?
@sytelus > Con: we will never know what models are thinking.
I guess it would be relatively easy to build a translator. Will not be perfect, but even todays CoTs aren't necessarily 100% faithful.
@VictorTaelin Nice! Would probably be helpful if it included more hard problems, because if already current models score ~90%, distingushing future models seems noisy