Why what you attend to can't be static:
https://t.co/4R5A5ljyah
Transformers can't change what they attend to after training. Backprop is too global and too destructive for continual learning. The brain doesn't work this way. 1/2
@GaoShanghua the two fixes that i think would be great: a falsification layer before promotion (anti-collapse metric + mandatory ablation) and a quality-diversity archive instead of one global champion
@GaoShanghua after having used it, this still feels like hill-climbing in a multi-team costume: there's one global champion, one-comment critique gate, serial gpu etc. also - no falsification layer, so it tends to game metrics
@beffjezos you should check out @Primer - i really respect the mission & founders of this company. I had a fantastic convo with Robert (their head of engineering) a bit ago
@beffjezos i first heard of atomic labs and Sam all the way in Munich from @moritzthuening , the work they're doing is amazing and Sam is incredibly cracked - unfortunate but makes sense that they're moving to Texas
@punit_arani i think this is really focused on directed execution toward pre-formed goals - but the real distinguishing variable is something earlier: your capacity to attend to things outside of distribution in the first place
@lukas_bongartz@misovalko reality -> we observe messy data π -> We try decompositions μ -> we score them by preferring the most simple explanatory decompositions and the
best scoring is our understanding of the world