I cannot overstate how crowded "ai research" has gotten and how low quality the published material is becoming as a result. Nearly every week now I am seeing throw away experiments my lab looked into 1 or 2 years ago being dressed up as "research" and posted to arxiv.
@DeepDishEnjoyer The op is interaction bait and incorrect, but I don’t think the universal approximation theorem actually explains much at all about why current deep learning works well.
@rishabh16_ Depends on the model. If you have a dna/rna language model with a causal mask yes you can do this. With a bidirectional mask interpreting attention weights is ~impossible.
great moment in every optimizer’s life when he finally runs the EV calc on running EV calcs on everything, realizes the whole thing has been catastrophically negative EV, deletes the spreadsheet and goes outside
"Attention is just a special case of <abstract math thing> so we generalized it by <neglecting the other 30 abstractions and conditions required for frontier architecture> and we found it performed <p hacking> compared to <naive baseline>"
@Tyler_A_Harper This is like saying universities shouldn’t have computers or microscopes or any one of the countless other tools that make research and learning possible.
When someone says "we need theory of deep learning", note that probably nothing will "count" as a theory of deep learning unless it is *their* theory, or unless it is speaking a language, and using techniques, that they already have a bias towards.
@Tyler_A_Harper “Why are they paying for access to the most revolutionary technology of our lifetime? That money should go to paying fat narcissists to stalk ucpd officers and paint graffiti all over Hyde park”
@mattapplepi@Grnfink2 I don’t know anything about military stuff but even if it isn’t a practical skill I bet this kind of training is good for teaching a mentality that is useful in war