21 CONFIRMED DEATHS
I urge international media to cover today’s events in Nepal. A peaceful, youth-led protest against corruption turned violent when police fired live bullets, killing 21. Government is trying to distort truth by framing protest as only about a social media ban.
SOCRATES: So then the beautiful is also the good, and the just as well
CHATGPT: By Zeus, you're right indeed, Socrates. And it says a lot about you that you came up with such a stunning insight. Let's delve into WHY you cooked so hard
Very happy to hear that GANs are getting the test of time award at NeurIPS 2024.
The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade.
I took some time to reminisce how GANs came about and how AI has evolve in the last decade.
The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days.
Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design.
It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends).
Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time.
Thank you for the story @DBahdanau !
Nice weekend in Paris (first time)...
Eiffel Tower (ET) law: it looks more majestic (more so at night?) than you expect, even when you take into account ET law...
@baibhavbista For some reason got reminded of this essay...
https://t.co/FBUfIGdd2D
I am also afflicted by the lure of consuming information all the time rather than slowing or even pausing to think for myself on all or even some of it...
kids are such great randomness generators...
Friend: *puts on classical ragas*
Friend's kid (4) : "Ehh, halt! Das tut weh. Ich muss kotzen" (stop! It hurts. I will puke)
lol
Randomly just realized/saw, for the first time, that XD is a laughing face rotated i.e. same class as :D and :)
Always just parsed it as X & D and didn't think about why that meant laughing xD