Career update: I’m joining @blocks for the Summer as a DL Research Intern!
I’ll be working on some very cool and novel approaches to generative audio modelling, would be happy to get to know more people in this space!
Fun theoretical question: You used an optimizer with momentum for a training run
You computed the cosine similarity of momentum with gradient and saw that it's mostly ~0, yet the optimizer with momentum on (β = 0.9 let's say) outperformed the one with momentum off.
Why?
@YipingDeng5 that is true when you have independently sampled vectors. The gradient and the momentum are definitely dependant, and in constant LR phases (this was in warmup) the correlation is actually negative.
@norxornor I didn't realize I logged cossim only in warmup (which u figured out, and yes it's negative in const LR phase). I originally understood it as low bsz -> noisier g_t -> momentum dampens noise, but your perspective is more complete and taught me sth I didn't know.
Thank you!
A remarkable paper appeared on arXiv tonight by Thomas Bloom, Will Sawin, Carl Schildkraut and Dmitrii Zhelezov. In this paper, they prove that there exists c>0 and arbitrarily large finite sets A of real numbers such that max(|A+A|,|AA|)≤|A|^{2-c}. This disproves the well-known sum-product conjecture over the real numbers. The sum-product conjecture considers the two most basic operations: addition and multiplication. A+A is the set of all pairwise sums of two elements in A while AA is the set of all pairwise products of two elements in A. (1/5)
All those days reading about Newton Schulz iterations and how to make Muon even faster might actually matter, thank you @Ji_Ha_Kim for the tweets/blogposts
@kaepora reminded me of a cute proof of the 1st fact:
P_F(D), polynomials over F of degree <= D, and F^n are vector spaces over F.
The map which takes p -> (p(x_1), … p(x_D)) is linear, has a trivial kernel, and so by rank nullity the two vspaces are isomorphic, which is the 1st fact
Fascinated by the idea that 50,000 years from now will be more like 49,000 years from now than 2026 is like 2006
Today’s virtues, which we think of as the Virtues of the Future - e.g.. adaptability - will turn out to have been merely transitional virtues
https://t.co/Qvx6jL2h74
something like crunchbase for Ontario startups, and you're able to group by industry (fintech, edtech, core-AI, AI infra etc) and you also have employee count metrics/other proxies for how fast the startup is growing
I should be able to find out quickly:
What are the 3 biggest AI infra startups rn, what phase of their funding round are they in, what their (disclosed) ARR is, etc
What are the 5 fastest growing startups overall, etc etc
I’m pretty sure I just beat the internal SoTA on an internal benchmark for an internal tool problem that only us and a few other labs care about
Will release this in an internal technical report (not kidding btw)