@geoffreylitt "I produced this with Claude", which I produced with Claude. Like a film or music producer that doesn't operate the camera or play the instruments but vouches for the outcome.
I am recruiting Ph.D. students for my new lab at @nyuniversity! Please apply, if you want to work with me on reasoning, reinforcement learning, understanding generalization and AI for science.
Details on my website: https://t.co/d8uId2LC47. Please spread the word!
Recently I've been playing around with a quarter-order-of-magnitude system for simple calculations. It gives better precision than single sig-fig calculations using only four, very intuitive, symbols. https://t.co/BO9mLi8pLF
If you miss the NYTimes needle, especially one that is statistically uniform (https://t.co/uqLw9f69Sw), you can use this page: https://t.co/xQ5cFrtRSD I whipped together to reason about the correlations between the swing states tonight as results come in.
@statymath Yeah, since the arcsine transformation makes the fisher flat, the variance is also isotropic. Maybe that would be a good thing to add, you can easily estimate the standard deviation in a simple estimate as 30/sqrt(n) if you measure things in degrees.
@dythui Basically, anytime we are dealing with continuous random variables I don't feel as though entropy is the most appropriate. It's rare for the appropriate prior to be uniform over the space, usually you want something with some concentration.
@gil2rok I really like that the KL divergence is linearly decomposable, while the other f-divergences are reparameterization invariant, they don't decompose naturally. I find Hobson's list of desiderata hard to argue with: https://t.co/2Qzq1vWudU
@dythui Relative entropy is KL, so yeah, I think that fixes things 😀. My take is everywhere you see entropy, it should really be thought of as KL to a uniform distribution, which while that is sometimes appropriate, it isn't always, including in many of the places it is used.
Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out https://t.co/DRO2IbTFCg and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.
Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. https://t.co/mceqpUfZQo With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd