The appendices in "Build an LLM (from Scratch)" have a lot of really useful stuff. Should I have read them before heading off on my own training runs? I think, on balance, no -- it would have saved time, but I think that by trying, failing, and trying again, I learned more. YMMV!
https://t.co/EdtLC4AQ62
@teortaxesTex It's a good thing that education science has already shown us how teaching smart students should differ from teaching the less-smart ones. Otherwise we'd be completely lost.
@hunvreus Convincing. I've read stuff in corporate environments where I felt my brain cells dying from sentence to sentence. If an AI had rewritten it, I would have been saved that pain. But maybe it was a useful pain, like the one that makes you pull your hand away from a flame.
@valsaven@mitsuhiko I'm thinking maybe file timestamping differences between OSes could trip things up, but yeah, maybe it's Syncthing vs Dropbox. I am planning to switch to Syncthing, will have to keep an eye out for problems like that!
@heynavtoor Worth noting that LLMs are trained to minimise cross entropy loss, which is (loosely) a measure of how surprising the real next token is when compared to its belief about what it should be.
@teortaxesTex There's algorithmic overhang as well. After all, we have an existence proof that you can train an AGI for maybe $100k on a constant 100W. Downside is that it takes ~20 years and sometimes (existence proof again) produces Literally Hitler.
@levelsio Anecdotally, it sounds like Portugal spent at least some of its EU funds well -- I've heard that the fibre infrastructure that means that you can get 10G Internet to your home was paid for that way. Impressive if true.