@thkostolansky Yep. As was the small reasoning model. The point is that the skill of *how* to elicit knowledge from a base model, and how to *use* that knowledge to reason, can fit in many fewer params than the knowledge itself.
@thkostolansky i.e. the knowledge of the world required to score well on benchmarks can be pretty cleanly separated from the patterns of output which can reliably elicit that knowledge.
@thkostolansky Yeah, see https://t.co/mvqw6Z1Lmg
If you take a large base model, and a small reasoning model, and then replace *only the tokens the large base model is uncertain about* with ones from the small reasoning model, it benchmarks as well as *large* reasoning models.
cars have windows and can move. houses have windows and can’t move. so it’s not the windows that make the car go, it’s something else entirely. back to the drawing board
cars have windows and can move. houses have windows and can’t move. so it’s not the windows that make the car go, it’s something else entirely. back to the drawing board
@MartinBJensen Tail has shifted quite a bit, as you can see if you plot log survival instead of just survival. Issue is that a very small number multiplied by 10 is still a very small number.
@ChrisPainterYup - mRNA treatment for <insert charismatic disease>
- self-driving cars
- VR / AR
- CRISPR
Basically everything *except* AI chat where the "hard" part of producing a working demo was done, and only the "easy" part of scaling it up and making it robust remained.
@AaronBergman18@kepe__ It's an opportunity to understand what the space of experience is like without the risks that normally accompany that. AND you get a million bucks too.
@garybasin All of these are pretty stupid at current token costs, but if/when tokens become 1000x cheaper some of them start to pencil (well, maybe not the "analyze all file triplets" even then)
@garybasin Put every pair of files in the codebase through the model as a seed, see if it can discover a bug and then write a failing functional test which proves the bug. If pairwise isn't enough, go with sets of 3, then 4 - each less likely to find something, but value is still positive.
@moultano Though you could have the decryption key _also_ depend on a secret that doesn't live in your DB, which would make a DB dump useless to an attacker.
@moultano 1. Upon PW+pin creation, choose 6 indices.
2. PW+pin chars at indices become your key.
3. Symmetrically encrypt PW+pin with that key
4. On entry of correct PW+pin, decrypt, choose new indices, goto 2
At 3 digits+3 chars keyspace is still too small, but I think idea is viable.