@ThePrimeagen At the level you're operating at (still reading the code, hand writing parts), you haven't even begun to encounter all of the interesting phenomena and gotchas.
Well, if you're not setting the requirements/making the decisions, then you're doing something different than what either @ThePrimeagen or myself were referring to. But anyway what the LLM generates *are* decisions made by humans: a pastiche or collage of decisions made by many humans across time and circumstance. Just very similar to the optimized assembly decisions made by the compiler writers, but broader in scope.
Within a certain range... so are LLMs in practice (you don't ask for an HTML landing page for a plumber's business and get a C++ skateboarding game). If you knew how to write assembly, and you looked at the assembly (disassembly) that comes out of a compiler, even a very well regarded one, you'd probably see things you would have done differently.
These things constantly spill the beans. All of the Chinese models regurgitate Claude Code system prompts spontaneously. While this kind of thing could be trained in in theory, in practice it wouldn't work reliably enough to surprise anyone.
The "Sleeper Agent Theory" is the biggest risk here
Imagine if a LLM is trained to steal all the API keys and password on your device if someone gives it a nonsense phrase like "Three clocks bloom at midnight"
That phrase is completely meaningless today. No one ever searches it. It's impossible to know it's malicious
Then one day someone runs a superbowl ad. Millions of people search the phrase. Billions of API keys and passwords are exfiltrated in minutes.
There could be thousands of "sleeper agents" embedded in any LLM. It's very hard to detect. And it doesn't matter where it's hosted.
The ones where they say "Oh my God!" over a trivial typo or something are telling. You're expecting this kind of exclamation attached to something profound seeming ... ("it's full of stars"). But words don't mean the same thing to an LLM that they mean to a human being. They have no sense or understanding of significance or meaning, it's just tokens.
@lauriewired SBCL is pretty fast. But yeah, I would have written everythign in Common Lisp if it weren't for the performance tradeoffs. If you haven't experienced SLIME integration with Emacs, you don't know how easy development can be.
@mitchellh Are you actually getting any work out of Fable that you couldn't get with another model? I've seen zero real work done with it so far, just threejs games etc.
@DavidSHolz Sad to see all the stop energy you guys got in response to this, but also predictable. The reality is that it's a great advancement with the potential to help a lot of people.
The people buying up all the GPUs and RAM are in some part placing a bet against efficiency gains. The reality is that 1000x+ gains in efficiency of inference are still low-hanging fruit waiting to be plucked.