@catchvartan@carl_feynman The real answer is that for ML problems you can't estimate the true loss function that you care about with enough fidelity to care. And even if you could, it's not worth finding the actual minimum, you train until the cost of continuing is not worth the incremental gain.
@wordgrammer Yes: when people believe in a made up unobservable-by-definition thing there's nothing that will ever convince them it's made up. The people that don't believe in it can't be convinced to start, because the thing is made up. It's a stupidity stalemate.
@reset_by_peer You have no idea how an LLM works, you know how a virtual machine called a transformer is built. The LLM is the much, much, much more complicated software that's running on that simple thing.
@lucasmeijer Sure, and at the end of it you've implemented a simple virtual machine with a flexible architecture that can run many different programs, and you understand absolutely nothing about the particular program that the billions of dollars in pre and post training went into creating.
@Yampeleg It's not real, because that's not all it does: next-token prediction is only the first phase of LLM training, and models that stop there are unusable. There's a reason labs spend billions of dollars on post-training, it's a lot more than some minor polish and tuning.
@ZPostFacto@OrientEngland People *don't* vote their conscience in great numbers in places where they might be at risk of death for pulling the wrong lever.
@jmoiron@glcst In a reframing where everyone who picks yellow is guaranteed to die and everyone who picks blue lives, the count was at 15% for yellow last I checked. The lizardman constant is super high on Twitter, that's what you can't read much of anything into these polls
I'm sorry you don't understand that your brain simulates whatever process you're imagining has independent reality exactly as much as a computer does. They're either both simulations or neither, it's not like you have some independent experience of the world that's not translated through many layers of reflection and interpretation first, including several that are almost exact analogues to the matrix multiplications that people love to deride (which are actually some of the most flexible, powerful, and universal operations in all of math).
@_fernando_rosas The view that witchcraft is a load of baloney has been progressively rejected by most people that actually study witchcraft.
Do you see the problem with your argument?
@fchollet If it would outperform by that much, people would be excited to use it, if you have an idea you should build it. "Neat vs scruffy" mudslinging just doesn't hit the same now that one side has demolished the Turing test...
@fdksfjdfd@AlexKleeman A transformer is an arbitrarily complicated pipeline that can apply almost any operation on vectors, given scale. The fact that the math is simple doesn't mean that hundreds of gigabytes of matrix weights lead to a simple result.
@turchin A million copies of you running around, but only one of each of the people that you care about? Saw that movie, it isn't super fun...team "heads" for me, no doubt.
@ShriKaranHanda After the paper was done, we tried modern agentic tools like claude code, gave them tools and instructed them to explore/learn
We found it actually wrote something like this by itself (without instructing)
Stay tuned for this update.
@Jonathan_Blow@bitbased "Language model" is an extremely general term, the language in question can be any sequence of vectors. Image patches, sound, text, numerical data, etc. It just has to have some sort of statistical pattern to it, otherwise there's no point in modeling it that way.
@OwlZphi @MrMosis@ESYudkowsky No, there's nothing at all clear about subjectivity, including whether it even exists at all as anything other than something our brains report to themselves. The whole qualia thing could just not exist and nothing at all about the world would be different.