@ToKTeacher I think the claim is less "there's no thinker" and more "the model of yourself as an enduring and stable uncaused causer of thoughts is inaccurate"
@gavinrbrown1 Right, I guess I meant without training, but this also serves to make the point. My intuition is that you could put bounds on capacity (or other properties) if you have a simple IT based model of an NN, but the empirical success of NNs requires a physics-style explanation.
@gavinrbrown1 I don't know, was genuinely asking whether there's a way to describe the input distribution, information bottleneck in the architecture, and get the memorization capacity using some IT concept..!
@gavinrbrown1 I think they are less meaningful because the IT part is incidental, like calculus is for EM. People expect IT to somehow say stuff on its own, but it's just math. We need the "physics" of AI :)
@gavinrbrown1 Eg maybe it turns out hierarchical architectures induce manifold structure s.t. large program subspaces are ordered wrt time-bound Kolmogorov complexity, when doing gradient descent
@gavinrbrown1 They do - set up boundary conditions, run a solver and out pop very good predictions, all from an extremely simple description of two coupled vector fields.
IT seems more comparable to calculus - IT would provide concepts, but the explanation itself is some hidden entity.
@EmilevanKrieken@yoavgo@andrewgwils Similar to randomized time-bounded Kolmogorov complexity, the length of the shortest program that can output a given string with high probability using a randomized, time-bounded algorithm (so information theory people do already have a concept for this).
@gleech@entirelyuseles Vision is an incredibly difficult inverse problem, and that's not even getting to the hard part of learning intuitive physics and concepts and categories.. language kinda gets these as input, the rest is grammar and style. Very handwavy argument, but I think the point stands.
@gleech@entirelyuseles No doubt the higher info density of language requires LLMs to be more 'nonlinear'; vision models get a lot of bits right just by learning spatiotemporal smoothness, which is linear-ish.
But if you control for info density and current capability, I'd guess vision is harder..
@entirelyuseles@gleech Compare the kind of conversational ability you need to be a farmer, trader or lawyer, vs the kind of scene understanding you'd need for the same jobs. Even though a lawyer needs excellent language skills and little vision skill, LLMs could easily handle the language part, vision
@JesusFerna7026@AVMiceliBarone@MahdiKahou But NNs are also different from standard piecewise affine functions; they can also adjust many slopes in many regions, by changing early layers.
@gleech@tenobrus IMO discrete latent variables are symbols. Hidden Markov models learn a "symbolic" state representation. It gets interesting when you replace the Markov chain with a tree, and get a PCFG; then continue up the Chomsky hierarchy. PPL people have been doing this stuff for a while.