@varunneal Well you don't get any gradient signal on experts that aren't chosen so ... you won't optimize the right thing unless you force exploration.
But I agree this isn't satisfying. I think the solution to sparsity will end up a lot more dynamic personally.
@alpaysh ONLY reinforcing negative pathways is bad, but like anything in life, a mix of things are generally best. Without introspection, how can you possibly improve at anything - including yourself?
@willdepue Well I’m interested when you find some time. But I I guess it sort of tracks. If true, then the most importance should really be on scaling pertaining as much as possible
What comes to mind is that games can have a much clearer "correct" answer than many of the things LLMs are generating. Human preference is varied and context-dependent so it’s harder to define a clean expert/improvement operator. But it does seem natural to apply in areas like math, coding etc.
What do you think? My understanding of RL is definitely not my strong suit so I'd love your informed opinion.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Markets are made at the margin. The margins have shifted quickly. If you can’t see that then you have your head in the sand.
AI beating Pokémon red with vision and having material impact on labor are completely unrelated measures. Anyone using agents can see how effective they are with some guidance. AGI is not a requirement to disrupt markets.
The AI panic is really unbelievable today. The level of delusion and hype have grown to mythic proportions.
Has AI beaten Pokemon Red yet? Like a normal 6 year old does, by looking at the screen? Oh it hasn't. But all jobs are over in 18 months? This website is full of idiots.
@arb8020 Interesting because I find the residual stream so unsatisfying the way each layer just gets added in. To me, it just seems like it is missing something important and it’s surprising that it works as well as it does.
@badlogicgames The only real abstraction layer that matters is the one I stop at obviously.
Meanwhile we can all just read the equations and get at the essence without implementing anything lol