We have released a new agentic frontier eval for Roblox game development in Lua.
https://t.co/iB32IWW6Cy
Gemini Flash 3 is currently the best model for Roblox game development.
Congrats @JeffDean@OriolVinyalsML@demishassabis and team - you really cooked !
@yacineMTB When I worked at Google we never even had GPUs on our local workstations, so we would have a set of hyper parameters that would let us train a smaller version on TF CPU + had unit tests. And then just launch borg jobs for the real deal.
It was a pretty decent setup overall.
@dharmesh You cannot compensate for a fundamentally lower quality model beyond a rather low ceiling by doing things by prompting / context engineering .
True intelligence is in the weights. Otherwise we’d by using cheap models / not burning large amounts of GPU compute for better models
@dharmesh I think you are mixing up harness vs context. Great harnesses respect the bitter lesson and mostly get out of the way and let the model drive by calling the LLM in a loop.
And the more “smart” your harness is, the more it interferes with the skill expression of the raw model.
@lukaszkaiser That said a lot of innovation - especially UX innovation - accepts remixing as a practice.
If the end users problem gets solved and more people are productive, IMO that’s the important part.