New preprint alert ๐จ
Can LLM agents develop video games?
We release GameDevBench, the first benchmark evaluating agentic game development in a game engine, Godot.
We also present two simple multimodal feedback mechanisms that lead to immediate performance gains.
/๐งต
I think hearing this, the best thing you can do for a PhD application is to work with a well established professor. NYU does have many. Granted your understandable worry over financials, it may be beneficial to apply for MS programs that offer funding!
I will add that there's a lot of noise to PhD applications so keep trying! Assuming your profs had good things to say, your app sounds like one that could make it past a first round.
@_sholtodouglas I've been evaluating agents for game development (see our GameDevBench work) and Claude is a noticeably worse than both Codex and Gemini at game development.
Going to update the benchmark soon and will keep you posted if you're interested.
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
@_Suresh2@OpenAI@AnthropicAI@Google This isn't really a benchmark issue, but more of a methods problem. Also, in our benchmark agents are able to observe video of the scene so there is reasoning over multiple frames.
We observed this a month or two ago on GameDevBench!
Ever since GPT 5.4, @OpenAI took over as the best agent for game development. However @AnthropicAI was never in the lead; the best was actually @Google with Gemini (good at multimodal understanding).
Good to see further confirmation on what's SOTA for game development.
Fun fact, GPT 5.5 is very good at Game Dev
Game Dev is the notable category where @OpenAI consistently beats out @AnthropicAI's Claude models
Upon code inspection, our @Designarena team found that GPT 5.5's frontend verbosity plays in its favor for game dev - it consistently created games with the most functional features
Congrats to @OpenAI for establishing the new Game Dev frontier!
A big downside with the the new focus on ArXiv is you have to read (and eventually cite) some absolutely awful papers that would clearly never pass peer review...
New preprint alert ๐จ
Can LLM agents develop video games?
We release GameDevBench, the first benchmark evaluating agentic game development in a game engine, Godot.
We also present two simple multimodal feedback mechanisms that lead to immediate performance gains.
/๐งต
Exciting work and really cool to see Moonlake reference GameDevBench as a precursor to their work!
The future of agentic game development is bright โ๏ธ
Introducing Moonlake's 3D Agent.
Our agent acts like a technical artist that can build and reconstruct articulated assets and large-scale editable scenes with hundreds of objects from a single image and can improve its generations continuously.
Learn more in the thread below.