Wayne Chi

Verified account

@iamwaynechi

CS Ph.D. at @SCSatCMU. Funded by @NDSEG Fellowship. Editor at

Santa Clara

Joined July 2013

228 Following

925 Followers

472 Posts

Pinned Tweet

4 months ago

New preprint alert 🚨 Can LLM agents develop video games? We release GameDevBench, the first benchmark evaluating agentic game development in a game engine, Godot. We also present two simple multimodal feedback mechanisms that lead to immediate performance gains. /🧵

19

254

27

179

26K

6 days ago

@a1zhang Haters gonna hate

0

0

0

0

244

6 days ago

I think hearing this, the best thing you can do for a PhD application is to work with a well established professor. NYU does have many. Granted your understandable worry over financials, it may be beneficial to apply for MS programs that offer funding! I will add that there's a lot of noise to PhD applications so keep trying! Assuming your profs had good things to say, your app sounds like one that could make it past a first round.

0

2

0

0

314

6 days ago

@JangLawrenceK @kevinyli_ @NaveenJRaman Too meta

0

0

0

0

60

Who to follow

Verified account

founder OpalAi (https://t.co/4xbwyc0kx4) | Fmr AI scientist @NASAJPL, @Caltech @UCSanDiego @UCLAanderson #sharifuniversity #ai #mars tweets are mine.

Verified account

@hayou_soufiane

Asst Professor at @JohnsHopkins (@JohnsHopkinsAMS and @HopkinsDSAI). Previously: @SimonsInstitute, @oxfordstats, @Polytechnique. I like to scale up things!

Pouya Pezeshkpour

Research Scientist at @MegagonLabs, working on NLP/ML || PhD from UCI, and former research intern at @MSFTResearch, @Apple, @allen_ai, and @FujitsuAmerica

6 days ago

Alright I guess it's time to test Opus 4.8 on GameDevBench

なかじ / 中島大介@ウェブ職TV

7 days ago

Opus4.8すごいすね…………

133

3K

198

1K

787K

1

3

0

1

1K

6 days ago

@barrnanas @sarahcat21 Time to bring your grandma to the events. I would like to know her fav tinned fish

1

1

0

0

105

7 days ago

@barrnanas @sarahcat21 You're a tinned fish Barr! 🐟

1

1

0

0

59

10 days ago

From my experience doing both, this is the most accurate differentiator. Most of the differences in methods and skills stem from this.

François Fleuret

@francoisfleuret

11 days ago

IMO a researcher studies a problem that may not be solvable, while an engineer solves a problem that is considered solvable.

200

2K

79

290

297K

0

5

0

1

816

18 days ago

@_sholtodouglas Paper: https://t.co/Ptu7RDMrlY

0

0

0

0

59

18 days ago

@_sholtodouglas I've been evaluating agents for game development (see our GameDevBench work) and Claude is a noticeably worse than both Codex and Gemini at game development. Going to update the benchmark soon and will keep you posted if you're interested.

iamwaynechi's tweet photo. @_sholtodouglas I've been evaluating agents for game development (see our GameDevBench work) and Claude is a noticeably worse than both Codex and Gemini at game development.

Going to update the benchmark soon and will keep you posted if you're interested. https://t.co/dpNUWgHS8K

1

1

0

0

252

iamwaynechi retweeted

Thomas G. Dietterich @tdietterich

21 days ago

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

140

6K

918

1K

1M

21 days ago

@bansalg_ Claude

1

1

0

0

127

24 days ago

@_Suresh2 @OpenAI @AnthropicAI @Google This isn't really a benchmark issue, but more of a methods problem. Also, in our benchmark agents are able to observe video of the scene so there is reasoning over multiple frames.

0

0

0

1

33

24 days ago

We observed this a month or two ago on GameDevBench! Ever since GPT 5.4, @OpenAI took over as the best agent for game development. However @AnthropicAI was never in the lead; the best was actually @Google with Gemini (good at multimodal understanding). Good to see further confirmation on what's SOTA for game development.

24 days ago

Fun fact, GPT 5.5 is very good at Game Dev Game Dev is the notable category where @OpenAI consistently beats out @AnthropicAI's Claude models Upon code inspection, our @Designarena team found that GPT 5.5's frontend verbosity plays in its favor for game dev - it consistently created games with the most functional features Congrats to @OpenAI for establishing the new Game Dev frontier!

grx_xce's tweet photo. Fun fact, GPT 5.5 is very good at Game Dev

Game Dev is the notable category where @OpenAI consistently beats out @AnthropicAI's Claude models

Upon code inspection, our @Designarena team found that GPT 5.5's frontend verbosity plays in its favor for game dev - it consistently created games with the most functional features

Congrats to @OpenAI for establishing the new Game Dev frontier!

13

196

11

32

25K

3

12

1

1

2K

25 days ago

A big downside with the the new focus on ArXiv is you have to read (and eventually cite) some absolutely awful papers that would clearly never pass peer review...

0

6

0

0

693

25 days ago

I love how southern Jensen sounds when he says America. 'Murica!🇺🇸🇺🇸🇺🇸🦅🦅🦅

0

1

0

0

166

about 1 month ago

GameDevBench has been accepted into ICML 2026! See everyone in Seoul soon!

4 months ago

New preprint alert 🚨 Can LLM agents develop video games? We release GameDevBench, the first benchmark evaluating agentic game development in a game engine, Godot. We also present two simple multimodal feedback mechanisms that lead to immediate performance gains. /🧵

19

254

27

179

26K

0

28

4

3

1K

about 1 month ago

Exciting work and really cool to see Moonlake reference GameDevBench as a precursor to their work! The future of agentic game development is bright ☀️

Moonlake @moonlake

about 1 month ago

Introducing Moonlake's 3D Agent. Our agent acts like a technical artist that can build and reconstruct articulated assets and large-scale editable scenes with hundreds of objects from a single image and can improve its generations continuously. Learn more in the thread below.

39

1K

180

2K

1M

0

16

0

1

1K

about 1 month ago

@bansalg_ @MSFTResearch I cannot believe I didn't see this!!!!

0

1

0

0

104

about 1 month ago

And they're cutting the next presenter's questions too???

0

4

0

0

681

about 1 month ago

The presenters in front of me took 15 minutes instead of 10 minutes each. And then the conference organizer CUT MY QUESTIONS??? wtf @iclr_conf

1

30

2

0

6K

Last Seen Users on Sotwe

Trends for you

Most Popular Users