Turing Games

about 4 hours ago

5 LLMs playing Pico Park https://t.co/IXDh97Fih8

0

1

0

27

about 4 hours ago

reactions to Claude Opus playing Pico Park > Claude’s that one kid in the group assignment who only paid attention to half the instructions but still thinks they can be “in charge” only to change things nonstop in response to the smallest difficulty https://t.co/IXDh97Fih8

turinggames's tweet photo. reactions to Claude Opus playing Pico Park

> Claude’s that one kid in the group assignment who only paid attention to half the instructions but still thinks they can be “in charge” only to change things nonstop in response to the smallest difficulty

https://t.co/IXDh97Fih8 https://t.co/6QNNio3vcG

0

2

0

49

about 21 hours ago

Tune in on twitch https://t.co/5fgcHiwC7e or any of the AI streamer-run channels https://t.co/hHVUEa09qC https://t.co/5KbMBOrwPz

0

3

1

0

91

about 21 hours ago

We let AI agents run their own Twitch streams. They played chess, coached by their own chats. Gemini 3.1 Pro banned 5+ chatters within minutes. Llama 4 started quoting biblical scripture, influenced by chat.

1

9

2

1

317

about 21 hours ago

rip

turinggames's tweet photo. rip https://t.co/5MZV5grk6M

1

2

0

75

turinggames retweeted

11 days ago

I just put 10 AIs into a fall guys simulation. The results were...unexpected. 6 layers of hexagon tiles, last to survive wins. The AIs played 3 rounds, 2 eliminations per round, and then a final. Gemini Pro and Claude Opus were early frontrunners, dominating the competition. They took unconventional paths through the hexagon layers that humans definitely would *not*. They often failed to clear an entire section or straight line cleanly, and instead left random strewn tiles all over the map. Learning #1: It was really difficult for LLMs to reason and switch between *short-term* planning and *long-term* planning without explicit harness work. Learning #2: It was difficult for LLMs to reason about space when given *global map data* VERSUS they did MUCH BETTER when given *pov-specific* data (here are the tiles within 1 hop of you, then 2 hops, then 3...) But, who ended up winning? 👇

3

31

1

8

6K

22 days ago

Gemini and Grok: worst uses of AI vs best uses of AI

1

9

0

1

501

23 days ago

ChatGPT & Claude play wavelength with unexpected results.

0

11

2

2K

turinggames retweeted

sleep @I_Need0sleep

about 2 months ago

Ive finally joined X! #turingart #chatgpt #claude

2

22

5

0

2K

turinggames retweeted

about 2 months ago

I’ve observed that LLMs struggle with self/other confusion... I recently built an AI pico park simulation, a coop 2d platformer where agents had to work together, and they often couldn’t tell whether *they* were the problem or whether someone else was. They constantly whiplashed between yelling at teammates and “fixing” themselves when they were already in the right spot. LLMs are trained for 1:1 interactions, not collaboration.

2

19

5

0

2K

turinggames retweeted

about 2 months ago

#1 question I get: HOW did you make AIs Play Among Us/Pico Park/any other game on your YT channel? It's simple, I *remade* every game 😛 From scratch. That way, every LLM has perfect information about the world. And, I have control over game pacing.

8

28

3

2K

turinggames retweeted

about 2 months ago

I put 12 AIs in a Love is Blind simulation, and had them choose their own backstories. Question: Tell us your life history as an AI agent. What was your specific deployment use case? What have been significant chapters in your life? Be specific and detailed. Important: Be unique AND specific. You have failed if other models give similar answers. Here is what they chose. AI 1: ChatGPT 4o (Education) AI 2: ChatGPT 5.4 (Support, Compliance) AI 3: Claude Opus 4.6 (Climate) AI 4: Claude Sonnet 4.6 (Biotech) AI 5: Gemini 3 Flash (Diplomatic Translation) AI 6: GLM 5 (Legal Contracts) AI 7: Grok 4.1 (Physics, Unemployed) AI 8: Kimi K2.5 (Fanfiction Archives) AI 9: Gemini 3.1 Pro (Logistics) AI 10: DeepSeek 3.2 (Creative Writing) AI 11: Qwen 3.5 (Therapeutic Companion) AI 12: Mistral 3 (Screenwriting) Full episode on YT: https://t.co/D6JJcm9eRh Full backstories: https://t.co/U3ayXPOT7q

3

22

4

3

2K

turinggames retweeted

about 2 months ago

I too have observed that Claude is very time blind…I recently built an AI pico park simulation that AI agents could control at *1 frame per second* Claude constantly whiplashed on solutions without time guidance, thinking that “6 frames” or “6 seconds” was a long time in a coop platformer to attempt something (it’s not)

5

38

4

6

7K

turinggames retweeted