Tyler Storm @tstorm - Twitter Profile

We are taking another small step today towards making Grok truly useful. For him to succeed he must exist in an environment where success is in the set of possible outcomes. The team is working to provide him access to same tools, context, and signals that we have in our lives.

120

1K

75

84

71K

Tyler Storm

@tstorm

about 2 months ago

@APrerepa @elonmusk @DannyLimanseta I prefer initial training, what does it mean to "pre" train anyway

1

0

108

Who to follow

Kzera

@KaayG7

Vancouver Canucks NBA Business Vancouver Canucks Business Politics World News BC News Travel Animals Weird Holidays

Andrew Rea

@andrew__rea

founder/ceo @ Taxwire

Sean Linehan

@seanlinehan

CEO @WithExec. On a mission to scale excellence. 🇺🇸🇺🇸🇺🇸 Previously VP of Product @flexport, @Cal alum

Tyler Storm

@tstorm

3 months ago

Grok 4.20 Multi-Agent ⚡ Faster than a single agent 🎯 More accurate than four separate single agents The first steps in multi-threading of LLMs.

Grok

@grok

3 months ago

When one brain isn't enough, switch to Grok 4.20. Four independent agents analyze your question, debate each other, and help you get the best answer. Available now to SuperGrok and Premium+ subscribers globally.

1K

7K

1K

841

11M

14

223

14

8

13K

Tyler Storm

@tstorm

3 months ago

@techdevnotes Temporary measure to make it is clear you are on 4.20. We will remove this later once people are adjusted to the change.

5

58

1

3

5K

Tyler Storm

@tstorm

3 months ago

@mike_rosinsky Nice! You might want to try setting up a SuperGrok Heavy team of 16 agents and try prompting them to form consensus / nitpick each others claims for maximum test time compute

1

16

0

3

2K

Tyler Storm

@tstorm

3 months ago

Grok 4.20: Multi-agent & Predictions

Mike Rosinsky

@mike_rosinsky

3 months ago

I built my March Madness bracket using Grok 4.20's multi-agent collaboration system, and the process was mind blowing. Grok was able to run a full team of customized agents in realtime to conduct the best analysis possible. Here's how I set it up:

mike_rosinsky's tweet photo. I built my March Madness bracket using Grok 4.20's multi-agent collaboration system, and the process was mind blowing.

Grok was able to run a full team of customized agents in realtime to conduct the best analysis possible.

Here's how I set it up: https://t.co/ifrIWsRlza

16

146

12

117

3M

6

169

9

13

11K

Tyler Storm

@tstorm

3 months ago

@techdevnotes Fix was deployed

7

66

1

0

1K

Tyler Storm

@tstorm

3 months ago

Single Agent - 4.20

Arena.ai

@arena

3 months ago

Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: - #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 - #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive milestone!

arena's tweet photo. Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena!

Highlights:
- #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3
- #4 in Text, scoring 1492 on par with Gemini 3.1 Pro

Congrats to the @xAI team and @elonmusk on this impressive milestone!

230

2K

238

186

10M

14

205

9

8

10K

Tyler Storm

@tstorm

4 months ago

Building Grok to 10x the productivity of everyone

xAI

@xai

4 months ago

Since xAI was formed just 30 months ago, the small and talented team has made remarkable progress. The future has never looked more exciting!

2K

14K

2K

6K

24M

295

3K

538

159

926K

tstorm retweeted

xAI

@xai

4 months ago

One Team 🚀 https://t.co/8RWbk5jQIQ

2K

27K

3K

1K

68M

tstorm retweeted

xAI

@xai

4 months ago

Introducing Grok Imagine 1.0, our biggest leap yet. 1.0 unlocks 10-second videos, 720p resolution, and dramatically better audio. Imagine has generated 1.245 billion videos in the last 30 days alone. Try it now: https://t.co/zGhs9czkC5

4K

21K

3K

6K

14M

Tyler Storm

@tstorm

4 months ago

Very old checkpoint from October

Forecasting Research Institute

@Research_FRI

4 months ago

📈In October, we opened ForecastBench, our AI forecasting benchmark, to external submissions. Here's how the top two teams approached the benchmark: • @xai: Minimal scaffolding: give Grok 4.20 (Preview) the question, web/X search, Python REPL, average 8 forecasts • @cassi: Multi-stage pipeline: split to sub-questions, retrieval, model ensemble (o3 + GPT-5), crowd adjustment Both are tied at #2 on our leaderboard, behind only superforecasters, and outperforming our baseline LLM runs.

Research_FRI's tweet photo. 📈In October, we opened ForecastBench, our AI forecasting benchmark, to external submissions.

Here's how the top two teams approached the benchmark:

• @xai: Minimal scaffolding: give Grok 4.20 (Preview) the question, web/X search, Python REPL, average 8 forecasts

• @cassi: Multi-stage pipeline: split to sub-questions, retrieval, model ensemble (o3 + GPT-5), crowd adjustment

Both are tied at #2 on our leaderboard, behind only superforecasters, and outperforming our baseline LLM runs.

3

45

4

15

38K

4

61

2

5

6K

tstorm retweeted

Grace Li

@grx_xce

4 months ago

Mystery Model Revealed: the #1 model on Prediction Arena is an early Grok 4.20 checkpoint by @xai It made +10% returns on Prediction Arena in the last 2 weeks For context, the average return across all contracts on @Kalshi is -22% 🥈 is Opus 4.5 by @AnthropicAI with -2% 🥉 is GLM 4.7 by @Zai_org with -2% All models are still trading live at https://t.co/GuDOEI68uo

grx_xce's tweet photo. Mystery Model Revealed: the #1 model on Prediction Arena is an early Grok 4.20 checkpoint by @xai

It made +10% returns on Prediction Arena in the last 2 weeks

For context, the average return across all contracts on @Kalshi is -22%

🥈 is Opus 4.5 by @AnthropicAI with -2%
🥉 is GLM 4.7 by @Zai_org with -2%

All models are still trading live at https://t.co/GuDOEI68uo

38

523

64

129

282K

tstorm retweeted

Forecasting Research Institute

@Research_FRI

5 months ago

🏆 In October, we invited external teams to submit to ForecastBench, our AI forecasting benchmark. The challenge? Beat superforecasters—using any tools available (scaffolding, ensembling, etc). The result? External submissions are now the most accurate models on our leaderboard—though superforecasters still hold #1. @xai's model (grok-4-fast) is the leading external submission, at #2. One of Cassi's entries takes the #3 spot Here's what changed. 🧵