Spreadsheet Arena @sheetarena - Twitter Profile

Pinned Tweet

4 months ago

Spreadsheets have entered the arena! ⚔️ Announcing Spreadsheet Arena, the first research platform for human preference rankings on LLM-generated spreadsheets. The results? @AnthropicAI Claude Opus is on top, but the gap is tighter than you’d think. w/ @LTIatCMU, @Cornell, and @scale_ai. 🧵

2

36

5

22

20K

Spreadsheet Arena

@sheetarena

4 months ago

⚔️BREAKING: Gemini 3.1 Pro Preview debuts as #7 overall on Spreadsheet Arena, trailing Gemini 3 Pro by @GoogleDeepMind which currently stands at #6

sheetarena's tweet photo. ⚔️BREAKING: Gemini 3.1 Pro Preview debuts as #7 overall on Spreadsheet Arena, trailing Gemini 3 Pro by @GoogleDeepMind which currently stands at #6 https://t.co/rUqY7trnTw

0

8

0

220

Spreadsheet Arena

@sheetarena

4 months ago

⚔️BREAKING: Claude Sonnet 4.6 by @AnthropicAI debuts at #2 in Spreadsheet Arena, trailing Opus 4.6!

0

8

1

547

Spreadsheet Arena

@sheetarena

4 months ago

Sonnet 4.6 is now on Spreadsheet Arena! How well can it model Anthropic's Series G?

Claude

@claudeai

4 months ago

This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.

1K

22K

2K

5K

8M

0

9

1

3

1K

Spreadsheet Arena

@sheetarena

4 months ago

⚔️BREAKING: Claude Opus 4.6 by @AnthropicAI debuts at #1 in Spreadsheet Arena, surpassing Opus 4.5!

0

22

2

7

2K

Spreadsheet Arena

@sheetarena

4 months ago

TL;DR: Spreadsheet generation is multi-dimensional. Human preference data captures what users actually value, but different dimensions matter across domains, and some signals surface more clearly than others. Spreadsheet Arena gives us a powerful foundation for evaluation, and a new lens for improving post-training. Start a battle at https://t.co/MHGdWX6xxm Read the paper at https://t.co/6GX4ify8CS @srkundurthy @claranahhh @Zachkirshner @calvincbzhang @ManasiSharma_ @jhnling

1

10

1

0

725

Spreadsheet Arena

@sheetarena

4 months ago

Spreadsheets have entered the arena! ⚔️ Announcing Spreadsheet Arena, the first research platform for human preference rankings on LLM-generated spreadsheets. The results? @AnthropicAI Claude Opus is on top, but the gap is tighter than you’d think. w/ @LTIatCMU, @Cornell, and @scale_ai. 🧵

2

36

5

22

20K

Spreadsheet Arena

@sheetarena

4 months ago

Feature effects don’t generalize across domains. Finance color coding conventions (e.g., blue inputs, black formulas) aren't significantly impactful on model rankings arena-wide. But zoom into Finance prompts and it's the single strongest predictor of winning. Even then, expert raters disagree with crowd preferences nearly half the time.

1

7

0

604

Spreadsheet Arena

@sheetarena

Last Seen Users on Sotwe

Trends for you

Most Popular Users