Jonathan Roberts @JRobertsAI - Twitter Profile

Pinned Tweet

Jonathan Roberts @JRobertsAI

over 1 year ago

Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6

57

3K

247

2K

1M

Jonathan Roberts @JRobertsAI

about 1 month ago

@J3m5Dev @sama The cost is roughly $50-60 for 100 questions. The mean time per question is approx. 5 mins

1

0

42

Jonathan Roberts @JRobertsAI

about 1 month ago

GPT-5.5 (xhigh) sets a new pass^5 high score on ZeroBench pass@5: 22% (SOTA 23%) pass^5: 10% (prev. SOTA 8%) Best 5/5 reliability so far Strong result from @sama and the OpenAI team

JRobertsAI's tweet photo. GPT-5.5 (xhigh) sets a new pass^5 high score on ZeroBench

pass@5: 22% (SOTA 23%)
pass^5: 10% (prev. SOTA 8%)

Best 5/5 reliability so far

Strong result from @sama and the OpenAI team https://t.co/CMHNjdcSJd

4

164

16

20

7K

Jonathan Roberts @JRobertsAI

about 1 month ago

The matrix shows how often the models answered each question correctly across 5 samples Leaderboard: https://t.co/NouEsFxJEM Data: https://t.co/mD8Eptr9M5

JRobertsAI's tweet photo. The matrix shows how often the models answered each question correctly across 5 samples

Leaderboard: https://t.co/NouEsFxJEM

Data: https://t.co/mD8Eptr9M5 https://t.co/AAqABMOkHn

0

5

1

450

Who to follow

Pingchuan Ma

@openpcma

@meta @openai @mit_csail

🚀Henry is leading AI Safety Research Programs

@sleight_henry

AI Safety Research Programs @ConstellOrg, Anthropic Fellows, OpenAI Safety Fellowship, ex-MATS Working out how to Do Big Good, but sanely! Apologetically myself

Vishaal Udandarao

@vishaal_urao

@ELLISforEurope PhD Student @bethgelab; Currently @Apple; Previously @GoogleAI @GoogleDeepMind @Cambridge_Uni @RutgersU @iiitdelhi

Jonathan Roberts @JRobertsAI

about 2 months ago

Leaderboard: https://t.co/E4noN7yDDM Data: https://t.co/mD8Eptr9M5

0

2

0

71

Jonathan Roberts @JRobertsAI

about 2 months ago

The Claude models are great for coding But on visual reasoning they still trail the frontier On ZeroBench (pass@5 / pass^5): Opus 4.7 (xhigh) - 14 / 4 Opus 4.6 - 11 / 2 GPT-5.4 (xhigh) - 23 / 8

1

15

2

1

2K

JRobertsAI retweeted

Kai Han @kaihan_x

about 2 months ago

Muse Spark scores 33% pass@5 on our ZeroBench.🚀 Glad to see models getting further away from "zero". https://t.co/9BF6krZkPY

2

22

2

3

3K

Jonathan Roberts @JRobertsAI

about 2 months ago

👀 Muse Spark scores 33% pass@5 w/ python on ZeroBench

Alexandr Wang

@alexandr_wang

about 2 months ago

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

alexandr_wang's tweet photo. 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵 https://t.co/fThDXdsxwB

741

10K

1K

3K

5M

0

6

2

1

407

Jonathan Roberts @JRobertsAI

2 months ago

Project page w/ leaderboard: https://t.co/2EOoXu1qnH

0

1

0

72

Jonathan Roberts @JRobertsAI

2 months ago

📣📣 New SOTA for GRAB-lite, our graph analysis benchmark GPT-5.4 crushes Opus 4.6: 71.0% vs 45.6% Impressive work from @gdb and team Updated leaderboard now live

JRobertsAI's tweet photo. 📣📣 New SOTA for GRAB-lite, our graph analysis benchmark

GPT-5.4 crushes Opus 4.6: 71.0% vs 45.6%

Impressive work from @gdb and team

Updated leaderboard now live https://t.co/JsfatK3Sc8

1

9

1

2K

Jonathan Roberts @JRobertsAI

3 months ago

Following yesterday's update, the @warpsurfai Node SDK now also supports @usekernel Fast remote browser sessions for JavaScript and TypeScript

0

4

0

150

Jonathan Roberts @JRobertsAI

3 months ago

https://t.co/YwuvCyFtbm

0

1

0

64

Jonathan Roberts @JRobertsAI

3 months ago

The @warpsurfai Python SDK now has @usekernel support You can now run warpsurf from Python on crazy fast browser sessions

1

4

0

520

Jonathan Roberts @JRobertsAI

3 months ago

https://t.co/xkStgfBStM

0

1

0

29

Jonathan Roberts @JRobertsAI

3 months ago

A small update for @warpsurfai warpsurf now fully supports the Brave browser

2

3

1

0

156

Jonathan Roberts @JRobertsAI

3 months ago

https://t.co/UV11MKyRx7

0

1

0

37

Jonathan Roberts @JRobertsAI

3 months ago

Following the Python SDK, Browserbase support is now live in the warpsurf Node SDK Run warpsurf on remote browsers from Node.js @warpsurfai x @browserbase

1

2

0

94

Jonathan Roberts @JRobertsAI

3 months ago

https://t.co/YwuvCyFtbm

0

2

0

42

Jonathan Roberts @JRobertsAI

3 months ago

warpsurf can now run on remote browsers The warpsurf Python SDK now supports Browserbase Run warpsurf without managing the browser infrastructure yourself @warpsurfai x @browserbase

1

2

0

94

Jonathan Roberts @JRobertsAI

3 months ago

Node SDK: https://t.co/UV11MKyRx7 warpsurf: https://t.co/Ka73nFEqIR Project page: https://t.co/xkStgfBStM

0

1

0

35

Jonathan Roberts @JRobertsAI

3 months ago

There’s now a Node SDK for @warpsurfai Run browser workflows from TypeScript on Node.js Use it in your own scripts, tools, and pipelines Details and repos below 👇

JRobertsAI's tweet photo. There’s now a Node SDK for @warpsurfai

Run browser workflows from TypeScript on Node.js

Use it in your own scripts, tools, and pipelines

Details and repos below 👇 https://t.co/RjLnDvCsGs

1

3

0

117

Jonathan Roberts

@JRobertsAI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users