Eric Pang

Verified account

@_eric_pang_

math/cs @uwaterloo | prev: ml @quora, @amazon

San Francisco, CA

Joined June 2020

753 Following

1.5K Followers

113 Posts

Pinned Tweet

9 months ago

Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!

9 months ago

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

arcprize's tweet photo. New SOTA on ARC-AGI

- V1: 79.6%, $8.42/task
- V2: 29.4%, $30.40/task

Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI

Both:
* Are open source
* Use Grok 4
* Implement program-synthesis outer loops with test-time adaptation https://t.co/C2JOx32Yeb

145

2K

255

605

8M

27

881

91

535

136K

7 months ago

@karpathy @goakhmad Agree with most points except that golden age of movies started in the 80s. imo 70s Hollywood was the most experimental with the death of the counterculture movement and the end of the Hays Code. Obviously worldwide cinema had a different peak period also.

0

11

0

1

7K

9 months ago

@pitdesi @julianweisser Coppola loves to give all-timer quotes at Cannes. https://t.co/u6EttlnyD8

John Frankensteiner @JFrankensteiner

about 3 years ago

Coppola showing up at Cannes 1979 with Apocalypse Now, still mostly insane from being in the jungle too long, just spitting bars is what it's all about

40

13K

2K

2K

2M

0

1

0

0

47

9 months ago

@ADarmouni 20 tasks out of which dataset?

1

0

0

1

153

Who to follow

Verified account

Chair Professor in AI, Hong Kong University. A Mathematical Theory of Intelligence/Memory: https://t.co/leZlkURb7j

Verified account

@PrivacyEthereum

PSE is a research and development lab delivering privacy to the Ethereum ecosystem.

9 months ago

Thanks for the cover! My architecture graph does not have a typo: when it's evaluating on the public eval set, the actual test outputs are given, so the system does check if the best program gets 100% on test examples. You are right that we don't know the answers for the submission run.

1

2

0

1

178

9 months ago

@FraserGreenlee Yes, I think this point is underdiscussed. My solution has higher accuracy and lower cost per task on ARC-1 compared to the average human.

0

14

2

0

340

9 months ago

@Simeon_Cps Replied to you there. DM's also open now.

0

1

0

0

34

9 months ago

The same reason is why ARC-AGI is the most important benchmark in AI. It is the only benchmark that's not saturated after repeated attempts from players big and small.

_eric_pang_'s tweet photo. The same reason is why ARC-AGI is the most important benchmark in AI. It is the only benchmark that's not saturated after repeated attempts from players big and small. https://t.co/yESXdRyP1J

1

4

0

0

558

_eric_pang_ retweeted

9 months ago

Grok 5 starts training in a few weeks

3K

31K

3K

1K

7M

9 months ago

@yechan_ai @jeremyberman Thank you!

0

2

0

0

639

9 months ago

Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!

9 months ago

New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation

arcprize's tweet photo. New SOTA on ARC-AGI

- V1: 79.6%, $8.42/task
- V2: 29.4%, $30.40/task

Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI

Both:
* Are open source
* Use Grok 4
* Implement program-synthesis outer loops with test-time adaptation https://t.co/C2JOx32Yeb

145

2K

255

605

8M

27

881

91

535

136K

9 months ago

@martinbowling Thanks!

0

3

0

0

113

9 months ago

That's right, when the system attempts the first task, it skips the program fetching step since library's originally empty. If you want to see how the library is evolved, check out https://t.co/0mz7O2mClG. This is the resulting library after the system attempts the ARC-2 public training set to build Knowledge Priors.

0

2

0

0

133

9 months ago

@rnadomaccount11 @jeremyberman Thank you for the kind words!

0

1

0

0

457

9 months ago

@facundo_fagalde @jeremyberman Thank you!

0

1

0

0

524

9 months ago

@K3ithAI @jeremyberman On semi-private eval set: - ARC-AGI-1: 77.1%, $2.56/task - ARC-AGI-2: 26.0%, $3.97/task

1

9

0

0

968

9 months ago

@MLStreetTalk @jeremyberman Thank you! I am a huge fan.

1

1

0

0

1K

9 months ago

@vr4300 @jeremyberman Thank you!

0

2

0

0

959

9 months ago

@thomasLe_e Thanks!

0

1

0

0

103

9 months ago

@joshlee361 @arcprize @jeremyberman Check out https://t.co/wuAGzJlbuh

Hyperbolic @hyperbolic_labs

10 months ago

Excited to announce Hyperbolic's partnership with the ARC Prize (@arcprize), a groundbreaking competition pushing the frontiers of AGI! Receive up to $1000 in compute credits. 🧵

hyperbolic_labs's tweet photo. Excited to announce Hyperbolic's partnership with the ARC Prize (@arcprize), a groundbreaking competition pushing the frontiers of AGI! Receive up to $1000 in compute credits. 🧵 https://t.co/MlmvNouDHh

4

34

4

11

15K

0

1

0

1

104

9 months ago

@joshlee361 @arcprize @jeremyberman My solution is cost-efficient. It costs <$500 to fully test on the public eval set with Grok-4. You can decrease the cost further with a more lightweight model.

2

1

1

0

189

_eric_pang_ retweeted

9 months ago

I'm back at the top of ARC-AGI with my new program. I use @grok 4 and multi-agent collaboration with evolutionary test-time compute

jeremyberman's tweet photo. I'm back at the top of ARC-AGI with my new program. I use @grok 4 and multi-agent collaboration with evolutionary test-time compute https://t.co/aPKm3OnF0D

72

1K

92

343

517K

9 months ago

My code is open-sourced! https://t.co/QhL70rvcMI

2

118

7

55

5K

9 months ago

You can read the full write-up here: https://t.co/xdjp4swOQy

1

59

2

19

5K

Last Seen Users on Sotwe

Trends for you

Most Popular Users