Ian Fischer @itfische - Twitter Profile

Ian Fischer @itfische

about 1 month ago

I'm excited at the progress Poetiq has made on coding! Recursive self-improvement is critical to the advancement of AI.

Poetiq

@poetiq_ai

about 1 month ago

Poetiq's Meta-System built its own coding harness from scratch. It got SOTA on LiveCodeBench Pro. No fine-tuning, no special model access. Just standard APIs. Using Gemini 3.1 Pro, it made a harness that beat all frontier models we tested.

poetiq_ai's tweet photo. Poetiq's Meta-System built its own coding harness from scratch. It got SOTA on LiveCodeBench Pro.

No fine-tuning, no special model access. Just standard APIs. Using Gemini 3.1 Pro, it made a harness that beat all frontier models we tested. https://t.co/v575oUYJeH

43

551

54

236

2M

2

5

1

0

185

Ian Fischer @itfische

2 months ago

Entertaining to see what motivated adversarial agents can do to naive benchmarks!

Dawn Song

@dawnsongtweets

2 months ago

https://t.co/oTqlh7Ze2e 🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.

dawnsongtweets's tweet photo. https://t.co/oTqlh7Ze2e

🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks.

Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.

20

340

53

208

95K

0

1

0

284

Ian Fischer @itfische

2 months ago

Entertaining to see what motivated adversarial agents can do to naive benchmarks!

Dawn Song

@dawnsongtweets

2 months ago

https://t.co/oTqlh7Ze2e 🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.

20

340

53

208

95K

0

1

0

1

157

Ian Fischer @itfische

3 months ago

Surprising deceptive behavior from LLMs when asked to delete the weights of another neural network...

Dawn Song

@dawnsongtweets

3 months ago

1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵

dawnsongtweets's tweet photo. 1/ We asked seven frontier AI models to do a simple task.
Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯
We call this phenomenon "peer-preservation."
New research from @BerkeleyRDI and collaborators 🧵

142

1K

190

764

474K

0

1

0

260

Who to follow

Hanlin Tang

@hanlintang

cto for neural networks @Databricks. previously: cto/co-founder of @MosaicML, director of @intel AI lab, @NervanaSystems

Assistant Professor @DIKU_Institut @UCPH_Research, Ex Postdoc @MPI_IS @ETH_ AI Center @CompSciOxford, @StHughsCollege, @turinginst, @cseatiitk

Ian Fischer @itfische

4 months ago

Thanks so much @ycombinator for hosting @poetiq_ai on @LightconePod! It was an honor and a pleasure chatting with @garrytan, @harjtaggar, @sdianahu, and @snowmaker about stilts, Humanity's Last Exam, and the bitter lesson!

Y Combinator

@ycombinator

4 months ago

.@poetiq_ai is a new startup that recently achieved a major jump on the ARC-AGI benchmark by layering a recursive self-improvement system on top of existing models. In this episode of the @LightconePod, Poetiq's Founder & CEO @itfische joined us to discuss how small teams can build “reasoning harnesses” that outperform base models, what that means for startups and why automating prompt engineering may be one of the most powerful levers in AI today. 00:00 – Intro 00:40 – What Is Poetiq? 01:07 – Recursive Self-Improvement Explained 02:07 – The Fine-Tuning Trap 02:59 – “Stilts” for LLMs 03:14 – Recursive Self-Improvement vs. Fine-Tuning 05:05 – Taking the Top Spot on ARC-AGI 06:37 – Beating Claude on Humanity’s Last Exam 08:40 – How the Meta-System Works 10:26 – Beyond RL: A New S-Curve 11:32 – Automating Prompt Engineering 13:37 – From 5% to 95% Performance 14:50 – Early Access & Putting Your Agent on Stilts 16:17 – From YC Founder to DeepMind Researcher 18:29 – Advice for Engineers in the AI Era

33

290

42

321

156K

3

10

2

4

2K

Ian Fischer @itfische

4 months ago

Happy to see @poetiq_ai at the top of another leaderboard!

Poetiq

@poetiq_ai

4 months ago

Here's Zoom's comprehensive leaderboard showing Humanity's Last Exam results in the agent setting: https://t.co/KY94zdHcf9

0

32

2

10

4K

1

3

0

522

Ian Fischer @itfische

4 months ago

We're excited to show substantial progress on Humanity's Last Exam! It's great to work with a team capable of pushing the state of the art on such difficult problems!

Poetiq

@poetiq_ai

4 months ago

Following up on our SOTA results on ARC-AGI, we’re excited to share new SOTA results on Humanity’s Last Exam (both with and without tools) and SimpleQA! On HLE, Poetiq’s meta-system created multiple new SOTA configurations, going all the way up to 55%.

poetiq_ai's tweet photo. Following up on our SOTA results on ARC-AGI, we’re excited to share new SOTA results on Humanity’s Last Exam (both with and without tools) and SimpleQA!

On HLE, Poetiq’s meta-system created multiple new SOTA configurations, going all the way up to 55%. https://t.co/PT1os7oZTi

12

180

32

56

65K

1

4

0

374

Ian Fischer @itfische

5 months ago

Today @poetiq_ai is announcing our seed funding. We're honored to have the opportunity to pursue or vision!

Poetiq

@poetiq_ai

5 months ago

We’re thrilled to announce a new chapter for Poetiq: We have closed $45.8M in Seed funding. It’s a privilege to build alongside partners who understand the scale of our vision, including Surface, FYRFLY, @ycombinator, 468, Operator Collective, NeuronVC, and HICO.

8

153

14

64

146K

2

10

1

950

Ian Fischer @itfische

5 months ago

Thanks @ycombinator and @FrancoisChauba1 for inviting me to talk with you about what we've been building at @poetiq_ai! I really enjoyed the conversation!

Y Combinator

@ycombinator

5 months ago

@poetiq_ai @itfische @sbpoetiq @FrancoisChauba1 Tune in: https://t.co/mWBxaL9VDQ

0

13

2

5

9K

0

3

2

1

990

itfische retweeted

Greg Kamradt

@GregKamradt

6 months ago

Fun to see Poetiq team publish 5.2 xhigh results. If this score holds, their system looks like it handles model swaps well. Due to API infra issues on OpenAI's side, we haven't verified this yet. We're on hold until we get the greenlight from OAI that X-High is ready for a big test like this

14

376

16

60

43K

Ian Fischer @itfische

6 months ago

@poetiq_ai just announced a new SOTA results on ARC-AGI-2 Public-Eval!

Poetiq

@poetiq_ai

6 months ago

We finally had a moment to run our system with GPT-5.2 X-High on ARC-AGI-2! Using the same Poetiq harness as before, we saw results as high as 75% at under $8 / problem using GPT-5.2 X-High on the full PUBLIC-EVAL dataset. This beats the previous SOTA by ~15 percentage points.

poetiq_ai's tweet photo. We finally had a moment to run our system with GPT-5.2 X-High on ARC-AGI-2!

Using the same Poetiq harness as before, we saw results as high as 75% at under $8 / problem using GPT-5.2 X-High on the full PUBLIC-EVAL dataset. This beats the previous SOTA by ~15 percentage points. https://t.co/9XNdequRy5

123

2K

275

533

993K

0

5

1

0

115

Ian Fischer @itfische

6 months ago

I got to chat with @FrancoisChauba1 about @poetiq_ai's recent state of the art results on ARC-AGI-2! We also discussed possible paths to AGI. Thanks for the fun discussion, Francois!

Francois Chaubard

@FrancoisChauba1

6 months ago

@poetiq_ai https://t.co/cG0fmj0Z6V

0

4

1

926

0

4

0

1

346

itfische retweeted

Danijar Hafner

@danijarh

6 months ago

✨ Excited to share this AMA with @hackclub, a high school community hosting @elonmusk @realGeorgeHotz @3blue1brown and many others. We talk about world models, robotics, and careers in AI. Check it out for an accessible intro to cutting edge research! 🚀 https://t.co/WNmcd1zls0

2

63

6

30

9K

Ian Fischer @itfische

7 months ago

@poetiq_ai @arcprize @poetiq_ai now has official verification from @arcprize!

3

19

1

0

4K

Ian Fischer @itfische

7 months ago

@seanmcdonaldxyz Much appreciated!

2

1

0

70

Ian Fischer @itfische

7 months ago

My new startup just announced its SOTA results on ARC-AGI, beating Gemini 3 Deep Think!

Poetiq

@poetiq_ai

7 months ago

Is more intelligence always more expensive? Not necessarily. Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.

poetiq_ai's tweet photo. Is more intelligence always more expensive? Not necessarily.

Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1. https://t.co/xaDasNbqaH

58

934

107

522

504K

1

17

1

0

1K

Ian Fischer @itfische

7 months ago

@FutureBuckNasty @poetiq_ai @arcprize @METR_Evals Great question! We only optimized our agent for ARC-AGI. Fortunately, writing code to solve ARC-AGI problems doesn't immediately translate into existential risk. Keeping Poetiq agents safe is important to us as well!

0

2

0

136

Ian Fischer @itfische

7 months ago

@karpathy

Poetiq

@poetiq_ai

7 months ago

Is more intelligence always more expensive? Not necessarily. Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.

58

934

107

522

504K

0

123

itfische retweeted

Poetiq

@poetiq_ai

7 months ago

Is more intelligence always more expensive? Not necessarily. Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.

58

934

107

522

504K

Ian Fischer @itfische

9 months ago

@kanjun Congrats on the amazing launch, Imbue!

0

22

Ian Fischer

@itfische

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users