Alex Boruch-Gruszecki @abgruszecki - Twitter Profile

4 days ago

@runarorama Verified against which specifications, exactly? Formal verification is far from a silver bullet which prevents all bugs.

0

15

Alex Boruch-Gruszecki @abgruszecki

about 1 month ago

Congrats to @sertealex for the first near-perfect transpiled submission to https://t.co/gB3O7BFauR, I can't think of a better trophy than the one he received! https://t.co/dMxSQb9tix

0

1

0

30

Alex Boruch-Gruszecki @abgruszecki

about 1 month ago

One takeaway from ProgramBench is that some current LLMs tend to give up too early. I'm curious what happens if we prompt models to work until timeout, since long-running autonomous agent have interesting issues! See link https://t.co/A5jBueP8ef /cc @jyangballin @KLieret

0

29

Alex Boruch-Gruszecki @abgruszecki

about 1 month ago

ProgramBench is awesome! It shows there's a lot of room to improve LLMs and agents at autonomously implementing entire programs from scratch, and it's a great hill climb target https://t.co/g5nH24WwTK

1

0

42

Who to follow

Peisen Yao

@YaoPeisen

Assistant Professor@Zhejiang University https://t.co/Elyzic0Z6P

Ben Delaware

@GhostofBendy

Professor at @PurdueCS; dad; occasional cancer fighter🎗️. Tweets about programming languages and formal methods, but mostly bird pictures.

Oliver Bračevac

@etaconversion

Researcher in CS, Scala 3 Compiler Engineer & Team Lead @epfl, Serves on the Scala Core Team and SIP committee. Agonizes over Effects, Capabilities & Ownership.

Alex Boruch-Gruszecki @abgruszecki

about 2 months ago

Can one person really use agents to manage nearly half a million lines of JavaScript, and produce a maintainable codebase? One way to find out! https://t.co/6jVkW4kspZ

0

1

0

21

Alex Boruch-Gruszecki @abgruszecki

about 2 months ago

I'll be helping David Bau run a contest which will let its participants show us how far they can push agentic coding! The task is "simple": port NetHack, over 440k lines of C and Lua, to JavaScript. Easy to verify, but still far from easy for agents. https://t.co/Myd68sKnX1

1

2

0

1

69

Alex Boruch-Gruszecki @abgruszecki

about 2 months ago

Porting NetHack seems like a great challenge for agents. The translation can be verified using pre-recorded input-output behaviors. But the feedback for the agent is sparse, the debugging chains get long, and early architectural mistakes can look good for a long time.

1

0

48

abgruszecki retweeted

François Chollet

@fchollet

3 months ago

The G in AGI stands for "general". General intelligence does not mean that you have been specifically trained for a large range of tasks. It means you can approach any NEW task and figure it out, just like humans do. If regular people can do it on their own (no guidance, no tools), why should AGI require special handholding and handcrafted instructions? If it's AGI, why would there still be a human in the loop, using their own human intelligence to guide the model on every new task?

134

1K

89

219

188K

abgruszecki retweeted

ARC Prize

@arcprize

3 months ago

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

248

4K

576

932

743K

Alex Boruch-Gruszecki @abgruszecki

3 months ago

I'm working on scaling Agnostics to larger problems. Esolangs are an exciting angle! I'd be glad to talk more, maybe at ICLR. Some people I know also may be interested. See more about Agnostics here: https://t.co/zLYaRYfkWa

abgruszecki's tweet photo. I'm working on scaling Agnostics to larger problems. Esolangs are an exciting angle! I'd be glad to talk more, maybe at ICLR. Some people I know also may be interested.

See more about Agnostics here: https://t.co/zLYaRYfkWa https://t.co/cr7p14I9KF

0

39

Alex Boruch-Gruszecki @abgruszecki

3 months ago

Great study! LLMs fail at rare programming languages in surprising ways. It'd be interesting to study these failure modes on larger examples. Our Agnostics may help: we show how to make problems which can be solved in any PL. Happy to chat more @lossfunk! https://t.co/VaelPySxCY

Lossfunk

@lossfunk

3 months ago

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

152

2K

282

1K

1M

1

0

89

Alex Boruch-Gruszecki @abgruszecki

5 months ago

@ShriramKMurthi In the end, a human needs to verify if the codebase satisfies some real-world requirements. It’s hard to see how to escape that in the foreseeable future.

0

402

Alex Boruch-Gruszecki @abgruszecki

8 months ago

@__protected @odersky Congratulations @odersky! Since the invitation to your lab I’ve been on a non-stop adventure of my life. Seeing your efforts was formative for me. As Jonathan said: well deserved!

0

1

0

115

Alex Boruch-Gruszecki @abgruszecki

10 months ago

@eliebakouch We trained SmolLM3 during the Agnostics project, exactly to have a capable smol non-Qwen model! https://t.co/CCp7YT0qZf

0

24

Alex Boruch-Gruszecki @abgruszecki

10 months ago

The leaderboard also shows the results of training SmolLM3 using the Agnostics framework, it's a small (3B) but very capable model. The Lua variant shows the highest relative gains from all the Lua models we trained!

0

55

Alex Boruch-Gruszecki @abgruszecki

10 months ago

We're publishing the Ag-LiveCodeBench-X leaderboard! It shows the peformance of models on coding in low-resource programming languages, using a benchmark prepared during the Agnostics project. https://t.co/DIgChqKMQz

1

0

80

Alex Boruch-Gruszecki @abgruszecki

10 months ago

The leaderboard shows more results than what we included in the report. We can see that the models we trained rival Sonnet 4 on coding in R, and beat it and Qwen 3 Coder on Fortran!

abgruszecki's tweet photo. The leaderboard shows more results than what we included in the report. We can see that the models we trained rival Sonnet 4 on coding in R, and beat it and Qwen 3 Coder on Fortran! https://t.co/I3saKSyBit

1

0

90

Alex Boruch-Gruszecki @abgruszecki

10 months ago

@brendanh0gan Congrats on your impressive results! We published a similar report recently, although we focused on developing a universal pipeline which works on any programming language. I'm curious to see how we can learn from each other! https://t.co/EtxXybykkB

Alex Boruch-Gruszecki @abgruszecki

11 months ago

We show a way to reinforce an LLM’s ability to code in *any* programming language! We turn Qwen 3 4B and 8B into SOTA ≤16B models for low-resource programming languages, rivaling their 32B sibling. Find out more about our Agnostics project at https://t.co/AFhqJyaelR , or here👇

abgruszecki's tweet photo. We show a way to reinforce an LLM’s ability to code in *any* programming language! We turn Qwen 3 4B and 8B into SOTA ≤16B models for low-resource programming languages, rivaling their 32B sibling.

Find out more about our Agnostics project at https://t.co/AFhqJyaelR , or here👇 https://t.co/HdVbtTXMnM

1

13

2

12

5K

1

2

0

1

232

Alex Boruch-Gruszecki @abgruszecki

10 months ago

@Laz4rz This is Switzerland? It’s part of the experience I’m afraid. During my stay at EPFL something like this happened a few times each year

0

1

0

45

Alex Boruch-Gruszecki @abgruszecki

10 months ago

@ezyang I might have something which will help :) https://t.co/EtxXybykkB

Alex Boruch-Gruszecki @abgruszecki

11 months ago

We show a way to reinforce an LLM’s ability to code in *any* programming language! We turn Qwen 3 4B and 8B into SOTA ≤16B models for low-resource programming languages, rivaling their 32B sibling. Find out more about our Agnostics project at https://t.co/AFhqJyaelR , or here👇

1

13

2

12

5K

0

3

0

39

Alex Boruch-Gruszecki

@abgruszecki

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users