Jian Mo @pythagodzilla - Twitter Profile

Jian Mo

@pythagodzilla

2 days ago

@Mkstv17 @a1zhang https://t.co/kFumBiWKsM this was two years earlier

0

13

Jian Mo

@pythagodzilla

2 days ago

@Vtrivedy10 I feel this idea of using code agent for search has been around for about two years https://t.co/kFumBiWKsM

0

42

pythagodzilla retweeted

Intology

@intology

16 days ago

Can coding agents do research? We release NanoGPT-Bench, an internal eval we’ve used to test agents on an AI R&D problem with months of human progress Codex, Claude Code, Autoresearch recover only 9.3% of human progress, mostly tuning hyperparams & ignoring algorithmic research NanoGPT-Bench is built on the NanoGPT Speedrun, a popular LLM pretraining competition to minimize the training time of a GPT-2 style model. Existing human submissions constitute nearly 2 years of work. To control for dependencies and contamination in frontier models, we standardize evaluation to a 5-month window of world records. Evaluation is fully autonomous and end-to-end, with no human intervention or internet access. 🧵

intology's tweet photo. Can coding agents do research?

We release NanoGPT-Bench, an internal eval we’ve used to test agents on an AI R&D problem with months of human progress

Codex, Claude Code, Autoresearch recover only 9.3% of human progress, mostly tuning hyperparams & ignoring algorithmic research

NanoGPT-Bench is built on the NanoGPT Speedrun, a popular LLM pretraining competition to minimize the training time of a GPT-2 style model. Existing human submissions constitute nearly 2 years of work. To control for dependencies and contamination in frontier models, we standardize evaluation to a 5-month window of world records. Evaluation is fully autonomous and end-to-end, with no human intervention or internet access. 🧵

22

276

61

172

144K

pythagodzilla retweeted

Jiayi Weng

@Trinkle23897

27 days ago

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. https://t.co/1ZaIneleuW

63

1K

234

1K

3M

Who to follow

Morgan Edmondson

@_medmondson

Interested in tech, AI, economics, and how the world works. 23 y/o. Co-founder of Nesti AI (https://t.co/isKk4XEZDy).

Lead AI Engineer Try https://t.co/Dr3Q9iVNJF - Bloomberg for prediction market. Get insights, forecasts, news, whale tracking, ai insights.

Jian Mo

@pythagodzilla

about 1 month ago

@nicbstme @Vtrivedy10 I think harness work still matters even with AGI. If the harness controls all the context flowing into the model, then it controls what the model can perceive. It doesn’t matter how good the driver (AGI or human) is — put them behind the wheel blindfolded and they’ll still crash.

0

23

pythagodzilla retweeted

Viv

@Vtrivedy10

about 1 month ago

ok so single steering instruction with gpt-5.5 produces ~12% change in Terminal Bench Score 🙃 pls read this thread to see why Prompt and Harness Engineering still matter A TON today with our current level of model intelligence :) looking at your Evals & Traces is how teams can measure the effects of these small changes 👀

5

47

5

27

4K

Jian Mo

@pythagodzilla

about 2 months ago

@Vtrivedy10 sharp as always

0

42

pythagodzilla retweeted

Aimar Haddadi

@AdvicebyAimar

about 2 months ago

i can spot a grifter from miles away. so i digged into the code to figure out if this is legit or not. guess i was right. ben is a crypto founder who runs some weird bitcoin lending platform, i was pretty sure he knows absolutely nothing about ai and memory so i tracked down the repo myself since i was curious. his website says he likes to build ai powered products and train local ai models? sure man, 80% of your github repo's are bitcoin related stuff. only one ai related project came up you forked in 2024. mempalace has 10k github stars, more than 1k forks but only.. 7 commits ? apparently the best memory layer to date? no git author history, no account connected to whoever wrote the code of this codebase. it doesn't add up.. the account who pushed the original repo, named: aya-thekeeper, under aya-thekeeper/mempal got deleted right after the repo got published. you paid a random guy named lu to build this shit out for you. ( "Written by Lu (DTL) — March 24, 2026. For: Ben." ) - benchmark md file. lu wrote the code. lu wrote the benchmarks. lu is nowhere in the readme. or mentioned in the github history? the git history then got squashed to one commit and published under milla jovovich? seriously? a actress? you say she is a great friend of yours, she has been building this project with you. she does this at night. yet she has.. 7 commits and only 2 active days in her entire github history? you paid an actress and a random guy to promote a product you know absolutely nothing about.

344

6K

700

2K

760K

Jian Mo

@pythagodzilla

2 months ago

@SakanaAILabs Congrats! Excited to see DAG as agents orchestration layer is taking off. We did similar experiments in https://t.co/FnEKVqFrQx

0

1

3

303

Jian Mo

@pythagodzilla

3 months ago

@russ98593 @Vtrivedy10 i'm building a framework does exactly this. lets see if it works

0

1

0

16

Jian Mo

@pythagodzilla

3 months ago

@kscottz Bro is taking drugs to write "lines of" markdown files. Crazy time to live

0

6

0

241

Jian Mo

@pythagodzilla

3 months ago

@michaelandregg did you train the neural net in the RL env or not?

0

690

Jian Mo

@pythagodzilla

3 months ago

@trillhause_ @maxvonhippel It's a glorified codeact

0

1

0

30

Jian Mo

@pythagodzilla

3 months ago

@theo Honestly, Exlir is perfect for multi agent/actor runtime

0

334

pythagodzilla retweeted

Jian Mo

@pythagodzilla

3 months ago

I've been digging into the RLM paper for the past few days, trying different ways to implement it across coding agents. We ended up building a skill that makes coding agents RLM'ish but even better: a persistent REPL scratchpad for any complex task. Variables survive across turns, only print() enters context. The skill you can use: https://t.co/ZQ9S3jS55N Details in: https://t.co/Wg7DgJYAIs

0

3

1

0

92

Jian Mo

@pythagodzilla

3 months ago

@Donogzs Only the whats needed for the answer are printed out otherwise keep it in the repl

0

41

Jian Mo

@pythagodzilla

3 months ago

https://t.co/5Lvdh0EI2M

2

3

0

2

374

Jian Mo

@pythagodzilla

3 months ago

@Timcast retarded?

0

5

Jian Mo

@pythagodzilla

3 months ago

@MarcJBrooker map reduce it to pieces and sent them to gpu

0

16

Jian Mo

@pythagodzilla

3 months ago

I've been digging into the RLM paper for the past few days, trying different ways to implement it across coding agents. We ended up building a skill that makes coding agents RLM'ish but even better: a persistent REPL scratchpad for any complex task. Variables survive across turns, only print() enters context. The skill you can use: https://t.co/ZQ9S3jS55N Details in: https://t.co/Wg7DgJYAIs

0

3

1

0

92