Ilya A. @ain92ru - Twitter Profile

Pinned Tweet

Ilya A. @ain92ru

over 1 year ago

The flag under which Syrian Arab Army (probably) goes down in history #Syria

8

1K

136

62

79K

Ilya A. @ain92ru

about 5 hours ago

@tikgiau September 2025 is quite recent and it's difficult to get an accurate measurement from such a short period, could you please benchmark much older models, preferably into 2024? Maybe you could also do a separate open-weights frontier

0

12

ain92ru retweeted

Deyao Zhu

@tikgiau

1 day ago

Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interaction time with high precision. EdgeBench is built with three ingredients: - 🌍 Real & Diverse: 134 real-world tasks across 6 task categories, spanning scientific problems, professional knowledge work, software engineering, optimization, formal math, and games. - ⏳ Ultra-Long-Horizon: Each task supports 12–72 hours of agent work. Recorded human effort averages 57.2 hours. - 🔁 Informative Feedback: Agents receive real-world feedback for continuous improvement. After 38,000 hours of agent runs on EdgeBench, a scaling law for learning from environments emerges: - 📈 As agents interact with task environments over time, their aggregate performance is precisely fit by a log-sigmoid function. - 🧠 This phenomenon can be explained by an elegant theory of graph exploration. We are releasing an initial 51 of the 134 tasks, together with the full evaluation framework, to help advance long-horizon agent research. Check our blog & paper for more findings! Blog https://t.co/nMOzFsOhbT Paper https://t.co/rZb3eWuvik GitHub https://t.co/oemXd4UrFw Dataset https://t.co/P4SQMrM47o Details below 👇🧵

35

758

142

539

225K

ain92ru retweeted

alex nikolic(icml 2026)

@justALEXWORTEGA

1 day ago

Six offline RL distillation losses. One base model. The exact same math rollouts. Do they actually learn the same thing? Turns out most "new" losses — RFT, DFT, offline GRPO — write nearly the same direction in weight space as plain SFT. Only DPO learns something genuinely different: near-orthogonal, its own loss basin, rewires what the network computes. Reward-weighting changes the step size. DPO changes the direction. Accepted at @icmlconf (MechInterp workshop) paper + interactive companion 👇 https://t.co/lWy2GRaVPs

justALEXWORTEGA's tweet photo. Six offline RL distillation losses. One base model. The exact same math rollouts.

Do they actually learn the same thing?

Turns out most "new" losses — RFT, DFT, offline GRPO — write nearly the same direction in weight space as plain SFT. Only DPO learns something genuinely different: near-orthogonal, its own loss basin, rewires what the network computes.

Reward-weighting changes the step size. DPO changes the direction.

Accepted at @icmlconf (MechInterp workshop)

paper + interactive companion 👇

https://t.co/lWy2GRaVPs

4

187

29

161

16K

Who to follow

@Guardian Investigations // Former @WashingtonPost foreign correspondent in Baghdad and Beirut // [email protected]. Signal: leloveluck.62

Fabian Hinz

@fab_hinz

Personal account. Missiles. Drones.

Ilya A. @ain92ru

2 days ago

@OrdoSeclorum @Aviation_Intel Engineering is bottlenecked by the need to actually manufacture the designed thing IRL, control the quality of manufacturing (how closely it conforms to the drawings and calculations) and measure its actual performance. That's a slow process, and no amount of AI can change that

0

9

Ilya A. @ain92ru

2 days ago

@OrdoSeclorum @Aviation_Intel Sample efficiency is a very old research direction, and if it was easy, the problem would have been solved already. The things you allude to might not come for 10+ years (there have not been fundamental architecture breakthroughs since transformer)

0

5

Ilya A. @ain92ru

2 days ago

@OrdoSeclorum I have read the letter and his point is entirely different: don't invest in productivity of your noncompetitive US factory if Chinese sweatshops with their low labor costs will win anyway

0

4

Ilya A. @ain92ru

3 days ago

@ekat2468 @amazonmilkfrog @north0fnorth Which kid of meat substitute have you tried in burgers? I needed to grab a quick bite in a mall food court yesterday, and I specifically went to Wendy's because they offer Beyond Meat which is tasty and tender

0

2

0

27

ain92ru retweeted

🐸🖤 𒁹pallas 🖤🐸 @amazonmilkfrog

3 days ago

friendly reminder that of the corn grown in the US, roughly 38% of it goes to animal feed and 36% to ethanol fuel. only like 10% goes to human consumption, half of which is sweeteners like high fructose corn syrup

26

2K

188

83

104K

Ilya A. @ain92ru

3 days ago

@Rebel44CZ @Aviation_Intel But also, economists have studied the so-called superstar effect, when imperfect substitutability & economy of scale lead to a winner-takes-all equilibrium: https://t.co/JVymVG0MH3 https://t.co/46wXq3S7f5 This is arguably the case with AI tech, which implies a natural oligopoly

0

9

Ilya A. @ain92ru

3 days ago

@Rebel44CZ @Aviation_Intel For one, this's wrong because the frontier labs have controlled the Pareto frontier (GLM-5.2 & Kimi K2.7 aren't worth their money in the most economically important uses), & will probably continue doing so. The largest models are the most cost-effective when set to low reasoning

ain92ru's tweet photo. @Rebel44CZ @Aviation_Intel For one, this's wrong because the frontier labs have controlled the Pareto frontier (GLM-5.2 & Kimi K2.7 aren't worth their money in the most economically important uses), & will probably continue doing so. The largest models are the most cost-effective when set to low reasoning https://t.co/8RPSdOjaHq

1

0

28

Ilya A. @ain92ru

3 days ago

@jacobrgoering @Aviation_Intel Most modern applications of smartphones didn't exist when they were PDAs, and even in the first years of the iPhone. Jevons' paradox describes how those applications only really appear when the tech becomes cheap and good enough, and thus widely adopted

1

0

18

Ilya A. @ain92ru

3 days ago

@OrdoSeclorum @Aviation_Intel This is already the current state of affairs, but it will diverge further indeed!

0

11

Ilya A. @ain92ru

3 days ago

@OrdoSeclorum @Aviation_Intel Absolutely, but the competitive areas you list have large margins allowing the users to pay the frontier labs back. Some other competitive areas don't enjoy that privilege

1

0

13

Ilya A. @ain92ru

3 days ago

@OrdoSeclorum @Aviation_Intel Yes, but specifically software engineers (there's not really much practical data for automating engineering at large)

1

0

12

Ilya A. @ain92ru

3 days ago

@_ueaj @paxaral @voidshapes @teortaxesTex This section is last present in Haiku 4.5 model card (October 2025) and disappears from Opus 4.5 in November. In the same month this doc page states the current state of affairs: https://t.co/XgjAn9Wt6o

0

14

Ilya A. @ain92ru

4 days ago

@Simon__Grimm @pietergaricano Where does that chart citing "NIST Figure 1" come from? It looks very dubious, as the open-weights gap is mostly stable across all benchmarks. The stability of this gap becomes especially notable if one includes non-top-3 proprietary models such as Grok, Qwen Max and Muse Spark

0

32

ain92ru retweeted

grace REMADE AND IMPROVED @agneswickfields

5 days ago

good (and historically accurate) tiktok

42

13K

794

4K

469K

ain92ru retweeted

Paul McNiel

@pbdmcniel

6 days ago

My dad gave me a piece of advice that has served me well: never make a comment about, or a play on someone's name. You will never impress them. You will only sink in their esteem. I met an old guy named John Lennon once, shook his hand and said "good to meet you Mr. Lennon." You could see the respect and joy at not hearing yet another damned comment about his name. Thanks dad.

421

39K

1K

4K

3M

Ilya A. @ain92ru

5 days ago

@RalphieRaccoon @ObcnSuperMatt @TylerM @sollidnuclear Long-range HVDC lines financed by other countries could have allowed to (re)export excess electricity further north and east, if France allowed them. No matter the cause for AC aversion, the future increase in summer electricity demand is inevitable

0

1

0

17

Ilya A. @ain92ru

5 days ago

@RalphieRaccoon @ObcnSuperMatt @TylerM @sollidnuclear If France doesn't plan to solve their pension problem by just letting thousands of elder people die in heat waves, they will have to install a lot of ACs and summer electricity demand will skyrocket

1

0

21

Ilya A.

@ain92ru

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users