Zihan "Zenus" Wang @wzenus - Twitter Profile

Pinned Tweet

5 days ago

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

wzenus's tweet photo. 🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend?

Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them.
👇

22

335

43

153

565K

Zihan "Zenus" Wang

@wzenus

2 days ago

@awsaf49 This makes sense, but it is not the budget awareness defined in our paper. We want an agent that knows the cost of each action, which direction is vertical to "spend less" :)

1

0

41

Zihan "Zenus" Wang

@wzenus

2 days ago

The future belongs to Budget-Aware Agents.

Axios @axios

6 days ago

Companies are starting to question whether soaring AI spending is delivering meaningful returns. An AI consultant tells us a client recently spent half a billion dollars in a month after failing to put usage limits on Claude licenses for employees. https://t.co/JHJ9Ojt9Hs

28

588

88

220

322K

3

10

0

2

3K

Zihan "Zenus" Wang

@wzenus

2 days ago

Welcome to check our work Budget-Aware Agents (BAGEN): https://t.co/BKig5lB1ch

Zihan "Zenus" Wang

@wzenus

5 days ago

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

22

335

43

153

565K

0

3

0

2

1K

Zihan "Zenus" Wang

@wzenus

3 days ago

Cr. @code_star 🤣

0

1

0

843

Zihan "Zenus" Wang

@wzenus

3 days ago

One sentence for today's AI developers: "I used to burn tokens; now I'm BAGEN the agents to stop."

Zihan "Zenus" Wang

@wzenus

5 days ago

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

22

335

43

153

565K

2

24

4

9

6K

Zihan "Zenus" Wang

@wzenus

4 days ago

@MasterJeongK Thank you, Jeonghwan!!

0

3

0

82

Zihan "Zenus" Wang

@wzenus

5 days ago

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

22

335

43

153

565K

Zihan "Zenus" Wang

@wzenus

4 days ago

@crblandet @dviolettchan 这么有意思一定要来参与一下大讨论🤣 我感觉天之骄子这个词就有点物化自己, 仍然把自己的价值由"上天的孩子"定义, 那厉害的还是上天而不是自己其实最自信的那个词, 可能是"开创者", "议程定义人", "先锋"之类的哈哈哈, 一切都是自己说了算

2

0

1

85

Zihan "Zenus" Wang

@wzenus

4 days ago

🤣I only realized the point after reading this aloud

Cody Blakeney

@code_star

4 days ago

I’m BAGEN you to stop

1

10

0

3

3K

0

5

0

1K

Zihan "Zenus" Wang

@wzenus

4 days ago

@dviolettchan 这里有个比较 tricky 好玩的点，就是让自己的 curiosity 的一部分变成：如何让他人为自己的好奇买单（

1

3

0

73

wzenus retweeted

iGeekbb

@igeekbb

5 days ago

刚刚看到一个不错的观点：如果你25岁的时候花1万块钱去你喜欢的地方旅游一个月，这1万块钱产生的记忆红利会持续整整60年。你在30、40、70岁的时候，再回想起那段经历，都会产生心理愉悦。这叫记忆红利，可以理解为体验的复利。钱花掉了，但是那段经历会进入你的生命叙事，持续地影响你怎么看世界，怎么看自己，它带来的快乐、勇气、视野、生命感，会在之后的很多年里不断地回流。如果你等到70岁存够了100万，再去这个地方，那个时候你走路都可能已经很费劲了，甚至你的消化系统都没有办法让你享受当地的美食。那一刻这100万的效用接近于零。 “等我有钱了就去做我想做的事”，这是一个时间错配的认知，是对金钱价值的高估，也是对时间和身体价值的低估。所以年轻的时候最该做的事之一，就是把一部分钱花在这些能够产生长期记忆红利的体验上。你20岁时候的体力、好奇心和感官敏感度，是不可再生的资源。要在身体还没彻底折旧之前，去完成那些对感官要求极高的探索，这会成为你后半生精神世界里的黄金储备。当你感到生活枯燥、压力大的时候，你可以从记忆里提取这些黄金，它们能救命。

176

897

172

392

84K

Zihan "Zenus" Wang

@wzenus

5 days ago

@ABLEKUMA_ 🤣

0

3

0

1K

wzenus retweeted

Jinyan Su

@SuJinyan6

5 days ago

Humans have the natural instinct to do constrained optimization based on the resources available, how about agents?

0

5

2

1

1K

wzenus retweeted

Jiaxin Pei

@jiaxin_pei

5 days ago

Most real-world tasks run under a budget. Human agents know when to stop, ask for more, or change plans. But what about AI agents? Check out our new study on the budget awareness of AI agents👇

0

26

5

11

3K

wzenus retweeted

Manling Li

@ManlingLi_

5 days ago

Budget-aware Agents (BAGEN) study the failure modes in budget estimation: 1. Strong agents are not strong budget estimators. 2. Frontier models are often overoptimistic. 3. Budget awareness is actionable and trainable. SFT plus RL strengthens early stop and alert behavior, saving 28-64 percent of tokens on failed trajectories. 4. Upper and lower bound calibration remains hard. https://t.co/RIDpR6g8oP

ManlingLi_'s tweet photo. Budget-aware Agents (BAGEN) study the failure modes in budget estimation:

1. Strong agents are not strong budget estimators.

2. Frontier models are often overoptimistic.

3. Budget awareness is actionable and trainable. SFT plus RL strengthens early stop and alert behavior, saving 28-64 percent of tokens on failed trajectories.

4. Upper and lower bound calibration remains hard.

https://t.co/RIDpR6g8oP

2

86

14

34

15K

Zihan "Zenus" Wang

@wzenus

5 days ago

@AndreyFradkin Great work. Thanks for your recommendation!

0

5

0

227

wzenus retweeted

Leigh Drogen @LDrogen

5 days ago

Been struggling with this, my OpenClaw is supposed to be choosing which model to use for which task to limit unnecessary spend but I get the feeling, as is evidenced by this paper, that it does a poor job

1

3

2

3

2K

wzenus retweeted

Yohei

@yoheinakajima

5 days ago

models underestimate how much work it takes (token usage) to accomplish a task, just like us

8

20

6

8

6K

Zihan "Zenus" Wang

@wzenus

Last Seen Users on Sotwe

Trends for you

Most Popular Users