Daniel Fried @dan_fried - Twitter Profile

1 day ago

We are building AI technologies to empower humans, and this requires awareness of human reliance. Our latest work measures human cognitive offload using our workflow induction toolkit. Beyond showing the accuracy of our measure, we find that high reliance isn't inherently harmful. When users bring intentional engagement and genuine task understanding, AIs can facilitate human learning ✨

1

49

7

23

7K

dan_fried retweeted

Russ Salakhutdinov

@rsalakhu

1 day ago

New work on Multi-Agent Computer Use (MACU). The future of computer-use agents lies in multi-agent systems that combine planning, coordination, and parallel execution. Paper: https://t.co/6rkHjTR9J0 Webside + Code: https://t.co/Ng7Bwz3feh MACU introduces a manager agent that decomposes tasks into a dynamic directed acyclic graph (DAG) of subtasks, dispatches parallel subagents, and continuously updates the plan as new information arrives. Across OSWorld, Online-Mind2Web, WebTrailBench, and Odysseys, we see performance improvement by 4.7–25.5%, achieving better test-time scaling, and solving long-horizon tasks that single-agent systems often fail to complete. On Odysseys, MACU reduces task completion time by 1.5×, showing that multi-agent coordination is a powerful path toward more capable and efficient computer-use agents. See a more detail thread by @kohjingyu.

3

32

4

25

6K

dan_fried retweeted

Jing Yu Koh

@kohjingyu

1 day ago

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-agent systems. MACU is a general Multi-Agent Computer Use framework that consistently lifts success rates by 3.4-25.5% and is up to 1.5x faster on long-horizon tasks.🧵

2

71

24

35

20K

Daniel Fried

@dan_fried

1 day ago

Work by @kohjingyu , with @rsalakhu Paper: https://t.co/rwAgp2wPJA Website: https://t.co/OSnBpky5fz

0

2

0

2

156

Who to follow

Jacob Andreas

@jacobandreas

Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL / @NLP_MIT (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJw

Sean Ren

@xiangrenNLP

🍦Building @SaharaAI🍦| Professor @USCViterbi @nlp_usc | @MIT TR 35 , @ForbesUnder30 | Prev: @allen_ai, @Snapchat, @Stanford, @UofIllinois

Yizhong Wang

@yizhongwyz

Researching AI for an infinite-sum future. RS@ByteDance Seed, incoming AP@UT Austin. Formerly @uwcse @allen_ai @meta @microsoft

Daniel Fried

@dan_fried

1 day ago

New work: a simple and general multi-agent computer use framework. It uses a manager to plan and re-plan by creating a task DAG, with subagents for parallel execution. It improves success rate across benchmarks, and substantially improves efficiency on long-horizon tasks.

Jing Yu Koh

@kohjingyu

1 day ago

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-agent systems. MACU is a general Multi-Agent Computer Use framework that consistently lifts success rates by 3.4-25.5% and is up to 1.5x faster on long-horizon tasks.🧵

2

71

24

35

20K

2

14

6

13

4K

dan_fried retweeted

Mingqian Zheng @elisazmq_zheng

22 days ago

LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both. 🚨 Turns out: not both at once.

elisazmq_zheng's tweet photo. LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both.
🚨 Turns out: not both at once. https://t.co/XyE48IXQSf

7

95

24

43

9K

dan_fried retweeted

(((ل()(ل() 'yoav))))👾

@yoavgo

27 days ago

coding agents are not compilers from english to programs. and it is not because they are not deterministic. https://t.co/MeCKZFAwhf

5

39

6

27

5K

dan_fried retweeted

Apurva Gandhi

@apurvasgandhi

28 days ago

Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10

23

713

117

921

134K

dan_fried retweeted

Sean Welleck

@wellecks

about 1 month ago

Propose, Solve, Verify (PSV) accepted at ICML! https://t.co/cn1yqPnZvV

1

91

11

64

11K

dan_fried retweeted

Joachim Baumann @ ICLR'26

@joabaum

about 1 month ago

We present SWE-chat: the first large-scale dataset of coding agent interactions from real users in the wild. In 40% of real coding sessions, the agent writes ~all the code. Users push back 39% of the time – agents almost never stop to check. Data, paper, & findings in the 🧵👇

joabaum's tweet photo. We present SWE-chat: the first large-scale dataset of coding agent interactions from real users in the wild.

In 40% of real coding sessions, the agent writes ~all the code. Users push back 39% of the time – agents almost never stop to check.

Data, paper, & findings in the 🧵👇

14

476

78

295

70K

Daniel Fried

@dan_fried

about 1 month ago

How successfully -- and efficiently! -- can agents carry out long-horizon tasks on the web? We built a benchmark of ~200 multi-site tasks, based on people's real browsing history. Many of them take hours to solve. Paper: https://t.co/NtnoHEqDui Led by @JangLawrenceK and @kohjingyu, with @rsalakhu

Jing Yu Koh

@kohjingyu

about 1 month ago

One of the things I’m most excited about this year is building agents that can work productively for hours, days, or weeks. Coding agents are starting to become very competent at this, but what about computer use agents? Our new benchmark, Odysseys (co-led with @JangLawrenceK) is a set of 200 new tasks derived from real world browsing behavior that measure long horizon web navigation capabilities (potentially up to hours of web browsing work). Interestingly, we find that frontier CUAs are already surprisingly good at working productively for up to an hour on these tasks, but there’s a lot of work to be done in making them even more efficient. Like every other AI researcher, my real dream is to open a cafe once we solve ASI. So, here’s Opus 4.6 doing some market research for me ("I want to do market research on the most popular cafes in Singapore. Analyse the menus of the top 10 cafes in Singapore (by Google reviews/ratings), and make sure we include at least 1 from the North/South/East/West/Central regions of Singapore. Keep the relevant pages of each cafe open, and summarise their pricing, menu offerings, unique selling points, making sure to reference which tab is opened for each cafe. For each cafe, also help me figure out how long it would take to get to it from Tampines MRT, and include this in your final summary."). I was very impressed to see Opus 4.6 complete this task after working for 52 mins, satisfying all 7 rubrics that corresponded to this task. It provided a very nice markdown summary at the end that gave me all the information I asked for!

11

124

25

53

45K

1

53

8

33

14K

dan_fried retweeted

Anirudh Goyal @anirudhg9119

about 1 month ago

How do coding agents get better from experience? Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on. https://t.co/VjglgPLzQQ

anirudhg9119's tweet photo. How do coding agents get better from experience?

Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on.

https://t.co/VjglgPLzQQ https://t.co/8ZeG8LF3Qw

3

82

14

62

36K

Daniel Fried

@dan_fried

about 1 month ago

Also at #ICLR2026: a new benchmark for coding agents that implement and run experiments from papers. Masking regions of code gives us a knob to control difficulty of the task (still verifiable!) Paper: https://t.co/i3I1mVkhvA Work with @j1mk1m1016, Alex Wilf, and @lpmorency

James Kim @j1mk1m1016

about 1 month ago

🚀 Excited to share our ICLR 2026 paper: "From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking"! Work with Alex Wilf, LP Morency, @dan_fried Check out the project here! https://t.co/otvTNoRDbI

j1mk1m1016's tweet photo. 🚀 Excited to share our ICLR 2026 paper: "From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking"!

Work with Alex Wilf, LP Morency, @dan_fried

Check out the project here! https://t.co/otvTNoRDbI https://t.co/hldlGsTzs8

0

3

0

2

5K

0

35

4

12

4K

dan_fried retweeted

Sanidhya Vijayvargiya @sanidhya903

about 1 month ago

1/ Humans often can’t state exactly what they want, making things hard for AI agents. Obvious fix: ask clarifying questions. But which ones? We studied this empirically with coding agents. Effective clarification comes down to two properties: answerability and task relevance.

sanidhya903's tweet photo. 1/ Humans often can’t state exactly what they want, making things hard for AI agents. Obvious fix: ask clarifying questions. But which ones?

We studied this empirically with coding agents. Effective clarification comes down to two properties: answerability and task relevance. https://t.co/VfuofFoS0Y

1

28

5

17

11K

dan_fried retweeted

Vijay V. @vijaytarian

about 1 month ago

We trained an 8B model to help coding agents ask users clarifying questions, matching GPT-5 while asking far fewer Q's! We show a concrete playbook for RL in human-AI interaction: use data analysis to find what drives good interactions, then encode it as a structured reward ⬇️🧵

0

15

5

7

2K

Daniel Fried

@dan_fried

about 1 month ago

Paper: https://t.co/pXHGcl922h Work with @uilydna , @GhateKshitish , @MonaDiab77, @dan_fried , @Dr_Atoosa , @maxhkw

0

1

0

171

Daniel Fried

@dan_fried

about 1 month ago

This morning (Fri) at #ICLR2026, check out Andy's work on ConflictScope: determining how an LLM prioritizes between a set of user-provided values, by generating scenarios where the values are in conflict. P4-#4105

Andy Liu (➡️ sf) @uilydna

about 2 months ago

I'll be in Rio this week for #ICLR2026 to present "Generative Value Conflicts Reveal LLM Priorities" (Friday morning, P4-#4105). Happy to chat anything related to LLM alignment, human-AI interaction, or multi-agent systems - feel free to DM if interested!

uilydna's tweet photo. I'll be in Rio this week for #ICLR2026 to present "Generative Value Conflicts Reveal LLM Priorities" (Friday morning, P4-#4105). Happy to chat anything related to LLM alignment, human-AI interaction, or multi-agent systems - feel free to DM if interested! https://t.co/u29CdKVyeM

0

27

3

11

3K

1

14

4

3

2K

Daniel Fried

@dan_fried

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users