Yujia Qin

@TsingYoga

ByteDance Seed, Agent, Previously Tsinghua Univ.

Beijing

Joined February 2019

343 Following

5.6K Followers

366 Posts

Pinned Tweet

Yujia Qin @TsingYoga

4 months ago

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. https://t.co/ghA2d8uvLy

TsingYoga's tweet photo. Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation
Right now it's only available in CN now, and will soon be ready globally.

https://t.co/ghA2d8uvLy https://t.co/Tuq9cWZ3w1

8

185

24

29

14K

Yujia Qin @TsingYoga

4 months ago

@turingbook 过年前就做好了哈哈

0

1

0

0

253

Yujia Qin @TsingYoga

4 months ago

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. https://t.co/ghA2d8uvLy

TsingYoga's tweet photo. Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation
Right now it's only available in CN now, and will soon be ready globally.

https://t.co/ghA2d8uvLy https://t.co/Tuq9cWZ3w1

8

185

24

29

14K

Yujia Qin @TsingYoga

4 months ago

@Oli82817545 We did not add GUI data in post-training to avoid potential abuse of this capability. We are still exploring better ways to bring GUI capabilities to users.

0

3

0

1

547

Who to follow

Verified account

Researcher of AI. Assistant Professor @Tsinghua_Uni. Working on scalable methods of language and physical models.

Ningyu Zhang@ZJU

Verified account

Associate Professor @ZJU_China. Research interests include NLP, LLM, KG, Agent, Knowledge Editing.

Verified account

Building intelligence @xAI. Grok-2🍍, 3🍫, 4🫐, Video Gen🪄. PhD from UIUC CS.

Yujia Qin @TsingYoga

6 months ago

This is the only meaningful benchmark

6 months ago

Doubao becomes 1st Chinese AI app to reach 100m DAU (only counting China). Volcano Engine recently reported that Doubao LLM token consumption has grown to 50T/day (3x May figures), so popularity of its text, image & video models are all very popular.

tphuang's tweet photo. Doubao becomes 1st Chinese AI app to reach 100m DAU (only counting China).

Volcano Engine recently reported that Doubao LLM token consumption has grown to 50T/day (3x May figures), so popularity of its text, image & video models are all very popular. https://t.co/zlFhTw2wlH

3

67

4

12

29K

0

40

5

6

8K

Yujia Qin @TsingYoga

6 months ago

Also, unlike most US/CN language models, Seed1.8 is trained without incorporating any distillation data from external sources

Yujia Qin @TsingYoga

6 months ago

Proud to introduce Seed1.8, our latest generalized agent model The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy! https://t.co/VZAdh5n5fP

TsingYoga's tweet photo. Proud to introduce Seed1.8, our latest generalized agent model

The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy!

https://t.co/VZAdh5n5fP https://t.co/32igSYRwXU

9

245

33

79

45K

6

110

6

31

17K

Yujia Qin @TsingYoga

6 months ago

@nikivdev The model is closed-sourced

0

0

0

0

682

Yujia Qin @TsingYoga

6 months ago

Proud to introduce Seed1.8, our latest generalized agent model The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy! https://t.co/VZAdh5n5fP

TsingYoga's tweet photo. Proud to introduce Seed1.8, our latest generalized agent model

The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy!

https://t.co/VZAdh5n5fP https://t.co/32igSYRwXU

9

245

33

79

45K

Yujia Qin @TsingYoga

6 months ago

@_TobiasLee 🥰

0

0

0

0

1K

Yujia Qin @TsingYoga

6 months ago

@TaylorOgan ⚡️⚡️⚡️

0

0

0

0

32

Yujia Qin @TsingYoga

6 months ago

🫡 Soon it will be super fast⏩

6 months ago

I just told my phone: "Play wordle and win." It opened the app for the first time, read the rules, guessed a word, waited for an ad to play, and got the word in three attempts. Insane!

6

150

19

39

49K

1

17

0

3

3K

Yujia Qin @TsingYoga

6 months ago

@IronRedSandHive buy the phone, and try it~ The utility is awesome

0

1

0

0

103

Yujia Qin @TsingYoga

6 months ago

See what Doubao Agent (backed up by UI-TARS) can do! No need to report benchmarks, usage is our best evaluation!

6 months ago

Another DeepSeek moment. This is the world’s first actual smart phone. It’s an engineering prototype of ZTE’s Nubia M153 running ByteDance’s Doubao AI agent fused into Android at the OS level. It has complete control over the phone. It can see the UI, choose/download apps, tap/type, call, and run multi-step task chains. Here I just say (in English) “find someone to wait in line for me” (something you can do in China), and it picks which app to open, configures the job, and hands me one confirm screen. I wouldn’t otherwise know how to do this, and here the phone just did it in a matter of seconds.

146

5K

653

2K

935K

4

38

7

8

6K

Yujia Qin @TsingYoga

7 months ago

Let's play Genshin step by step

Weihao Tan @WeihaoTan64

7 months ago

🚀Introducing Lumine, a generalist AI agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions and following diverse instructions within complex 3D open-world environments.🎮 Website: https://t.co/UxSwNKGZml 1/6

31

909

147

501

196K

1

11

0

3

3K

Yujia Qin @TsingYoga

7 months ago

OSWorld remains one of the best open-source evals Real-world envs allow far more reward hacking, e.g., Claude 4.5 Sonnet often uses the terminal to solve GUI tasks instead of sticking to pure GUI interactions But in 2025, if you trust benchmarks, you will have a tough time

@EpochAIResearch

7 months ago

We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!

EpochAIResearch's tweet photo. We looked at OSWorld, a popular evaluation of AI computer use capabilities.

Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.

See thread for details! https://t.co/P9264WqX60

9

161

12

45

44K

0

22

1

4

3K

Yujia Qin @TsingYoga

7 months ago

A blog from @ycjcl detailing the AIO sandbox https://t.co/GOTq3yRlQq

Yujia Qin @TsingYoga

9 months ago

The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳 https://t.co/Lzl2dPqpQ7 https://t.co/p4LIFMp9Hr

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

10

229

39

182

21K

0

16

1

5

2K

Yujia Qin @TsingYoga

7 months ago

Check out Game-TARS, a generalized multimodal game agent. It's literally the best general game AI in the world, and it's very small~ Paper: https://t.co/WR8858XtfA Blog: https://t.co/AbJ6Q2Qw1q

Zihao Wang @RealZihaoWang

7 months ago

🚀 Thrilled to introduce Game-TARS: our next-gen generalist multimodal game agent! Tired of AI that needs custom code for every new game? Game-TARS is a single VLM that learns to master any game just like a human: by watching the screen and using a keyboard & mouse. Read more.

6

61

11

22

17K

3

109

13

43

15K

Yujia Qin @TsingYoga

8 months ago

RL for pretraining 🚫 RL as continual pretraining 🫡

Rishabh Agarwal

8 months ago

*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable (e.g., we extrapolated to 3x compute for a 17Bx16 MoE from 16k GPU Hours to 50k hours). The other is when comparing algorithms, embrace the bitter lesson (try to predict how well it would scale with compute using a given performance curve, instead of just performance at a fixed compute). Most algorithmic tricks in a scalable RL method don't change the asymptote performance, but things like model size, context length, batch size, and data does. There are of course many design choices in RL, so we don’t think that the ScaleRL recipe is the end of the story.

19

813

70

668

235K

0

24

0

7

4K

TsingYoga retweeted

8 months ago

🚨Variational Reasoning for Language Models🚨 We show how treating thinking traces as latent variables unlocks a principled, stable, and unified framework for training reasoning LLMs.

TianyuPang1's tweet photo. 🚨Variational Reasoning for Language Models🚨

We show how treating thinking traces as latent variables unlocks a principled, stable, and unified framework for training reasoning LLMs.

8

364

77

278

25K

TsingYoga retweeted

8 months ago

🚀LLMs can learn directly from verbal feedback — no scalar rewards needed! 😥Scalar rewards compress rich feedback— “redundant but correct” vs “concise but typo-ridden” might both be 0.8 💡We propose to learn Feedback-Conditional Policy (FCP), an extremely scalable paradigm!

TianyuPang1's tweet photo. 🚀LLMs can learn directly from verbal feedback — no scalar rewards needed!

😥Scalar rewards compress rich feedback— “redundant but correct” vs “concise but typo-ridden” might both be 0.8

💡We propose to learn Feedback-Conditional Policy (FCP), an extremely scalable paradigm!

15

468

90

438

80K

Yujia Qin @TsingYoga

9 months ago

@jwyu10 Will release later

0

0

0

0

26

Yujia Qin @TsingYoga

9 months ago

The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳 https://t.co/Lzl2dPqpQ7 https://t.co/p4LIFMp9Hr

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

TsingYoga's tweet photo. The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

https://t.co/Lzl2dPqpQ7
https://t.co/p4LIFMp9Hr https://t.co/Nh7kk3MWTj

10

229

39

182

21K

Last Seen Users on Sotwe

Trends for you

Most Popular Users