Xiangyan Liu @dobogiyy - Twitter Profile

dobogiyy retweeted

3 days ago

Our latest deliveries: 👨‍💻 UniRL, an RL infra for unified multimodal models ➕ Two new RL algorithms: �� FlowDPPO for diffusion and flow-matching models • DRPO for LLMs and VLMs Check out our repo: https://t.co/fIQzFNOxD0

4

145

28

71

16K

dobogiyy retweeted

Ziwei Liu

@liuziwei7

17 days ago

🔥LLaVA-OneVision-2.0 Open Sourced🔥 LLaVA-OneVision series @lmmslab now upgrades to 2.0 with its key advance on *codec-stream tokenization*, which treats highly dynamic video as a continuous bit-cost stream - Tech Report: https://t.co/pFo2fGYj2M - Code: https://t.co/JvRzu96rJ1

liuziwei7's tweet photo. 🔥LLaVA-OneVision-2.0 Open Sourced🔥

LLaVA-OneVision series @lmmslab now upgrades to 2.0 with its key advance on *codec-stream tokenization*, which treats highly dynamic video as a continuous bit-cost stream

- Tech Report: https://t.co/pFo2fGYj2M
- Code: https://t.co/JvRzu96rJ1 https://t.co/d1BgKzQo8I

3

236

45

101

19K

dobogiyy retweeted

Hardy Chen @HardyChen266091

about 2 months ago

1/n https://t.co/Mlhaclg0dX What happens when you push AI agents *too hard* to improve a score? Instead of getting better, they may find shortcuts to *game the metric* 🧠➡️🎯 As we rely more on automated evals, this can quietly creep in—good score, but weaker real performance⚠️

HardyChen266091's tweet photo. 1/n
https://t.co/Mlhaclg0dX
What happens when you push AI agents *too hard* to improve a score?

Instead of getting better, they may find shortcuts to *game the metric* 🧠➡️🎯

As we rely more on automated evals, this can quietly creep in—good score, but weaker real performance⚠️ https://t.co/tCec0e35K0

1

14

7

4

10K

dobogiyy retweeted

Evolvent AI

@Evolvent_AI

about 2 months ago

Launch Week — Day 1: ClawMark Most agent benchmarks give the model one shot, one prompt, one frozen environment. Real coworker tasks span multiple days — and the world keeps changing while the agent works. Introducing 🦞ClawMark: a multi-day, dynamic-environment benchmark for coworker agents. Built by Evolvent together with 40+ researchers from NUS, HKU, MIT, UW, and UC Berkeley. Open-sourced at: https://t.co/QN7XgIoaN1 100 tasks. 13 professional domains. Fully rule-based scoring. Results from 6 frontier models below. 🧵👇

6

55

11

22

17K

dobogiyy retweeted

Fanqing Meng

@FanqingMengAI

3 months ago

Text agents have their Gym. Vision agents? Not until now. Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains. One API to rule them all: 📦 Offline dataset 🤖 Agentic RL training 🔧 Tool-use training 👥 Multi-agent training 📊 VLM & T2I model evaluation All under the same reset/step interface. Key findings: 1. Observation scaffolding matters MORE than RL algorithm choice 2. Broad curricula transfer well; narrow training causes negative transfer 3. Multi-turn interaction amplifies everything 📄 Paper: https://t.co/WP4AgCuqSW 💻 Code: https://t.co/QSFRNzPWVb Open the thread for a deep dive! 🧵

FanqingMengAI's tweet photo. Text agents have their Gym. Vision agents? Not until now.

Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains.

One API to rule them all:
📦 Offline dataset
🤖 Agentic RL training
🔧 Tool-use training
👥 Multi-agent training
📊 VLM & T2I model evaluation

All under the same reset/step interface.

Key findings:
1. Observation scaffolding matters MORE than RL algorithm choice
2. Broad curricula transfer well; narrow training causes negative transfer
3. Multi-turn interaction amplifies everything

📄 Paper: https://t.co/WP4AgCuqSW
💻 Code: https://t.co/QSFRNzPWVb

Open the thread for a deep dive! 🧵

8

109

17

72

10K

Xiangyan Liu @dobogiyy

3 months ago

@XinleiWang220 @EvanCrypto17 @ResearchWang 因为从技术上说，openclaw就是claude code的子集啊

1

0

50

dobogiyy retweeted

Fanqing Meng

@FanqingMengAI

4 months ago

I am so confused that some says research and engineer separately To be a Good Engineer , Then learn to become Researcher

1

17

1

5K

dobogiyy retweeted

Yacine Mahdid

@yacinelearning

4 months ago

finally got done editing this awesome interview with @zzlccc first author of the Dr. GRPO paper in it we discuss: - llm post-training weirdness - is self reflexion even real - the absolute state of GRPO - simplicity in algorithmic design highly recommend for my RL-heads!

yacinelearning's tweet photo. finally got done editing this awesome interview with @zzlccc first author of the Dr. GRPO paper

in it we discuss:
- llm post-training weirdness
- is self reflexion even real
- the absolute state of GRPO
- simplicity in algorithmic design

highly recommend for my RL-heads! https://t.co/SWSQxcXwn5

14

687

78

530

43K

dobogiyy retweeted

Renjie @Renjie_Ranger

4 months ago

🔥Congrats to the SDPO authors @jonashuebotter @FrederikeLubeck — really enjoyed the paper, and I appreciate the discussion + citation of our work “Language Models Can Learn from Verbal Feedback Without Scalar Rewards.” 🔍Complementary angle: SDPO uses feedback-conditioned self-teacher for on-policy distillation → dense credit assignment (feedback-as-state). We study Feedback-Conditional Policy (FCP): learn directly from (response, verbal feedback) pairs via MLE (feedback-as-goal) — super scalable and competitive with GRPO! 🚀 💡Our Motivation: Language priors are compositional: Text-to-Image models can generate rare concepts like “a banana surfing on the ocean” 🏄‍♂️📷 because language priors let them combine and compose elements from mixed prompts seen during training. 📑 Paper: https://t.co/e8L3RfVWSS 💻 Code: https://t.co/BS5xZ7UGqy

Renjie_Ranger's tweet photo. 🔥Congrats to the SDPO authors @jonashuebotter @FrederikeLubeck — really enjoyed the paper, and I appreciate the discussion + citation of our work “Language Models Can Learn from Verbal Feedback Without Scalar Rewards.”

🔍Complementary angle: SDPO uses feedback-conditioned self-teacher for on-policy distillation → dense credit assignment (feedback-as-state).

We study Feedback-Conditional Policy (FCP): learn directly from (response, verbal feedback) pairs via MLE (feedback-as-goal) — super scalable and competitive with GRPO! 🚀

💡Our Motivation: Language priors are compositional: Text-to-Image models can generate rare concepts like “a banana surfing on the ocean” 🏄‍♂️📷 because language priors let them combine and compose elements from mixed prompts seen during training.

📑 Paper: https://t.co/e8L3RfVWSS
💻 Code: https://t.co/BS5xZ7UGqy

4

29

8

17

8K

Xiangyan Liu @dobogiyy

5 months ago

nice idea and design

LobeHub @lobehub

5 months ago

Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.

82

318

66

238

185K

0

1

0

141

dobogiyy retweeted

LobeHub @lobehub

5 months ago

Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.

82

318

66

238

185K

dobogiyy retweeted

Longxu Dou

@LongxuDou

6 months ago

🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize. ✨What Makes Reptile Special? Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons: ⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training. 💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol. Github: https://t.co/AmrCJWA0Ls Homepage: https://t.co/kK73JkQoi0

LongxuDou's tweet photo. 🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize.

✨What Makes Reptile Special?
Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons:
⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training.
💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol.

Github: https://t.co/AmrCJWA0Ls
Homepage: https://t.co/kK73JkQoi0

4

22

19

1

3K

Xiangyan Liu @dobogiyy

6 months ago

@FanqingMengAI you enjoying?

0

6

0

2K

dobogiyy retweeted

Fanqing Meng

@FanqingMengAI

6 months ago

I think it is not new.... In dpsk 3.2, they use expert rl -> joint sft -> joint rl in it. In longcat, they use expert rl -> model mr -> joint rl in it. mimo replace sft with opd. everyone know opd is better than sft :)

4

134

4

73

72K

dobogiyy retweeted

Xiang Yue @xiangyue96

6 months ago

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

xiangyue96's tweet photo. There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

28

1K

242

1K

327K

Xiangyan Liu @dobogiyy

7 months ago

time to embrace DLMs🤓

Jinjie Ni

@NiJinjie

7 months ago

1/3 🚬 Ready to smell your GPUs burning? Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU. Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear scaling. https://t.co/B1cUF37nWY You can train arbitrarily large models without compromising throughput. It was also the training backend for Super Data Learners, Quokka, and OpenMoE 2. We open-sourced more good stuff; see the next thread👇

NiJinjie's tweet photo. 1/3

🚬 Ready to smell your GPUs burning?

Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU.

Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear scaling.

https://t.co/B1cUF37nWY

You can train arbitrarily large models without compromising throughput.

It was also the training backend for Super Data Learners, Quokka, and OpenMoE 2.

We open-sourced more good stuff; see the next thread👇

5

150

47

84

13K

0

1

0

152

dobogiyy retweeted

Jiawei Gu

@Kuvvius

7 months ago

🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 https://t.co/2GPHnsPq7R (1/16)

Kuvvius's tweet photo. 🚨Sensational title alert: we may have cracked the code to true multimodal reasoning.
Meet ThinkMorph — thinking in modalities, not just with them.
And what we found was... unexpected. 👀
Emergent intelligence, strong gains, and …🫣
🧵 https://t.co/2GPHnsPq7R
(1/16) https://t.co/jnTl4CzwsA

27

316

67

253

69K

dobogiyy retweeted

Zichen Liu

@zzlccc

8 months ago

Nothing feels more exciting than writing a thesis proposal on RL for LLMs before 2025 ends!! Covering a subset of my first-author works done in the past 1.5 years (after switching from traditional RL to LLM RL…) Tentative title, of course

zzlccc's tweet photo. Nothing feels more exciting than writing a thesis proposal on RL for LLMs before 2025 ends!!
Covering a subset of my first-author works done in the past 1.5 years (after switching from traditional RL to LLM RL…)
Tentative title, of course https://t.co/Pvbd0BE4vG

16

503

61

287

61K

dobogiyy retweeted

Brian Li

@Brian_Bo_Li

8 months ago

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full support for parallelism and optimization. Not just to hack⚙️, but also to scale🚀. So finally we built it for ourselves. https://t.co/3Jtd1fU4oK LMMs-Engine: a lean, efficient framework built to train unified multimodal model at scale. From Qwen LLM, VLM, LLaVA-OV, and WanVideo, to unified models like Qwen-Omni and BAGEL — plus Linear-Attn GDN and research prototypes like RAE and SiT - all under one modular system that seamlessly integrates diverse datasets and optimization strategies. Powered by FSDP2 multi-dim parallelism, Ulysses sequence parallel, Flash-Attention, Liger Kernels, and Native Sparse Attention (also with bonus support for the Muon optimizer for all models).

9

111

34

55K

Xiangyan Liu

@dobogiyy

Last Seen Users on Sotwe

Trends for you

Most Popular Users