Xiachong Feng @xc_feng - Twitter Profile

Xiachong Feng @xc_feng

7 days ago

@XiangruTang @MarkGerstein 🚀🚀🚀

0

28

xc_feng retweeted

HKU Centre for AI, Management and Organization @camo_hku

3 months ago

🦞 Calling all #OpenClaw builders, tinkerers, & #AI makers! Meet others pushing their lobsters beyond prototypes — sharing the workflows, wins, and hard lessons of building real AI-driven value 🦾 🗓 Mar 28, 2–5PM | HKU iCube 💡 Free reg: https://t.co/71mxIB4gyE #HKUCAMO

camo_hku's tweet photo. 🦞 Calling all #OpenClaw builders, tinkerers, & #AI makers!

Meet others pushing their lobsters beyond prototypes — sharing the workflows, wins, and hard lessons of building real AI-driven value 🦾

🗓 Mar 28, 2–5PM | HKU iCube

💡 Free reg: https://t.co/71mxIB4gyE

#HKUCAMO https://t.co/o7V7fyF4aD

0

5

1

0

397

Xiachong Feng @xc_feng

4 months ago

Train on tokens, infer on raw bytes — no tokenizer, no architecture changes. Proxy Compression shows byte-level LMs can match or beat tokenizer baselines at 7B/14B scale. One step closer to a tokenizer-free future.

Lin Zheng @linzhengisme

4 months ago

Introducing proxy compression for end-to-end language modeling: train on compressed (e.g., tokenized) data for efficiency, but run inference entirely on raw bytes without a tokenizer. No architectural changes required. At scale, proxy-trained byte models match or surpass tokenizer baselines at 7B and 14B. 📄 Paper: https://t.co/4NGVagTocP 💻 Code: https://t.co/tPcbReJ915 [1/9] 🧵👇

linzhengisme's tweet photo. Introducing proxy compression for end-to-end language modeling: train on compressed (e.g., tokenized) data for efficiency, but run inference entirely on raw bytes without a tokenizer. No architectural changes required. At scale, proxy-trained byte models match or surpass tokenizer baselines at 7B and 14B.

📄 Paper: https://t.co/4NGVagTocP
💻 Code: https://t.co/tPcbReJ915

[1/9]
🧵👇

2

99

16

61

21K

5

121

11

60

13K

xc_feng retweeted

Discrete Diffusion Reading Group

@diffusion_llms

6 months ago

📢Dec 22 (Mon): Diffusion Beats AR: Code Generation Discrete diffusion models now rival autoregressive (AR) models on challenging coding benchmarks, making them a compelling alternative to AR models. This Monday, Shansan Gong (@sansa19739319) will present recipes for training masked diffusion models to reach such coding performance, and will reveal several surprising inference-time behaviors of these models. Paper: https://t.co/r9SBOKdgAh

diffusion_llms's tweet photo. 📢Dec 22 (Mon): Diffusion Beats AR: Code Generation

Discrete diffusion models now rival autoregressive (AR) models on challenging coding benchmarks, making them a compelling alternative to AR models.

This Monday, Shansan Gong (@sansa19739319) will present recipes for training masked diffusion models to reach such coding performance, and will reveal several surprising inference-time behaviors of these models.

Paper: https://t.co/r9SBOKdgAh

2

104

14

47

21K

Who to follow

Ning Ding

@stingning

Researcher of AI. Assistant Professor @Tsinghua_Uni. Working on scalable methods of language and physical models.

Zhuosheng Zhang

@zhangzhuosheng

Assistant Professor at @sjtu1896. NLP/AI/ML. Formerly @AmazonScience @MSFTResearch @NICT_Publicity @sinovationvc @IBM #NLProc

Tianyu Gao

@gaotianyu1350

@Meta MSL TBD lab and incoming assistant prof. @UCSanDiego. Prev @OpenAI @Princeton @Tsinghua_Uni

Xiachong Feng @xc_feng

6 months ago

Reptile introduces a promising approach to CLI agents by integrating human-in-the-loop feedback directly into the terminal workflow for future model training.🚀🚀🚀

Longxu Dou

@LongxuDou

6 months ago

🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize. ✨What Makes Reptile Special? Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons: ⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training. 💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol. Github: https://t.co/AmrCJWA0Ls Homepage: https://t.co/kK73JkQoi0

LongxuDou's tweet photo. 🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize.

✨What Makes Reptile Special?
Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons:
⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training.
💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol.

Github: https://t.co/AmrCJWA0Ls
Homepage: https://t.co/kK73JkQoi0

4

22

19

1

3K

0

2

0

213

xc_feng retweeted

Longxu Dou

@LongxuDou

6 months ago

🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize. ✨What Makes Reptile Special? Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons: ⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training. 💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol. Github: https://t.co/AmrCJWA0Ls Homepage: https://t.co/kK73JkQoi0

4

22

19

1

3K

xc_feng retweeted

HKUNLP @hkunlp2020

8 months ago

We will have a guest talk from Cai Zhou. He is a second-year PhD in MIT EECS. "Continuous modeling in diffusion language models: HDLM and CCDD ". All are welcome to join via the following link. https://t.co/ZlLDO5pKRH

hkunlp2020's tweet photo. We will have a guest talk from Cai Zhou. He is a second-year PhD in MIT EECS. "Continuous modeling in diffusion language models: HDLM and CCDD
". All are welcome to join via the following link.
https://t.co/ZlLDO5pKRH https://t.co/2u1ntwiKz1

0

16

6

4

4K

xc_feng retweeted

Lingpeng Kong @ikekong

11 months ago

What happend after Dream 7B? First, Dream-Coder 7B: A fully open diffusion LLM for code delivering strong performance, trained exclusively on public data. Plus, DreamOn cracks the variable-length generation problem! It enables code infilling that goes beyond a fixed canvas.

1

72

35

20

8K

xc_feng retweeted

Jiacheng Ye @JiachengYe15

11 months ago

Check out our blogs for more details. Dream-Coder 7B: https://t.co/kUggrHEyow DreamOn: https://t.co/1VoVwTRDHR

0

4

1

3

647

Xiachong Feng @xc_feng

12 months ago

@persdre bingo, thats why self-motived becomes the top-first preference for admitted students in many professors' website.

0

1

0

69

xc_feng retweeted

Min-Yen Kan @knmnyn

12 months ago

Jane Austen meets AI and Physiology? Don't know what I mean?🤔 You got 20m, I'll tell you🤫😀 Keynote:🌟Pride and Prejudice: AI meets Physiology Education🌟 Video https://t.co/SqCDq7wIKe Slides https://t.co/4lIGOrixZ3 #NLProc @wing_nus @NUSComputing @nusaiinstitute

0

6

3

1

857

Xiachong Feng @xc_feng

12 months ago

Wow, a 4B model with a fully open recipe just beat Claude-4 Opus in reasoning—check out Polaris.

Chenxin An @AnChancy46881

12 months ago

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels (65 → 79 on AIME25) through RL training on open-source data and academic-level resources. 📑Notion: https://t.co/k5ITJFzCe1 📗Blog post: https://t.co/Leth9PWSod 🤗Model & data: https://t.co/SVdfIwYTrU 💻Code: https://t.co/txg0qcywWi

AnChancy46881's tweet photo. # 🚨 4B open-recipe model beats Claude-4-Opus
🔓 100% open data, recipe, model weights and code.

Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models.

🥳 Check out how we boost open-recipe reasoning models to incredible performance levels (65 → 79 on AIME25) through RL training on open-source data and academic-level resources.

📑Notion: https://t.co/k5ITJFzCe1
📗Blog post: https://t.co/Leth9PWSod
🤗Model & data: https://t.co/SVdfIwYTrU
💻Code: https://t.co/txg0qcywWi

24

444

82

388

100K

0

2

0

459

Xiachong Feng @xc_feng

about 1 year ago

🚀 Incredible work! PromptCoT-Mamba sets a new bar: the first constant-memory reasoning model outperforming Transformers on tough math & code benchmarks. No attention, no KV cache — just pure efficient decoding.

Xueliang Zhao @xlzhao_hku

about 1 year ago

🔥 Meet PromptCoT-Mamba The first reasoning model with constant-memory inference to beat Transformers on competition-level math & code ⚡ Efficient decoding: no attention, no KV cache ⚡ +16.0% / +7.1% / +16.6% vs. s1.1-7B on AIME 24 / 25 / LiveCodeBench 🚀 Up to 3.66× faster

xlzhao_hku's tweet photo. 🔥 Meet PromptCoT-Mamba

The first reasoning model with constant-memory inference to beat Transformers on competition-level math & code

⚡ Efficient decoding: no attention, no KV cache

⚡ +16.0% / +7.1% / +16.6% vs. s1.1-7B on AIME 24 / 25 / LiveCodeBench

🚀 Up to 3.66× faster https://t.co/zHdk9iFxWr

2

29

15

6

3K

0

5

0

2

379

xc_feng retweeted

Xueliang Zhao @xlzhao_hku

about 1 year ago

🔥 Meet PromptCoT-Mamba The first reasoning model with constant-memory inference to beat Transformers on competition-level math & code ⚡ Efficient decoding: no attention, no KV cache ⚡ +16.0% / +7.1% / +16.6% vs. s1.1-7B on AIME 24 / 25 / LiveCodeBench 🚀 Up to 3.66× faster

2

29

15

6

3K

Xiachong Feng @xc_feng

about 1 year ago

🚀🚀🚀

Qiushi Sun

@qiushi_sun

about 1 year ago

🎉Introducing our latest work: "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows" 🤗 Huggingface: https://t.co/56CJS6Qzg9 🏠Homepage: https://t.co/jU6mHlFIoU TLDR: We introduce ScienceBoard, featuring (1) a dynamic OS env with real scientific software (CLI + GUI), and (2) a human-validated benchmark spanning domains like biochem, astronomy, GIS, ATP, and more. 🧵[1/5]

qiushi_sun's tweet photo. 🎉Introducing our latest work: "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"

🤗 Huggingface: https://t.co/56CJS6Qzg9
🏠Homepage: https://t.co/jU6mHlFIoU

TLDR: We introduce ScienceBoard, featuring (1) a dynamic OS env with real scientific software (CLI + GUI), and (2) a human-validated benchmark spanning domains like biochem, astronomy, GIS, ATP, and more.

🧵[1/5]

3

63

19

22

11K

0

5

1

522

xc_feng retweeted

chang ma

@ma_chang_nlp

about 1 year ago

We are kicking off a series of seminars at @hkunlp2020. @siyan_zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: https://t.co/i9FsWYRNbZ

ma_chang_nlp's tweet photo. We are kicking off a series of seminars at @hkunlp2020. @siyan_zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: https://t.co/i9FsWYRNbZ https://t.co/XpZsu9NBDH

0

38

16

3

3K

xc_feng retweeted

Songlin Yang

@SonglinYang4

about 1 year ago

Evabyte presented at today’s ASAP seminar! Recordings: https://t.co/JVtBdBMOtm Slides: https://t.co/7fNBFTGIoB

1

50

11

17

11K

Xiachong Feng @xc_feng

about 1 year ago

🚀🚀🚀

Xueliang Zhao @xlzhao_hku

about 1 year ago

🚀 Meet PromptCoT-QwQ-32B, a breakthrough in mathematical reasoning! Outperforming all open-source models on AIME2024 and AIME2025, including Nemotron-Ultra-253B, DeepSeek-R1-671B, and QwQ-32B! 🔥

xlzhao_hku's tweet photo. 🚀 Meet PromptCoT-QwQ-32B, a breakthrough in mathematical reasoning! Outperforming all open-source models on AIME2024 and AIME2025, including Nemotron-Ultra-253B, DeepSeek-R1-671B, and QwQ-32B! 🔥 https://t.co/eNIS9Yv5zm

1

28

11

8

3K

0

5

0

342

xc_feng retweeted

Jiacheng Ye @JiachengYe15

about 1 year ago

🚀Excited to announce Dream 7B (Diffusion reasoning model): the most powerful open diffusion large language model to date.

JiachengYe15's tweet photo. 🚀Excited to announce Dream 7B (Diffusion reasoning model): the most powerful open diffusion large language model to date. https://t.co/8VFKnckcRg

49

1K

203

751

266K

xc_feng retweeted

Jiacheng Ye @JiachengYe15

over 1 year ago

🤔 Always wondering if a next-token prediction model is the end of planning and reasoning. 🎯 Now excited to announce our team's latest research on exploring a new paradigm to enhance the planning ability of LLMs with DiffuSearch. 🧵1/7

1

20

4

5

2K

Xiachong Feng

@xc_feng

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users