Hui Chen✈️ICML 2026🇰🇷

@chchenhui

🏝️Researcher @NUSComputing @NUSingapore; AI for scientific research.

Singapore

Joined June 2017

631 Following

176 Followers

131 Posts

Pinned Tweet

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

about 2 months ago

To what extent do AI-generated papers contain fabrications? 🚀Excited to introduce FabScore for fine-grained evaluation of fabrications in automated AI research. 🧵 We evaluate 144 AI-written papers from multiple sources, including @SakanaAILabs 's AI Scientist, MLR-Bench, @AnalemmaAI 's FARS and the 2025 #Agents4Science Open Conference. Among 54 real conference submissions, we find that approximately 70% contain at least one fabrication; even among accepted papers, the rate remains as high as 59.3%. 📰 Paper: https://t.co/egFQ33fglo 💻 Code: https://t.co/BPpZm6YY24 1/

chchenhui's tweet photo. To what extent do AI-generated papers contain fabrications?

🚀Excited to introduce FabScore for fine-grained evaluation of fabrications in automated AI research. 🧵

We evaluate 144 AI-written papers from multiple sources, including @SakanaAILabs 's AI Scientist, MLR-Bench, @AnalemmaAI 's FARS and the 2025 #Agents4Science Open Conference.

Among 54 real conference submissions, we find that approximately 70% contain at least one fabrication; even among accepted papers, the rate remains as high as 59.3%.

📰 Paper: https://t.co/egFQ33fglo
💻 Code: https://t.co/BPpZm6YY24

1/

111

21K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

4 days ago

@MangQiuyang Congrats!!

chchenhui retweeted

Engram

@EngramLab

6 days ago

https://t.co/CGIef5lIBI

160

210

chchenhui retweeted

James Zhao @xu_Zhao0

7 days ago

🔍How should we scale test-time compute for agentic search? We propose FineVerify: decompose questions into checkable sub-questions, verify each candidate with evidence, and select the best-supported answer. Check our work at: https://t.co/eIt5UW6V7I (1/N)

xu_Zhao0's tweet photo. 🔍How should we scale test-time compute for agentic search?

We propose FineVerify: decompose questions into checkable sub-questions, verify each candidate with evidence, and select the best-supported answer.

Check our work at: https://t.co/eIt5UW6V7I
(1/N) https://t.co/4y7etU39kq

629

Who to follow

Ningyu Zhang@ZJU

@zxlzr

Associate Professor @ZJU_China. Research interests include NLP, LLM, KG, Agent, Knowledge Editing.

Zhuosheng Zhang

@zhangzhuosheng

Assistant Professor at @sjtu1896. NLP/AI/ML. Formerly @AmazonScience @MSFTResearch @NICT_Publicity @sinovationvc @IBM #NLProc

Liangming Pan

@PanLiangming

Assistant Professor, Peking University (@PKU1898) | Former AP @UofAInfoSci | Postdoc @ucsbNLP | Ph.D. @NUSingapore | Researcher in NLP, LLMs & Reasoning

chchenhui retweeted

Alisa Liu @alisawuffles

8 days ago

I'm joining OpenAI next week!🥹 The job search turned out to be really challenging but also super rewarding, so I wrote a small blog to share what I learned along the way and hopefully make the process a little less mysterious for the next person. https://t.co/6FigSBdenD

502

14K

19K

chchenhui retweeted

Xiuyu Li

@sheriyuo

29 days ago

https://t.co/YThLJJ2rjL

132

148

35K

chchenhui retweeted

Xuandong Zhao

@xuandongzhao

10 days ago

Happy to introduce our latest work — VIMPO: Value-Implicit Policy Optimization for LLMs Most RL methods for LLM training face a trade-off: · PPO-style methods use a value model (critic) for token-level credit, but critics are hard to train. · GRPO-style methods drop the critic, but give every token the same trajectory-level signal. Can we get the best of both worlds? 🧵

xuandongzhao's tweet photo. Happy to introduce our latest work — VIMPO: Value-Implicit Policy Optimization for LLMs

Most RL methods for LLM training face a trade-off:
· PPO-style methods use a value model (critic) for token-level credit, but critics are hard to train.
· GRPO-style methods drop the critic, but give every token the same trajectory-level signal.

Can we get the best of both worlds? 🧵

310

285

47K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

11 days ago

😲😲

Tony Chen @tonychenxyz

11 days ago

Today, agents execute isolated tasks. Tomorrow, agents will steer complex decisions across long horizons. Introducing CEO-Bench, a first step to measure "Steering Intelligence." In CEO-Bench, agents are asked to run a simulated startup for 500 days. https://t.co/9gNWFALTKK

tonychenxyz's tweet photo. Today, agents execute isolated tasks. Tomorrow, agents will steer complex decisions across long horizons.

Introducing CEO-Bench, a first step to measure "Steering Intelligence." In CEO-Bench, agents are asked to run a simulated startup for 500 days.

https://t.co/9gNWFALTKK https://t.co/xMn3jxEvVt

337

264

116K

chchenhui retweeted

Yuzhen Mao

@Mao_Yuzhen

20 days ago

What happens when multi-agent systems stop relying on a central “controller” agent? Can agents coordinate by sharing results directly with each other? Introducing Decentralized Language Models (DeLM): we let agents coordinate asynchronously through a shared context. Agents claim tasks from a queue and write back compact, verified results as they finish, making progress visible to all workers without requiring a main agent to merge, filter, and rebroadcast it. New paper with @azaliamirh!

297

290

85K

chchenhui retweeted

chang ma

@ma_chang_nlp

13 days ago

Excited to introduce 🌠Orion: Towards Lab Automation with Computer-Using Agents. Give it control of your lab computer💻, and it can use software, analyze any experiment images, browse databases on Chrome exactly like you, and work for hours to analyze your experiments. 🌎:https://t.co/5EAe8vEetl 📎:https://t.co/D08hYrkJuG

ma_chang_nlp's tweet photo. Excited to introduce 🌠Orion: Towards Lab Automation with Computer-Using Agents.

Give it control of your lab computer💻, and it can use software, analyze any experiment images, browse databases on Chrome exactly like you, and work for hours to analyze your experiments.

🌎:https://t.co/5EAe8vEetl
📎:https://t.co/D08hYrkJuG

111

36K

chchenhui retweeted

Noam Brown

@polynoamial

21 days ago

https://t.co/oWqzT12RtZ

411

990K

chchenhui retweeted

Shuo Ji @shuo_ji87616

22 days ago

[ICML'26 · arXiv:2606.06036] Your agent isn't forgetting — it just doesn't know how to recall. Most memory-augmented agents retrieve. Humans recall. We propose MRAgent: a Cue–Tag–Content graph memory system where agents follow cues, update state, and decide to explore or answer.

shuo_ji87616's tweet photo. [ICML'26 · arXiv:2606.06036] Your agent isn't forgetting — it just doesn't know how to recall.

Most memory-augmented agents retrieve.
Humans recall.
We propose MRAgent: a Cue–Tag–Content graph memory system where agents follow cues, update state, and decide to explore or answer. https://t.co/iH56NtmJ5h

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

18 days ago

Interesting🤣🤣…The subtitles are moving too fast!

Peter Henderson

@PeterHndrsn

19 days ago

My partner grew up playing the game Sword and Fairy 1 (仙劍一). Every model release cycle my vibe check is to have the model create a revamped version with a Star Trek mashup (my fav). Claude Fable is the first model to do a reasonable job! Playthrough 4x speed👇

chchenhui retweeted

Qiuyang Mang

@MangQiuyang

18 days ago

https://t.co/rl818g2dSV

12K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

21 days ago

“Build an automated AI researcher—an AI system that can accelerate and increasingly automate the research process itself, while remaining steerable, accountable, and connected to people.”

Sam Altman

@sama

21 days ago

Here is our current plan for OpenAI: https://t.co/r29FUUee3A

784

250

chchenhui retweeted

Sakana AI

@SakanaAILabs

24 days ago

Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀 https://t.co/AskX3J5oEJ Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI. While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality: ▪ LLM²: AI models automating research to invent better preference optimization algorithms. ▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance. ▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models. ▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning. ▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity. ▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature. Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve. Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI. We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good. We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.

SakanaAILabs's tweet photo. Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀

https://t.co/AskX3J5oEJ

Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI.

While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality:

▪ LLM²: AI models automating research to invent better preference optimization algorithms.
▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance.
▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models.
▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning.
▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity.
▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature.

Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve.

Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI.

We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good.

We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.

152

507

328K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

24 days ago

@liyzhen2 @NTUsg @NTU_ccds @ICComputing Congratulations!! Welcome to Singapore 🇸🇬

134

chchenhui retweeted

CLS

@ChengleiSi

24 days ago

so in the past few months, we've seen at least the following labs claiming to work on RSI: - @AnthropicAI (https://t.co/lZVXWiUHSM) - @OpenAI (https://t.co/oIPwcHeVKg) - @Recursive_SI (https://t.co/kT0eQK3G7Z) - Mirendil (https://t.co/yLDTaWgd3H) - @inherent_labs (https://t.co/EMdGFkzYHQ) - @SakanaAILabs (https://t.co/DToke5Udyj) No matter who's gonna make it happen first, this is gonna be an important year for humanity. Looking forward!

19K

chchenhui retweeted

Diyi Yang

@Diyi_Yang

25 days ago

We propose a new way to quantify AI overreliance: the Offloading Score 🧐 @vishakh_pk It measures the fraction of cognitive work you hand off to AI 🤖 via simulating how you'd have done each step without AI, then counting the steps the AI saved. It works directly from interaction traces (keystrokes, screenshots), so it's reusable across many tools!!

168

105

46K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

26 days ago

Interesting work that quantifies how much cognitive effort we offload to AI. A great step toward measuring AI reliance!

Vishakh Padmakumar

@vishakh_pk

27 days ago

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

$vishakh_pk's tweet photo. People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)$

213

100

78K

chchenhui retweeted

Hanna Hajishirzi

@HannaHajishirzi

27 days ago

MAI-Thinking-1 is out! Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra. Check out our tech report has the full story of our RL climbs. https://t.co/aLW40sWz4d

HannaHajishirzi's tweet photo. MAI-Thinking-1 is out!

Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra.

Check out our tech report has the full story of our RL climbs.
https://t.co/aLW40sWz4d

875

128

383

131K

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users