Haochen TAN @HC_Tan_ - Twitter Profile

HC_Tan_ retweeted

about 1 month ago

Excited to give the NLIP seminar and especially happy to be returning to Cambridge (virtually) four years later. Looking forward to the discussion with everyone at NLIP! For more info about the paper, kindly visit: https://t.co/bSMo5ZBWzs

0

15

1

975

HC_Tan_ retweeted

Qiushi Sun

@qiushi_sun

about 1 month ago

Just made a contributed talk on https://t.co/VxP25T0ZoY @ ICLR'26. 🇧🇷 If you are interested in mobile GUI agents' safety, please kindly check our work https://t.co/nrCCUen0dl #ICLR26

qiushi_sun's tweet photo. Just made a contributed talk on https://t.co/VxP25T0ZoY @ ICLR'26. 🇧🇷

If you are interested in mobile GUI agents' safety, please kindly check our work https://t.co/nrCCUen0dl

#ICLR26 https://t.co/5gcfYQApRN

0

21

3

1

827

Haochen TAN @HC_Tan_

3 months ago

The other way: create a forum post in Discord and a Claude Code session spawns on the device automatically. Discord becomes ticket system for remote Claude Code sessions.

0

33

Haochen TAN @HC_Tan_

3 months ago

Anthropic just released a Discord channel plugin for Claude Code. Loved the idea, so I took it further — built a toy plugin that maps each session to a separate Discord forum thread. Now I can code with Claude from my phone while grabbing coffee. https://t.co/7Sh1YdrHOG

1

0

90

Haochen TAN @HC_Tan_

3 months ago

Start claude code on your device → forum post appears in Discord. Chat in the thread, approve bash commands with emoji reactions (✅ yes / 🔓 allow all / ❌ no), even answer Claude's questions — all from Discord. Session ends → thread archives automatically.

1

0

68

HC_Tan_ retweeted

renjie pi @RenjiePi

3 months ago

Introducing Nemotron-Terminal: a systematic data engineering pipeline for scaling LLM Terminal Agents. We bridge the gap between open models and proprietary models with a fully open synthetic-to-real trajectory pipeline. 🤯The payoff: SFT on our Nemotron-Terminal-Corpus boosts Qwen3-32B from 3.4% → 27.4% on Terminal-Bench 2.0 (+24.0), rivaling models multiple its size. What makes it work? 🌟Terminal-Task-Gen: A lightweight data curation pipeline that seamlessly combines the adaptation of existing datasets with robust synthetic task construction. 🌟Nemotron-Terminal-Corpus: A massive, open-source dataset covering diverse terminal interactions, which contains explicit planning and execution traces for complex long-horizon tasks. And we’re releasing everything: 📦 Nemotron-Terminal-Corpus (Large-scale dataset) 🤖 Nemotron-Terminal models (8B, 14B, 32B) Paper: https://t.co/ORIZ01sav1 HF Daily: https://t.co/nSH4hu7I5D Models & Data: https://t.co/J1Zc22M95r Our tech report just hit the #1 spot on Hugging Face Daily Papers! We're also incredibly excited to see the open-source community putting our work to the test, with the Nemotron-Terminal-Corpus dataset currently trending at over 1,800 downloads and counting. We can't wait to see what the community build with it!

RenjiePi's tweet photo. Introducing Nemotron-Terminal: a systematic data engineering pipeline for scaling LLM Terminal Agents.

We bridge the gap between open models and proprietary models with a fully open synthetic-to-real trajectory pipeline.

🤯The payoff: SFT on our Nemotron-Terminal-Corpus boosts Qwen3-32B from 3.4% → 27.4% on Terminal-Bench 2.0 (+24.0), rivaling models multiple its size.

What makes it work?

🌟Terminal-Task-Gen: A lightweight data curation pipeline that seamlessly combines the adaptation of existing datasets with robust synthetic task construction.
🌟Nemotron-Terminal-Corpus: A massive, open-source dataset covering diverse terminal interactions, which contains explicit planning and execution traces for complex long-horizon tasks.

And we’re releasing everything:

📦 Nemotron-Terminal-Corpus (Large-scale dataset)
🤖 Nemotron-Terminal models (8B, 14B, 32B)

Paper: https://t.co/ORIZ01sav1
HF Daily: https://t.co/nSH4hu7I5D
Models & Data: https://t.co/J1Zc22M95r

Our tech report just hit the #1 spot on Hugging Face Daily Papers!
We're also incredibly excited to see the open-source community putting our work to the test, with the Nemotron-Terminal-Corpus dataset currently trending at over 1,800 downloads and counting.
We can't wait to see what the community build with it!

6

205

29

154

17K

HC_Tan_ retweeted

Zhijiang Guo @ZhijiangG

8 months ago

✨ Thrilled to share that our paper “ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts” has been accepted to the EMNLP 2025 Main Conference! 📂 Dataset & Code: https://t.co/scGCnmlIIY 📄 Paper: https://t.co/3iGtKAmQ08

ZhijiangG's tweet photo. ✨ Thrilled to share that our paper “ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts” has been accepted to the EMNLP 2025 Main Conference!
📂 Dataset & Code: https://t.co/scGCnmlIIY
📄 Paper: https://t.co/3iGtKAmQ08 https://t.co/Yz8lbOWEGe

1

14

6

1

460

HC_Tan_ retweeted

Zhijiang Guo @ZhijiangG

9 months ago

Unlock LLM reasoning with Adaptive RLVR! 🚀 DEPTH × BREADTH synergy boosts both Pass@1 & Pass@K.🥰 The core idea is simple yet effective: Hard problems get more rollouts, big batches keep entropy high. Paper: https://t.co/4LZMNSLZry Code: https://t.co/f6ShuF6mRH

0

21

7

6

2K

HC_Tan_ retweeted

Yuntian Deng

@yuntiandeng

11 months ago

Can we build an operating system entirely powered by neural networks? Introducing NeuralOS: towards a generative OS that directly predicts screen images from user inputs. Try it live: https://t.co/3PHYOl6B69 Paper: https://t.co/OEAUHu0hg3 Inspired by @karpathy's vision. 1/5

7

209

41

88

42K

HC_Tan_ retweeted

Siru Ouyang @Siru_Ouyang

about 1 year ago

🚀 Introducing RAST: Reasoning Activation via Small Model Transfer! ✨ RAST adjusts key "reasoning tokens" at decoding time using insights from smaller RL-tuned models — no full RL tuning for large models! ⚡ Efficient & Performant,🧠 Scalable & Easy,📉 Up to 50% less GPU memory!

Siru_Ouyang's tweet photo. 🚀 Introducing RAST: Reasoning Activation via Small Model Transfer!
✨ RAST adjusts key "reasoning tokens" at decoding time using insights from smaller RL-tuned models — no full RL tuning for large models!
⚡ Efficient & Performant,🧠 Scalable & Easy,📉 Up to 50% less GPU memory!

3

116

19

47

29K

HC_Tan_ retweeted

Yinya Huang @YinyaHuang

about 1 year ago

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀 🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks integrating complex scientific diagrams with theoretical derivations. 📊Experiments reveal that even SOTA models like Gemini-2.5-Pro and o4-mini achieve accuracy rates below 55%, with over 30% error rates on simple middle-school-level problems, highlighting significant challenges in multimodal reasoning. Key Features Highlighted: 🔎Vision-Text Integration: Explicitly emphasizes multimodal reasoning failures in interpreting diagrams (e.g., circuit schematics, coordinate systems). 🔎Cross-Domain Complexity: Tests models across 7 physics domains and 8 educational tiers, exposing weaknesses in both visual grounding and logical derivation. 🔎Open-Source Design: Fully reproducible framework for diagnosing AI's "visual illiteracy" in scientific contexts. 🎖️Project led by: @kaleb962, @HengLee29423, Terry Jingchen Zhang, @YinyaHuang 💼Joint work with an exceptional team: Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, @mrinmayasachan, Xiaodan Liang 🏁The benchmark is now open for evaluation at the ICML 2025 AI for MATH Workshop. Academic and industrial teams are invited to test their models and advance multimodal physics! ⚛️Project Page: https://t.co/Drk9jb7rgV 🤗Data: https://t.co/VydIc3gRBG 📜Paper: https://t.co/XYc8hMWJ4K 🏆Challenge Submission: https://t.co/GXDkRG9bYO ➡️Competition Guidelines: https://t.co/q0EOmJLWqj

YinyaHuang's tweet photo. 🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀

🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks integrating complex scientific diagrams with theoretical derivations.
📊Experiments reveal that even SOTA models like Gemini-2.5-Pro and o4-mini achieve accuracy rates below 55%, with over 30% error rates on simple middle-school-level problems, highlighting significant challenges in multimodal reasoning.

Key Features Highlighted:
🔎Vision-Text Integration: Explicitly emphasizes multimodal reasoning failures in interpreting diagrams (e.g., circuit schematics, coordinate systems).
🔎Cross-Domain Complexity: Tests models across 7 physics domains and 8 educational tiers, exposing weaknesses in both visual grounding and logical derivation.
🔎Open-Source Design: Fully reproducible framework for diagnosing AI's "visual illiteracy" in scientific contexts.

🎖️Project led by: @kaleb962, @HengLee29423, Terry Jingchen Zhang, @YinyaHuang
💼Joint work with an exceptional team: Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, @mrinmayasachan, Xiaodan Liang

🏁The benchmark is now open for evaluation at the ICML 2025 AI for MATH Workshop. Academic and industrial teams are invited to test their models and advance multimodal physics!

⚛️Project Page: https://t.co/Drk9jb7rgV
🤗Data: https://t.co/VydIc3gRBG
📜Paper: https://t.co/XYc8hMWJ4K
🏆Challenge Submission: https://t.co/GXDkRG9bYO
➡️Competition Guidelines: https://t.co/q0EOmJLWqj

4

40

18

10

3K

HC_Tan_ retweeted

Yi Xu

@_yixu

about 1 year ago

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

_yixu's tweet photo. 🚀Let’s Think Only with Images.

No language and No verbal thought.🤔

Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨.

We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

15

1K

219

1K

230K

HC_Tan_ retweeted

Yingjia Wan

@Yingjia_Wan

about 1 year ago

Is your model faithfully translating math into formal languages like Lean? ⚖ Introducing "FormalAlign"! #ICLR2025 ⁉️To address the lack of scalable evaluation in autoformalization, we propose the FIRST method to evaluate semantic alignment between informal and formal languages.

Yingjia_Wan's tweet photo. Is your model faithfully translating math into formal languages like Lean?
⚖ Introducing "FormalAlign"! #ICLR2025
⁉️To address the lack of scalable evaluation in autoformalization, we propose the FIRST method to evaluate semantic alignment between informal and formal languages. https://t.co/EAGVl26gJl

2

77

26

32

8K

HC_Tan_ retweeted

yuxuan_yy @cerana99x

about 1 year ago

🚀Our new paper on #LLM ensembling is out! We propose UniTE, a novel approach that focuses on the union of top-k tokens from each model, avoiding full vocabulary alignment and reducing computational overhead. Check out our paper: https://t.co/nZUQqq7i0u @hahahawu2 @ZhijiangG

cerana99x's tweet photo. 🚀Our new paper on
#LLM ensembling is out!
We propose UniTE, a novel approach that focuses on the union of top-k tokens from each model, avoiding full vocabulary alignment and reducing computational overhead.
Check out our paper: https://t.co/nZUQqq7i0u
@hahahawu2 @ZhijiangG https://t.co/amdwLsaDwS

5

14

6

5

1K

HC_Tan_ retweeted

Han Wu @hahahawu2

about 1 year ago

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.

hahahawu2's tweet photo. 💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field. https://t.co/IJtrlzt8Ce

1

17

10

3

970

HC_Tan_ retweeted

Yi Xu

@_yixu

about 1 year ago

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

_yixu's tweet photo. 🔥Are we ranking LLMs correctly?🔥

Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄 https://t.co/y9hvznw7Nc

2

132

30

60

22K

Haochen TAN @HC_Tan_

over 1 year ago

Nice work!

Zhijiang Guo @ZhijiangG

over 1 year ago

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: https://t.co/k1HGQTA2kN Github: https://t.co/VpcNVcEBSg

ZhijiangG's tweet photo. 🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries!
Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress.
Paper: https://t.co/k1HGQTA2kN
Github: https://t.co/VpcNVcEBSg https://t.co/66aTpHL97H

2

158

62

58

12K

0

2

0

52

HC_Tan_ retweeted

Yinhong Liu @YinhongLiu2

over 1 year ago

🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: https://t.co/QqqUuPHQRX

YinhongLiu2's tweet photo. 🚨 New Paper Alert! 🚨
When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔
Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚

🔗 Read more: https://t.co/QqqUuPHQRX https://t.co/PMA52suy05

14

246

68

158

24K

HC_Tan_ retweeted

Jiao Sun

@sunjiao123sun_

over 1 year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

sunjiao123sun_'s tweet photo. Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference @NeurIPSConf

We have ethical reviews for authors, but missed it for invited speakers? 😡 https://t.co/BjClBR9Kyl

175

4K

776

515

2M

HC_Tan_ retweeted

Zhijiang Guo @ZhijiangG

over 1 year ago

🤗Huge thanks to my friend @rayleizhu24 for presenting HydraLoRA for us. Welcome to drop by his poster about Vision Backbones at the poster session (East Exhibit Hall A-C #1500) on Fri 13 Dec 11 a.m. — 2 p.m.👋

ZhijiangG's tweet photo. 🤗Huge thanks to my friend @rayleizhu24 for presenting HydraLoRA for us. Welcome to drop by his poster about Vision Backbones at the poster session (East Exhibit Hall A-C #1500) on Fri 13 Dec 11 a.m. — 2 p.m.👋 https://t.co/EkhuxQCPUX

0

4

3

0

602

Haochen TAN

@HC_Tan_

Last Seen Users on Sotwe

Trends for you

Most Popular Users