Lihao Sun @1e0sun - Twitter Profile

Lihao Sun

@1e0sun

27 days ago

@itaimond yes - and this idea was actually what our mid-reasoning correctness predictor was based on!

1

0

12

Lihao Sun

@1e0sun

about 2 months ago

How do LLMs do CoT reasoning internally? In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight. 1/

1e0sun's tweet photo. How do LLMs do CoT reasoning internally?

In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight.
1/ https://t.co/RIkvQVPLVO

12

288

34

247

20K

Lihao Sun

@1e0sun

about 2 months ago

awesome point - we did verify - comparing autoregressive (KV cache), full-sequence forward (our two-pass), and truncated forward under "deterministic" settings: Tokens: perfect match across all three methods. Hidden states: not bit-identical (float16 computation order differences), but worst-case cosine sim > 0.99, KL < 0.001. Way below anything that would affect probe accuracy or PCA geometry. So yes, mathematically equivalent via causal masking, but tiny bits of difference, possibly due to floating point stuff - though not a huge difference for the observations, but careful when you use it for settings where high precision matters. we will probably add these to the appendix! thanks!

1

0

24

Lihao Sun

@1e0sun

about 2 months ago

@wassname haha yes - pass 1 generates the full output under deterministic settings, then pass 2 runs the prompt + output to extract all hidden states at once. with causal masking you recover the same activations without storing them token-by-token.

1

0

46

Lihao Sun

@1e0sun

about 2 months ago

@LuxInvariantAI wow mind sharing link to the publication? couldn’t find it online

3

0

117

Lihao Sun

@1e0sun

about 2 months ago

Great question! We chose Instruct + R1-Distill + Base to span a range of training regimes while keeping the architecture constant. We wanted to establish a general claim - it will also be interesting to explore the structures in an RLVR model! re the format concern: the organization exists across formats including “Step X:” and freeform responses (no formatting instructions; numbered lists, paragraphs, single blocks). See section 3.5 for more details!

0

3

0

1

265

Lihao Sun

@1e0sun

about 2 months ago

📢 Accepted to #ACL2026 Main Conference! Thanks to all collaborators at Microsoft: Hang Dong, Bo Qiao, Qingwei Lin, Dongmei Zhang, and Saravan Rajmohan. Paper: https://t.co/o9fQlmufle Website: https://t.co/lVHw2nsWY7 Code: https://t.co/we7nO28psh 8/

1e0sun's tweet photo. 📢 Accepted to #ACL2026 Main Conference!
Thanks to all collaborators at Microsoft: Hang Dong, Bo Qiao, Qingwei Lin, Dongmei Zhang, and Saravan Rajmohan.

Paper: https://t.co/o9fQlmufle
Website: https://t.co/lVHw2nsWY7
Code: https://t.co/we7nO28psh

8/ https://t.co/M4cMMkwdDR

1

17

2

12

808

Lihao Sun

@1e0sun

about 2 months ago

Reasoning length is also controllable. Steering hidden states toward the termination subspace shortens reasoning; steering away extends it. At moderate strengths this works as a smooth knob with minimal accuracy changes - push too hard and the model enters repetitive loops. 7/

1e0sun's tweet photo. Reasoning length is also controllable. Steering hidden states toward the termination subspace shortens reasoning; steering away extends it. At moderate strengths this works as a smooth knob with minimal accuracy changes - push too hard and the model enters repetitive loops.

7/ https://t.co/NgWEVELyh3

1

12

4

1

886

Lihao Sun

@1e0sun

2 months ago

Great to see so many convergent findings on affective structure in LLMs lately!

Melanie Weber @mweber_PU

2 months ago

How are emotions represented in the latent geometry of LLMs? We analyze affective representations in latent space and show that they mirror classic valence-arousal models from psychology (similar to concurrent work @AnthropicAI @1e0sun) and display nonlinear structure that supports uncertainty quantification and steering in emotion tasks with implications for model transparency and AI safety.

0

66

10

53

9K

0

10

1

3

675

Lihao Sun

@1e0sun

2 months ago

While we find consistent circular VA geometry across Llama and Qwen models, @AnthropicAI concurrently finds similar structure in Claude. Check our work out! https://t.co/DW6RON38OG And thanks to all collaborators: Andrew Lee (@a_jy_l), Lewen Yan, Xiaoya Lu, Jie Zhang, and Jing Shao. 6/

1e0sun's tweet photo. While we find consistent circular VA geometry across Llama and Qwen models, @AnthropicAI concurrently finds similar structure in Claude.

Check our work out! https://t.co/DW6RON38OG

And thanks to all collaborators: Andrew Lee (@a_jy_l), Lewen Yan, Xiaoya Lu, Jie Zhang, and Jing Shao.

6/

0

15

3

11

674

Lihao Sun

@1e0sun

2 months ago

💡New paper! Woke up to @AnthropicAI's emotion paper and realized - “wait, that's our finding too.” So we ArXiv'd immediately. We concurrently uncovered a circular geometry of emotions organized by valence and arousal (VA), as well as steering effects on downstream behaviors like refusal and sycophancy. We further provide a mechanistic account for why: refusal and compliance tokens occupy distinct regions in this space. 1/

1e0sun's tweet photo. 💡New paper!
Woke up to @AnthropicAI's emotion paper and realized - “wait, that's our finding too.” So we ArXiv'd immediately.

We concurrently uncovered a circular geometry of emotions organized by valence and arousal (VA), as well as steering effects on downstream behaviors like refusal and sycophancy. We further provide a mechanistic account for why: refusal and compliance tokens occupy distinct regions in this space.
1/

11

104

18

71

16K

Lihao Sun

@1e0sun

2 months ago

One possible reason why: consider refusal-related token embeddings (“no”) and compliance tokens (“sure”). Take their mean diff and project onto our VA circle, which lands at 256°: negative in both V and A. Steering in -V or -A promotes the likelihood of refusal tokens! Generally, emotion prompting and VA steering change the emission probabilities of these key tokens, thereby affecting downstream behaviors - further supported by logit shifts and neuron analysis. 5/

1e0sun's tweet photo. One possible reason why: consider refusal-related token embeddings (“no”) and compliance tokens (“sure”). Take their mean diff and project onto our VA circle, which lands at 256°: negative in both V and A. Steering in -V or -A promotes the likelihood of refusal tokens!

Generally, emotion prompting and VA steering change the emission probabilities of these key tokens, thereby affecting downstream behaviors - further supported by logit shifts and neuron analysis.

5/

0

9

1

2

500

Lihao Sun

@1e0sun

2 months ago

Somewhat surprisingly, the VA axes also provide monotonic, bidirectional control over multiple downstream behaviors, including refusal and sycophancy. Arousal is a strong lever - increasing arousal leads to lower refusal rates, while decreasing arousal leads to more refusal behavior. 4/

1e0sun's tweet photo. Somewhat surprisingly, the VA axes also provide monotonic, bidirectional control over multiple downstream behaviors, including refusal and sycophancy. Arousal is a strong lever - increasing arousal leads to lower refusal rates, while decreasing arousal leads to more refusal behavior.

4/

1

10

2

3

585

Lihao Sun

@1e0sun

2 months ago

@AnthropicAI Unlike Anthropic, we steer along the circular manifold at 0°, 30°, 60°, 90°, etc. This controls the valence and/or arousal level of the model’s outputs, validating that the recovered axes correspond to valence and arousal in a human-interpretable sense. 3/

1e0sun's tweet photo. @AnthropicAI Unlike Anthropic, we steer along the circular manifold at 0°, 30°, 60°, 90°, etc. This controls the valence and/or arousal level of the model’s outputs, validating that the recovered axes correspond to valence and arousal in a human-interpretable sense.

3/ https://t.co/QD7tJhHKRE

1

13

2

3

681

Lihao Sun

@1e0sun

2 months ago

@AnthropicAI We use mean-diff to extract emotion steering vectors. PCA + ridge regression reveals a circumplex akin to the circumplex model of emotions in human psychology. Projections onto these axes correlate with human-crowdsourced VA ratings across 44k words (valence r=0.71). 2/

0

13

2

714

Lihao Sun

@1e0sun

Last Seen Users on Sotwe

Trends for you

Most Popular Users