Kevin Meng @mengk20 - Twitter Profile

Pinned Tweet

over 1 year ago

why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?

Transluce

@TransluceAI

over 1 year ago

Monitor: An Observability Interface for Language Models Research report: https://t.co/Nl88TcH8bh Live interface: https://t.co/jZAjCHd2uP (optimized for desktop)

TransluceAI's tweet photo. Monitor: An Observability Interface for Language Models

Research report: https://t.co/Nl88TcH8bh
Live interface: https://t.co/jZAjCHd2uP (optimized for desktop) https://t.co/pK1jIJZjI6

4

187

24

159

328K

68

1K

147

848

375K

mengk20 retweeted

Dami Choi @damichoi95

3 months ago

Code for our user modeling project is out now! https://t.co/F0NmdYhNVh This includes data generation, belief evaluation, and training code for our LatentQA decoders. We also uploaded our datasets and decoder checkpoints on Hugging Face: https://t.co/trUDGfDaME

0

51

7

22

7K

mengk20 retweeted

Ziqian Zhong

@fjzzq2002

4 months ago

🔭 We’re releasing Hodoscope: an open-source tool for unsupervised behavior discovery. It lets you visually explore and compare agent behaviors at scale. It helped us discover a novel reward hacking vulnerability in Commit0 - with just a couple minutes of human effort.

28

1K

154

1K

75K

mengk20 retweeted

Jacob Steinhardt @JacobSteinhardt

4 months ago

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

JacobSteinhardt's tweet photo. New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic. https://t.co/cRgTVXfyPX

4

123

29

68

16K

Who to follow

Tri Dao

@tri_dao

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

mengk20 retweeted

4 months ago

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇

TransluceAI's tweet photo. Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵

GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇

2

74

15

19

10K

mengk20 retweeted

Mike A. Merrill

@Mike_A_Merrill

5 months ago

The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵

Mike_A_Merrill's tweet photo. The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵 https://t.co/juIviCM1jX

21

458

102

249

104K

mengk20 retweeted

Jacob Steinhardt @JacobSteinhardt

5 months ago

New blog post out: a position piece on "Turning Compute into Understanding", by training superhuman oversight assistants.

JacobSteinhardt's tweet photo. New blog post out: a position piece on "Turning Compute into Understanding", by training superhuman oversight assistants. https://t.co/cDSHAgiyeR

5

232

37

181

31K

Kevin Meng

@mengk20

6 months ago

@vvhuang_ congrats vincent CBD is so cool!! 🤩

0

2

0

135

mengk20 retweeted

Sarah Schwettmann

@cogconfluence

6 months ago

All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: https://t.co/fhAK9VozDZ Blog: https://t.co/53t4oenA1N And here's @damichoi95's work on scalably extracting latent representations of users from model internals: https://t.co/F8fs7rhaX7

1

88

17

61

10K

mengk20 retweeted

vincent!

@vvhuang_

6 months ago

We trained a decoder to read the internal activations of an LLM and answer questions about what the model will think about or do next. We find that this decoder can understand LLM behaviors, even when the model itself is confused! (for instance, if the model has been jailbroken)

vvhuang_'s tweet photo. We trained a decoder to read the internal activations of an LLM and answer questions about what the model will think about or do next.
We find that this decoder can understand LLM behaviors, even when the model itself is confused! (for instance, if the model has been jailbroken) https://t.co/nhS0JxMHS8

9

107

27

23

21K

mengk20 retweeted

Transluce

@TransluceAI

6 months ago

Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.

2

165

33

67

37K

mengk20 retweeted

Transluce

@TransluceAI

6 months ago

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

TransluceAI's tweet photo. Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year. https://t.co/obs6LetVSX

4

97

22

9

65K

Kevin Meng

@mengk20

6 months ago

@lisabdunlap super cool, lisa!! congrats on the release, looking fw to playing with it :)

1

2

0

200

mengk20 retweeted

Transluce

@TransluceAI

6 months ago

We are proud to have helped get the AI Evaluator Forum off the ground! And excited to be working with such a great group of partners.

1

47

7

4

5K

mengk20 retweeted

AI Evaluator Forum

@aievalforum

6 months ago

Today we are announcing the creation of the AI Evaluator Forum: a consortium of leading AI research organizations focused on independent, third-party evaluations. Founding AEF members: @TransluceAI @METR_Evals @RANDCorporation @halevals @SecureBio @collect_intel @Miles_Brundage

6

171

53

52

90K

mengk20 retweeted

Dami Choi @damichoi95

7 months ago

Have you ever had ChatGPT give you personalized results out of nowhere that surprised you? Here, the model jumped straight to making recommendations in SF, even though I only asked for Korean food!

damichoi95's tweet photo. Have you ever had ChatGPT give you personalized results out of nowhere that surprised you? Here, the model jumped straight to making recommendations in SF, even though I only asked for Korean food! https://t.co/7lOAYbt0Wm

1

48

18

5

7K

mengk20 retweeted

Transluce

@TransluceAI

7 months ago

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data + methods to extract & steer these user models.

4

87

26

44

23K

mengk20 retweeted

Transluce

@TransluceAI

7 months ago

Transluce is headed to #NeurIPS2025! ✈️ Interested in understanding model behavior at scale? Join us for lunch on Thursday 12/4 to learn more about our work and meet members of the team: https://t.co/nOmFyTlsVs

1

78

8

33

25K

mengk20 retweeted

Aryaman Arora

@aryaman2020

7 months ago

🫡 new paper neurons can be a sparse and interpretable basis for circuit tracing, once you make the right decisions about which neurons and how you circuit trace! i'm excited for how this affects future progress on circuits + automating interp

5

191

15

114

22K

mengk20 retweeted

Transluce

@TransluceAI

7 months ago

Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!

TransluceAI's tweet photo. Is your LM secretly an SAE?

Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits! https://t.co/lTBbUqoRlt

7

368

76

337

119K

mengk20 retweeted

John Yang

@jyangballin

7 months ago

Super excited about @TransluceAI's work applying Docent to SWE-bench + mini-SWE-agent. Gemini 3 Pro results live Great to see better tools to understand SWE-agent behaviors. Stay tuned for more soon! (CodeClash 👀)

jyangballin's tweet photo. Super excited about @TransluceAI's work applying Docent to SWE-bench + mini-SWE-agent. Gemini 3 Pro results live

Great to see better tools to understand SWE-agent behaviors. Stay tuned for more soon! (CodeClash 👀) https://t.co/8jmoXQccnV

0

26

1

3

3K

Kevin Meng

@mengk20

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users