Tom Sawada @tsawada_ml - Twitter Profile

Tom Sawada @tsawada_ml

8 months ago

I will be at #EMNLP2025 ! DM me if you want to grab coffee!! 😊

0

122

Tom Sawada @tsawada_ml

11 months ago

The takeaway: You can "Train It and Forget It." The privacy & simplicity benefits of dropping the BPE merge list at inference may outweigh the minimal performance trade-offs , enabling more secure tokenization for deployed LLMs. Joint work with @kartik_goyal_ (4/4)

0

157

Tom Sawada @tsawada_ml

11 months ago

BPE merge lists in LLMs are a privacy risk. What if we just ignored them at inference? Our paper shows you can ditch the merge list without retraining. Merge-list-free tokenization has minimal impact on performance & can even improve it on some tasks. Paper: https://t.co/bQ8fn860H3 👇 (1/4)

tsawada_ml's tweet photo. BPE merge lists in LLMs are a privacy risk. What if we just ignored them at inference?

Our paper shows you can ditch the merge list without retraining. Merge-list-free tokenization has minimal impact on performance & can even improve it on some tasks.

Paper: https://t.co/bQ8fn860H3

👇 (1/4)

2

12

2

2K

Tom Sawada @tsawada_ml

11 months ago

The results? Deliberately corrupting the merge list tanks performance. But our compression-based methods are robust, even *outperforming* the standard tokenizer on QA (MMLU/ARC) & open-ended generation. We saw only modest drops in machine translation. (3/4)

1

0

160

Who to follow

Omid V. Ebrahimi

@OmidVEbrahimi

Researcher @UniOfOxford | @OxExpPsy. Clinical Psychologist | Public Health | Statistics. I study how people transition into and recover from mental disorders.

Nikita Araslanov

@neekans

Researcher at University of Oxford / TU Munich

Victoria Houed

@victoriahoued

Policy Entrepreneur. Previously @CommerceGov, @schmidtfutures, @speakerpelosi & @cah.

Tom Sawada @tsawada_ml

11 months ago

LLMs don’t take tests like students. So why evaluate them like students? Our method decouples reasoning from answer selection. It’s automatic, scalable, and works with existing QA benchmarks. 📄 https://t.co/3dNk9foHL4 w/ Ryan Yan and @kartik_goyal_

0

1

0

107

Tom Sawada @tsawada_ml

11 months ago

Why do we evaluate LLMs using multiple-choice QA... ...when in practice, we ask them to generate open-ended answers? Standard evaluation rewards models for choosing the right letter — not for reasoning their way to an answer. A better alternative: Cascaded Information Disclosure

2

3

1

0

525

Tom Sawada @tsawada_ml

11 months ago

We tried using another LLM to “judge” the model’s reasoning. Turns out it’s unreliable — even when we feed it perfect explanations (!) But when we match explanations to answers, accuracy shoots up (>99%). No hallucinated grading.

tsawada_ml's tweet photo. We tried using another LLM to “judge” the model’s reasoning.
Turns out it’s unreliable — even when we feed it perfect explanations (!)
But when we match explanations to answers, accuracy shoots up (>99%).
No hallucinated grading. https://t.co/siaU9cEbzo

1

2

0

121

Tom Sawada @tsawada_ml

over 1 year ago

@gan_chuang https://t.co/hVoFda4Cky

0

2

0

123

Tom Sawada @tsawada_ml

about 2 years ago

@brian_a_burns Michael Spivak <3

1

2

0

137

tsawada_ml retweeted

OpenAI

@OpenAI

over 2 years ago

Introducing the GPT Store: Over 3M GPTs have been created and now you can find the most useful versions of ChatGPT for you. https://t.co/rdR0jMEYgt

671

5K

1K

739

1M

tsawada_ml retweeted

Jim Fan

@DrJimFan

over 2 years ago

What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level, generally capable robots. The brain will be. This work is done by 3 researchers with academic budget. What an incredible job! Stanford rocks! Congrats to @zipengfu @tonyzzhao @chelseabfinn Academia is no longer the place for the biggest frontier LLMs, simply because of resource constraints. But robotics levels the playing field a bit between academia and industry, at least in the near term. More affordable hardware is the inevitable trend. Advice for aspiring PhD students: embrace robotics - less crowded, more impactful. Website: https://t.co/gFcgiuTxrg Hardware assembly tutorial (oh yes we need more of these!): https://t.co/FK5Twrgniz Codebase: https://t.co/8FsfEXGfVg

69

2K

417

732

512K

tsawada_ml retweeted

Stephen McAleer

@McaleerStephen

over 2 years ago

This is an interesting paper that learns a process reward model without human annotations. The idea is to evaluate the accuracy of full reasoning traces generated from a given partial reasoning step. Nice to see Llemma-34B getting 47.3% on MATH! https://t.co/FtApBbuGha

McaleerStephen's tweet photo. This is an interesting paper that learns a process reward model without human annotations.

The idea is to evaluate the accuracy of full reasoning traces generated from a given partial reasoning step.

Nice to see Llemma-34B getting 47.3% on MATH!

https://t.co/FtApBbuGha https://t.co/uIZd6QmVEw

4

180

25

142

23K

Tom Sawada @tsawada_ml

over 2 years ago

Come see my poster at the Math-AI workshop at Room 217-219!!

1

22

5

7

5K

tsawada_ml retweeted

Leopold Aschenbrenner

@leopoldasch

over 2 years ago

Intuitively, superhuman AI systems should "know" if they're acting safely. But can we "summon" such concepts from strong models with only weak supervision? Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/ https://t.co/FiFGhrqqE0

leopoldasch's tweet photo. Intuitively, superhuman AI systems should "know" if they're acting safely.

But can we "summon" such concepts from strong models with only weak supervision?

Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/
https://t.co/FiFGhrqqE0 https://t.co/XyMO1Kjj5o

12

458

54

252

197K

tsawada_ml retweeted

EleutherAI @AiEleuther

over 2 years ago

The first thing you need to build a high quality mathematics model is high quality mathematics data. Don't worry, we got your back! Hear the oral at the Math-AI Workshop! https://t.co/Zh8QAARF67

1

6

4

0

1K

Tom Sawada

@tsawada_ml

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users