DT K @OnePitchOneSoul - Twitter Profile

By training a machine learning model to predict pitchers we can find the most unique ones. Tyler Rogers comes in at #1, with a 𝟵𝟵.𝟵% confidence just from a single pitch!

OnePitchOneSoul's tweet photo. By training a machine learning model to predict pitchers we can find the most unique ones.

Tyler Rogers comes in at #1, with a 𝟵𝟵.𝟵% confidence just from a single pitch! https://t.co/rxxHTHB0dJ

4

60

1

8

14K

DT K

@OnePitchOneSoul

about 2 hours ago

@notbobkoz His 65mph Fastball was quite unique👀

0

2

0

175

Who to follow

Justin Choi 🇰🇷

@justinochoi

@CleGuardians R&D. Past: Nationals, FanGraphs, Baseball Prospectus. WashU alum.

Sam Wirth

@samwirth_

Data strategist @Mariners. Georgetown '25 math, econ, history.

Brooks Robinson

@Brooksrobinson5

I sell and lease office buildings in Central Florida [email protected]

DT K

@OnePitchOneSoul

about 2 hours ago

@KevinRuben3rd He threw 155 pitches across 2024/2025😂

1

3

0

527

DT K

@OnePitchOneSoul

about 16 hours ago

@wearefromstars Is this MLB data?

1

0

8

DT K

@OnePitchOneSoul

1 day ago

@Forkball_tho @TJStats I'll be answering that shortly 👍

0

128

DT K

@OnePitchOneSoul

1 day ago

Are stuff models just memorizing pitchers? One fun test is to make the model predict pitcher names (instead of ERA). With @TJStats tjStuff+, we get a 49% accuracy using a *single* pitch, and 73% using the whole season. (And tjStuff+ is less overfit than others!) So theoretically at most 73% of the predictiveness is banking on prior season results (RV, xwOBA, etc.)

OnePitchOneSoul's tweet photo. Are stuff models just memorizing pitchers?

One fun test is to make the model predict pitcher names (instead of ERA).

With @TJStats tjStuff+, we get a 49% accuracy using a *single* pitch, and 73% using the whole season.

(And tjStuff+ is less overfit than others!)

So theoretically at most 73% of the predictiveness is banking on prior season results (RV, xwOBA, etc.)

6

54

3

32

15K

DT K

@OnePitchOneSoul

1 day ago

@EliBenPorat You've guessed my next project haha

0

44

DT K

@OnePitchOneSoul

1 day ago

Yeah, the conclusion might've sounded too strong. But the point of stuff models is to predict out of sample (i.e. new pitchers). Is Misio great because his name is Misio or because he sits 101? Imo stuff models tend not to isolate the latter because the former adds predictiveness.

Eli Ben-Porat 🇨🇦 @EliBenPorat

1 day ago

This is interesting, but not sure I agree with your conclusion. I as a human would be able to predict 100% just seeing 1 Misiorowski, Clase, Jansen etc. signature pitch. But if another pitcher replicated those exact same physics they’d be equally amazing.

1

8

0

3

7K

2

5

0

4K

DT K

@OnePitchOneSoul

1 day ago

@djhogness @EliBenPorat Imo it's not so insane anymore! Teams are getting better at using process metrics instead of waiting for results to speak for themselves

0

27

DT K

@OnePitchOneSoul

1 day ago

@racketman45 @TJStats Not yet!

0

17

DT K

@OnePitchOneSoul

1 day ago

@RobertStock6 @903124S @TJStats https://t.co/V9ebWUgzGr

DT K

@OnePitchOneSoul

1 day ago

It's not a bad thing that models can guess pitchers per se. I think the interesting question is "how can we lessen the dependence on pitcher identities".

2

3

0

470

0

119

DT K

@OnePitchOneSoul

1 day ago

It's not a bad thing that models can guess pitchers per se. I think the interesting question is "how can we lessen the dependence on pitcher identities".

2

3

0

470

DT K

@OnePitchOneSoul

1 day ago

It's moreso apparent for guys like Hendricks, Kershaw, and pitch-to-pitch relationship merchants. Was Brent Suter's changeup good in 2020-2021 because of, or despite the lack of velocity gap? I found that models like to avoid these questions by depending on pitcher identity and therefore understating the effects.

2

3

0

414

DT K

@OnePitchOneSoul

1 day ago

@PatekLives @exstasiae @TJStats https://t.co/DnSbqACLEa

DT K

@OnePitchOneSoul

1 day ago

Yeah, the conclusion might've sounded too strong. But the point of stuff models is to predict out of sample (i.e. new pitchers). Is Misio great because his name is Misio or because he sits 101? Imo stuff models tend not to isolate the latter because the former adds predictiveness.

2

5

0

4K

0

1

0

138

DT K

@OnePitchOneSoul

1 day ago

@903124S @RobertStock6 @TJStats Exactly

0

38

DT K

@OnePitchOneSoul

1 day ago

@903124S @TJStats Yeah using the same features and model setup

0

1

0

520

DT K

@OnePitchOneSoul

6 days ago

@taylor_turrisi @enosarris @choice_fielder 1. Yeah defo not identical. It's fuzzy what counts as "replication" though. 2. At this point it wouldn't improve my portfolio tbh. Stuff models are hard to build nevertheless and I thought it'd help a lot of people.

0

1

0

228

DT K

@OnePitchOneSoul

7 days ago

If anyone is seeing this, I'm looking for some advice. I replicated Fangraphs' Stuff+. I scraped every article, tweet, podcast from @enosarris and @choice_fielder to derive the same 19 features, same 5 stage architecture, same Catboost, same train/test split. .9 correlation to Stuff+, just as predictive. Is it okay to open source this? Am I technically copying code here?