Smells Like ML

@smellslikeml

Building #ExperimentOps @remyxai Experiment orchestration for AI teams #XO #BeAnExperimenter #WhatsNext

San Francisco, CA

Joined October 2018

441 Following

929 Followers

4.9K Posts

Pinned Tweet

Smells Like ML @smellslikeml

4 days ago

In that benchmark comparison, do you even have the sample size to compare two models, or are you making decisions based on statistical noise? 2605.30315v1 offers a simple test Outrider added it to our fork of lm-evaluation-harness Install Outrider: https://t.co/apCnaQRZAW

2

4

2

0

121

Smells Like ML @smellslikeml

4 days ago

pip install remyxai → discover research and turn it into reviewable PRs https://t.co/UppiG8G98C

0

0

0

0

39

Smells Like ML @smellslikeml

4 days ago

In that benchmark comparison, do you even have the sample size to compare two models, or are you making decisions based on statistical noise? 2605.30315v1 offers a simple test Outrider added it to our fork of lm-evaluation-harness Install Outrider: https://t.co/apCnaQRZAW

2

4

2

0

121

Smells Like ML @smellslikeml

4 days ago

@artists_voyage Especially insidious for segment-level error analysis You slice into subgroups to improve over a baseline, the CIs widen with each cut, and decisions made off the point estimates end up chasing noise.

0

2

0

0

12

Who to follow

Jenny (Plunkett) Speelman

@jennymplunkett

Staff Engineer, DevRel @EdgeImpulse. @OReillyMedia co-author. ex-@Arm. @UTAustin grad living in Amsterdam. Hook 'em. 🤘

Verified account

• Teaching Beginner-Friendly ML Courses @zerotomasteryio (https://t.co/SGxUchebqe) • Building ML @nutrifyfoodapp (https://t.co/T8DzQnU4sG)

Julian Schrittwieser

Member of Technical Staff at Anthropic | AlphaGo, AlphaZero, MuZero, AlphaCode, AlphaTensor, AlphaProof Gemini RL Prev Principal Research Engineer at DeepMind

Smells Like ML @smellslikeml

6 days ago

Now @remyxai Outrider is on @github Marketplace! Schedule Claude to implement core methods from the most relevant papers for your repo using Github Actions! https://t.co/JqE1DKegF5

smellslikeml's tweet photo. Now @remyxai Outrider is on @github Marketplace!

Schedule Claude to implement core methods from the most relevant papers for your repo using Github Actions!

https://t.co/JqE1DKegF5 https://t.co/Mthawzm1ke

0

4

2

0

70

Smells Like ML @smellslikeml

6 days ago

'CrossView Suite' introduces CrossViewBench, focusing on explicit alignment mechanisms and object-level consistency across views offers a strong framework for evaluating the fidelity of VQASynth's synthetic data and improving the robustness of the generated spatial questions.

smellslikeml's tweet photo. 'CrossView Suite' introduces CrossViewBench, focusing on explicit alignment mechanisms and object-level consistency across views offers a strong framework for evaluating the fidelity of VQASynth's synthetic data and improving the robustness of the generated spatial questions. https://t.co/n47t7gsv6a

0

0

0

0

29

Smells Like ML @smellslikeml

6 days ago

Outrider is the GitHub Action that matches new arXiv methods to your repo and drafts the PR https://t.co/apCnaQRZAW

2

3

2

0

129

Smells Like ML @smellslikeml

6 days ago

@remyxai @AnthropicAI Coming soon: evals on every PR. Your benchmark suite + datasets run against the diff, results linked to PR before setting ready for review. Design partner pilot opens soon — DM if interested.

0

0

0

0

24

Smells Like ML @smellslikeml

6 days ago

Under the hood: arXiv → @RemyxAI ranks weekly against your team's commit history → @AnthropicAI Claude Code drafts the integration → Outrider opens a draft PR with tests

1

0

0

0

29

Smells Like ML @smellslikeml

18 days ago

https://t.co/pTmRDMPBsb

0

0

0

0

25

Smells Like ML @smellslikeml

18 days ago

When code evolves, developers signal an implicit preference for the new over the old. Scale that analysis across many repos and patterns emerge. Taste is learnable too, even if OAI hasn't figured out selection yet.

4 months ago

taste is a new core skill

890

10K

1K

2K

3M

1

0

0

0

48

Smells Like ML @smellslikeml

21 days ago

Check out the dataset for VQASynth https://t.co/KgXhkjHsjO

0

0

0

0

20

Smells Like ML @smellslikeml

22 days ago

https://t.co/LmEDVeZGHF

smellslikeml's tweet photo. https://t.co/LmEDVeZGHF https://t.co/DLT7792GEk

1

1

1

0

96

Smells Like ML @smellslikeml

21 days ago

Uses a Gaussian Process to learn contributor preferences implicitly from repo merge history Next, I applying the GP to synthesize a larger volume of preference data to help finetune an open-weight coding model with DPO and LoRA. https://t.co/8bqDhtqdTx

1

0

0

0

50

Smells Like ML @smellslikeml

22 days ago

smellslikeml's tweet photo. https://t.co/WkSsayXR1y

Katarzyna (Kasia) Kobalczyk @kasia_kobalczyk

about 1 month ago

Another paper accepted! #ICML2026 here we go 🇰🇷

kasia_kobalczyk's tweet photo. Another paper accepted! #ICML2026 here we go 🇰🇷 https://t.co/0yTzcIODVp

0

2

0

0

193

0

4

1

0

125

Smells Like ML @smellslikeml

22 days ago

smellslikeml's tweet photo. https://t.co/fm7kFE8VBW

0

0

0

0

21

Smells Like ML @smellslikeml

23 days ago

The space of possible improvements to your AI model is large while evaluation is costly LILO learns efficiently from decision maker's preferences, balancing exploration and exploitation in a principled way w/ Bayesian Optimization https://t.co/rpW3323Col

smellslikeml's tweet photo. The space of possible improvements to your AI model is large while evaluation is costly

LILO learns efficiently from decision maker's preferences, balancing exploration and exploitation in a principled way w/ Bayesian Optimization

https://t.co/rpW3323Col https://t.co/7w3F1aeRMi

1

3

1

0

127

Smells Like ML @smellslikeml

23 days ago

used it to rank 11 arXiv papers by alignment, with spatial-reasoning benchmark and novel-view papers topping the list

smellslikeml's tweet photo. used it to rank 11 arXiv papers by alignment, with spatial-reasoning benchmark and novel-view papers topping the list https://t.co/3NVWz8smpy

1

0

0

0

43

Last Seen Users on Sotwe

Trends for you

Most Popular Users