Benjamin Feuer @feuerbenjamin - Twitter Profile

Benjamin Feuer @FeuerBenjamin

6 months ago

So excited to be the SFT lead of this massive collaboration! The OpenThoughts team is 🔥

Negin Raoof

@NeginRaoof_

6 months ago

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on TerminalBench, and sets a new bar on our newly released OpenThoughts-TB-Dev benchmark. (1/n)

NeginRaoof_'s tweet photo. How can we make a better TerminalBench agent?
Today, we are announcing the OpenThoughts-Agent project.
OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments.
OpenThinker-Agent-v1 is the strongest model of its size on TerminalBench, and sets a new bar on our newly released OpenThoughts-TB-Dev benchmark. (1/n)

17

287

77

181

127K

3

14

1

0

1K

Benjamin Feuer @FeuerBenjamin

6 months ago

@etash_guha @NeginRaoof_ Extraordinary collaboration! Maximum effort!

1

4

0

46

Benjamin Feuer @FeuerBenjamin

7 months ago

@micahgoldblum @gneubig Reproduce or extend: https://t.co/F6lGeSQAJp Interactive HTML plot with individual submission results: e.g. https://t.co/lsScUwS5ov Raw judgments: e.g. https://t.co/I9oJWsJMLw n / n

0

1

0

1

275

Benjamin Feuer @FeuerBenjamin

7 months ago

@micahgoldblum @gneubig Not only that, reviewers this year are *much* worse at punishing heavily AI-modified content. 2 / n

FeuerBenjamin's tweet photo. @micahgoldblum @gneubig Not only that, reviewers this year are *much* worse at punishing heavily AI-modified content.

2 / n https://t.co/l5Y7cMWJZ4

1

2

0

113

FeuerBenjamin retweeted

Thao Nguyen @thao_nguyen26

11 months ago

If you are attending #ICML2025, check out our DataWorld workshop on Sat July 19. We have updated the website with more info on speakers & accepted papers! https://t.co/K3U540rqoe Also happy to chat offline about all things ✨ data ✨

thao_nguyen26's tweet photo. If you are attending #ICML2025, check out our DataWorld workshop on Sat July 19. We have updated the website with more info on speakers & accepted papers! https://t.co/K3U540rqoe

Also happy to chat offline about all things ✨ data ✨ https://t.co/RU1GHIqYqK

0

81

25

16

11K

Benjamin Feuer @FeuerBenjamin

11 months ago

New research paper for you to read over your July 4th break (if you're US-based) -- Vision is a skeleton key! 🗝️ We convert a small VLM into an "everything classifier" by transforming data into visualizations that VLMs can naturally understand and reason about. We call it MARVIS: Modality Adaptive Reasoning over VISualizations. Our MARVIS-3B model: - Beats Gemini by 16% on average across 100s of vision and tabular tasks 🏆 - Gets within 2.5% of the best specialized model across across 4 modalities ... 🎯 - Using just one 3B model ... 💪 - ... without exposing any P.I.I. (personally identifiable information) to the VLM ... 🔐 - And without requiring any model training! ⚡ Our GitHub: https://t.co/NKbPFSIhQi 💻 Our Paper: https://t.co/TjKbraivQ0 📄 Research Supported By: https://t.co/larCo1IOmv Thanks to @LennartPurucker @Oussama_e

FeuerBenjamin's tweet photo. New research paper for you to read over your July 4th break (if you're US-based) --

Vision is a skeleton key! 🗝️ We convert a small VLM into an "everything classifier" by transforming data into visualizations that VLMs can naturally understand and reason about. We call it MARVIS: Modality Adaptive Reasoning over VISualizations.

Our MARVIS-3B model:
- Beats Gemini by 16% on average across 100s of vision and tabular tasks 🏆
- Gets within 2.5% of the best specialized model across across 4 modalities ... 🎯
- Using just one 3B model ... 💪
- ... without exposing any P.I.I. (personally identifiable information) to the VLM ... 🔐
- And without requiring any model training! ⚡

Our GitHub: https://t.co/NKbPFSIhQi 💻
Our Paper: https://t.co/TjKbraivQ0 📄
Research Supported By: https://t.co/larCo1IOmv
Thanks to @LennartPurucker @Oussama_e

0

4

2

1

563

Benjamin Feuer @FeuerBenjamin

12 months ago

@allen_ai @CVPR @LambdaAPI @huggingface @Zhang_Yu_hui @NimrodShabtay @NHulkund @thao_nguyen26 @XiaohanWang96 @lschmidt3 @sainingxie @sarameghanbeery @georgiagkioxari And also our supporters @natolambert , Thomas Bordes, George Will

0

4

0

218

Benjamin Feuer @FeuerBenjamin

12 months ago

So excited to announce the DCVLR (Data Curation for Vision-Language Reasoning) competition at NeurIPS 2025, led by @Oumi_PBC and sponsored by @LambdaAPI! 🌟open-data 🌟 🤖 open-models 🤖 💻 open-source 💻 💪anyone can compete for free 💪 https://t.co/7FLCl255cK 🧵 1 / n

1

43

13

12

11K

Benjamin Feuer @FeuerBenjamin

12 months ago

@allen_ai @CVPR @LambdaAPI @huggingface Thanks to Rohun Tripathi, Oussama Elachqar, @Zhang_Yu_hui , @NimrodShabtay , @NHulkund , Stefan Webb, @thao_nguyen26 , Vishaal Udandarao, @XiaohanWang96 , @lschmidt3 , @sainingxie , Serena Yeung-Levy, Paul Pu Liang, @sarameghanbeery , @georgiagkioxari , Manos Koukoumidis !

1

5

1

648

FeuerBenjamin retweeted

Ryan Marten

@ryanmart3n

about 1 year ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights: BTW, it also works on non-Qwen models😉 (1/N)

ryanmart3n's tweet photo. Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals.

We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights:

BTW, it also works on non-Qwen models😉 (1/N)

34

925

192

725

201K

FeuerBenjamin retweeted

Neha Hulkund @NHulkund

about 1 year ago

📣We are extending our deadline to May 31st!📣 Looking forward to seeing everyone's submissions :)

0

7

4

0

824

FeuerBenjamin retweeted

Mike A. Merrill

@Mike_A_Merrill

about 1 year ago

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr lots of room for improvement! https://t.co/qEczwCmyoQ

Mike_A_Merrill's tweet photo. Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse?

We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr lots of room for improvement! https://t.co/qEczwCmyoQ

16

243

63

100

52K

FeuerBenjamin retweeted

Thao Nguyen @thao_nguyen26

about 1 year ago

📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! 📅 Deadline: May 24, AoE 🔗 Website: https://t.co/K3U540rqoe We have an amazing lineup of speakers + panelists from various institutions and application areas.

thao_nguyen26's tweet photo. 📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains!

📅 Deadline: May 24, AoE
🔗 Website: https://t.co/K3U540rqoe

We have an amazing lineup of speakers + panelists from various institutions and application areas. https://t.co/4YFAIvSSFG

2

134

21

26

26K

Benjamin Feuer @FeuerBenjamin

about 1 year ago

@arankomatsuzaki @arankomatsuzaki , thanks for this important work! The bias in LM Arena also filters down to the LLM judge benchmarks designed to simulate it, as we showed in https://t.co/FKUxPdkanx. Happy to cross-cite if you're interested! Good luck with the paper!

0

2

0

245

Benjamin Feuer

@FeuerBenjamin

Last Seen Users on Sotwe

Trends for you

Most Popular Users