Mehdi Fatemi

7 days ago

@ajratner's talk from Scale at Meta covering how benchmarks must evolve, the data layer underneath, and the collaborations driving this work forward. https://t.co/1fEUhVuexK

573

mefatemi retweeted

AI and robotics researcher at Technion

23 days ago

Join us for late-afternoon boba and research. RSVP: https://t.co/boWl1s0gH6🧋 Next up in the Snorkel AI Reading Group: Russell Yang (@StanfordLaw) on “JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment,” stemming from a collaboration with @harvey and Snorkel AI, with contributions from Charles Dickens. When evaluating AI systems in high-judgment domains like law, should we use rubrics or pairwise preference rankings? JudgmentBench is a new benchmark that enables direct comparison of these evaluation methods using expert legal judgments.

Who to follow

Aviv Tamar

@AvivTamar1

Principal Researcher at Froggy Team, Microsoft Research Montreal. Opinions are my own.

mefatemi retweeted

about 1 month ago

We’re building the data and environments behind the world’s most advanced AI systems. If you want to work on hard problems that matter, alongside people who hold a high bar and move fast without ego, Snorkel is the place. Learn more about open roles: https://t.co/luQ4eOnT4T

SnorkelAI's tweet photo. We’re building the data and environments behind the world’s most advanced AI systems. If you want to work on hard problems that matter, alongside people who hold a high bar and move fast without ego, Snorkel is the place.

Learn more about open roles: https://t.co/luQ4eOnT4T https://t.co/MLSfruKy1V

mefatemi retweeted

about 1 month ago

Join us for late-afternoon 🧋 boba and research. Details/RSVP: https://t.co/GK5lswnZbY Next up in the Snorkel AI Reading Group: @EchoShao8899 (@stanfordnlp) on “Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration,” recently presented at @iclr_conf.

mefatemi retweeted

about 2 months ago

🚀 We're #hiring a Research Scientist – RL Training @SnorkelAI. We need someone who's actually RLFT'd agents using complex environments (e.g. SWE-Bench/Terminal-Bench). Deep hands-on experience with GRPO, RLHF, DPO, reward modeling & frameworks like verl/SkyRL. 30B+ scale and deep expertise in RL algorithms. Come build SOTA coding agents with us! 📍 RWC / SF / NYC / Remote - US #ML #ReinforcementLearning #PostTraining

166

143

24K

mefatemi retweeted

about 2 months ago

The question has moved past whether coding agents can produce code that works. The harder question is whether they can complete real software work safely, measurably, and repeatedly — and how we supervise it. Great piece by @realjustinbauer on the evolution of coding agents and why they need better data, evals, and environments: https://t.co/RBHhWfp6Nn

SnorkelAI's tweet photo. The question has moved past whether coding agents can produce code that works. The harder question is whether they can complete real software work safely, measurably, and repeatedly — and how we supervise it.

Great piece by @realjustinbauer on the evolution of coding agents and why they need better data, evals, and environments: https://t.co/RBHhWfp6Nn

544

mefatemi retweeted

Alex Ratner

@ajratner

about 2 months ago

One major factor distorting our perception of AI capabilities: benchmark development now lags behind model development for the first time in AI history. In traditional AI/ML: The rate of benchmark advancement (i.e. labeling a small-to-mid sized dataset) exceeded that of model development - and so benchmarks gave a pretty useful view of frontier capabilities. This made them canonical measures of AI progress. Today: it's very difficult to create benchmarks that properly measure *real world* environments, scenarios, and tasks at the jagged frontier of AI capabilities - which itself has become an exponentially bigger space to measure - and are robust to rapid overfitting. Benchmarks show near saturated performance - even though models still have real capability gaps in practice. One more reason why accelerating the pace of benchmark development - and doing so with the full power of open, academic communities- is so important!

mefatemi retweeted

about 2 months ago

@marklevinshow Carter's biggest foreign policy failure on full display. We traded a genuine ally in the Shah's Iran for a regime that chants "Death to America" and now we're dependent on countries that are allies only when it's convenient for them!

191

Mehdi Fatemi @mefatemi

about 2 months ago

If you are an RL expert with track record of experience in LLM post-training, reach out directly with your resume! *** Please share with your network ***

about 2 months ago

Our #research team is looking for an #RL expert to join the team that's working on building a Coding Agent. Ping me if you have experience with RL for agents at scale and want to build something exciting (or apply through the links below) #Hiring

15K

mefatemi retweeted

Reza Pahlavi

@PahlaviReza

about 2 months ago

Thank you for standing with the people of Iran, Sergey and Gerelyn.

787

33K

11K

417

431K

mefatemi retweeted

about 2 months ago

🔬 Research Scientist – RL Training: https://t.co/Az5I3tv0Vh 🛠 Applied Research Engineer – Training Infra: https://t.co/BYJ8RJSqWt 📄 Paper: https://t.co/sUdFUdpuw9 🎉 Event: https://t.co/ycmGkqEjod

10K

mefatemi retweeted

2 months ago

Exciting release! And we are looking for researchers with deep RL and LLM post-training expertise to help us build something amazing! #hiring #rl #post_training

mefatemi retweeted

Justin Bauer

@realjustinbauer

2 months ago

Our #MLSys2026 paper is live on arXiv 📄 We ran a systematic study of RLVR in low-data regimes across 3 procedurally generated benchmarks (counting, graph, spatial reasoning). Key finding: dataset composition matters more than dataset size. https://t.co/Z7ZuG1fLMD

mefatemi retweeted

Neil Stone

@DrNeilStone

3 months ago

Everyone is talking about uranium enrichment, oil prices, and the Strait of Hormuz. No one is talking about the Iranian people

641

13K

130

158K

mefatemi retweeted

3 months ago

Web agents are getting increasingly capable. Excited that our team @SnorkelAI partnered with @allen_ai on MolmoWeb where we managed the human trajectory annotations, verified correctness, and ensured quality control for the training data. Blog: https://t.co/7uWdB1qmVq

ArminPCM's tweet photo. Web agents are getting increasingly capable. Excited that our team @SnorkelAI partnered with @allen_ai on MolmoWeb where we managed the human trajectory annotations, verified correctness, and ensured quality control for the training data.
Blog: https://t.co/7uWdB1qmVq https://t.co/x1VUkUu222

mefatemi retweeted