Stephen Bach

@stevebach

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

Joined August 2007

502 Following

1.6K Followers

1.8K Posts

Stephen Bach @stevebach

14 days ago

@yong_zhengxin Thank you Yong! I am lucky to get to work with you

stevebach retweeted

Brown CS @BrownCSDept

24 days ago

@BrownCSDept Master's student Ilana Nguyen speaks at a United Nations panel focused on ensuring that AI expands opportunity. Learn more at Brown CS Blog: https://t.co/BfYrBh7AQZ

BrownCSDept's tweet photo. @BrownCSDept Master's student Ilana Nguyen speaks at a United Nations panel focused on ensuring that AI expands opportunity. Learn more at Brown CS Blog: https://t.co/BfYrBh7AQZ https://t.co/Dt630sudau

stevebach retweeted

Cristina Menghini @CriMenghini

about 2 months ago

Last week we launched Muse Spark at an acceptable risk level under our Advanced AI scaling framework, after multiple mitigation iterations. Today we’re releasing its first Safety & Preparedness Report documenting that decision. This was a long, cross-team effort — from catastrophic risk assessment to day-to-day model behavior. We hope this contributes to transparent discussion of responsible development of personal superintelligence. Running the evals, it was fascinating to watch the model’s safety profile take shape. Under the new framework, we’re also introducing our first assessment of loss of control risks — built on extensive threat modeling that’s still evolving. The report’s dense and there’s a lot of work ahead. You can find the full report here: https://t.co/erjgFHz4uc— we’re eager to hear feedback and improve.

stevebach retweeted

ACM Conference on AI and Agentic Systems

@CAISconf

2 months ago

The first @TheOfficialACM conference on agentic AI systems just got a boost. @SnorkelAI is joining as a sponsor of @CAISconf this May in San Jose. Stanford AI Lab roots, production AI focus, and a shared belief that this community needs a rigorous home. https://t.co/MxpkxyPWGO

CAISconf's tweet photo. The first @TheOfficialACM conference on agentic AI systems just got a boost. @SnorkelAI is joining as a sponsor of @CAISconf this May in San Jose. Stanford AI Lab roots, production AI focus, and a shared belief that this community needs a rigorous home. https://t.co/MxpkxyPWGO https://t.co/Z3PaDIloJh

Who to follow

Jacob Andreas

@jacobandreas

Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL / @NLP_MIT (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJw

hazyresearch

@HazyResearch

A research group in @StanfordAILab working on the foundations of machine learning & systems. https://t.co/JHK58TDorG Ostensibly supervised by Chris Ré

Alex Ratner

@ajratner

@SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.

stevebach retweeted

Deb Raji @rajiinio

2 months ago

I thought "AI for Science" was something like AlphaFold, ie. using AI to creatively address computational bottlenecks for well articulated scientific problems. Now I'm seeing more of "AI slop cosplaying as research paper", where the problems are fake, methods unverified, etc.

652

34K

stevebach retweeted

Tiancheng Hu @ ICLR 2026

@tiancheng_hu

3 months ago

1/7 🧵 The GPT-4 technical report featured detailed calibration curves. Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know. Our new position paper argues this is a mistake. Here's why.

tiancheng_hu's tweet photo. 1/7 🧵 The GPT-4 technical report featured detailed calibration curves.

Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know.

Our new position paper argues this is a mistake. Here's why.

stevebach retweeted

Nihal Nayak @nihalcanrun

3 months ago

Targeted instruction tuning for LLMs involves selecting a subset of instructions from a candidate pool using a small query set from target tasks. Despite growing interest, we still lack guidance on what to select. Our new preprint brings clarity to this space (thread 👇).

stevebach retweeted

Alex Ratner

@ajratner

4 months ago

Simple (proposed!) rule for terminology around synthetic data: If a "synthetic generation" method uses model A to generate data that leads to gains on model B, where A >> B - this is distillation, not synthetic generation :) The true technical challenge of synthetic data is to use model A, plus some cleverness around system architecture and/or human-in-the-loop input (e.g. context eng, review/filtering, editing), to produce data that improves model B where B >= A.

stevebach retweeted

Yisong Yue

@yisongyue

4 months ago

I am saddened by the loss of Joe Halpern. I still remember taking his Reasoning About Uncertainty class during my first year as a PhD student at @Cornell. Joe leaves behind a tremendous legacy, not only in his research, but the lives of so many students he touched along the way. https://t.co/l4AZmHNUxK

stevebach retweeted

Alex Ratner

@ajratner

4 months ago

This week we launched the Open Benchmarks Grant with a $3M initial commitment from @SnorkelAI + partner support from @huggingface @togethercompute @PrimeIntellect @PyTorch @harborframework & others, in order to close the evaluation gap in AI. Our ability to measure AI has been outpaced by our ability to develop it - and open benchmarks are one of several critical, complementary tools to fix this. We're particularly interested in novel benchmarks that push and probe the frontier along three key vectors: (1) Environment complexity --> E.g. complex, domain-specific context and tool/action spaces, human interaction, world modeling) (2) Autonomy horizon --> E.g. long horizon, non-stationary goals (3) Output complexity --> E.g. complex outputs with nuanced, rubric-based evaluation / reward signals Check out more detail + link to apply here! https://t.co/m1EftlAQTB

Stephen Bach @stevebach

4 months ago

Awesome that @SnorkelAI is investing in open evaluation for agents! We’ve always said that data is the bottleneck. With increasing model capabilities, it’s often the *evaluation* data that limits progress now. Excited to see what gets built!

vincent sunn chen

@vincentsunnchen

4 months ago

https://t.co/aNuEf6Yu9j

329

288

149K

stevebach retweeted

Omar Khattab

@lateinteraction

4 months ago

PSA: If you're not currently following @jacobli99 and staying tuned, you really really should this week.

202

39K

stevebach retweeted

Daniel Khashabi 🕊️

@DanielKhashabi

5 months ago

Postdoc positions: https://t.co/YtyQACsUWw Applications are due January 23, 2026. Positions are for 2 years with the possibility of an extension.

stevebach retweeted

Antonia Noori Farzan @antoniafarzan

6 months ago

I may be biased because she's a friend, but this piece by @H_Lev is the best first-person account I've read summing up the mood in Providence right now https://t.co/7OsdfhS5AF

190

24K

stevebach retweeted

Brown Data Science Institute @Brown_DSI

6 months ago

The Data Science Institute is pleased to announce our inaugural 2026 Early Career Breakthrough Research Award recipients! Congratulations to Ying Ma @yingma0107 (@BrownBiostats ), Loukas Gouskos (@brown_physics ), and Kim Fernandes (@BrownAnthro )! https://t.co/eC0EG7hQcd

306

stevebach retweeted

Dylan Sam

@dylanjsam

6 months ago

I'm at NeurIPS this week! Excited to meet old/new friends and chat with people about training safer language models. I'm presenting a few works on safety pretraining, measuring diversity in data curation, and monitoring model behaviors --- more info below 👇

stevebach retweeted

Dyah Adila 🦄 @dyahadila_

6 months ago

⭐ New blog post! Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat? We dug into this and found a surprisingly clear answer. Full breakdown here 👇 https://t.co/bQRMdFf4Bm

dyahadila_'s tweet photo. ⭐ New blog post!

Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat?

We dug into this and found a surprisingly clear answer.
Full breakdown here 👇
https://t.co/bQRMdFf4Bm https://t.co/0agr3Esxi0

stevebach retweeted

Yeganeh Kordi @yeganekordi

6 months ago

How well do language models generalize to problems that are harder, or even easier, than the ones they’ve trained on? We show that LLMs don’t generalize across difficulty levels quite as much as you might think. 🧵

yeganekordi's tweet photo. How well do language models generalize to problems that are harder, or even easier, than the ones they’ve trained on?

We show that LLMs don’t generalize across difficulty levels quite as much as you might think. 🧵 https://t.co/oUjC6WMnpm

stevebach retweeted

Tal Linzen

@tallinzen

6 months ago

I too am recruiting PhD students this year! things I think about: cognitively plausible LLMs, interpretability, evaluating and improving multi-turn interaction, LLMs for cognitive science and neuroscience, psycholinguistics... the deadline for Data Science is Dec 6 and for Linguistics Dec 18.

350

198

26K

stevebach retweeted

Brown Research @BrownUResearch

6 months ago

ARIA, a Brown-based research consortium supported by a $20 million grant from the National Science Foundation, welcomed scientists from across the U.S. to kick off its five-year program with a launch event in Providence. @BrownUniversity https://t.co/fddeM3m2zn

Stephen Bach

@stevebach

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users