Jean Michel A. Sarr

@jmamathsarr

Jean Michel A. Sarr | PhD CS Research Engineer @ Google (views my own) Building Distill in public: research memory, synthesis, and decision support Accra, Ghana

Accra, Ghana

Joined April 2012

347 Following

502 Followers

1.1K Posts

Pinned Tweet

Jean Michel A. Sarr

@jmamathsarr

27 days ago

Reading more wasn’t my problem. Remembering and compounding insight was. I wrote about why I kept restarting things I cared about, and why I’m building Distill: a system that tracks papers, remembers what matters, and updates conclusions as new evidence arrives. Building it in public: https://t.co/Cl4sBf6r36

jmamathsarr's tweet photo. Reading more wasn’t my problem. Remembering and compounding insight was.

I wrote about why I kept restarting things I cared about, and why I’m building Distill: a system that tracks papers, remembers what matters, and updates conclusions as new evidence arrives.

Building it in public:
https://t.co/Cl4sBf6r36

Jean Michel A. Sarr

@jmamathsarr

4 days ago

@jeremyphoward @Zai_org For the first time I went on https://t.co/Cxy28YRu25. I asked if the model had a laptop app similar to Claude code or Codex, and the model started talking about Gemini CLI. When I asked it who it was, it thought it was Gemini. While the app showed GLM 4.7. Not great.

507

Jean Michel A. Sarr

@jmamathsarr

8 days ago

Inspiring article, the part about choosing your own problems particularly resonated with me. Big tech has a dramatic influence in the AI research landscape because they often dictate which problems are worth solving. But there are many cool problems out there that are not popular yet.

Jean Michel A. Sarr

@jmamathsarr

20 days ago

The topic I'm tracking will change. Maybe in 3 months, maybe 6. So the topic definition — thesis, scoring dimensions, source priorities, audience profile — lives in a single config file instead. The orchestration code knows nothing about what it's tracking. Pivoting to a new research area means swapping the file. Zero code changes to go from "data advantages in AI" to any other domain.

Who to follow

Dakar Institute of Technology

@DITSenegal

Première école informatique Ouest-Africaine spécialisée en Intelligence Artificielle et Big Data. #AI #MachineLearning #Africa #computer #code

GDG Dakar x Build With AI

@gdg_dakar

GDG Dakar is a Dakar-based non-profit developer group to learn, share and discover more about Google's evolving technologies and beyond.

ÐɛЯǤưƎηE 

@derguene

Constant beginner, who sings off-key against the beat 🪔 --------------------- ⦿ Co-founder of @GalsenAI ⦿ Postdoc Researcher at @Apple ⦿ Thoughts are my own 💭

Jean Michel A. Sarr

@jmamathsarr

21 days ago

@lemire If the problem is about the attitude of the contributing folks, then it surely wouldn't solve it. But if the problem is about solving code reviews as the bottleneck, then it would probably help.

Jean Michel A. Sarr

@jmamathsarr

21 days ago

Distill continuously ingests papers and scores each one for relevance. The obvious approach: fetch everything, then decide. But that means paying to read every paper you'll throw away. Instead, a cheap model runs on abstracts only — no full-text fetch. Anything below the relevance threshold gets dropped there. Full-text scoring only runs on what cleared the gate. Reading is the most expensive step in the pipeline. Every architectural decision around it should treat it that way.

Jean Michel A. Sarr

@jmamathsarr

23 days ago

https://t.co/zLvsV9PAt6

Jean Michel A. Sarr

@jmamathsarr

6 months ago

@YiTayML @JeffDean @quocleix @benoitschilling @denny_zhou @leehsienloong Happy new year 🎊

Jean Michel A. Sarr

@jmamathsarr

6 months ago

@nikitabier Impact

Jean Michel A. Sarr

@jmamathsarr

7 months ago

In preference learning, who judges quality and how those judgments update the policy are two distinct decisions that people often mix together. • Human-written principles (e.g Constitutional AI) provide an interpretable judging mechanism, where explicit rules guide the model in labeling responses before those labels are used to train a reward model. • Expert model judges such as GPT-4 generate preference labels that can either train a reward model for RL or feed directly into DPO for policy optimization. • Self-judgment allows the model to prefer on response over the other, either by relying on emergent judging ability or by leveraging explicit judge training shown to outperform the emergent approach. • Hybrid methods combine multiple sources of judgments, such as Constitutional AI mixing AI-labeled harmlessness with human-labeled helpfulness to balance safety and utility. Decoupling who judges from how a judgment is done gives you orthogonal control knobs over two fundamentally different parts of the system Have you found any other paradigm ? https://t.co/EFG2GXYjSE [12/n]

jmamathsarr's tweet photo. In preference learning, who judges quality and how those judgments update the policy are two distinct decisions that people often mix together.

• Human-written principles (e.g Constitutional AI) provide an interpretable judging mechanism, where explicit rules guide the model in labeling responses before those labels are used to train a reward model.
• Expert model judges such as GPT-4 generate preference labels that can either train a reward model for RL or feed directly into DPO for policy optimization.
• Self-judgment allows the model to prefer on response over the other, either by relying on emergent judging ability or by leveraging explicit judge training shown to outperform the emergent approach.
• Hybrid methods combine multiple sources of judgments, such as Constitutional AI mixing AI-labeled harmlessness with human-labeled helpfulness to balance safety and utility.

Decoupling who judges from how a judgment is done gives you orthogonal control knobs over two fundamentally different parts of the system

Have you found any other paradigm ?

https://t.co/EFG2GXYjSE

[12/n]

Jean Michel A. Sarr

@jmamathsarr

7 months ago

RLHF can't scale. Here's why 🧵 I just published a 4-part research series digging into its fundamental limits and mapping the synthetic alignment methods taking over. Starting an n-day daily thread walking through the evidence, one insight at a time. Join me? Day 1/n: full roadmap https://t.co/af6LyQOYIp

jmamathsarr's tweet photo. RLHF can't scale. Here's why 🧵
I just published a 4-part research series digging into its fundamental limits and mapping the synthetic alignment methods taking over. Starting an n-day daily thread walking through the evidence, one insight at a time.
Join me? Day 1/n: full roadmap
https://t.co/af6LyQOYIp

348

Jean Michel A. Sarr

@jmamathsarr

7 months ago

In some cases, you want to refine responses to generate natural preference pairs. How do you do that? You can: - Use heuristics (e.g., bigger models produce better responses). - In an online setting where you continue training the reward model on new data, sample preference pairs from the current reward model. - With DPO, use the policy itself as a reward model to directly rank its own answers. - Use Constitutional AI: generate a critique of a bad response, then apply a human-written constitution to revise it. - Use self-play: let the model engage with itself in multi-turn conversation and select the latest refined answer. - Use tree search: generate multiple responses, select the best, critique it, and generate improved ones until satisfied. Have you used any of these methods before? https://t.co/LIBbQans20 [11/n]

jmamathsarr's tweet photo. In some cases, you want to refine responses to generate natural preference pairs. How do you do that?

You can:
- Use heuristics (e.g., bigger models produce better responses).
- In an online setting where you continue training the reward model on new data, sample preference pairs from the current reward model.
- With DPO, use the policy itself as a reward model to directly rank its own answers.
- Use Constitutional AI: generate a critique of a bad response, then apply a human-written constitution to revise it.
- Use self-play: let the model engage with itself in multi-turn conversation and select the latest refined answer.
- Use tree search: generate multiple responses, select the best, critique it, and generate improved ones until satisfied.

Have you used any of these methods before?

https://t.co/LIBbQans20

[11/n]

Jean Michel A. Sarr

@jmamathsarr

7 months ago

@zhansheng @NeurIPSConf I like vibes ... and alignment, happy to connect !

204

Jean Michel A. Sarr

@jmamathsarr

7 months ago

I just arrived to San Diego for Neurips ! I am excited to discuss the latest research in synthetic data, alignment and more. Excited to meet new folks and discover the local food !

Jean Michel A. Sarr

@jmamathsarr

7 months ago

@kdqg1 Interesting, I'm curious to learn more.

446

Jean Michel A. Sarr

@jmamathsarr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users