Dan Friedman @danfriedman0 - Twitter Profile

29 days ago

VLMs can compress text by rendering it as images, but accuracy collapses once images shrink below a certain resolution. We introduce LensVLM: teach the model to scan compressed images, then selectively decompress what it needs. Paper: https://t.co/D2ECVMH6C9

RoyXie_'s tweet photo. VLMs can compress text by rendering it as images, but accuracy collapses once images shrink below a certain resolution.

We introduce LensVLM: teach the model to scan compressed images, then selectively decompress what it needs.

Paper: https://t.co/D2ECVMH6C9

3

64

19

30

7K

danfriedman0 retweeted

Princeton PLI @PrincetonPLI

about 1 year ago

In a new blog post, @HowardYen1 and @xiye_nlp introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs. Read now: https://t.co/8wfh1Qp2ES

PrincetonPLI's tweet photo. In a new blog post, @HowardYen1 and @xiye_nlp introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs.

Read now: https://t.co/8wfh1Qp2ES https://t.co/cK3qsUkuNU

0

20

10

2

4K

danfriedman0 retweeted

Michael Hu

@michahu8

over 1 year ago

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

michahu8's tweet photo. Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient!

How and why does this work? The answer lies…Between Circuits and Chomsky.

🧵1/6👇 https://t.co/xXlBlrfSls

23

926

123

645

133K

danfriedman0 retweeted

Alex Wettig @_awettig

over 1 year ago

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

_awettig's tweet photo. 🤔 Ever wondered how prevalent some type of web content is during LM pre-training?

In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐

Key takeaway: domains help us curate better pre-training data! 🧵/N https://t.co/qptz231z3u

5

208

58

106

49K

Who to follow

Shikhar

@ShikharMurty

Agents and RL @GoogleDeepMind, prev: Stanford CS PhD student @StanfordNLP. Opinions my own

Chen Zhao

@henryzhao4321

Assistant Professor NYU Shanghai, Postdoc NYU, PhD @umdclip doing NLP research, bridge player

Yuntian Deng

@yuntiandeng

danfriedman0 retweeted

Simon Park @parksimon0808

over 1 year ago

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper https://t.co/S0HhYN7cvz Code https://t.co/GJsgZof2k7 @Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora

parksimon0808's tweet photo. Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance.

Paper https://t.co/S0HhYN7cvz
Code https://t.co/GJsgZof2k7

@Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora https://t.co/X9NznMzQbj

1

69

16

27

19K

danfriedman0 retweeted

Tianyu Gao @gaotianyu1350

over 1 year ago

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. https://t.co/46dtUUVb0P

gaotianyu1350's tweet photo. Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents.

https://t.co/46dtUUVb0P https://t.co/yvriiiYhDP

4

194

43

66

27K

danfriedman0 retweeted

John Hewitt @johnhewtt

over 1 year ago

I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding

18

872

154

351

107K

danfriedman0 retweeted

Xi Ye

@xiye_nlp

over 1 year ago

🔔 I'm recruiting multiple fully funded MSc/PhD students @UAlberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

15

520

158

267

70K

danfriedman0 retweeted

Griffiths Computational Cognitive Science Lab @cocosci_lab

over 1 year ago

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: https://t.co/5dnLpcMQzu

cocosci_lab's tweet photo. (1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: https://t.co/5dnLpcMQzu https://t.co/QSo91mCzcJ

20

2K

444

2K

176K

danfriedman0 retweeted

Tom McCoy @RTomMcCoy

over 1 year ago

🤖🧠 I'll be considering applications for postdocs & PhD students to start at Yale in Fall 2025! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! Postdoc link: https://t.co/8Ds8X3OQf9 PhD link: https://t.co/HQWq47B7ss

RTomMcCoy's tweet photo. 🤖🧠 I'll be considering applications for postdocs & PhD students to start at Yale in Fall 2025!

If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply!

Postdoc link: https://t.co/8Ds8X3OQf9
PhD link: https://t.co/HQWq47B7ss https://t.co/xoM7x2VGsa

3

336

78

165

40K

danfriedman0 retweeted

Angelina Wang @angelinawang.bsky.social @ang3linawang

over 1 year ago

I am recruiting PhD students for Fall 2025 at Cornell Tech! If you are interested in topics relating to machine learning fairness, algorithmic bias, or evaluation, apply and mention my name in your application: https://t.co/EU0Zu56Qo9 Also, go vote!

ang3linawang's tweet photo. I am recruiting PhD students for Fall 2025 at Cornell Tech! If you are interested in topics relating to machine learning fairness, algorithmic bias, or evaluation, apply and mention my name in your application: https://t.co/EU0Zu56Qo9

Also, go vote! https://t.co/WW1pURBhT3

15

920

229

293

106K

danfriedman0 retweeted

Aaron Mueller @amuuueller

over 1 year ago

I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models). Details below!

amuuueller's tweet photo. I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025!

Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models).

Details below! https://t.co/FEU2H8zQpW

11

712

185

285

63K

danfriedman0 retweeted

Abhishek Panigrahi @Abhishek_034

over 1 year ago

Progressive distillation, where a student model learns from multiple checkpoints of the teacher, has been shown to improve the student–but why? We show it induces an implicit curriculum that accelerates training. Work w @BingbinL, @SadhikaMalladi, @risteski_a, @SurbhiGoel_

Abhishek_034's tweet photo. Progressive distillation, where a student model learns from multiple checkpoints of the teacher, has been shown to improve the student–but why? We show it induces an implicit curriculum that accelerates training.

Work w @BingbinL, @SadhikaMalladi, @risteski_a, @SurbhiGoel_ https://t.co/DNgWJEpjWY

2

92

25

35

20K

danfriedman0 retweeted

Tom McCoy @RTomMcCoy

over 1 year ago

🤖🧠NOW OUT IN PNAS🧠🤖 Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29 In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do https://t.co/lJIWx89YpJ Major updates since the preprint! 1/n

RTomMcCoy's tweet photo. 🤖🧠NOW OUT IN PNAS🧠🤖

Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29

In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do

https://t.co/lJIWx89YpJ
Major updates since the preprint!

1/n https://t.co/uysIXqpJ78

9

357

81

185

54K

danfriedman0 retweeted

Akshara Prabhakar @aksh_555

over 1 year ago

🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 https://t.co/LR8VrBqxGk 1/n

aksh_555's tweet photo. 🤖 NEW PAPER 🤖

Chain-of-thought reasoning (CoT) can dramatically improve LLM performance

Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization?

A: Both!

🔗 https://t.co/LR8VrBqxGk
1/n https://t.co/aPEU8iZOJM

6

307

46

224

77K

danfriedman0 retweeted

John Yang

@jyangballin

over 1 year ago

We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ @_carlosejimenez 🧵

8

269

58

74

52K

danfriedman0 retweeted

Tianyu Gao @gaotianyu1350

over 1 year ago

Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): https://t.co/xF5MwlJORz ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): https://t.co/PmaVyRRa4X Here is a story of how we arrived there

gaotianyu1350's tweet photo. Very proud to introduce two of our recent long-context works:

HELMET (best long-context benchmark imo): https://t.co/xF5MwlJORz
ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): https://t.co/PmaVyRRa4X

Here is a story of how we arrived there https://t.co/MDpXrCEaTR

5

197

46

69

56K

danfriedman0 retweeted

Tianyu Gao @gaotianyu1350

almost 2 years ago

Meet ProLong, a Llama-3 based long-context chat model! https://t.co/zyZ0f5ucyI (64K here, 512K coming soon) ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks.

gaotianyu1350's tweet photo. Meet ProLong, a Llama-3 based long-context chat model! https://t.co/zyZ0f5ucyI (64K here, 512K coming soon)

ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks. https://t.co/Yt3tNdrrYT

4

139

24

44

21K

danfriedman0 retweeted

Mengzhou Xia @xiamengzhou

almost 2 years ago

🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀 SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity. ✨ Recipe: on-policy data annotated by a strong reward model + SimPO 💪 Strong performance on chat benchmarks (i.e., AlpacaEval 2, Arena-Hard and WildBench) 📈 Retains GSM8K and MMLU scores in ZeroEval 🔢 Understands that 9.11 is bigger than 9.8 🔗 More details at https://t.co/oMhhPespd7 🔬 Through extensive experiments, we find that - gemma-2-9b-it exhibits significantly less catastrophic forgetting than Llama-3-8b-Instruct during fine-tuning and is more robust to different learning rates - With a small learning rate, both DPO and SimPO can improve math domains - SimPO has large gains over DPO when the SFT model is weaker, or the PO data is noisy. The gap is reduced when the model and data quality improve. - We also made several major updates to our preprint, added more baselines (i.e., RRHF, SLiC-HF, and CPO), conducted KL divergence analysis since SimPO has no regularization, and investigated adding an additional SFT term. 🌟 More insights in our preprint: https://t.co/hok51xtACX. And we welcome feedback and look forward to discussions! Joint work with @yumeng0818 and @danqi_chen. And Many thanks to @yanndubs @billyuchenlin @infwinston @LiTianleli for maintaining the amazing benchmarks!

xiamengzhou's tweet photo. 🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀

SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity.

✨ Recipe: on-policy data annotated by a strong reward model + SimPO
💪 Strong performance on chat benchmarks (i.e., AlpacaEval 2, Arena-Hard and WildBench)
📈 Retains GSM8K and MMLU scores in ZeroEval
🔢 Understands that 9.11 is bigger than 9.8

🔗 More details at https://t.co/oMhhPespd7

🔬 Through extensive experiments, we find that
- gemma-2-9b-it exhibits significantly less catastrophic forgetting than Llama-3-8b-Instruct during fine-tuning and is more robust to different learning rates
- With a small learning rate, both DPO and SimPO can improve math domains
- SimPO has large gains over DPO when the SFT model is weaker, or the PO data is noisy. The gap is reduced when the model and data quality improve.
- We also made several major updates to our preprint, added more baselines (i.e., RRHF, SLiC-HF, and CPO), conducted KL divergence analysis since SimPO has no regularization, and investigated adding an additional SFT term.

🌟 More insights in our preprint: https://t.co/hok51xtACX. And we welcome feedback and look forward to discussions!

Joint work with @yumeng0818 and @danqi_chen. And Many thanks to @yanndubs @billyuchenlin @infwinston @LiTianleli for maintaining the amazing benchmarks!

8

175

40

48

42K

danfriedman0 retweeted

Tianyu Gao @gaotianyu1350

almost 2 years ago

If you are attending ICML this year, stop by our workshop on long-context foundation models! Schedule: https://t.co/xROLEaxvVO Also, RSVP for our social event with our sponsor @togethercompute on July 24: https://t.co/A53a7OxdBP 🥳

gaotianyu1350's tweet photo. If you are attending ICML this year, stop by our workshop on long-context foundation models!

Schedule: https://t.co/xROLEaxvVO

Also, RSVP for our social event with our sponsor @togethercompute on July 24: https://t.co/A53a7OxdBP 🥳 https://t.co/K4ubx2lGxk

1

218

36

42

40K

Dan Friedman

@danfriedman0

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users