Dan Roth @DanRothNLP - Twitter Profile

EMNLP 2026 @emnlpmeeting

7 months ago

Social Impact Award: "AccessEval: Benchmarking Disability Bias in Large Language Models" by Srikant Panda, Amit Agarwal, and Hitesh Laxmichand Patel https://t.co/tcT69fbM42 10/n

1

13

4

3

4K

DanRothNLP retweeted

Siyi Liu

@liusiyi64198

7 months ago

📷 New #EMNLP2025 Findings survey paper! “Conflicts in Texts: Data, Implications, and Challenges” Paper: https://t.co/Tav9i9mYvP Conflicts are everywhere in NLP — news articles reflecting different perspectives or opposing views, annotators who disagree, LLMs that hallucinate or contradict themselves, and personal/enterprise document collections that grow apart and are conflicting. Most research tackles these in isolation, and our survey provides the first unified view of conflicting information in NLP. We chart the path toward conflict-aware, reliable NLP systems. Builds on our earlier work on: - Multi-perspective dataset https://t.co/ZMk3RuTbWv and search https://t.co/9KJ01DascE - Hallucination detection https://t.co/CUSXbakDeL - Open-domain QA with conflicting contexts https://t.co/qVMyjFStgh

liusiyi64198's tweet photo. 📷 New #EMNLP2025 Findings survey paper!

“Conflicts in Texts: Data, Implications, and Challenges” Paper: https://t.co/Tav9i9mYvP

Conflicts are everywhere in NLP — news articles reflecting different perspectives or opposing views, annotators who disagree, LLMs that hallucinate or contradict themselves, and personal/enterprise document collections that grow apart and are conflicting. Most research tackles these in isolation, and our survey provides the first unified view of conflicting information in NLP. We chart the path toward conflict-aware, reliable NLP systems.

Builds on our earlier work on:
- Multi-perspective dataset https://t.co/ZMk3RuTbWv and search https://t.co/9KJ01DascE
- Hallucination detection https://t.co/CUSXbakDeL
- Open-domain QA with conflicting contexts https://t.co/qVMyjFStgh

0

12

4

0

858

DanRothNLP retweeted

Tomer Wolfson @TomerWolfson

10 months ago

✨Yesterday we released MoNaCo, an @allen_ai benchmark of 1,315 hard human-written questions that, on average, require 43.3 documents per question!✨ The three aforementioned questions were actually some of the easier ones in MoNaCo 😉 (8/) https://t.co/Ad9FrWiwtn

1

3

1

0

629

DanRothNLP retweeted

Ai2 @allen_ai

10 months ago

MoNaCo evaluates complex question-answering with: 📚 1,315 multi‑step queries 🔎 Retrieval, filtering & aggregation across text and tables 🌟 Avg 43.3 distinct documents per query

allen_ai's tweet photo. MoNaCo evaluates complex question-answering with:
📚 1,315 multi‑step queries
🔎 Retrieval, filtering & aggregation across text and tables
🌟 Avg 43.3 distinct documents per query https://t.co/CgYJ41Vhlh

1

15

1

2

1K

Who to follow

Associate Professor @UCLAengineering/@UCLA. Area: #NLProc/#ML/#AI https://t.co/zj1ssZj9ox

Sean Ren

@xiangrenNLP

🍦Building @SaharaAI🍦| Professor @USCViterbi @nlp_usc | @MIT TR 35 , @ForbesUnder30 | Prev: @allen_ai, @Snapchat, @Stanford, @UofIllinois

DanRothNLP retweeted

Ai2 @allen_ai

10 months ago

LLMs power research, decision‑making, and exploration—but most benchmarks don’t test how well they stitch together evidence across dozens (or hundreds) of sources. Meet MoNaCo, our new eval for question-answering cross‑source reasoning. 👇

allen_ai's tweet photo. LLMs power research, decision‑making, and exploration—but most benchmarks don’t test how well they stitch together evidence across dozens (or hundreds) of sources. Meet MoNaCo, our new eval for question-answering cross‑source reasoning. 👇 https://t.co/ilEihlTBdJ

10

225

37

94

22K

DanRothNLP retweeted

Weijia Shi

@WeijiaShi2

almost 2 years ago

Augmenting GPT-4o with Visual Sketchpad ✏️ We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks 📈 🔗: https://t.co/I6ul5406E6

WeijiaShi2's tweet photo. Augmenting GPT-4o with Visual Sketchpad ✏️

We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks 📈

🔗: https://t.co/I6ul5406E6 https://t.co/iYRh4BzMgJ

10

283

50

123

50K

DanRothNLP retweeted

Xingyu Fu

@XingyuFu2

almost 2 years ago

😺 This work is done with my amazing collaborators: @yujielu_10, muyu he, @WilliamWangNLP @DanRothNLP YOU ARE THE BEST!!! 😎🔥 (n/n)

3

7

1

0

1K

DanRothNLP retweeted

Xingyu Fu

@XingyuFu2

almost 2 years ago

🔥Error Examples from DALL-E 3 👀More Visualizations: https://t.co/oiFPqKAiw1 (3/n)

1

11

1

0

1K

DanRothNLP retweeted

Xingyu Fu

@XingyuFu2

almost 2 years ago

🔥Highlights of the Commonsense-T2I benchmark: 📚Pairwise text prompts with minimum token change ⚙️Rigorous automatic evaluation with descriptions for expected outputs ❗️Even DALL-E 3 only achieves below 50% accuracy (2/n)

XingyuFu2's tweet photo. 🔥Highlights of the Commonsense-T2I benchmark:

📚Pairwise text prompts with minimum token change

⚙️Rigorous automatic evaluation with descriptions for expected outputs

❗️Even DALL-E 3 only achieves below 50% accuracy

(2/n) https://t.co/DDia27otjX

1

10

2

0

2K

DanRothNLP retweeted

Xingyu Fu

@XingyuFu2

almost 2 years ago

Can Text-to-Image models understand common sense? 🤔 Can they generate images that fit everyday common sense? 🤔 tldr; NO, they are far less intelligent than us 💁🏻‍♀️ Introducing Commonsense-T2I 💡 https://t.co/CTcmCbGUhX, a novel evaluation and benchmark designed to measure commonsense reasoning in T2I models 🔥🔥 Paper: https://t.co/Csu2Q0453s (1/n)

XingyuFu2's tweet photo. Can Text-to-Image models understand common sense? 🤔

Can they generate images that fit everyday common sense? 🤔

tldr; NO, they are far less intelligent than us 💁🏻‍♀️

Introducing Commonsense-T2I 💡 https://t.co/CTcmCbGUhX, a novel evaluation and benchmark designed to measure commonsense reasoning in T2I models 🔥🔥

Paper: https://t.co/Csu2Q0453s

(1/n)

7

130

39

46

49K

DanRothNLP retweeted

Zijian Wang

@zijianwang30

about 2 years ago

Best-fit Packing completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation with <0.01% overhead tested on popular pre-training datasets like @TIIuae's RefinedWeb and @BigCodeProject's Stack.🧵5/n

zijianwang30's tweet photo. Best-fit Packing completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation with <0.01% overhead tested on popular pre-training datasets like @TIIuae's RefinedWeb and @BigCodeProject's Stack.🧵5/n

1

3

1

635

DanRothNLP retweeted

Zijian Wang

@zijianwang30

about 2 years ago

The common practice in LLM pre-training is to concat all docs then split into equal-length chunks. This is efficient but hurts data integrity: doc fragmentation leads to loss of info, and causes next-token prediction to be ungrounded, making model prone to hallucination.🧵2/n

zijianwang30's tweet photo. The common practice in LLM pre-training is to concat all docs then split into equal-length chunks. This is efficient but hurts data integrity: doc fragmentation leads to loss of info, and causes next-token prediction to be ungrounded, making model prone to hallucination.🧵2/n

1

4

2

0

1K

DanRothNLP retweeted

Zijian Wang

@zijianwang30

about 2 years ago

🚀Introducing "Fewer Truncations Improve Language Modeling" at #ICML2024 We tackle a fundamental issue in LLM pre-training: docs are often broken into pieces. Such truncation hinders model from learning to compose logically coherent and factually grounded content. 👇🧵1/n

zijianwang30's tweet photo. 🚀Introducing "Fewer Truncations Improve Language Modeling" at #ICML2024

We tackle a fundamental issue in LLM pre-training: docs are often broken into pieces. Such truncation hinders model from learning to compose logically coherent and factually grounded content.

👇🧵1/n

4

45

10

7K

DanRothNLP retweeted

Xingyu Fu

@XingyuFu2

about 2 years ago

Can GPT-4V and Gemini-Pro perceive the world the way humans do? 🤔 Can they solve the vision tasks that humans can in the blink of an eye? 😉 tldr; NO, they are far worse than us 💁🏻‍♀️ Introducing BLINK👁 https://t.co/EGDh0bMnyJ, a novel benchmark that studies visual perception abilities NOT yet “emerged” in Multimodal LLMs 🔥🔥 Paper: https://t.co/teFGLiXU12 (1/n)

XingyuFu2's tweet photo. Can GPT-4V and Gemini-Pro perceive the world the way humans do? 🤔

Can they solve the vision tasks that humans can in the blink of an eye? 😉

tldr; NO, they are far worse than us 💁🏻‍♀️

Introducing BLINK👁 https://t.co/EGDh0bMnyJ, a novel benchmark that studies visual perception abilities NOT yet “emerged” in Multimodal LLMs 🔥🔥

Paper: https://t.co/teFGLiXU12

(1/n)

9

408

125

206

106K

DanRothNLP retweeted

Sopan Khosla @KhoslaSopan

almost 3 years ago

Super excited to announce that our "3rd Workshop on NLP for Medical Conversations" will be co-located with IJCNLP-AACL 2023!! Website and CFP: https://t.co/17pA4at3PL @aaclmeeting #AACL2023 #NLProc #NLP #AI #DigitalHealth #HealthTech #Healthcare

1

10

7

1

3K

DanRothNLP retweeted

vinayshekhar @vinayshekhar000

almost 3 years ago

We are thrilled to announce our second workshop on natural language interfaces, held in conjunction with the prestigious IJCNL-AACL conference! In collaboration with researchers from AWS AI Labs, Google Research, Meta AI Research, and Microsoft Research, this workshop aims to

1

6

3

0

2K

DanRothNLP retweeted

Randall Hunt

@ranman

almost 3 years ago

I’ve been working with @awscloud’s #Bedrock service for a couple of months now at @caylentinc, and I’d like to share some of what I’ve learned. 🧵

8

313

90

124

90K

Dan Roth @DanRothNLP

about 3 years ago

Just out from AWS AI: https://t.co/k4NtrBZq2k

0

23

5

1

3K

DanRothNLP retweeted

Adam Seligman

@adamse

almost 4 years ago

https://t.co/duGiOK8wBP is really neat. Helps you code faster, checks for security vulns, discloses licenses of code it drew from, and works great for AWS APIs. Boom! @awscloud putting ML to work for developers

1

5

3

0

Dan Roth @DanRothNLP

almost 4 years ago

Excited to announce a new product from AWS AI: Amazon CodeWhisperer https://t.co/NpqG6IfaCE

0

10

1

0

Dan Roth

@DanRothNLP

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users