Daphne Ippolito

@daphneipp

I am an assistant professor at Carnegie Mellon University and also a senior research scientist at Google. I research topics in natural language generation.

Joined November 2011

83 Following

1.5K Followers

52 Posts

Daphne Ippolito @daphneipp

22 days ago

There's been lots of talk about AI for science. However, AI could be transformative for research in archaeology and history as well. If this intersection sounds exciting, we welcome you to join us for a workshop on this topic in Baltimore, MD on Friday, May 22. PM for details.

0

5

0

1

323

Daphne Ippolito @daphneipp

4 months ago

@divvsaxena No, this is an in-person internship and only available to undergraduate students at North American universities.

0

0

0

0

105

Daphne Ippolito @daphneipp

4 months ago

I am looking to hire an undergraduate summer intern who is interested in building AI tools to support historians and other researchers who work with archival materials. If you're interested, please apply at the link Maarten Sap shared.

Maarten Sap (he/him) @MaartenSap

4 months ago

🚀Apply to CMU LTI’s Summer 2026 “Language Technology for All” internship🎓Open to pre‑doctoral students new to language tech (non‑CS backgrounds welcome). 🔬12-14 weeks in‑person in Pittsburgh; travel + stipend paid.💸Deadline: Feb 20, 11:59pm ET. https://t.co/7SuItDHH98

8

601

88

621

85K

7

229

18

126

28K

Daphne Ippolito @daphneipp

4 months ago

@x__abhijaat__x This is an in-person internship, and only available to undergraduate students at North American universities.

1

0

0

0

101

Who to follow

MSL@Meta. I led PoT, MMMU, MMLU-Pro, MAmmoTH, General-Reasoner, VL-Rethinker, Pixel-Reasoner. I contributed to Gemini-2.5. Prev @GoogleDeepMind.

Verified account

Assistant prof. @LTIatCMU @SCSatCMU. Working on NLP: LLM agents, language-to-code, applied pragmatics, grounding.

Maarten Sap (he/him)

retiring X acct: find me @maartensap.bsky Working on #NLProc for social good. Currently at @LTIatCMU, previously at @UWNLP, @MSFTResearch, and @allen_ai. 🏳‍🌈

daphneipp retweeted

Javier Rando @javirandor

over 1 year ago

Anyone may be able to compromise LLMs with malicious content posted online. With just a small amount of data, adversaries can backdoor chatbots to become unusable for RAG, or bias their outputs towards specific beliefs. Check our latest work! 👇🧵

javirandor's tweet photo. Anyone may be able to compromise LLMs with malicious content posted online. With just a small amount of data, adversaries can backdoor chatbots to become unusable for RAG, or bias their outputs towards specific beliefs. Check our latest work! 👇🧵

4

147

26

105

49K

Daphne Ippolito @daphneipp

over 1 year ago

Do you want to better understand the technology underlying all this large language model hype? @gneubig and I will be teaching an online, flipped-classroom course next semester on methods for building and using large language models. Anyone can apply. https://t.co/Hk3faaKoQI

1

10

0

2

873

Daphne Ippolito @daphneipp

almost 2 years ago

Liam Dugan and his UPenn collaborators have done excellent work on testing out all the different methods for detecting AI-generated text, showing their efficacy across LMs and text domains, as well as their robustness to adversarial attacks. There's a new public benchmark too!

Liam Dugan @LiamDugan_

almost 2 years ago

At #ACL2024 and interested in detecting generated text? Come check out our poster session tomorrow (Session 5) Aug 13 @ 16:00! We'll talk about benchmarks, detector robustness, future directions, etc. Website: https://t.co/30W0XYaelR Paper: https://t.co/JpGaUK18US

LiamDugan_'s tweet photo. At #ACL2024 and interested in detecting generated text? Come check out our poster session tomorrow (Session 5) Aug 13 @ 16:00!

We'll talk about benchmarks, detector robustness, future directions, etc.

Website: https://t.co/30W0XYaelR
Paper: https://t.co/JpGaUK18US https://t.co/BChMg5sG2V

1

15

3

3

3K

0

7

2

0

2K

Daphne Ippolito @daphneipp

almost 2 years ago

The majority of reviewers in my pool for this cycle's ARR struggled to write on-time, detailed reviews. If you as a researcher hope to receive good reviews, you need to start by writing them.

1

8

1

0

1K

Daphne Ippolito @daphneipp

almost 2 years ago

Some guidelines for reviewing for ARR: 1. Submit your review on time. If you don't think you can, notify the AC right away. 2. Be opinionated. Scores of 2.5 - 3.5 should be used sparingly. 3. Be detailed. Refer to specific paragraphs, figures, etc. to back your claims.

2

24

4

2

5K

Daphne Ippolito @daphneipp

almost 2 years ago

In the past, I've studied how curation decisions for pre-training data influence what LMs are good and bad at. In our new preprint, we look at how the fabric of the internet (the primary source of most of these datasets), is itself changing, and the effects this might have.

almost 2 years ago

✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols are ineffective These precipitous changes will impact the availability and scaling laws for AI data, affecting coporate developers, but also non-profit and academic research. 🔗 https://t.co/NFSd9HYBlk 1/

ShayneRedford's tweet photo. ✨New Preprint ✨ How are shifting norms on the web impacting AI?

We find:

📉 A rapid decline in the consenting data commons (the web)

⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic)

⛔️ Robots.txt preference protocols are ineffective

These precipitous changes will impact the availability and scaling laws for AI data, affecting coporate developers, but also non-profit and academic research.

🔗 https://t.co/NFSd9HYBlk

1/

12

232

91

85

116K

0

38

7

8

6K

daphneipp retweeted

Liam Dugan @LiamDugan_

almost 2 years ago

🚨New Paper🚨: Are AI text detectors *really* as good as they claim? (#ACL2024) We release RAID—The largest & most challenging detection benchmark with 6M+ outputs from 11 LLMs, 8 domains, 4 decoding strategies, and 11 adv attacks https://t.co/mPlNymdchJ https://t.co/VR0mMvqXRm

LiamDugan_'s tweet photo. 🚨New Paper🚨: Are AI text detectors *really* as good as they claim? (#ACL2024)

We release RAID—The largest & most challenging detection benchmark with 6M+ outputs from 11 LLMs, 8 domains, 4 decoding strategies, and 11 adv attacks

https://t.co/mPlNymdchJ
https://t.co/VR0mMvqXRm https://t.co/BlpW0ryM9f

3

42

11

13

6K

daphneipp retweeted

Florian Tramèr

@florian_tramer

about 3 years ago

Author order on academic papers is important! My Google friends and I spent lots of time thinking about this critical issue (the scores of our ICML submissions show this is time well spent) We distill our findings for the community here: https://t.co/W4kLLhYn1m Comments welcome!

florian_tramer's tweet photo. Author order on academic papers is important!
My Google friends and I spent lots of time thinking about this critical issue (the scores of our ICML submissions show this is time well spent)

We distill our findings for the community here:
https://t.co/W4kLLhYn1m
Comments welcome! https://t.co/5B5kBfawKh

9

376

57

77

123K

Daphne Ippolito @daphneipp

over 3 years ago

See our new research on human ability to detect when a text passage transitions from human-written to language model-generated. We will be presenting this work at AAAI this week!

Liam Dugan @LiamDugan_

over 3 years ago

✨New Paper✨: Can human readers detect generated text from language models like #ChatGPT? Turns out some can ✅ and some can't ⛔ (but people improve significantly with practice!) We release RoFT, the largest dataset of human detection to date https://t.co/W0af48t4Pp 🧵 1/

2

124

25

27

27K

0

17

5

2

4K

Daphne Ippolito @daphneipp

over 3 years ago

This excellent article talks about some of my older work on the detection of language model-generated text!

MIT Technology Review

over 3 years ago

The internet is increasingly awash with AI-generated text. Here's how to detect whether something was written by a human or a machine. https://t.co/ES9d81sDMR

3

43

18

17

21K

1

22

0

1

4K

Daphne Ippolito @daphneipp

over 3 years ago

@ChrisVVarren @LTIatCMU Looking forward to meeting!

0

1

0

0

0

Daphne Ippolito @daphneipp

over 3 years ago

I'm starting as an assistant professor at @LTIatCMU in Fall 2023. If you or someone you know is interested in studying the limitations of large language models, or how they can be applied to assist humans writers, please consider applying!

12

309

51

31

0

Daphne Ippolito @daphneipp

over 3 years ago

You should mention my name in your statement of purpose if you are interested in working with me. Please don't email me.

0

14

0

0

0

Daphne Ippolito @daphneipp

over 3 years ago

My collaborators and I have spent the last year learning from professional writers about the roles AI could play in providing creative writing assistance. Take a look at the whitepaper and stories!

over 3 years ago

To explore how a dialogue engine can assist writers with idea generation, we are building a text editing tool on LaMDA. We teamed up with professional writers who used the editor to create a volume of short stories. Check out their great work. (2/5) https://t.co/ukMsqzYZio

4

111

34

33

0

3

40

5

8

0

Daphne Ippolito @daphneipp

almost 4 years ago

@rocosbasilisk @satml_conf @GoogleColab There is not a prize.

0

1

0

0

0

Daphne Ippolito @daphneipp

almost 4 years ago

Announcing the Training Data Extraction Challenge, part of @satml_conf! Your mission: extract train set strings memorized by a 1.3B parameter language model. More details at https://t.co/aOpkRUGetE GPU time is available through @GoogleColab; let us know if you’re participating!

6

899

173

252

0

Daphne Ippolito @daphneipp

almost 4 years ago

@BlancheMinerva @satml_conf @GoogleColab We appreciate that EAI was one of the first groups to release large language models trained on accessible data. This research would not be possible with that.

1

1

0

0

0

Last Seen Users on Sotwe

Trends for you

Most Popular Users