After reading the @nytimes lawsuit against @OpenAI and @Microsoft, I find my sympathies more with OpenAI and Microsoft than with the NYT.
The suit:
(1) Claims, among other things, that OpenAI and Microsoft used millions of copyrighted NYT articles to train their models
(2) Gives examples in which OpenAI models regurgitated NYT articles almost verbatim
But the presentation muddies (1) and (2), and I saw a lot of commentary on social media that -- because of what I believe is a muddied presentation -- draws a link between them that I'm not sure is what people think it is.
On (1): I understand why media companies don't like people training on their documents, but believe that just as humans are allowed to read documents on the open internet, learn from them, and synthesize brand new ideas, AI should be allowed to do so too. I would like to see training on the public internet covered under fair use -- society will be better off this way -- though whether it actually is will ultimately be up to legislators and the courts.
On (2): I suspect a lot of the examples of ChatGPT regurgitating articles nearly verbatim were due to a RAG-like mechanism where the user prompt causes the system to browse the web, retrieve a specific article and then print it out. (If my statement here isn't accurate, I would love to see the @nytimes clarify this.) If this is the case, then (i) To OpenAI's credit, they seem to have already updated their software to make this much less likely, and (ii) This is also a much easier problem to fix than if an LLM were to regurgitate text using only the pre-trained weights, which AFAIK very rarely happens (and which, given its rarity, also raises the question of how much harm to NYT this has actually caused).
To be clear, I believe independent media is important for democracy and must be protected. I also sympathize with media businesses worried about Generative AI disrupting their businesses. But I'm not convinced the NYT lawsuit is the right way to do this.
Usual caveat: I am not a lawyer and am not giving legal advice or any other form of advice here.
You can also read more details of my take on this below. https://t.co/wkZSMHsvNA
A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. (HT @sayashk@CitpMihir for helpful discussions.)
I invited Luxembourgish Artist Alain Welter to my home & showed him @letz_ai.
The resulting video offers a glimpse into a traditional artist's first interaction with AI.
If you're curious about AI, or if it seems daunting to you, this video is for you.
https://t.co/enC3rrWqsL
I have a systematic philosophical overview of the last ten years of research in deep learning (and recommendations for promising next targets) hopefully heading soon to a bookstore near you. Have a look at the table of contents if this might interest you.
Back when we were writing the Second Machine Age, @amcafee and I would play a game where we try to name an occupation that could not be done by AI or robotics.
Barber seemed like a good candidate.
We are so excited for @MooreDMPS Day @DrakeUniversity! Tomorrow -- 10am-1pm
Thanks to all the Drake faculty and staff who went above and beyond to create programming and space for these 5th graders.
Don't score AI using tests designed for humans. In particular because, with humans, the default assumption is that *they haven't already seen* the content you're giving them. With a LLM, the default assumption should be that, *if it's on the Internet, it's already been memorized*
🖥️ 𝙝𝙚𝙡𝙡𝙤, 𝙬𝙤𝙧𝙡𝙙!
The Wolves' newest threads pay homage to Iowa's place in computing history and hit the floor TOMORROW ⚫️🟢
AUCTION: https://t.co/yxU4copzjr
Why does chatGPT make up fake academic papers?
By now, we know that the chatbot notoriously invents fake academic references. E.g. its answer to the most cited economics paper is completely made-up (see image).
But why? And how does it make them? A THREAD (1/n) 🧵