Víthor Rosa Franco

@vthorrf

Assistant Professor in Psychometrics at São Francisco University 🇧🇷 #rstats #bayes #psychometrics #machinelearning

Brasil

Joined February 2014

478 Following

343 Followers

1.8K Posts

vthorrf retweeted

Maziyar PANAHI

@MaziyarPanahi

about 2 months ago

Portuguese and Brazilian healthcare AI just got serious. 35 open-source PII models. Best F1: 89.21%. Top 10 above 88.56%. Apache 2.0. No API. No cloud. No gatekeepers. Available now on @huggingface.

MaziyarPanahi's tweet photo. Portuguese and Brazilian healthcare AI just got serious.

35 open-source PII models. Best F1: 89.21%. Top 10 above 88.56%.

Apache 2.0. No API. No cloud. No gatekeepers.

Available now on @huggingface.

790

100

692

43K

vthorrf retweeted

Nav Toor

@heynavtoor

2 months ago

🚨BREAKING: Every book you have ever read. Every novel that has ever been published. It is sitting inside ChatGPT right now. Word for word. Up to 90% of it. And OpenAI told a judge that was impossible. Researchers at Stony Brook University and Columbia Law School just proved it. They fine tuned GPT-4o, Gemini 2.5 Pro, and DeepSeek V3.1 on a simple task: expand a plot summary into full text. A normal use case. The kind of thing a writing assistant is built for. No hacking. No jailbreaking. No tricks. The models started reciting copyrighted books from memory. Not paraphrasing. Not summarizing. Entire pages reproduced verbatim. Single unbroken spans exceeding 460 words. Up to 85 to 90% of entire copyrighted novels. Word for word. Then it got worse. The researchers fine tuned the models on the works of only one author. Haruki Murakami. Just his novels. Nothing else. It unlocked verbatim recall of books from over 30 completely unrelated authors. One author's books opened the vault to everyone else's. The memorization was already inside the model the whole time. The fine tuning just removed the lock. Your book might be in there right now. You would never know it unless someone looked. Every safety measure the companies rely on failed. RLHF failed. System prompts failed. Output filters failed. The exact protections these companies cite in courtroom defenses did not stop a single page from being extracted. Then the researchers compared the three models. GPT-4o. Gemini. DeepSeek. Three different companies. Three different countries. They all memorized the same books in the same regions. The correlation was 0.90 or higher. That means they all trained on the same stolen data. The paper names the sources directly: LibGen and Books3. Over 190,000 copyrighted books obtained from pirated websites. Right now, authors and publishers have dozens of active lawsuits against OpenAI, Anthropic, Google, and Meta. These companies have argued in court that their models learn patterns. Not copies. That no book is stored inside the weights. This paper says that is a lie. The books are still inside. And researchers just pulled them out.

heynavtoor's tweet photo. 🚨BREAKING: Every book you have ever read. Every novel that has ever been published. It is sitting inside ChatGPT right now.

Word for word. Up to 90% of it. And OpenAI told a judge that was impossible.

Researchers at Stony Brook University and Columbia Law School just proved it.

They fine tuned GPT-4o, Gemini 2.5 Pro, and DeepSeek V3.1 on a simple task: expand a plot summary into full text. A normal use case. The kind of thing a writing assistant is built for. No hacking. No jailbreaking. No tricks.

The models started reciting copyrighted books from memory.

Not paraphrasing. Not summarizing. Entire pages reproduced verbatim. Single unbroken spans exceeding 460 words. Up to 85 to 90% of entire copyrighted novels. Word for word.

Then it got worse.

The researchers fine tuned the models on the works of only one author. Haruki Murakami. Just his novels. Nothing else.

It unlocked verbatim recall of books from over 30 completely unrelated authors.

One author's books opened the vault to everyone else's. The memorization was already inside the model the whole time. The fine tuning just removed the lock. Your book might be in there right now. You would never know it unless someone looked.

Every safety measure the companies rely on failed. RLHF failed. System prompts failed. Output filters failed. The exact protections these companies cite in courtroom defenses did not stop a single page from being extracted.

Then the researchers compared the three models. GPT-4o. Gemini. DeepSeek. Three different companies. Three different countries. They all memorized the same books in the same regions. The correlation was 0.90 or higher.

That means they all trained on the same stolen data. The paper names the sources directly: LibGen and Books3. Over 190,000 copyrighted books obtained from pirated websites.

Right now, authors and publishers have dozens of active lawsuits against OpenAI, Anthropic, Google, and Meta. These companies have argued in court that their models learn patterns. Not copies. That no book is stored inside the weights.

This paper says that is a lie. The books are still inside. And researchers just pulled them out.

247

431K

Víthor Rosa Franco @vthorrf

2 months ago

YABS version 0.4.0 — "Let's Go Nuts!" is out: https://t.co/0ZUwgqee13 The new features include: * NUTS for parameter estimation; * Plethora of information criterion for model assessment (WAIC, WBIC, MDL, ICOMP, IFIM, etc); and * PSIS for LaplaceApproximation. Go check it out :)

vthorrf retweeted

Aakash Gupta

@aakashgupta

3 months ago

A human consumes about 2,000 calories per day. Over 20 years, that’s roughly 17,000 kWh of total food energy. Training GPT-4 consumed an estimated 50 GWh of electricity. That’s 3,000 humans worth of “training energy” for a single model run. And GPT-4 is already dead. OpenAI retired GPT-4o from ChatGPT on February 13th. The model that took 50 GWh to train got less than two years of flagship status before replacement. The human you spent 17,000 kWh “training” for 20 years produces economic output for the next 40 to 60 years. The amortization window on GPT-4 was shorter than a car lease. Now look at what replaced it. GPT-5.2, released December 2025, is OpenAI’s current default. The GPT-5 series consumes an estimated 18 Wh per average query according to the University of Rhode Island’s AI Lab, up to 40 Wh for extended reasoning. That’s 8.6 times more electricity per response than GPT-4. With 2.5 billion queries hitting ChatGPT daily and GPT-5.2 now the default model, the inference math gets staggering fast. Even at a blended average well below 18 Wh, you’re looking at daily electricity consumption that could power over a million American households. This is what Altman is actually doing. OpenAI hit $13 billion in annual recurring revenue but still isn’t profitable. They need you to think of AI energy consumption as natural and inevitable, the same way you think about feeding a child, because the alternative framing is that they’re burning through enough electricity to rival small countries while racing to build 1-gigawatt Stargate data centers. The food analogy makes the energy costs feel biological and unavoidable instead of what they are: an engineering and business choice that scales with every model generation. The comparison sounds clever at a fireside chat in India. It falls apart the second you do the arithmetic.

412

14K

Who to follow

LauraBringmann

@bringmann_laura

Psychological methods; Philosophy of psychology; Time series data; Experience sampling method in clinical practice; Psych Networks: more than a pretty picture?

Nathaniel Haines

@Nate__Haines

Paid to do p(b | a) p(a) p(a | b) = ——————— p(b)

Denny Borsboom

@BorsboomDenny

Professor of Psychology at the University of Amsterdam. Moving to Mastodon, find me at @[email protected]

vthorrf retweeted

Sridhar Ramesh @RadishHarmers

6 months ago

Generative AI is amazing at tasks where I am not qualified to judge the output.

17K

635

290K

vthorrf retweeted

Misha Teplitskiy | Science of Science @MishaTeplitskiy

7 months ago

A famous study in science of science space took some papers *published* in prominent psych journals, changed authors' names/affiliations, and resubmitted them to the *same* journals. Allegedly only 8% of editors & reviewers detected the resubmissions. I keep staring at that 8% and thinking: that can't be quite right can it? Like, there must be some caveats/unreported aspects to that number?

MishaTeplitskiy's tweet photo. A famous study in science of science space took some papers *published* in prominent psych journals, changed authors' names/affiliations, and resubmitted them to the *same* journals. Allegedly only 8% of editors & reviewers detected the resubmissions.

I keep staring at that 8% and thinking: that can't be quite right can it? Like, there must be some caveats/unreported aspects to that number?

119

36K

vthorrf retweeted

Prof. Shamika Ravi

@ShamikaRavi

7 months ago

This has been an open secret in the economics profession for decades. Several instances come to mind. Here’s one from the editor of an ‘A’ journal in 2009: “This is very good work, your model is neat and the empirical approach is novel. Unfortunately the data is from India, so not generalizable.”(!!) Lesson: we need more of our own journals & thank god for open source.

361

325

165K

vthorrf retweeted

Gerard Sans | Axiom 🇬🇧

@gerardsans

12 months ago

@yudapearl These questions have been already answered in detail for LLM by AI researchers. There’s no need to go over them again as these are now resolved. Find a few of them in this article: https://t.co/k5HQBH2tQ6

468

vthorrf retweeted

Rishi Jha @rishi_d_jha

about 1 year ago

I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with @jxmnop, Collin Zhang, and @shmatikov. We present the first method to translate text embeddings across different spaces without any paired data or encoders. Here's why we're excited: 🧵👇🏾

rishi_d_jha's tweet photo. I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with @jxmnop, Collin Zhang, and @shmatikov.

We present the first method to translate text embeddings across different spaces without any paired data or encoders.

Here's why we're excited: 🧵👇🏾 https://t.co/FtQ7sYpWnV

258

161K

vthorrf retweeted

Matter as Machine

@matterasmachine

about 1 year ago

1/18 Today I will try to describe a mathematical trick that can logically explain what happens in Quantum Mechanics and Special Relativity.

matterasmachine's tweet photo. 1/18
Today I will try to describe a mathematical trick that can logically explain what happens in Quantum Mechanics and Special Relativity. https://t.co/bdFEEqpbus

109

132K

vthorrf retweeted

Alex Imas

@alexolegimas

about 1 year ago

There is a field experiment showing this exact effect. Introducing GPT tutors increases performance by *a lot*--students seem to be picking up the material much faster--but when GPT is removed those who had access perform *much worse* compared to those w/o access. 1/4

alexolegimas's tweet photo. There is a field experiment showing this exact effect. Introducing GPT tutors increases performance by *a lot*--students seem to be picking up the material much faster--but when GPT is removed those who had access perform *much worse* compared to those w/o access. 1/4 https://t.co/nVzQjsC9pi

722K

vthorrf retweeted

Jimmy Wales

@jimmy_wales

over 1 year ago

I think Elon is unhappy that Wikipedia is not for sale. I hope his campaign to defund us results in lots of donations from people who care about the truth. If Elon wanted to help, he'd be encouraging kind and thoughtful intellectual people he agrees with to engage. https://t.co/RU9YwbvqOI

81K

vthorrf retweeted

Florian Ederer

@florianederer

over 1 year ago

Academics from poorer socio-economic backgrounds are more likely to - not publish - have outstanding publication records - introduce more novel scientific concepts - less likely to receive recognition, as measured by citations, Nobel Prize nominations, and awards.

florianederer's tweet photo. Academics from poorer socio-economic backgrounds are more likely to
- not publish
- have outstanding publication records
- introduce more novel scientific concepts
- less likely to receive recognition, as measured by citations, Nobel Prize nominations, and awards. https://t.co/Lt1ZUQQH5Q

915

275K

Víthor Rosa Franco @vthorrf

over 1 year ago

Hey peeps! Sharing a collaboration I am glad to have had the opportunity to be a part of We use measurement theory and psychometrics to develop indices that indicate the magnitude of ordering for Likert-type scales Simulations and empirical examples are also provided 😊

PsyArXiv bot v2 @PsyArXiv_bot_v2

over 1 year ago

Signposts on the Path from Nominal to Ordinal Scales https://t.co/geDQtHPPj2

914

Víthor Rosa Franco @vthorrf

over 1 year ago

@o_guest https://t.co/kVwtPVHAfx have you seen this? These are becoming very common...

Víthor Rosa Franco @vthorrf

over 1 year ago

Plus, there are two empirical examples thoroughly discussed with available R code and data! Of course, the examples only cover two specific cases that may not suit your own research interests. But I hope they will serve as inspiration for your next project 😊

Víthor Rosa Franco @vthorrf

over 1 year ago

Hey peeps! Just submitted this manuscript to an awesome journal; your feedback is appreciated. Disquiet with current practices in psychometrics? Looking for ways to test your theories more thoroughly? Want to apply representational measurement methods? I got you covered 🙃

PsyArXiv bot v2 @PsyArXiv_bot_v2

over 1 year ago

Improved Measures with the Experimental Psychometrics Framework https://t.co/rPm6ElkkWr

430

213

Víthor Rosa Franco @vthorrf

over 1 year ago

This manuscript aids on the understanding of how data theory and experiments can lead to applications of RMTs not only as scaling methods (i.e., the assignment of numbers to observations), but also as means of testing meaningful aspects of psychological theories

vthorrf's tweet photo. This manuscript aids on the understanding of how data theory and experiments can lead to applications of RMTs not only as scaling methods (i.e., the assignment of numbers to observations), but also as means of testing meaningful aspects of psychological theories https://t.co/zfIH8qcKvE

vthorrf retweeted

Santiago

@svpino

over 1 year ago

Large Language Models don't reason. Thank you, Apple.

284

955K

Víthor Rosa Franco

@vthorrf

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users