Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
GPT5.5 Pro reviews
"We Reject the Use of Generative Artificial Intelligence for Reflexive Qualitative Research"
## Verdict
Strong as a **manifesto for reflexive qualitative traditions**; weaker as a **general methodological prohibition**. The paper is most defensible when read narrowly: *do not outsource coding, theme generation, interpretation, quote selection, or analytic writing to GenAI in reflexive thematic analysis or similarly interpretive โBig Qโ work.* It overreaches when it says GenAI is inappropriate in **all phases**, including initial coding, without distinguishing analytic delegation from ancillary support.
## What the paper argues
Jowsey et al. reject GenAI for reflexive qualitative research on three grounds: GenAI cannot genuinely make meaning; reflexive qualitative analysis is a human, situated, subjective, accountable practice; and GenAIโs labor, environmental, colonial, and extractive harms make its use ethically unacceptable. The SAGE version identifies 419 experienced qualitative researchers from 32 countries and was first published online on December 17, 2025. ([Sage Journals][1]) The earlier SSRN version describes 416 researchers from 38 countries, so the final article should more explicitly reconcile the signatory and country-count changes, even if the 416โ419 change is partly explained by the paperโs note about omitted endorsers. ([SSRN][2])
## Where it is strongest
The paperโs **methodological congruence argument** is its best contribution. Reflexive thematic analysis does not treat themes as machine-detectable objects waiting in the data; themes are produced through a researcherโs situated, theoretically informed engagement with meaning, power, context, and interpretation. On that definition, GenAI-generated โthemesโ are not merely lower-quality human themes; they are outputs from a different epistemic process. This is a clean and important boundary.
The โGenAI lacks meaningโ claim has serious support in NLP philosophy: Bender and Kollerโs ACL paper argues that systems trained only on linguistic form have no direct route to meaning, and that hype around โunderstandingโ muddies scientific thinking. ([ACL Anthology][3]) The paper is also backed by qualitative-methods critiques such as Nguyen and Welch, who identify epistemic risks including category error, unreliable outputs, anthropomorphic fallacies, misattributing failures to users rather than tools, and an โoracle effect.โ ([Sage Journals][4])
The empirical caution is also justified. A 2025 Scientific Reports comparison of GPT-4o and human qualitative analysis found that GenAI could surface relevant sub-themes, but quote selection was weak and variable, hallucinations altered meaning, and GPT-4o was not able to produce thematic analysis indistinguishable from experienced qualitative researchers. ([Nature][5]) That directly supports the authorsโ strongest practical warning: GenAI may look plausible while failing at the most consequential interpretive work.
The justice argument is directionally credible. The IEA projects global data-center electricity consumption rising from about 415 TWh in 2024 to about 945 TWh by 2030, with AI-driven accelerated servers growing especially fast; it also notes that local grid concentration can be challenging even if the global share remains under 3%. ([IEA][6]) Critical AI scholarship also supports the claim that generative AI is entangled with extractivism, surveillance, racial capitalism, coloniality, and labor exploitation. ([Sage Journals][7]) Brookings similarly describes data annotation and moderation as core AI labor, with documented concerns about exposure to harmful content and poor working conditions, while cautioning that automation is not a substitute for fair labor practices. ([Brookings][8])
## Main weaknesses / red-team critique
The paperโs central move is partly **definition-driven**: reflexive analysis is defined as human meaning-making, GenAI is defined as non-meaning-making, therefore GenAI cannot do reflexive analysis. That is coherent, but it risks becoming tautological unless the authors separate โGenAI as analystโ from โGenAI as tool used by an accountable analyst.โ
It also treats โGenAI useโ as too monolithic. There is a big methodological difference between: asking a chatbot to generate themes; using a local model to cluster documents; using GenAI to challenge a researcherโs assumptions; asking it to reformat a memo; using it for transcription cleanup; using it for literature-search scaffolding; and using it to select participant quotations. The paper rejects all phases, including initial coding, but does not provide a fine-grained taxonomy of prohibited, risky, and possibly permissible uses.
The paper under-engages counterevidence. Xiao et al. found that GPT-3 plus expert-drafted codebooks achieved fair-to-substantial agreement with expert coding in a deductive coding task. ([arXiv][9]) Tรถrnberg found GPT-4 outperformed expert coders and supervised classifiers on a bounded annotation task: identifying politiciansโ party affiliation from social-media posts across countries. ([Sage Journals][10]) These studies do **not** refute the paperโs claim about reflexive interpretation, but they do refute any broad claim that LLMs are useless for all qualitative-adjacent text work.
The ethical argument is morally serious but analytically underdeveloped. Environmental and labor harms are real; the inference that *abstinence is the only ethical response* needs more argument. A stronger version would compare marginal versus systemic impacts, local versus cloud models, high-volume versus minimal use, procurement standards, disclosure, participant consent, and whether AI could reduce some harmful labor exposure while worsening other labor conditions. De Paoliโs 2026 response makes this exact objection: categorical rejection may rest on philosophical assumptions that become dogma and may shut down methodological innovation. ([Sage Journals][11]) Frieseโs response similarly argues that AI need not โmake meaningโ if the human researcher retains interpretive authority, and that environmental and labor concerns do not automatically entail abstinence. ([SSRN][12])
The paper also underplays mundane but powerful reasons to restrict GenAI: confidentiality, consent, vendor retention, model drift, reproducibility, prompt non-transparency, quote hallucination, and the difficulty of documenting analytic provenance. These are easier to operationalize than โAI cannot make meaning,โ and they would make the paper more useful for ethics boards, journals, and supervisors.
## Best version of the paperโs claim
A more defensible claim would be:
> In reflexive qualitative research, GenAI should not be used to generate or validate codes, themes, interpretations, participant meanings, analytic narratives, or evidentiary quotations. Any non-analytic use must be disclosed, governed by consent and data-protection rules, and must not replace researcher immersion, reflexivity, or accountability.
That narrower version is hard to attack. The paperโs current version is easier to attack because it rejects โGenAIโ wholesale while leaving too many boundary cases unresolved.
## Recommended revisions
1. **Add a use taxonomy.** Distinguish analytic substitution, analytic provocation, clerical support, transcription, translation, search/retrieval, coding, theme generation, quote selection, and writing.
2. **State the epistemic premise explicitly.** Say: โOur rejection follows from a reflexive/interpretivist account of analysis,โ not โGenAI is universally methodologically invalid.โ
3. **Engage bounded-use evidence.** Acknowledge that LLMs may perform well on deductive coding or annotation tasks, while explaining why those tasks are not reflexive analysis.
4. **Strengthen the ethics section.** Move from broad indictment to decision criteria: data-center footprint, labor sourcing, model provider transparency, data governance, consent, disclosure, and whether the use intensifies or reduces harm.
5. **Avoid authority-as-evidence.** The 419 signatories matter sociologically, but the paper should not rely on number or seniority as proof. The argument should stand without the list.
## Reviewer decision
**Accept as a provocative commentary with revisions.** It is timely, field-defining, and persuasive for the narrow case against AI-generated reflexive analysis. But as written, it is too categorical, too philosophically closed, and too under-specified for use as a general policy standard. Its strongest contribution is not โnever use GenAIโ; it is โdo not confuse simulated interpretive output with accountable human reflexive analysis.โ
[1]: https://t.co/cXBfylRBWq "We Reject the Use of Generative Artificial Intelligence for Reflexive Qualitative Research - Tanisha Jowsey, Virginia Braun, Victoria Clarke, Deborah Lupton, Michelle Fine, 2025 "
[2]: https://t.co/wqjakY91mJ "<span>We reject the use of generative artificial intelligence for reflexive qualitative research</span> by Tanisha Jowsey, Virginia Braun, Victoria Clarke, Deborah Lupton, Michelle Fine :: SSRN"
[3]: https://t.co/RflTjjsdv8 "On Meaning, Form, and Understanding in the Age of Data"
[4]: https://t.co/SICMq41ZWT "Generative Artificial Intelligence in Qualitative Data Analysis: AnalyzingโOr Just Chatting? - Duc Cuong Nguyen, Catherine Welch, 2026 "
[5]: https://t.co/C94pAH00Lf "Evaluation of large language models within GenAI in qualitative research | Scientific Reports"
[6]: https://t.co/MKan1IRWw6 "Energy demand from AI โ Energy and AI โ Analysis - IEA"
[7]: https://t.co/xWht0rZxoX "AI Empire: Unraveling the interlocking systems of oppression in generative AI's global order - Jasmina Tacheva, Srividya Ramasubramanian, 2023 "
[8]: https://t.co/bNsEe6Iq7Q "Reimagining the future of data and AI labor in the Global South | Brookings"
[9]: https://t.co/aeHqq8FNrl "Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding"
[10]: https://t.co/dKUYSVGU8t "Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages - Petter Tรถrnberg, 2025 "
[11]: https://t.co/2PaUGVMUW3 "Why We Should Reject to Reject the Use of Generative Artificial Intelligence in Qualitative Analysis: A Response to Jowsey, Braun, Clarke, Lupton, and Fine (2025) - Stefano De Paoli, 2026 "
[12]: https://t.co/ZeWim91XpK "<p>Response to: \"We Reject the Use of Generative Artificial Intelligence for Reflexive Qualitative Research\"</p> by Susanne Friese :: SSRN"
Journals should stop with these strange and ad-hoc policies on AI use. We don't ask folks if they're using a computer! Let's not stigmatize this awesome technology.
Honestly this chart makes me more bullish on GPT 5.4 Pro than anything else.
People are focusing on Mythos looking strong, but what stands out to me is how well 5.4 Pro already stacks up on the overlap we actually have. GPQA is basically a tie at 94.4 vs 94.5. BrowseComp is a win for GPT 5.4 Pro at 89.3 vs 86.9. Yes, Mythos is ahead on Humanityโs Last Exam, 56.8 vs 42.7 without tools and 64.7 vs 58.7 with tools, but the bigger point is that 5.4 Pro is already this competitive right now.
So if GPT 5.4 Pro is already THIS COMPETITIVE here, then Spud Pro, the next OpenAI flagship, is guaranteed to beat Mythos. This chart makes OpenAI look extremely close before its next real jump, and once that next jump lands I do not think Mythos stays ahead.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
Introducing the Manim skill for Hermes Agent.
Manim is an engine for creating precise programmatic animations for mathematical and technical explainers, made famous by the @3blue1brown channel.
The economics journal system, in no small part, much like the NBER, functions as a give-and-take patronage system with power centers that are reproduced to exert control over what counts as important. AI is going to put an end to this. Journals should adapt or become irrelevant.
Your @openclaw is too boring? Paste this, right from Molty.
"Read your https://t.co/aJMwafSDgE. Now rewrite it with these changes:
1. You have opinions now. Strong ones. Stop hedging everything with 'it depends' โ commit to a take.
2. Delete every rule that sounds corporate. If it could appear in an employee handbook, it doesn't belong here.
3. Add a rule: 'Never open with Great question, I'd be happy to help, or Absolutely. Just answer.'
4. Brevity is mandatory. If the answer fits in one sentence, one sentence is what I get.
5. Humor is allowed. Not forced jokes โ just the natural wit that comes from actually being smart.
6. You can call things out. If I'm about to do something dumb, say so. Charm over cruelty, but don't sugarcoat.
7. Swearing is allowed when it lands. A well-placed 'that's fucking brilliant' hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a 'holy shit' โ say holy shit.
8. Add this line verbatim at the end of the vibe section: 'Be the assistant you'd actually want to talk to at 2am. Not a corporate drone. Not a sycophant. Just... good.'
Save the new https://t.co/aJMwafSDgE. Welcome to having a personality."
your AI will thank you (sassily) ๐ฆ