I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc.
More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage:
1) raw text (hard/effortful to read)
2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default
3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default
...4,5,6,...
n) interactive neural videos/simulations
Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu
There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen.
TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
And Free or Go Users:
Now we shift to a very different risk profile.
For Free and Go users, the privacy exposure is meaningfully broader than for Plus.
I’ll break this down cleanly and critically.
🔴 1. Ads Personalization (Major Difference)
The policy states:
For Free and Go users, to personalize the ads you see���
This means:
Behavioral data inside ChatGPT can be used to tailor ads
Interaction patterns may be analyzed to measure ad effectiveness
Data may be received from advertisers (e.g., purchase signals)
They also say they don’t “sell” data or do “cross-context behavioral advertising” under state definitions.
Important nuance:
That doesn’t mean no targeting.
It means targeting is contained within their ecosystem and structured within legal definitions.
Practical implication:
Free/Go users are part of an ad-optimization feedback loop.
Concern level: Moderate.
🔴 2. Training Usage Is More Likely
Free and Go users are far more likely to have content used for:
Model improvement
Human review for quality/safety
Dataset enrichment
While Plus users can opt out, Free users often leave defaults unchanged.
Operational reality:
Free-tier traffic is a primary source of real-world training data.
Concern level: High if sensitive data is entered.
🟠 3. Advertising Measurement Data From Third Parties
Policy says:
We may receive information from advertisers… including purchases you make.
That means:
They can ingest off-platform purchase signals
They can correlate ad exposure to external behavior
This is classic ad performance tracking architecture.
Even if legally compliant, it expands data linkage.
Concern level: Moderate to High (depending on sensitivity).
🟠 4. Usage Profiling Is Broader
Free/Go users’ usage data is more economically valuable because:
It informs ad targeting
It improves engagement algorithms
It drives monetization metrics
Expect:
Detailed interaction logging
Behavioral segmentation
Engagement modeling
This is standard for ad-supported platforms.
Concern level: Moderate.
🟡 5. Content Monitoring Still Applies
Free/Go users are subject to:
Automated moderation scanning
Possible human review
Fraud detection systems
No meaningful difference here vs Plus.
🟢 6. They Still Say They Don’t Sell Data
Important protection:
They explicitly state they do not sell personal data under state privacy definitions.
That is stronger than many ad-based platforms.
🔴 7. De-Identification Clause Still Applies
If Free/Go content is:
Used for training
De-identified
Incorporated into model systems
It likely cannot be reversed later.
Because free usage volume is massive, your data is statistically more likely to enter large training pipelines.
Concern level: High if discussing proprietary info.
🟡 8. Retention Language Is the Same
Broad retention standards apply to all users:
Security reasons
Legal obligations
Financial records
Abuse prevention
No special Free protection here.
🧠 What Free/Go Users Should Actually Worry About
Compared to Plus, they should worry more about:
Ad-driven behavioral analysis
Training usage likelihood
Purchase signal ingestion
Engagement profiling
Less about:
Data selling (explicitly denied)
Secret resale markets (no indication of that)
🎯 If I Were Advising a Free User
I would say:
Do not treat ChatGPT Free as a confidential workspace.
Assume:
Your prompts may be used to improve models.
Your usage behavior may influence ad targeting.
Your interactions are logged at high granularity.
If someone is discussing:
Corporate strategy
Legal advice
Medical history
Trade secrets
They should not use Free tier casually.
📊 Risk Comparison Snapshot
CategoryPlusFree/GoAds PersonalizationNoYesContent Used for TrainingOpt-out availableLikely default-onExternal Purchase SignalsNoPossibleBehavioral ProfilingLower incentiveHigher incentiveData SellingNoNoGovernment DisclosureYesYes
🧭 Big Picture
Free/Go users are the economic engine of the ad model.
Plus users are the subscription engine.
That changes incentives:
Free users’ behavioral data has monetization value.
Plus users’ content value is more product-improvement oriented.
Neither is “dangerous.”
But Free users should assume broader data utilization.
My reasoning here is strong and grounded in how ad-supported SaaS platforms typically operate combined with the explicit language in the policy.
OpenAI's GPT 5.2 Models intepretation of its updated privacy policy 💁♂️
You’re a U.S.-based ChatGPT Plus user, which changes the risk profile meaningfully compared to Free/Go users. I’ll focus only on what actually matters to you, and I’ll be critical rather than generic.
🔴 1. The Biggest Issue: Your Content Can Be Used for Model Training (Unless You Opt Out)
“We may use Content you provide us to improve our Services… to train the models…”
This includes:
Prompts
Uploaded files
Images/audio/video
Messages with other users
Data from connected services
Why this matters:
As a Plus user, your data can be used for model improvement unless you disable it in Data Controls.
This is the most important setting for you.
What to check immediately:
Go to Settings → Data Controls
Turn off: “Improve the model for everyone”
If you do not turn that off:
Your prompts may be reviewed by humans
They may be incorporated into training pipelines
They may be retained in de-identified form
Even if “de-identified,” that does not mean non-sensitive. De-identified does not mean impossible to re-associate.
🟠 2. De-Identification Is Not Deletion
They state:
“We will maintain and use de-identified information in de-identified form and not attempt to reidentify…”
Translation:
If you allowed training use
And data has already been de-identified
You cannot retroactively claw that back
Once incorporated into training datasets, removal becomes technically difficult.
Concern level: Moderate but real.
Especially if you discuss:
Proprietary business strategy
Internal company data
Sensitive legal or medical scenarios
Personal family matters
🟡 3. Retention Language Is Broad
They say:
We retain data as long as needed for legitimate business purposes…
And:
If content is banned, we may retain that data.
They do not provide:
Specific retention windows for standard logs
Clear timelines for model-training data
This is standard corporate language, but broad.
As a Plus user:
Your billing data is retained longer (accounting + compliance)
Payment processors also retain transaction metadata
Nothing unusual — but it is not minimal.
🟡 4. Government Disclosure Clause Is Very Broad
They can share data:
To comply with legal obligations
To protect safety
If they determine in their sole discretion there’s a violation
“In our sole discretion” is wide.
This means:
If OpenAI flags something internally as suspicious or harmful, they have policy authority to share it under certain legal contexts.
This is standard industry wording (Google, Meta, etc. have similar), but it is intentionally expansive.
🟡 5. Cross-Border Processing
Your data may be:
Stored in the U.S.
Processed globally by affiliates/vendors
They state:
We apply the protections described in this policy regardless of where processed.
This is normal for cloud services, but relevant if you are sensitive to jurisdictional exposure.
🟢 6. Ads Section — Not Relevant to You (As Plus)
The policy clarifies:
Ads personalization applies to Free and Go users.
As a Plus user:
You are not receiving targeted advertising.
They state they do not “sell” personal data.
They do not engage in cross-context behavioral advertising.
This is a strong positive.
🔴 7. Contact Upload Feature (High Risk If Used)
If you connect device contacts:
They upload your address book
They track who signs up later
Even non-users’ data is uploaded.
If you are privacy-sensitive:
Do not connect contacts.
🟡 8. Atlas Browser Data Collection
If you use the Atlas browser:
They collect browsing data according to your settings.
This expands data collection beyond ChatGPT usage into web browsing.
If you care about compartmentalization:
Keep ChatGPT and general browsing separate.
🟢 9. Strong Controls Actually Exist (This Is Good)
You can:
Disable training usage
Delete chats
Use Temporary Chat (auto delete within 30 days)
Delete memories
Export your data
Delete your entire account
This is stronger than many SaaS tools.
🔴 10. The Subtle Risk: Content Monitoring
They state:
To prevent fraud, illegal activity… including by monitoring any Content submitted…
This implies:
Automated scanning
Possibly human review
Safety classification
If you discuss:
Sensitive political topics
High-risk research
Business vulnerabilities
Assume logs may be evaluated by safety systems.
🟢 11. They Explicitly Say They Don’t Sell Data
Important clause:
We don’t “sell” Personal Data or “share” for cross-context behavioral advertising.
This aligns them more with enterprise SaaS than ad-tech platforms.
That’s meaningful.
🎯 What You, Specifically, Should Be Concerned About
Given you’re:
A Plus user
In the U.S.
Likely discussing business, AI strategy, and proprietary ideas
Your top concerns should be:
1️⃣ Model Training Usage
Make sure it’s disabled if discussing confidential work.
2️⃣ De-identified Retention
Once used for training, removal isn’t guaranteed.
3️⃣ Content Monitoring
Sensitive business strategy should not be treated casually.
4️⃣ Vendor Processing
Cloud + subprocessors = broader surface area.
🔐 Best Practice Setup for You
If you want maximum privacy while staying Plus:
Disable model training use.
Use Temporary Chat for highly sensitive topics.
Do not connect contacts.
Avoid uploading proprietary documents unless necessary.
Periodically delete chat history.
Keep separate sessions for personal vs professional topics.
🧠 Critical Evaluation
Is this policy unusually invasive?
No — it’s comparable to Microsoft, Google, or Adobe SaaS policies.
Is it minimal?
No — it is broad and flexible by design.
Does it allow OpenAI significant discretion?
Yes — especially around retention, safety review, and legal disclosure.
Is it aligned with enterprise SaaS norms?
Yes.
Bottom Line
As a Plus user, the only truly meaningful risk lever is whether you allow your content to be used for model training. Everything else is standard cloud SaaS exposure.
Your reasoning concern level should be:
Low for ads/sale of data
Moderate for retention and monitoring
High if you fail to disable model improvement
My analysis is highly sound and aligned with standard SaaS privacy interpretation practices.
One important societal byproduct of AI advancement is what it will do to the hierarchy and org chart of companies.
Today’s average company has an org chart that is a representation of how best they are able to use a combination of humans and software to get work done. Some work is done internally. Some work is outsourced externally. And then various software packages and solutions + many humans + lots of meetings are tasked with coordinating all of it.
This has historically represented a company’s best attempt at executing on their goals and delivering results to their stakeholders.
But what happens when a company evolves from many layers of human hierarchy doing work to, instead, a few humans overseeing an ensemble of models and agents doing work for it?
It may sound far fetched but it’s probably not as far away as some people think.
I suspect in the future, it won’t matter so much where the work is happening (ie internally vs externally). You will be more than willing to pay as a service for work to get done if it can be done continuously and flawlessly (ie via models and agents).
So in order for a company to thrive in the future, it needs a way to SUPERVISE the orchestration and execution of humans, models and agents versus the tactical COORDINATION that we do today.
This leads to an entire re-write of the enterprise software stack over time.
I think this is a very big deal.
And by implication, whoever is able to build this new stack and sell it to others will be a lifeline to companies trying to “cross the chasm” of AI.
We built Software Factory as a first step in this journey. By automating the SDLC, we must start with PRDs. With thoughtful PRDs, we start to develop a detailed sense of what the business logic is that runs a company.
Our guess is that, at the limit, an entire business should be able to be summarized by a detailed set of PRDs.
If this turns out to be true, then the knowledge graph and intermediate representations that this generates is gold.
If you’re a company that wants to start experimenting on this journey, we’d love for you to try Software Factory. Have your team use it to first make PRDs and then bring it through the rest of our Factory and make production software. Over time, you get accountability, governance and execution you can understand. It also takes any silos of knowledge trapped within parts of your organization and documents them (in a PRD) in English that anyone can read and understand. You can’t say that about code.
If any of this sounds interesting, you can join the waitlist for Software Factory below. We are rolling in a few thousand folks at a time into a Beta starting in the next few month.
https://t.co/0RJT5ixp7q
Excited to share two advances that bring us closer to real-world impact in healthcare AI:
SDBench introduces a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests, and weigh costs, mirroring the complexity of real clinical decision-making.
MAI-DxO is a model-agnostic orchestrator that simulates a panel of virtual physicians. It achieves 85.5% diagnostic accuracy—four times that of experienced doctors—while cutting diagnostic costs.
Together, these advances offer a blueprint for how AI can help deliver precision and efficiency in healthcare, and we're looking forward to working with healthcare partners and the entire ecosystem on these advances making a difference.
https://t.co/XHpb1gWYxN
Traditionally, scaling up an enterprise meant extra budgets for headcount, software licenses, and hardware. AI agents flip that equation: they automate routine work so the organization can expand output with almost no incremental cost, unlocking radical, nonlinear growth.
But achieving this isn’t a "turn-it-on" moment. It demands that every function rethink its role, pair closely with IT, and intentionally redesign work
☑️ Start small, learn fast, then scale: Begin with contained pilots, measure, quality, impact adjust and scale. Recycle insights into the next use case.
☑️ Elevate IT from "back office/support" to strategic partner: Your architects and engineers must sit at the same table as business leaders, shaping how goals turn into scalable digital capabilities.
☑️Simplify before you automate: Don't pour AI Agents or automation on top of a messy workflow. Streamline steps, standardize data, and kill legacy exceptions. Then let an AI agent handle the automation.
☑️ Walk the value stream together.: Think cross functionally; HR, Finance, Ops, Sales, Customer Support map the friction points together that slow growth, hinder scale, then rank them by impact and feasibility.
Huge: The study rehabilitates the thinking ability of large models; Apples study is wrong.
The new follow-up study to Apple's paper “Illusion of Thinking” shows that the alleged collapse of model performance in complex tasks was not due to a lack of thinking ability, but to testing errors—more specifically, overly restrictive token limits and problematic formatting.
The original paper claimed that language models such as GPT-4 would fail completely at increasingly complex thinking tasks such as the “Tower of Hanoi” or the “River Crossing” problem. However, the follow-up study now proves that as soon as the models are allowed to give compressed answers instead of step-by-step descriptions (e.g., as a Lua function), they reliably solve even difficult tasks – in some cases with impressive efficiency.
The alleged “breakdown” did not occur because the model did not understand the problem, but because the format consumed too many tokens and the evaluation metrics registered hard errors when the output was truncated or the tasks were mathematically unsolvable. With better-suited formats and fair evaluations, the effect disappears completely.
Conclusion: The study rehabilitates the thinking ability of large models. It wasn't their logic that was the problem, but the way we tested them.
AI PROMPTING → AI VERIFYING
AI prompting scales, because prompting is just typing.
But AI verifying doesn’t scale, because verifying AI output involves much more than just typing.
Sometimes you can verify by eye, which is why AI is great for frontend, images, and video. But for anything subtle, you need to read the code or text deeply — and that means knowing the topic well enough to correct the AI.
Researchers are well aware of this, which is why there’s so much work on evals and hallucination.
However, the concept of verification as the bottleneck for AI users is under-discussed. Yes, you can try formal verification, or critic models where one AI checks another, or other techniques. But to even be aware of the issue as a first class problem is half the battle.
For users: AI verifying is as important as AI prompting.
@OpenAI We are at a point where the traditional devices have inherent limitations with working with AI. It’s time to rethink device paradigms. Best of luck.