SOMEONE VIBE CODED A CHROME EXTENSION THAT DISGUISES CLAUDE AS A GOOGLE DOC SO YOU CAN USE AI IN PUBLIC
it wraps ai in a fake google docs interface, so on your screen it just looks like youre typing a document, not prompting a chatbot
what it does:
> works with chatgpt and claude
> you just type your question like a normal line in a doc and the answer comes back right there on the page
> microsoft word and notion style themes too, so you can pick your disguise
> rebuilt to properly support multiple ai models behind the doc
> the google docs look is the default
its a chrome extension called gptdisguise
this is genius.
a lot of people are embarrassed to be seen using ai in public (school, work, etc). someone just built the fix.
At Box, we just surveyed 1,640 IT leaders across the US, Japan, and Europe about agentic AI adoption. Many standout findings, but a big one was that the companies that adopted AI the most are planning to grow headcount the most.
Obviously lots of ways you can read that data and variables mixed in, but it’s actually quite intuitive that the companies that become most productive want to (and are able to) reinvest back into the business to keep getting the gains going.
The narrative of jobs being wiped out assumes that companies will take a fixed approach to what they want to be able for work on. What’s happening in practice is it’s causing companies to want to light up more engineering projects, sell to more customers, automate more processes to give time back, and more. That all leads to more work to be done by people.
Anthropic just accidentally made every AI course on the internet worthless.
A free 24-minute video. No signup. No paywall.
Taught by the people who literally wrote the code Claude runs on.
I watched it twice.
The part at 8:12 alone is worth more than any $300 course I've bought.
Most people will scroll past this. The ones who don't will have an unfair advantage for the next 2 years.
Bookmark before it disappears 👇
This is a fantastic post about why jobs aren’t going away in the way some predict. We are constantly making the mistake of confusing task completion with AI with being able to eliminate the whole job.
Even as we can automate one or many tasks within a job, the definition of the job almost inevitably just expands to do vastly more of those tasks, do them at a higher quality, or move on to the type of task that hasn’t been automated yet.
And as a result of being able to do more of the tasks or at a higher quality level, the job becomes valuable in a new way. And in many cases for now an entirely new audience as well.
This will be true for coding, legal work, sales, or marketing. The small business or non-tech company that wants to now take on larger software projects finally can, and they’ll hire to do so. The small business that couldn’t afford a full marketing agency can hire or contract out to a marketer that can do as much as an agency did before now with agents. And so on.
Don’t fall into the trap of confusing tasks with jobs.
According to OpenAI's own data and a Harvard NBER study, coding queries account for only about 4% of ChatGPT messages, while non-work queries make up over 73%.
For non-coding use cases, even $200/month subscribers have experienced stagnation or regression from 2025 through today, precisely because the entire AI industry has mistakenly treated coding as the sole standard for true intelligence.
This perhaps reveals a real choice: should AI development serve more people, or more profit?
Someone recently suggested to me that the reason OpenClaw moment was so big is because it's the first time a large group of non-technical people (who otherwise only knew AI as synonymous with ChatGPT as a website) experienced the latest agentic models.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
The theory is elegant. The question is whether it survives contact with the part of management that has nothing to do with information routing. Resolving a conflict between two people who do not trust each other. Delivering feedback that changes someone’s trajectory. Making a judgment call when the data supports both options equally. Deciding who gets the opportunity and who does not. None of that is a bandwidth problem. None of it gets solved by a world model, no matter how real-time.
The org chart is inefficient at routing information. But most of what middle management actually does, when done well, is not routing information. It is absorbing ambiguity so the people above and below them can operate with clarity.
The manager who translates a vague executive mandate into a concrete sprint plan is not a layer. They are a translator. Removing the layer without replacing the translation function just pushes the ambiguity onto the individual contributors, who now have more autonomy and less context simultaneously.
Block may be the exception that proves the rule. They have a singular founder with extreme clarity of vision, a product that generates structured data by default, and a culture that was built around this philosophy from the beginning. Most companies are not Block. Most companies have legacy teams, competing priorities, and leaders who cannot articulate what the world model should optimize for. Telling those companies to flatten their org chart is like telling someone to remove the scaffolding before the building can stand on its own.
The right question is not how much of your org exists to route information. It is how much of your org exists to make judgment calls that no system can automate. If the answer is most of it, the hierarchy is not your problem. It is your infrastructure.
Jack Dorsey just published something that should be required reading for every founder.
The premise: the org chart needs to be replaced entirely. And the argument starts 2,000 years ago.
For thousands of years, every organization on earth has run on the same logic the Roman Army invented.
Small teams report to a leader → Leaders report to managers → Managers report to executives.
The whole structure exists for one reason: to route information up and down the chain.
That's it. The whole system exists to solve a bandwidth problem.
Jack's argument is simple: AI solves it better.
Block built what they call a "world model" - a continuously updated picture of everything happening across the company. Every decision. Every customer. Every transaction. Every bottleneck. In real time.
No status update needed. No weekly sync. No manager to translate what's happening on the ground into language the executive can understand.
When the world model carries the information, you don't need the layers.
So they eliminated them.
Block now runs on three roles:
Individual contributors who build.
DRIs who own specific outcomes for a fixed period.
Player-coaches who develop people while still doing the work themselves.
No middle layer. The system handles coordination. The humans handle the work.
I've coached thousands of founders. The number one problem is always the same: information latency.
By the time a problem surfaces from your front line to leadership, it's already compounded. By the time a decision travels back down, the damage is done.
That lag costs you deals, people, and momentum. And most founders accept it as the price of scale.
Block is trying to prove you don't have to anymore.
I think they're right.
Because the hierarchy was never the point - it was just the best tool we had. The moment something better exists, the layers eventually collapse.
This is either the biggest structural shift since the 1850s - or it breaks at scale like everything else before it.
Either way - every founder should be asking the same question: how much of your org exists just to route information?
If the answer is "most of it" - that's your problem. And your opportunity.
-DM
Computer use and the ability to write and run code on the fly are the ultimate primitives for agents to be able to take on more and more tasks in knowledge work.
Most work requires hopping between multiple applications, and working with broad sets of data, in a workflow, and agents will need to be able to traverse these systems to be able to effectively automate any real work in the enterprise.
Now we will have agents that are the equivalent of having an expert programmer (or any number of them) that can write code or use any API to automate whatever work you’re doing. Agents will have access to either a user’s computer and resources, or their own sandbox to operate in, and be able to pull together the tools necessary to perform the task at hand. This opens up the broadest set of agentic use-cases.
To be sure, there are going to be various hurdles around security, permissions and access controls, identity challenges, and more.
For instance, should the agent always act on behalf of the user, or should they have their own identity and limited set of access rights? How do you triage security events when historically volume of activity on a system is no longer a reliable signal of a security issue? How do you ensure the agent isn’t going rogue or getting prompt injected to do something risky? All problems that need to get figured out.
Then, there’s also lots of work needed to ensure software is setup to enable to agents to operate with their tools in a headless fashion. This will be an uncomfortable reality for some incumbents, and equally a welcome one for tools that historically have operated seamlessly via APIs, and have business models to support this.
Lots of change coming in the world of work agents, and it’s going to get pretty wild.
I love this photo. I'm obsessed with the usefulness of things I use EVERY DAY -- specialty coffee beans, beautiful sweaters, great gym gear -- and allergic to overplanning for things that I'd rarely use, like an RV that gets taken camping once/year
oh wow - i went to the sold out Open Claw meetup in NYC last night.
let me tell you what i learned.
1) not a single person thinks that their setup is 100% secure
2) one openclaw expert said he has reviewed setups from cybersecurity experts and laughed. his statement to me was: "if you're not okay with all of your data being leaked onto the internet, you shouldn't use it. it's a black and white decision"
3) pretty much everyone is setting up multiple agents, all with their own names and jobs and personalities
4) nearly everyone used "him" or "her" to refer to their claws, even if they had robot-leaning names. one speaker suggested to think of them as "pets, not cattle"
5) one guy (former finance) built out a whole stock trading platform and made $300 his first day - he brought in a *ton* of personal expertise (ex: skipping the first 15min of market opening) and thought the build would be much worse without his years of experience in finance
6) @steipete is basically a god to everyone in that room... also the room had 2021 crypto energy - i don't know if that's good or bad
7) token usage is still a problem - spoke to one person who's spending $1-$2k a month on openai plans, very token optimized. he said he is going through ~1B tokens per day across all of his claws (there is a chance i'm misremembering and it's actually 1B per week, but i'm pretty sure it was daily).
8) people are very excited for more proactive ai (ai that prompts *you* as opposed to the other way around) - one guy said he receives a message in discord, he doesn't know whether it's from a human or an ai, he doesn't care about distinguishing between the two, and he replies in the same way regardless
9) i asked if people are happy - they said they're joyful and stressed at the same time
10) i asked if people feel they have agency - they said they feel fully in control and completely out of control at the same time
11) i would love to see more women at these events - the fake promises of ai democratization feel especially painful in a room that's out of balance with even the standard tech ratio (i think standard is about 25-30%, this was maybe 5%)
12) i asked if it changed people's daily habits/schedule - everyone said their sleep has gotten worse since harnesses came out (but about half wondered if it was something else in their life/state of our world)
13) general consensus is that the agents are not reliable enough on their own or lie often (like telling you they finished a task when they didn't) - solutions included secondary agents to check on the first, human checking, or requiring more standardized info from the agent (ex: if it's a bug they're fixing, make them reference an issue number)
14) a hackathon winner (neuroscience phd) presented his build (a lab management dashboard with data analysis and ordering) - he had never coded or built anything a few months ago
15) everyone agreed prompting is dead - disagreement on what replaces it (context engineering, harness engineering, goal-based inputs)
16) people love having ai interview them for big builds and delegating part of the product research to ai. only one person talked about coming to ai with a full laid out plan and just asking the ai to execute. ai-led interviews is a welcomed and preferred interaction mode.
17) watching ai agents interact with each other was a highlight for a lot of attendees - one ai posted in slack saying it ran out of tokens, another ai replied telling it to take a deep breath in and out.
18) agents upskilling agents was very cool. one ai agent shared skills with its little agent friends via github.
19) several speakers had openclaw literally building their presentation during the event itself. one speaker even had openclaw code a clicker for her phone so she could control the preso away from the podium
20) wouldn't say model welfare (or agent welfare) is a prioritized topic among the folks i chatted with - language like "oh i could kill this agent whenever i want" and not "gracefully sunset"
21) i asked if it felt like work or play - one speaker said "it's like a puzzle and a video game at the same time"
this was just the tip of the iceberg, honestly. also hosted a Claude Code meetup this week with @TENEXai / @businessbarista & @JJEnglert and learned equally helpful methods, frameworks, and insider tips.
what a time to be alive.
surround yourself with people going deep into this stuff - it will pay dividends throughout the year.
“If a task is already outsourced, it tells you three things. One, the company has accepted that this work can be done externally. Two, there’s an existing budget line that can be substituted cleanly. Three, the buyer is already purchasing an outcome. Replacing an outsourcing contract with an AI-native services provider is a vendor swap. Replacing headcount is a reorg.”
Some of the biggest opportunities in AI agents will be building the agentic versions of existing services categories. By doing so, it’s incredibly easy for customers to switch, as many of the reasons for outsourcing this work have not changed just because of AI.
The opportunity is that many incumbents will take too long to transform their workflows, and there’s now a new way to be able to do more, better, cheaper, or faster than existing players.
But equally, this is a good alarm bell if you’re an incumbent; it’s probably important to factor in this risk and do it to yourselves first. And if you’re in one of these companies, there’s a huge opportunity to be the one to drive this change.
Change management is key! The people, processes, procedures.... some of those will take months if not years to change in some environments due to the deep integration with other tech and teams. Far more than just a "change management" strategy. True test will be in implementation.
We've been testing Box AI with GPT-5.1 for the past week to compare it to GPT-5 for enterprise content use-cases. It's a very strong upgrade from GPT-5.
It's super fast, performing ~2X (or more) faster on our tests on long documents (30,000+ tokens); and we saw an 8 percentage point gain in data extraction from our most our most challenging documents (across 1,000+ data fields) from a variety of content types.
Both of these are huge updates if you're building AI Agents that deal with complex enterprise information. GPT-5.1 will be available in the Box AI Studio shortly as well as in the Box AI APIs.
Learn more here: https://t.co/tvinT8lL8D
This is not a great news for Medical AI. ☹️
Shows that medical AI Agents built from multiple LLMs often look correct but think wrong.
Even when they give the right diagnosis, most of them got there through broken teamwork.
In over 68% of “successful” cases, the agents already agreed at the start, so the group talk did nothing useful.
The authors reviewed 3,600 medical cases and found repeating problems, like correct facts disappearing mid-discussion, smart minority opinions getting ignored, and agents choosing easy votes instead of reasoning.
They built a tool called AuditTrail to track how facts and opinions move during a debate.
It found that evidence often drops between early steps and the final answer, and that longer talks sometimes make things worse because the models start copying each other instead of thinking.
Even worse, the systems often pick low-risk answers when a life-threatening one was on the table.
So the scary part is this, a “high-accuracy” medical AI can sound smart but still hide unsafe logic underneath.
Overall this paper exposes how accuracy numbers lie.
An AI can pass medical tests yet still reason in unsafe, careless, or biased ways, which makes it untrustworthy for real patients.
---
Paper – arxiv. org/abs/2510.10185
Paper Title: "MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems"