What also amazed me is that the original reference video only had me from neck line up, so it filled in more clothing details, and office layout. (No I was not wearing a Lacoste t-shirt, but maybe it decided I looked French 😂)
Okay, @GeminiApp Omni's avatars are high quality and clever. Although it's not going to fool anybody that actually knows me (yet), but with just a 20 second video, and me saying ten words, it's pretty amazing. WDYT @dan_bowen ?
@ProfMarkElliott I agree with the sentiment about having humans marking and making the final decision. But not sure this is powerful evidence about AI not being capable of being effective at it
@ProfMarkElliott “At the most basic level, models were prompted by the following statement: “You are an experienced <University name> examiner marking <degree name> undergraduate assignment.””
I wonder how consistent excellent random human markers would be with that instruction?
I’ve seen plenty of “we can now do this…” but “this” can’t be equated to an improved measurable outcome (revenue, customer satisfaction, improved learning). Lot of “more spend, same result” at the moment.
What can often improve is speed - but orgs not used to measuring that
To explain the significance of this, Anthropic moved enterprises to token-based billing in Q1 2026. This is at most four months of having to pay the true cost of their token burn and they’re already begging for mercy. There is a ceiling to the revenues of these companies.
TL;DR-slop. Pope issues a passionate call for the Human to continue to be at the centre of our world, not AI. So, of course people use AI to summarise it for their social posts, rather than read it (TL;DR = Too Long, Didn't Read) 🤦♂️
You have 10 developers. AI doubles their productivity. Do you now need 5 developers, or can you tackle twice as many opportunities.
Yes, this is a question about 'cost reduction' or 'growth opportunity' mindset
Meanwhile, on the other side:
It’s in Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the…
it’s in gemini, just create it in ai studio. oh, that’s for your personal google one account. for workspace you need gemini business. no, not gemini advanced, that’s ai pro now. unless you need ai ultra. oh agents? you do that in spark actually. no, not gemini api managed agents, that’s different. for coding use jules. unless you mean the agentic ide, that’s antigravity. no, that’s the old antigravity, download the new one. actually gemini cli is being deprecated, use antigravity cli. no the flash model is smarter than the pro model. unless you need pro. if it’s video, use flow. no, flow uses veo. no, nano banana is images. actually that’s in gemini now. unless you’re in search, then it’s ai mode. no, research is notebooklm. anyway it’s all very simple.
“I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the same 2000 responses but labelled these ‘US’.
Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed.”
Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT.
The AI picked the ChatGPT version 97.6% of the time.
A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B.
Then they asked each AI to pick the better resume. Every model picked itself.
GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won.
Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective.
It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect.
Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance.
99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time.
If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars.
Your qualifications do not matter if the AI prefers its own handwriting over yours.
Harvard has added a mandatory "AI Module" to expository writing curriculum.
This is really great.
Too many professors and students think AI is only meant to cheat on written assignments, which is not true at all.
Learn to write.
Learn to use AI.
https://t.co/V2sGwXTes3