Ray @rayfleming - Twitter Profile

3 days ago

What also amazed me is that the original reference video only had me from neck line up, so it filled in more clothing details, and office layout. (No I was not wearing a Lacoste t-shirt, but maybe it decided I looked French 😂)

0

39

Ray @RayFleming

3 days ago

Okay, @GeminiApp Omni's avatars are high quality and clever. Although it's not going to fool anybody that actually knows me (yet), but with just a 20 second video, and me saying ten words, it's pretty amazing. WDYT @dan_bowen ?

1

0

30

Ray @RayFleming

8 days ago

@ProfMarkElliott I agree with the sentiment about having humans marking and making the final decision. But not sure this is powerful evidence about AI not being capable of being effective at it

0

10

Ray @RayFleming

8 days ago

@ProfMarkElliott “At the most basic level, models were prompted by the following statement: “You are an experienced <University name> examiner marking <degree name> undergraduate assignment.”” I wonder how consistent excellent random human markers would be with that instruction?

1

0

35

Who to follow

Tony Parkin

@tonyparkin

Disruptive Nostalgist & Educational Technologist. Freelance speaker, writer & mentor. Left leaning & right learning Yorkshireman. https://t.co/ggPgPJDMQt

Simfin

@simfin

#onlinesafety #digitalcitizenship specialist. Support & training for education/safeguarding partners/charities & business. @simfinuk.bsky.social

Mark Reynolds

@themarkreynolds

I left Microsoft to start @hablegroup in 2015. Accelerating change. Transforming culture. Boosting efficiency. #TeamHable

Ray @RayFleming

8 days ago

@AndyMasley @dan_bowen And also excited to introduce our listeners to the humble almond as a unit of measurement. IYKYK

3

2

0

41

Ray @RayFleming

8 days ago

@AndyMasley @dan_bowen Love the way that you simplify complex topics, so really looking forward to this conversation!

1

2

0

55

Ray @RayFleming

9 days ago

I’ve seen plenty of “we can now do this…” but “this” can’t be equated to an improved measurable outcome (revenue, customer satisfaction, improved learning). Lot of “more spend, same result” at the moment. What can often improve is speed - but orgs not used to measuring that

Ed Zitron

@edzitron

10 days ago

To explain the significance of this, Anthropic moved enterprises to token-based billing in Q1 2026. This is at most four months of having to pay the true cost of their token burn and they’re already begging for mercy. There is a ceiling to the revenues of these companies.

55

3K

392

503

212K

0

1

0

61

Ray @RayFleming

10 days ago

TL;DR-slop. Pope issues a passionate call for the Human to continue to be at the centre of our world, not AI. So, of course people use AI to summarise it for their social posts, rather than read it (TL;DR = Too Long, Didn't Read) 🤦‍♂️

RayFleming's tweet photo. TL;DR-slop. Pope issues a passionate call for the Human to continue to be at the centre of our world, not AI. So, of course people use AI to summarise it for their social posts, rather than read it (TL;DR = Too Long, Didn't Read) 🤦‍♂️ https://t.co/yCmFJOJZMl

0

1

0

35

Ray @RayFleming

13 days ago

When I was growing up I expected quick sand to be more of an issue in adulthood than it has been. Conscientious watching “Ice cold in Alex”

0

1

0

54

Ray @RayFleming

14 days ago

You have 10 developers. AI doubles their productivity. Do you now need 5 developers, or can you tackle twice as many opportunities. Yes, this is a question about 'cost reduction' or 'growth opportunity' mindset

0

33

Ray @RayFleming

14 days ago

Cambridge picked the emdash as the winner decades ago. Oh…

Daniel Albert @DrDanielAlbert

15 days ago

@RealOxfordComma If Oxford has its comma, what punctuation does Cambridge have?

7

8

1

0

32K

0

85

Ray @RayFleming

15 days ago

Meanwhile, on the other side: It’s in Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the other Copilot. No, not that Copilot, the…

Nathan Clark

@nathanclark_

16 days ago

it’s in gemini, just create it in ai studio. oh, that’s for your personal google one account. for workspace you need gemini business. no, not gemini advanced, that’s ai pro now. unless you need ai ultra. oh agents? you do that in spark actually. no, not gemini api managed agents, that’s different. for coding use jules. unless you mean the agentic ide, that’s antigravity. no, that’s the old antigravity, download the new one. actually gemini cli is being deprecated, use antigravity cli. no the flash model is smarter than the pro model. unless you need pro. if it’s video, use flow. no, flow uses veo. no, nano banana is images. actually that’s in gemini now. unless you’re in search, then it’s ai mode. no, research is notebooklm. anyway it’s all very simple.

510

19K

2K

3K

2M

0

27

Ray @RayFleming

15 days ago

Really interesting findings on using AI for analysis of qualitative data and things like sentiment analysis

Greg Egan @gregeganSF

15 days ago

“I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the same 2000 responses but labelled these ‘US’. Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed.”

22

2K

47

198

74K

0

45

Ray @RayFleming

21 days ago

I read this as about my fellow technologists. And then realised that some others read this as about politicians. Probably both

Tom Goodwin

@tomfgoodwin

28 days ago

The people leading us into the future are those who know least about normal human beings

7

88

13

11

8K

0

26

Ray @RayFleming

22 days ago

Um. Why not just start the conveyor belt 30cm earlier? Would save a fortune. No robot or manual handling needed

KEMOSABE

@KEMOS4BE

22 days ago

@adcock_brett looks like a weeee bit of teleoperation here (misses a bunch of packages -> adjusts headset -> no longer misses)

75

3K

127

266

787K

0

71

RayFleming retweeted

Anon Opin.

@anon_opin

28 days ago

IT departments that lock the wallpaper to some boring corporate image are boring and mean. Just let me have a picture of my dog.

14

484

16

0

19K

Ray @RayFleming

about 1 month ago

@DamiDina @jimprosser @nitashatiku @ElevenLabs Kogan did it last year https://t.co/YZd61bqGAU

0

1

0

46

Ray @RayFleming

about 1 month ago

This is important to know if you’re in HR and are using resume scanning tools. Your HR system may be denying you the best candidates

Nav Toor

@heynavtoor

about 1 month ago

Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT. The AI picked the ChatGPT version 97.6% of the time. A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B. Then they asked each AI to pick the better resume. Every model picked itself. GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won. Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective. It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect. Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance. 99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time. If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars. Your qualifications do not matter if the AI prefers its own handwriting over yours.

heynavtoor's tweet photo. Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT.

The AI picked the ChatGPT version 97.6% of the time.

A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B.

Then they asked each AI to pick the better resume. Every model picked itself.

GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won.

Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective.

It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect.

Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance.

99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time.

If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars.

Your qualifications do not matter if the AI prefers its own handwriting over yours.

431

24K

7K

12K

3M

0

64

RayFleming retweeted

Mushtaq Bilal, PhD

@MushtaqBilalPhD

about 1 month ago

Harvard has added a mandatory "AI Module" to expository writing curriculum. This is really great. Too many professors and students think AI is only meant to cheat on written assignments, which is not true at all. Learn to write. Learn to use AI. https://t.co/V2sGwXTes3

9

223

61

158

22K

Ray

@RayFleming

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users