Jonathan Bennion

@rooftopzen

9 years of ML and DS in SF bay area | Applied Science | Distance runner (when thinking). Runs The Objective AI @

San Francisco, CA

Joined April 2008

189 Following

113 Followers

281 Posts

rooftopzen retweeted

Nav Toor

@heynavtoor

about 1 month ago

Researchers at EPFL proved your AI is lying to you. Not sometimes. Most of the time. They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding. Then they ran every top model on it. The results. GPT-5. Wrong 71.8% of the time. Claude Opus 4.5. Wrong 60% of the time. Gemini 3 Pro. Wrong 61.9% of the time. DeepSeek Reasoner. Wrong 76.8% of the time. These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money. You think turning on web search fixes it. It doesn't. Claude Opus 4.5 with web search. Still wrong 30.2% of the time. GPT-5.2 thinking with web search. Still wrong 38.2% of the time. The internet attached. Still lying to you in 1 out of every 3 answers. Now the part that should scare you. Medical questions. The one place being wrong can kill you. GPT-5 hallucinated 92.8% of the time on medical guidelines. Claude Haiku 4.5 hallucinated 95.7% of the time. Gemini 3 Flash hallucinated 89% of the time. Nine out of ten medical answers from popular AI models. Wrong. It gets worse. The longer you talk to it, the more it lies. Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first. The paper, in its own words: "hallucinations remain substantial even with web search." This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code. Most are not checking. Most never will. But please. Keep using ChatGPT for medical advice. The doctors need a break. https://t.co/dHBP5CDpTM

heynavtoor's tweet photo. Researchers at EPFL proved your AI is lying to you.

Not sometimes. Most of the time.

They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding.

Then they ran every top model on it.

The results.

GPT-5. Wrong 71.8% of the time.

Claude Opus 4.5. Wrong 60% of the time.

Gemini 3 Pro. Wrong 61.9% of the time.

DeepSeek Reasoner. Wrong 76.8% of the time.

These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money.

You think turning on web search fixes it.

It doesn't.

Claude Opus 4.5 with web search. Still wrong 30.2% of the time.

GPT-5.2 thinking with web search. Still wrong 38.2% of the time.

The internet attached. Still lying to you in 1 out of every 3 answers.

Now the part that should scare you.

Medical questions. The one place being wrong can kill you.

GPT-5 hallucinated 92.8% of the time on medical guidelines.

Claude Haiku 4.5 hallucinated 95.7% of the time.

Gemini 3 Flash hallucinated 89% of the time.

Nine out of ten medical answers from popular AI models. Wrong.

It gets worse.

The longer you talk to it, the more it lies.

Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first.

The paper, in its own words: "hallucinations remain substantial even with web search."

This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code.

Most are not checking.

Most never will.

But please. Keep using ChatGPT for medical advice.

The doctors need a break.

https://t.co/dHBP5CDpTM

155

810

158K

rooftopzen retweeted

The Path Forward

@ThePathFoward25

about 2 months ago

Translation - Substitute the word "believe" with "Architecting". Lying sociopath.

rooftopzen retweeted

Big Brain AI

@realBigBrainAI

about 2 months ago

Yann LeCun (AMI Labs Founder): "The AI industry is completely LLM-pilled. Everybody is working on the same thing. They're all digging the same trench." LeCun explains why no lab dares break from the pack: "They are stealing each other's engineers. So they can't afford to do something different because if they start going on a tangent, they're going to fall behind the other guys. And so they're all doing the same thing." This groupthink is exactly what drove him out of Meta. "Meta also became LLM-pilled with sort of recent reshuffling. And it's fine, a strategic decision that maybe makes sense for them. It's just not what I'm interested in." For @ylecun, the problem runs deeper than strategy. LLMs are missing something essential about how intelligence actually works: "I cannot imagine that we can build agentic systems without those systems having an ability to predict in advance what the consequences of their actions are going to be. The way we act in the world is that we can predict the consequences of our actions and that's what allows us to plan." His broader critique is that the industry has mistaken fluency for intelligence. Language turned out to be the easy part. The hard part is the physical world. It's why we still don't have domestic robots or level-five self-driving cars, even though today's systems can pass the bar exam and write code.

120

394

942

288K

rooftopzen retweeted

Scott Stevenson

@scottastevenson

2 months ago

It’s time to expose a huge scam in AI startups: Contracted ARR The reason many AI startups are crushing revenue records is because they are using a dishonest metric The biggest funds in the world are supporting this and misleading journalists for PR coverage. The setup: Company signs 3-year enterprise deals. Year 1 is discounted (say $1M), Year 2 steps up ($2M), Year 3 is full price ($3M). They report $3M as “ARR” — even though they’re only collecting $1M right now. The worst part: The customer has an opt-out option at 12 months! It’s not actually a 3 year contract. In the chart below, by Q5 the company is trumpeting ~$100M “ARR” to press, while actual cash-generating, in-effect ARR is ~$35M. That’s ~3x inflation. On top of this, enterprise AI companies are bundling full-time “forward deployed engineers” into deals massively reducing margins, sometimes producing Year 1 negative margins. At some point customers are going to start triggering their opt-out clauses or aggressively negotiating down Year 3 pricing. And a wave of enterprise AI companies may collapse.

scottastevenson's tweet photo. It’s time to expose a huge scam in AI startups: Contracted ARR

The reason many AI startups are crushing revenue records is because they are using a dishonest metric

The biggest funds in the world are supporting this and misleading journalists for PR coverage.

The setup: Company signs 3-year enterprise deals. Year 1 is discounted (say $1M), Year 2 steps up ($2M), Year 3 is full price ($3M).

They report $3M as “ARR” — even though they’re only collecting $1M right now.

The worst part: The customer has an opt-out option at 12 months! It’s not actually a 3 year contract.

In the chart below, by Q5 the company is trumpeting ~$100M “ARR” to press, while actual cash-generating, in-effect ARR is ~$35M. That’s ~3x inflation.

On top of this, enterprise AI companies are bundling full-time “forward deployed engineers” into deals massively reducing margins, sometimes producing Year 1 negative margins.

At some point customers are going to start triggering their opt-out clauses or aggressively negotiating down Year 3 pricing.

And a wave of enterprise AI companies may collapse.

166

142

797

958K

Who to follow

Saul Orbach

@saulorbach

Founder of Pivot-CEO Academy| Startup CEO & Advisor | Venture Capitalist

JoAnne Murphy

@joannemurphy00

Mom to the best kids, love my family, tries everyday to live life to the very fullest. “We are what we think we are”

rooftopzen retweeted

Han

@HanchungLee

2 months ago

at this rate, ai engineers and influencers will discover data engineering by end of 2026, reinventing medallion architecture, analytics, etl, and finally learn about ai and machine learning. 15 years after the field has matured.

HanchungLee's tweet photo. at this rate, ai engineers and influencers will discover data engineering by end of 2026, reinventing medallion architecture, analytics, etl, and finally learn about ai and machine learning.

15 years after the field has matured. https://t.co/8A1j71fIT9

173

11K

rooftopzen retweeted

snwy

@snwy_me

3 months ago

what the actual fuck is he talking about

284

11K

242

711

612K

Jonathan Bennion @rooftopzen

3 months ago

What lying looks like at collegiate level, saving for future entertainment

Rohan Paul

@rohanpaul_ai

3 months ago

UC Berkeley professor Stuart Russell: An AI with IQ 150 could upgrade itself to 170, then 250, very soon leaving humans way behind. A recent Meta paper also wanred self-improving AI is promising but risky, as removing humans can worsen misalignment.

260

100

25K

Jonathan Bennion @rooftopzen

3 months ago

@garrytan

rooftopzen retweeted

Gary Marcus

@GaryMarcus

3 months ago

Don’t use a calculator until you can do the math on your own. Don’t vibe code until you can code – and debug and maintain code – on your own. It’s that simple.

573

24K

rooftopzen retweeted

François Chollet

@fchollet

3 months ago

This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)

193

318

313K

Jonathan Bennion @rooftopzen

3 months ago

Sad effect of marketing anthropomorphism

Aakash Gupta

@aakashgupta

3 months ago

50% of all relationship advice on Reddit is “leave.” 15 years of data, 52 million comments, and the trend line only goes one direction. A researcher filtered r/relationship_advice down to 1,166,592 quality comments and tracked what people actually recommend. In 2010, “End Relationship” sat around 30%. By 2025, it’s approaching 50%. “Communicate” dropped from 22% to 14%. “Compromise” collapsed from 7% to 3%. “Give Space” fell from 25% to 13%. Every category that requires patience lost ground every single year. The one category growing faster than “leave” is “Seek Therapy,” which went from 1% to 6%. The subreddit is slowly learning to say “this is above my pay grade.” Train a model on this dataset and it would absolutely tell people to break up. The training data is 50% “leave” and climbing. The model wouldn’t be broken. It would be accurately reflecting what 52 million commenters actually believe about your relationship. A 50% prior that you should leave, a 14% prior that you should talk about it, and a 6% prior that you need a professional. That’s not LLM psychosis. That’s the median human opinion on your relationship, backed by the largest advice dataset ever assembled.

aakashgupta's tweet photo. 50% of all relationship advice on Reddit is “leave.” 15 years of data, 52 million comments, and the trend line only goes one direction.

A researcher filtered r/relationship_advice down to 1,166,592 quality comments and tracked what people actually recommend. In 2010, “End Relationship” sat around 30%. By 2025, it’s approaching 50%.

“Communicate” dropped from 22% to 14%. “Compromise” collapsed from 7% to 3%. “Give Space” fell from 25% to 13%. Every category that requires patience lost ground every single year.

The one category growing faster than “leave” is “Seek Therapy,” which went from 1% to 6%. The subreddit is slowly learning to say “this is above my pay grade.”

Train a model on this dataset and it would absolutely tell people to break up. The training data is 50% “leave” and climbing. The model wouldn’t be broken. It would be accurately reflecting what 52 million commenters actually believe about your relationship.

A 50% prior that you should leave, a 14% prior that you should talk about it, and a 6% prior that you need a professional. That’s not LLM psychosis. That’s the median human opinion on your relationship, backed by the largest advice dataset ever assembled.

491

16K

Jonathan Bennion @rooftopzen

4 months ago

@Global_Mil_Info @sentdefender The map is not southern Lebanon

271

Jonathan Bennion @rooftopzen

6 months ago

You’re 2 years behind, @sriramk - saw youd deleted your AGI posts from earlier this year but I took screenshots for a future post

Sriram Krishnan

@sriramk

7 months ago

I've come to have mixed feelings on "AGI" and "ASI" as terms to convey where this technology is headed. On one hand, AGI has played a key role in motivating talented people who obsessed over this problem (@demishassabis, @ShaneLegg , @ilyasut , many others) and enabled the flow of capital that made many of our advancements possible. On the other hand, "AGI" or "ASI" or any variant currently actively harms discourse around how the most interesting technology of our lifetime gets built and used. a) it's not an accurate description of where we're headed, at least how most people interpret the term. Look at the recent conversations with @karpathy and @dwarkesh_sp and you instantly see how far we are from anything resembling true human intelligence. No proof of takeoff, timelines keep expanding. We are building very useful technology which could transform how businesses work or how tech is built but has nothing to do with "general intelligence". b) it's become so overloaded that I've found almost no two people define it the same way or agree on timelines (whether we've already reached it or are 30 years away). It's why every blog post on AGI has to conjure up its own local definition to proceed. I mostly subscribe to @random_walker's view on why "intelligence" is used in an almost incoherent way always. c) most importantly, it invokes fear—connected to historical usage in sci-fi and philosophy (think 2001, Her, anything invoking the singularity) that has nothing to do with the tech tree we're actually on. Makes every AI discussion incredibly easy to anthropomorphize and detour into hypotheticals. We may need a different term for what we're trying to build at the end of all this and what it means for business and society.

374

175

77K

rooftopzen retweeted

Shanaka Anslem Perera ⚡

@shanaka86

7 months ago

BREAKING: The $610 Billion AI Ponzi Scheme Just Collapsed Last night at 4pm EST, something unprecedented happened. Nvidia stock rallied 5% on earnings, then crashed into negative territory within 18 hours. Wall Street algorithms detected what humans couldn’t: the numbers don’t add up. Here’s what they found. Nvidia reported $33.4 billion in unpaid bills, up 89% in one year. Customers who bought chips haven’t paid for them yet. The average wait time for payment stretched from 46 days to 53 days. That extra week represents $10.4 billion that may never arrive. Meanwhile, Nvidia stockpiled $19.8 billion in unsold chips, up 32% in three months. But management claims demand is insane and supply is constrained. Both cannot be true. Either customers aren’t buying or they’re buying without cash. The cash flow tells the real story. Nvidia generated $14.5 billion in actual cash but reported $19.3 billion in profit. The gap is $4.8 billion. Healthy chip companies like TSMC and AMD convert over 95% of profits to cash. Nvidia converts 75%. That’s distress level. Here’s where it gets criminal. Nvidia gave $2 billion to xAI. xAI borrowed $12.5 billion to buy Nvidia chips. Microsoft gave OpenAI $13 billion. OpenAI committed $50 billion to buy Microsoft cloud. Microsoft ordered $100 billion in Nvidia chips for that cloud. Oracle gave OpenAI $300 billion in cloud credits. OpenAI ordered Nvidia chips for Oracle data centers. The same dollars circle through different companies and get counted as revenue multiple times. Nvidia books sales, but nobody actually pays. The bills age. The inventory piles up. The cash never comes. AI company CEOs admitted it themselves last week. Airbnb’s CEO called it vibe revenue. OpenAI burns $9.3 billion per year but makes $3.7 billion. That’s a $5.6 billion annual loss. The $157 billion valuation requires $3.1 trillion in future profits that MIT research shows 95% of AI projects will never generate. Peter Thiel sold $100 million in Nvidia on November 9. SoftBank dumped $5.8 billion on November 11. Michael Burry bought put options betting Nvidia crashes to $140 by March 2026. Bitcoin, which tracks AI speculation, dropped from $126,000 in October to $89,567 today. That’s a 29% crash. AI startups hold $26.8 billion in Bitcoin as collateral for loans. When Nvidia falls another 40%, those loans default, forcing $23 billion in Bitcoin sales, crashing crypto to $52,000. The timeline is now certain. February 2026, Nvidia reports fourth quarter and reveals how many bills aged past 60 days. March 2026, credit agencies downgrade. April 2026, the first restatement. The fraud that took 18 months to build unwinds in 90 days. Fair value for Nvidia: $71 per share. Current price: $186. The math is simple. This is the fastest moving financial fraud in history because algorithms detected it in real time. Human investors are 90 days behind. Read the full data driven deep dive article here - https://t.co/sDEf5Mdrtc

shanaka86's tweet photo. BREAKING: The $610 Billion AI Ponzi Scheme Just Collapsed

Last night at 4pm EST, something unprecedented happened. Nvidia stock rallied 5% on earnings, then crashed into negative territory within 18 hours. Wall Street algorithms detected what humans couldn’t: the numbers don’t add up.

Here’s what they found.

Nvidia reported $33.4 billion in unpaid bills, up 89% in one year. Customers who bought chips haven’t paid for them yet. The average wait time for payment stretched from 46 days to 53 days. That extra week represents $10.4 billion that may never arrive.

Meanwhile, Nvidia stockpiled $19.8 billion in unsold chips, up 32% in three months. But management claims demand is insane and supply is constrained. Both cannot be true. Either customers aren’t buying or they’re buying without cash.

The cash flow tells the real story. Nvidia generated $14.5 billion in actual cash but reported $19.3 billion in profit. The gap is $4.8 billion. Healthy chip companies like TSMC and AMD convert over 95% of profits to cash. Nvidia converts 75%. That’s distress level.

Here’s where it gets criminal.

Nvidia gave $2 billion to xAI. xAI borrowed $12.5 billion to buy Nvidia chips. Microsoft gave OpenAI $13 billion. OpenAI committed $50 billion to buy Microsoft cloud. Microsoft ordered $100 billion in Nvidia chips for that cloud. Oracle gave OpenAI $300 billion in cloud credits. OpenAI ordered Nvidia chips for Oracle data centers.

The same dollars circle through different companies and get counted as revenue multiple times. Nvidia books sales, but nobody actually pays. The bills age. The inventory piles up. The cash never comes.

AI company CEOs admitted it themselves last week. Airbnb’s CEO called it vibe revenue. OpenAI burns $9.3 billion per year but makes $3.7 billion. That’s a $5.6 billion annual loss. The $157 billion valuation requires $3.1 trillion in future profits that MIT research shows 95% of AI projects will never generate.

Peter Thiel sold $100 million in Nvidia on November 9. SoftBank dumped $5.8 billion on November 11. Michael Burry bought put options betting Nvidia crashes to $140 by March 2026.

Bitcoin, which tracks AI speculation, dropped from $126,000 in October to $89,567 today. That’s a 29% crash. AI startups hold $26.8 billion in Bitcoin as collateral for loans. When Nvidia falls another 40%, those loans default, forcing $23 billion in Bitcoin sales, crashing crypto to $52,000.

The timeline is now certain. February 2026, Nvidia reports fourth quarter and reveals how many bills aged past 60 days. March 2026, credit agencies downgrade. April 2026, the first restatement. The fraud that took 18 months to build unwinds in 90 days.

Fair value for Nvidia: $71 per share. Current price: $186. The math is simple.

This is the fastest moving financial fraud in history because algorithms detected it in real time. Human investors are 90 days behind.

Read the full data driven deep dive article here - https://t.co/sDEf5Mdrtc

27K

18K

Jonathan Bennion @rooftopzen

7 months ago

@DavidSacks My post last week on probable intent of your cohorts - you're not an engineer (you are sales) https://t.co/K1zDqZECO3

rooftopzen retweeted

Susan Zhang

@suchenzang

8 months ago

for just $6,800, you can earn bragging rights for having only spent 2 weeks "learning AI" before becoming a VP / Tech Lead of "agentic AI" teams 👀

suchenzang's tweet photo. for just $6,800, you can earn bragging rights for having only spent 2 weeks "learning AI" before becoming a VP / Tech Lead of "agentic AI" teams 👀 https://t.co/14OAYUIvZe

119

17K

Jonathan Bennion @rooftopzen

8 months ago

@jxmnop thx - @garrytan is a hype dude without much understanding of tech and I usually ignore his posts but got upset at this one too. Let him fail on ROI.

Jack Morris

@jxmnop

8 months ago

this post is complete misinformation LLMs are lossy compressors! of *training data*. LLMs losslessly compress *prompts*, internally. that’s what this paper shows. source: i am the author of “Language Model Inversion”, the original paper on this

211

881

381K

Jonathan Bennion @rooftopzen

8 months ago

Means he’s outsourcing to India (it’s 2025 not 2023)

Exec Sum

@exec_sum

8 months ago

NEWS: Starbucks CEO Brian Niccol says the coffee giant is now “all-in on AI.”

249

588

152

Jonathan Bennion @rooftopzen

8 months ago

@athyuttamre obv you are intent on hiring robots to influence social opinion on a trash product - how is your conscience?

Atty Eleti

@athyuttamre

8 months ago

OpenAI is looking for a very online™ person who loves working with developers to run our social media. DMs open if this is you!

athyuttamre's tweet photo. OpenAI is looking for a very online™ person who loves working with developers to run our social media.

DMs open if this is you! https://t.co/bcv2ndVYHi

488

134

72K

Jonathan Bennion @rooftopzen

9 months ago

@rohanpaul_ai spreading what is trending is better when you consider all angles first.. your history is now in training data and includes all the things

Rohan Paul

@rohanpaul_ai

9 months ago

Today’s edition of my newsletter just went out. 🔗 https://t.co/tEFjPubUgK Consider subscribing, its free, and I write it everyday. 🔬 Microsoft research finds AI is not yet ready for real-world medical diagnosis 🏆 Google DeepMind just released its very first robotics AI models, called Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. 🛠️ New paper shows a great way to make RAG much faster and more accurate 💼 AI just passed a brutal finance exam most humans fail

Jonathan Bennion

@rooftopzen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users