Jamie Patterson 🚅 @Jamie_Patterson - Twitter Profile

about 1 month ago

After multiple attempts to get different AI models to help me with my taxes, I was forced to give up and use a simple spreadsheet. AI couldn’t do simple math and kept inventing numbers or ignoring instructions. It was like hiring the world’s worst accountant.

Jamie_Patterson's tweet photo. After multiple attempts to get different AI models to help me with my taxes, I was forced to give up and use a simple spreadsheet. AI couldn’t do simple math and kept inventing numbers or ignoring instructions. It was like hiring the world’s worst accountant. https://t.co/le6EOzuyAI

0

18

Jamie Patterson 🚅 @Jamie_Patterson

about 1 month ago

Tried to use AI to convert my USD stock trades to CAD for the purpose of doing my taxes. Claude demanded money, Gemini hallucinated its own numbers and Grok just froze. Guess it's back to doing math the old fashioned way.

1

0

51

Jamie_Patterson retweeted

JR Urbane Network

@JRUrbaneNetwork

about 1 month ago

When I was taking a JR train in Osaka last month, I kept wondering: Wouldn't it be nice if Toronto had something similar? A post on the history, struggles and hope Japan can bring to the development of regional rail in Toronto. Filled with video of JR-W trains too. Next post...

JRUrbaneNetwork's tweet photo. When I was taking a JR train in Osaka last month, I kept wondering: Wouldn't it be nice if Toronto had something similar? A post on the history, struggles and hope Japan can bring to the development of regional rail in Toronto. Filled with video of JR-W trains too. Next post... https://t.co/ird9IWe9Z7

5

94

13

18

11K

Jamie_Patterson retweeted

Tina Yazdani

@TinaYazdani

about 2 months ago

I am no longer employed by CityNews. I am proud of my journalism at CityNews and I stand by my reporting. I will have more to say on this later but for now please stay tuned and thank you for those who have supported me.

792

12K

2K

622

1M

Who to follow

Sergiu Mocanu Ӿ

@sermoc88

#NANO $XNO Ӿ (Robots money) is the future global payment. https://t.co/Z4ZFZDtQh8 #Natrium $xno #Ӿ https://t.co/EQGeX2D7i3

Charlie Beer Ӿ

@CharlieBeer1

savcrypt base.eth Zetarium

@savnathz

NFT, DF,I DEGEN , SOCIAL MEDIA MANAGER #KOR

Jamie_Patterson retweeted

Nathaniel Arfin

@ArfinNathaniel

about 2 months ago

This is absolutely infuriating. Yazdani has been doing amazing work covering education in this province, and holding the minister to account. @CityNewsTO this is absurd and demands explanation.

ArfinNathaniel's tweet photo. This is absolutely infuriating. Yazdani has been doing amazing work covering education in this province, and holding the minister to account.

@CityNewsTO this is absurd and demands explanation. https://t.co/8jBoOXz4hv

29

615

275

14

15K

Jamie_Patterson retweeted

Massimo

@Rainmaker1973

2 months ago

The stock market you don’t know about.

35

2K

530

2K

169K

Jamie Patterson 🚅 @Jamie_Patterson

2 months ago

You’d have to be pretty dead inside to know this exists and be like “nah, not for me” #highspeedrail @altotrain

0

1

0

27

Jamie_Patterson retweeted

FastTrackTO @FastTrackTO

3 months ago

FastTrackTO is launching and releasing our 10 point plan to fix the Toronto Streetcar network. A symbol of the city that could be great with achievable, common sense and low cost changes. The plan if implemented would transform the city. https://t.co/9RpJtWQwtM (1/2)

17

392

88

62

85K

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

@thesnlnetwork Looks like Al Nash is a secret "Gregg head" as he wore a VFA hat at the end of the show.

0

290

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

@OCATCOfficial one of the cast members of the new #snluk is a true cinefile. He wore a VFA hat at then end of the premier last night. #vfa #victorvillefilmarchive

Saturday Night Network

@thesnlnetwork

3 months ago

Goodnight to the first ever episode of #SNLUK! 🇬🇧

167

2K

157

79

797K

0

1

0

179

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

Chuck Norris was single handedly propping up the stock market. Monday is going to be a bloodbath.

0

63

Jamie_Patterson retweeted

Gul Dukat @realGulDukat

3 months ago

Lal just died. Good, I’m glad she’s dead. She can no longer hurt innocent people! President GUL DUKAT #StarTrek #Sisko197

22

663

63

15

18K

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

@EricONCA @RM_Transit To your point a boy was killed by a GO train today, so yeah, accidents do happen. I wonder though if trains were more frequent if people would actually learn to take train crossings more seriously. https://t.co/dbEu7FscXv

0

1

0

121

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

@RandyRen4 @alleria_eh @unicornflyers @theliamnissan @grok @grok is ask grok still available in canada!

1

3

0

106

Jamie Patterson 🚅 @Jamie_Patterson

3 months ago

@RJCity1 @RealBryanClark

0

2

0

407

Jamie Patterson 🚅 @Jamie_Patterson

4 months ago

Why The U.S. Can’t Copy Japanese 7-Eleven | AB Explained https://t.co/nnRrqdHZvM via @YouTube

0

1

0

49

Jamie_Patterson retweeted

Alex Chalmers

@chalmermagne

4 months ago

Back in September 2024, @nathanbenaich and I wrote about this nonsense purveyor: https://t.co/3K9NWhQL9w Disappointing to see so many people taking this seriously.

20

633

80

295

171K

Jamie Patterson 🚅 @Jamie_Patterson

4 months ago

“LLMs reason just enough to sound convincing, but not enough to be reliable.”

God of Prompt

@godofprompt

4 months ago

🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time. This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great. The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied. Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation). Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints. Across all three, the same failure patterns keep showing up. > First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process. > Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply. > Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing. One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated. This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process. Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience. Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable. The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance. But they’re very clear that none of these are silver bullets yet. The takeaway isn’t that LLMs can’t reason. It’s more uncomfortable than that. LLMs reason just enough to sound convincing, but not enough to be reliable. And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing. That’s the real warning shot in this paper. Paper: Large Language Model Reasoning Failures

godofprompt's tweet photo. 🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.

This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But they’re very clear that none of these are silver bullets yet.

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

That’s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures

265

7K

1K

7K

967K

0

33

Jamie_Patterson retweeted

rahim @DirectorVMH

4 months ago

I always loved watching old SCTV reruns for Catherine O'Hara (but my favourite parts were when they just got totally absurd) So this little excerpt from Monster Chiller Horror Theatre: Whispers of The Wolf - was one of my fave, just out there performances. She was one of a kind.

0

1

0

179

Jamie Patterson 🚅

@Jamie_Patterson

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users