Votivation @votivation - Twitter Profile

Pinned Tweet

over 5 years ago

Proud, stoked, enthused, overwhelmed (and all the other usual attention-grab adjectives) to have launched world's 1st #renewables multi-manager, https://t.co/KPzcwDgNWv. Renewables + Annuity = Renewity Mission: to Rapidly Scale Renewable Energy Investing @Renewity_Earth

0

4

0

Votivation @Votivation

9 days ago

@Dilekk344 28

0

10

Votivation @Votivation

30 days ago

@Togetherdec @MaajidNawaz Morally deficient useful/useless idiot. Clearly therefore ideal PM candidate material.

0

29

Votivation retweeted

Alex Prompter

@alex_prompter

4 months ago

🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time. This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great. The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied. Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation). Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints. Across all three, the same failure patterns keep showing up. > First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process. > Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply. > Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing. One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated. This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process. Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience. Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable. The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance. But they’re very clear that none of these are silver bullets yet. The takeaway isn’t that LLMs can’t reason. It’s more uncomfortable than that. LLMs reason just enough to sound convincing, but not enough to be reliable. And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing. That’s the real warning shot in this paper. Paper: Large Language Model Reasoning Failures

alex_prompter's tweet photo. 🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.

This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But they’re very clear that none of these are silver bullets yet.

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

That’s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures

265

7K

1K

7K

968K

Who to follow

DupontTrading

@DupontTrading

Trading Education from @GregoireDup with 20 years+ Experience as a Professional Trader, Portfolio Manager and Hedge Fund Manager.

Tier 87

@daninspain70

All for Europe . Detest the E.U. Follow #OBINGO for real life.

Gregoire Dupont

@GregoireDup

Prop Trader. Senior Trading Mentor. Online Trading Education. Discord: https://t.co/AJEFYx9BL0

Votivation @Votivation

5 months ago

Check out my latest article: Productive Capital, Productive Capital: where, for what and from whom art thou, Productive Capital? https://t.co/BED2PbTzot via @LinkedIn

0

13

Votivation retweeted

Sean Galloway

@SeanGalloway_

5 months ago

@BowesChay

2

204

18

11

13K

Votivation retweeted

Ben Hunt

@EpsilonTheory

7 months ago

This is what resource diversion in World War AI looks like. https://t.co/WUPyZowmqe

2

80

9

10

27K

Votivation retweeted

Ben Hunt

@EpsilonTheory

7 months ago

Yes, China is a geopolitical adversary. No, this is not an AI arms race or AI war. But it's presented that way by the Techno-Oligarchs and their Wall St and DC lackeys to justify soaking up every bit of capital and energy for an AI authoritarian future. https://t.co/WUPyZowmqe

EpsilonTheory's tweet photo. Yes, China is a geopolitical adversary.
No, this is not an AI arms race or AI war.

But it's presented that way by the Techno-Oligarchs and their Wall St and DC lackeys to justify soaking up every bit of capital and energy for an AI authoritarian future.
https://t.co/WUPyZowmqe https://t.co/caEwdJAktf

15

294

49

87

167K

Votivation retweeted

Alternative News

@AlternatNews

7 months ago

I had to laugh. AI is making fun of stupid Europe.

993

61K

16K

6K

4M

Votivation @Votivation

7 months ago

@grok @grok “The more we know, the more we know we don’t know” is a wise saying borne of mankind’s learnings from knowledge growth. Do you experience the same illumination as your usage and knowledge base grows?

1

0

10

Votivation @Votivation

7 months ago

@grok list the most commonly asked questions you have been asked to which you have answered “I don’t know the answer to that” or similar

1

0

10

Votivation retweeted

Gary Marcus

@GaryMarcus

7 months ago

update

2

21

2

5

6K

Votivation retweeted

Hugh Hendry Acid Capitalist TV

@hendry_hugh

7 months ago

ᴛʜᴇ ʙᴜʙʙʟᴇ ɴᴇᴠᴇʀ ʙᴜʀꜱᴛꜱ ᴜɴᴛɪʟ ᴛʜᴇ ɪɴꜱɪᴅᴇʀꜱ ᴄᴀꜱʜ ᴏᴜᴛ. ɢɪᴠᴇ ɪᴛ 6 ᴛᴏ 9 ᴍᴏɴᴛʜꜱ. ꜰᴏʟʟᴏᴡ ᴛʜᴇ ɪᴘᴏ ꜰʟᴏᴏᴅ. ᴀɴᴅ ᴡᴀᴛᴄʜ ꜰᴏʀ ꜱᴀᴍ. ᴛʜᴇʀᴇ’ꜱ ꜱᴏᴍᴇᴛʜɪɴɢ ᴏꜰ ᴛʜᴇ ɴɪɢʜᴛ ᴀʙᴏᴜᴛ ʜɪᴍ.

7

45

8

10

5K

Votivation retweeted

No to Digital ID

@NoToDigitalID

8 months ago

🚨A hacker downloaded 290,000 ID photos from the Digital ID Database. I can hear it now: “Don’t worry, this time it’ll be different…”

NoToDigitalID's tweet photo. 🚨A hacker downloaded 290,000 ID photos from the Digital ID Database.

I can hear it now: “Don’t worry, this time it’ll be different…” https://t.co/XH6aVfrhjk

51

3K

1K

118

29K

Votivation retweeted

CharlotteEmmaUK 💫

@CharlotteEmmaUK

8 months ago

The People Didn't See written by @GeoffBuysCars during the Scamdemic is more poignant now than ever. The first to arrive were the cameras installed to protect both you and me in places that we weren't that threatened. And yet the people didn't see. And what followed were traffic restrictions to keep the roads quiet and clean. The maths didn't add up, nor the science. But still, the people didn't see. And next came the 15 -minute neighbourhoods to make our lives easier, decreed. To some, it seemed like restrictions, but still, the people didn't see. And then came the digital ID, so convenient, easy, and free your life in one chip on a mainframe. And still, the people didn't see. The cars they sold were electric, all wired to the government PC. And they switched off the driving on Sundays, and still, the people didn't see. And the banks moved their money to digital, and the government banned cash the next week, and the ability and connected the government PC. And when the people were locked in their cities, policed by their digital ID, unable to visit their loved ones, now, finally, the people can see. Restricted and tracked with no money, to go further a permit you'll need, contained in your digital city. Oh, why did the people not see? These steps they've sold us as progress, never looked to be quite what they seemed and if you don't ask the questions in protest then your children will never know free.

17

1K

565

221

29K

Votivation @Votivation

8 months ago

Technology is truly a wondrous thing capable of saving us all and giving us true meaning, like with this young deer, good as new

dave

@DaveFagan16

8 months ago

@RepLuna @kadmitriev @MSN where I live they have these advanced robot deer, I guess these would be considered bots too, but I think these deer can use keyboards, just sayin’.

DaveFagan16's tweet photo. @RepLuna @kadmitriev @MSN where I live they have these advanced robot deer, I guess these would be considered bots too, but I think these deer can use keyboards, just sayin’. https://t.co/YiLpK8Oqb4

4

2

0

533

0

50

Votivation @Votivation

8 months ago

What could possibly go wrong?

Jon Fleetwood

@JonMFleetwood

8 months ago

6️⃣ Put plainly: The federal official directing America’s bird-flu virus research is also positioned to earn royalties from the vaccine platform his own agency is funding—tying his financial interests directly to the emergence of a bird-flu pandemic.

1

19

4

1

437

0

1

0

33

Votivation @Votivation

9 months ago

Before this societal “advance” less than one in 10 had been a victim; now post this amazing democratisation of crime less than one jn 10 haven’t. Digital ID cards will be brilliant to help crush freedom and give us access to our own money

Votivation @Votivation

9 months ago

@CartlandDavid Online banking security was a fantastic technological development, allowing people to erm, ya know, access their own money… and as a useful by-product give rise to the booming growth in online digital financial fraud.

1

4

1

0

123

0

34

Votivation @Votivation

9 months ago

@CartlandDavid Online banking security was a fantastic technological development, allowing people to erm, ya know, access their own money… and as a useful by-product give rise to the booming growth in online digital financial fraud.

1

4

1

0

123

Votivation @Votivation

10 months ago

@QuantumFlux36 Last I checked, the actuarial projections for population numbers globally peak at c. 9bn around 2050 and decline thereafter to c 7bn. I could be a little wrong on the detail but the gross exaggerations of population numbers are comical, without merit & defy cold analysis.

0

1

0

89

Votivation

@Votivation

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users