emissary @MrFixedIncome - Twitter Profile

4 days ago

@JimDMiller If you remove all the scary stuff from the original prompt and just try as neutral language as possible the transformer latches onto the word photo and assumes family photo because of the other word restoration.

MrFixedIncome's tweet photo. @JimDMiller If you remove all the scary stuff from the original prompt and just try as neutral language as possible the transformer latches onto the word photo and assumes family photo because of the other word restoration. https://t.co/ssuJvdFzUI

0

167

emissary @MrFixedIncome

4 days ago

@JimDMiller You guys realize you are seeding the image generation with a negative connotation by saying things like > strange > content > don’t ask questions > close your eyes It’s actually perfectly clear what the ai is doing. The transformer picks up the relationship between those

2

3

0

3K

emissary @MrFixedIncome

4 days ago

@JimDMiller If you don’t specify to make up the photo yourself, it gives a black square. If you use nicer flowerly language and specify, the transformer picks up on that and seeds the image generator to make a nicer picture. It’s literally how transformers work.

MrFixedIncome's tweet photo. @JimDMiller If you don’t specify to make up the photo yourself, it gives a black square. If you use nicer flowerly language and specify, the transformer picks up on that and seeds the image generator to make a nicer picture.

It’s literally how transformers work. https://t.co/TFxCdejpth

0

2

0

144

emissary @MrFixedIncome

10 days ago

@TheMindScourge So you think he’s joining open ai to work on product? He’s going there to label data or to work on a new model? Instead of the much more likely possibility that he is going there as an internal researcher, publishing ai assisted papers for them to use in marketing materials?

0

158

Who to follow

Director of Rizzonomics | Shieldify Security | @versandlukas podcast | pronouns: king/master/sir

maru.eth

@wasserpest

please be my friend! ◯ creator of wassies and founder of the #cryptosoupgroup

emissary @MrFixedIncome

2 months ago

This meme was just a year and a half too early

0

1

0

136

emissary @MrFixedIncome

2 months ago

@JamrockHobo Planescape Torment… which I am sure most Disco Elysium fans have played

0

9

0

1

302

emissary @MrFixedIncome

2 months ago

> new ai paper saying ai sucks > models from 2 years ago > like clockwork

Nav Toor

@heynavtoor

2 months ago

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

heynavtoor's tweet photo. 🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves.

And the way they proved it is devastating.

Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers.

Every model's performance dropped. Every single one. 25 state-of-the-art models tested.

But that wasn't the real experiment.

The real experiment broke everything.

They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly.

Here's the actual example from the paper:

"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"

The correct answer is 190. The size of the kiwis has nothing to do with the count.

A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are.

But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185.

Llama did the same thing. Subtracted 5. Got 185.

They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction.

The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all.

Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing.

The results are catastrophic.

Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence.

GPT-4o dropped from 94.9% to 63.1%.

o1-mini dropped from 94.5% to 66.0%.

o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%.

Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause.

This means it's not a prompting problem. It's not a context problem. It's structural.

The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense.

The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data."

And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts."

They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse.

A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash.

This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world.

You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

857

11K

3K

6K

2M

0

3

0

217

MrFixedIncome retweeted

Mini Modu @MinModulation

2 months ago

Resident Evil...Which I'm sure most Resident Evil fans have played...

10

4K

272

145

48K

emissary @MrFixedIncome

2 months ago

@meatballtimes There will be no mass unemployment. You only assume so because you cannot imagine what jobs will be created from freeing up human capital via automation. The jobs of tomorrow will be unimaginable to the jobs of today, but there will still be abundant jobs

1

0

44

emissary @MrFixedIncome

3 months ago

Just wait until you start noticing how every unreal engine 5 game has the same menu system and more often, the same graphics options configuration

exQUIZitely 🕹️

@exQUIZitely

3 months ago

Game menus had a different vibe back then. I miss this style. The modern minimalistic design feels bland, and it's lacking soul. I am sure a lot of research goes into AB testing and optimization, but I will always prefer this...

266

8K

544

532

372K

0

2

0

174

emissary @MrFixedIncome

3 months ago

@BullyEsq This guy has been in the foreign ministry for 30 years

0

54

emissary @MrFixedIncome

3 months ago

@Tophmilio I know. I just reconnected with him 2 months ago and we shared a brief dm exchange. I don’t know what happened.

0

3

0

93

emissary @MrFixedIncome

3 months ago

Rest in peace yv

1

39

0

2K

emissary @MrFixedIncome

3 months ago

@midascabal Moron. This is an etf of us oil companies. You are looking for CL

0

4

0

129

emissary @MrFixedIncome

3 months ago

@coolpatiens Yes

0

1

0

49

emissary @MrFixedIncome

3 months ago

@formershell 4gb of extra ram just to have mumbaisoft break notepad and try to sell you shit in the settings app

0

1

0

86

emissary @MrFixedIncome

3 months ago

@LRH_Superfan Not just any Chinese dude. Chiang Kai-Shek

5

34

0

1

4K

emissary @MrFixedIncome

3 months ago

@fuckyourputs Dude has been shit talking Fani Willis and endorsing candidates in between taunting Iran and providing war updates

0

1

0

327

emissary @MrFixedIncome

3 months ago

@fitnessfeelingz So you’re saying we should’ve sacrificed thousands of US troops in our bases to get a casus belli we didn’t need? After we bombed Natanz and Ifsahan the last time there is no way Iran would hold out on us after getting pelted by Israel again

0

113

emissary

@MrFixedIncome

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users