grsimari

@grsimari

Professor Emeritus of Logic for Computer Science and Artificial Intelligence, DCIC, ICIC, Univ. Nac. del Sur in Bahia Blanca, Argentina, WashU and UNS alumnus.

Joined November 2008

1K Following

492 Followers

3.3K Posts

grsimari @grsimari

1 day ago

An improved poster is coming up. This is just a heads up.

grsimari retweeted

Valerio Capraro

@ValerioCapraro

9 days ago

The Pope is making exactly our point. LLMs “may imitate or even simulate, but they do not understand.” This is the core epistemic fault line. Most AI evaluation is still based on one assumption: if a system statistically approximates human behaviour, then it is close to human intelligence. But approximation is not intelligence. Simulation is not understanding. LLMs can produce the right answer without knowing why it is right. They can simulate empathy without feeling. They can imitate judgment without responsibility. They can generate coherent explanations without having a world to which those explanations are accountable. Stop confusing behavioural similarity with cognitive equivalence. Human understanding is embodied, affective, relational, motivational, and normative. It is not just the production of plausible text. * Full paper in the first reply

ValerioCapraro's tweet photo. The Pope is making exactly our point. LLMs “may imitate or even simulate, but they do not understand.”

This is the core epistemic fault line.

Most AI evaluation is still based on one assumption: if a system statistically approximates human behaviour, then it is close to human intelligence.

But approximation is not intelligence.
Simulation is not understanding.

LLMs can produce the right answer without knowing why it is right. They can simulate empathy without feeling. They can imitate judgment without responsibility. They can generate coherent explanations without having a world to which those explanations are accountable.

Stop confusing behavioural similarity with cognitive equivalence.

Human understanding is embodied, affective, relational, motivational, and normative. It is not just the production of plausible text.

*
Full paper in the first reply

263

203K

grsimari retweeted

Nora Bär @norabar

11 days ago

Sensacional hallazgo de @GamarnikLab y su grupo 👇A 20 años d haber develado un mecanismo clave de la reproducción del virus dengue, descubrieron que lo utilizan más de 20 virus de ese género; una “llave maestra” para un antiviral contra todos ellos https://t.co/En7oaplfI5

596

228

15K

grsimari retweeted

Nora Bär @norabar

17 days ago

Lo único que hay que hacer. Cumplir la ley 👇👇👇👇👇

Who to follow

Ryohei Sasaki@engineer

@rsasaki0109

Software Engineer at MAP IV(TIER IV group) AI/Robotics/Autonomous Driving/GNSS/LiDAR/IMU/SLAM/Localization/Mapping

Jonas Andrulis

@JonasAndrulis

Serial entrepreneur, engineer and enemy of entropy. Building an inspiring AI R&D team (https://t.co/MXqYVpSWb1). Come join!

Sophia Sanborn

@naturecomputes

Scientist / Founder | Neuro + AI | Prof @stanford | https://t.co/ymy1Hq4vbf & @metamorphiclabs

grsimari retweeted

Nora Bär @norabar

about 2 months ago

#Cientificidiosinfin 👇

grsimari retweeted

Nora Bär @norabar

about 2 months ago

No hay innovación sin ciencia básica. No hay ciencia básica sin universidades públicas y gratuitas que promuevan la formación de talentos diversos 👇 https://t.co/52XCmr99tU

554

191

10K

grsimari @grsimari

about 2 months ago

@Fedocles @JuanPDAmato Podrías explicarme como juntas los PICT con los sueldos universitarios? El resto de lo que decís es lamentable. Ni Macri ni Milei pagaron nada todavía, solo nos endeudaron más. Los PICT florecieron sin deuda externa.

grsimari retweeted

Nav Toor

@heynavtoor

2 months ago

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

heynavtoor's tweet photo. 🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves.

And the way they proved it is devastating.

Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers.

Every model's performance dropped. Every single one. 25 state-of-the-art models tested.

But that wasn't the real experiment.

The real experiment broke everything.

They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly.

Here's the actual example from the paper:

"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"

The correct answer is 190. The size of the kiwis has nothing to do with the count.

A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are.

But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185.

Llama did the same thing. Subtracted 5. Got 185.

They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction.

The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all.

Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing.

The results are catastrophic.

Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence.

GPT-4o dropped from 94.9% to 63.1%.

o1-mini dropped from 94.5% to 66.0%.

o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%.

Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause.

This means it's not a prompting problem. It's not a context problem. It's structural.

The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense.

The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data."

And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts."

They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse.

A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash.

This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world.

You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

857

11K

grsimari retweeted

Marcelo Tedesco @MikeTangoAlfa

2 months ago

La Justicia confirmó que el Gobierno debe cumplir con la Ley de Financiamiento Universitario Con duros cuestionamientos al Ejecutivo, la Cámara ratificó que se deberán actualizar salarios, becas y programas de investigación. #cumplanlaley

MikeTangoAlfa's tweet photo. La Justicia confirmó que el Gobierno debe cumplir con la Ley de Financiamiento Universitario

Con duros cuestionamientos al Ejecutivo, la Cámara ratificó que se deberán actualizar salarios, becas y programas de investigación.
#cumplanlaley https://t.co/VGxwOaod33

527

grsimari retweeted

Nora Bär @norabar

2 months ago

¡Buen miércoles! Hoy 🌧️⛈️⛈️ 30° de máx. #Cientificidio

grsimari retweeted

Gabriel Castro

@GabrielCastroOK

8 months ago

@El_Puchu Que ingenuos fuimos cuando pensábamos que solo era una película

grsimari retweeted

Universidad Nacional del Sur

@UNS_oficial

9 months ago

🏛️La universidad pública es un orgullo argentino y necesitamos cuidarla👩‍🎓👨‍🎓 El Rectorado de la UNS manifiesta su profunda preocupación frente al veto presidencial a la Ley de Financiamiento Universitario 👇

UNS_oficial's tweet photo. 🏛️La universidad pública es un orgullo argentino y necesitamos cuidarla👩‍🎓👨‍🎓

El Rectorado de la UNS manifiesta su profunda preocupación frente al veto presidencial a la Ley de Financiamiento Universitario 👇 https://t.co/U8kCwXZEka

464

124

10K

grsimari @grsimari

9 months ago

AI is for losers: A Manifesto. While I do not entirely agree, there are many valid points in that text. https://t.co/gObvO4UJ1r

grsimari retweeted

Love Music

@khnh80044

9 months ago

Freddie is looking down and giving y'all a standing ovation. That's spectacular!😍💗 The most INSANE Bohemian Rhapsody Flashmob you will ever see!! With 30 musicians and singers in the STREET of Paris 😍 Cre : Julien Cohen Pianist

198K

54K

58K

10M

grsimari @grsimari

9 months ago

https://t.co/mb2pohD7Ng

grsimari @grsimari

9 months ago

I Anthropic tells US judge it will pay $1.5 billion to settle author class action | CNN Business https://t.co/4tNMuojkIU

grsimari retweeted

Raph 🇧🇷

@RaphinaaV1

12 months ago

🐐

RaphinaaV1's tweet photo. 🐐 https://t.co/ssOskcZZXg

457

55K

474

923K

grsimari retweeted

Marcelo Tedesco @MikeTangoAlfa

about 1 year ago

Se escribe "Universidad Nacional del Sur" Se pronuncia "mejor del mundo mundial"

639

grsimari retweeted

Valencia CF

@valenciacf

about 1 year ago

🦇 Ante la injusticia y falsedades cometidas con la afición del Valencia CF, desde el Club hemos exigido por escrito una rectificación inmediata a la productora del documental por lo ocurrido en Mestalla y que no se corresponde con la realidad. La verdad y el respeto a nuestra afición deben prevalecer. El Valencia CF se reserva las acciones judiciales que en derecho le asistan #RESPECT

226K

19K

10K

26M

grsimari retweeted

Luiza Jarovsky, PhD

@LuizaJarovsky

about 1 year ago

🚨 New study reveals that when used to summarize scientific research, generative AI is nearly five times LESS accurate than humans. Many haven't realized, but Gen AI's accuracy problem is worse than initially thought.

LuizaJarovsky's tweet photo. 🚨 New study reveals that when used to summarize scientific research, generative AI is nearly five times LESS accurate than humans.

Many haven't realized, but Gen AI's accuracy problem is worse than initially thought. https://t.co/p0GQM6kK32

750

576K

grsimari

@grsimari

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users