@Nano1205@joanamaraldias É simples: Precisamos de 5 a 6 milhões de trabalhadores para sustentar as pensões dos que daqui a 30-40 anos sejam reformados. De onde virão esses 5-6 milhões, não sei. Ou temos mais filhos, ou vêm de fora... Ou então dizemos adeus às reformas ou idade de reforma...
This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly:
Can LLMs actually discover science, or are they just good at talking about it?
The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder:
Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists?
Here’s what the authors did differently 👇
• They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision
• Tasks span biology, chemistry, and physics, not toy puzzles
• Models must work with incomplete data, noisy results, and false leads
• Success is measured by scientific progress, not fluency or confidence
What they found is sobering.
LLMs are decent at suggesting hypotheses, but brittle at everything that follows.
✓ They overfit to surface patterns
✓ They struggle to abandon bad hypotheses even when evidence contradicts them
✓ They confuse correlation for causation
✓ They hallucinate explanations when experiments fail
✓ They optimize for plausibility, not truth
Most striking result:
`High benchmark scores do not correlate with scientific discovery ability.`
Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories.
Why this matters:
Real science is not one-shot reasoning.
It’s feedback, failure, revision, and restraint.
LLMs today:
• Talk like scientists
• Write like scientists
• But don’t think like scientists yet
The paper’s core takeaway:
Scientific intelligence is not language intelligence.
It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.”
Until models can reliably do that, claims about “AI scientists” are mostly premature.
This paper doesn’t hype AI. It defines the gap we still need to close.
And that’s exactly why it’s important.
#Business school leaders want staff to produce research that’s relevant to societal ills. However, incentives are designed to produce opposite results. Learn more: https://t.co/4XT5zhTvzl
By Stefan Stremersch, Russell S. Winer, and @nmcamacho#businesschoolhealth
OPEN ACCESS: New JAMS study offers a novel framework to help firms validate their #innovation ideas with smart #customerinsights.
https://t.co/3cD1v6Uz8m
This study on research incentives in bus schools finds:
1: the number of papers gets too much weight, while creativity, relevance & awards gets not enough
2: profs feel insufficiently rewarded for research, while deans feel they are rewarded too much
https://t.co/blqCiVYfCT
#Business school leaders want staff to produce research that’s relevant to societal ills. However, incentives are designed to produce opposite results. Learn more: https://t.co/iZ1Zo7sjBo
Full JM article: https://t.co/LNBPtFu1Xs
#MarketingAcad#businesschoolhealth@nmcamacho
I love elegant visual illusions. They reveal the inner workings of our minds & how easy it is to confuse our perceptions with reality.
If we can’t get simple stationary cubes right , shouldn't we be warier of confident opinions on complex social systems? Beware of naive realism!
Cool clarification of the meaning of tweets and retweets by Philip E. Tetlock: Likes = interesting; Retweets = very interesting; Interesting ≠ endorsement
Collective layoffs can do more than harm staff morale: They also hurt sales and advertising effectiveness while increasing price sensitivity. Learn more: https://t.co/KaNzKeWbbA
By Vardit Landsman and Stefan Stremersch