Graeme Davidson

about 22 hours ago

It certainly feels this way more often than it used to, especially when building production systems rather than throw away experiment and investigation code.

kache

@yacineMTB

2 days ago

the amount of time i spend cleaning up LLM code is greater or equal to the amount of time it would have taken me to write it myself

235

2K

78

153

113K

0

12

gra_davidson retweeted

kache

@yacineMTB

2 days ago

the amount of time i spend cleaning up LLM code is greater or equal to the amount of time it would have taken me to write it myself

235

2K

78

153

113K

gra_davidson retweeted

Co-Founder/CEO @solveintel (YC S23) | Prev: @turinginst @instadeepai @Dyson - PhD @ucl, MRes @Cambridge_Uni, MEng @imperialcollege

24 days ago

Neural networks can overfit — they memorize the training data rather than learning a general rule, which makes them fail on new inputs. Everyone knows this, and the standard fix is to watch the gap between training accuracy and test accuracy during training. This paper by @CalcCon and his collaborator focuses on a situation where that standard approach breaks down: you have a trained model's weights, no training history, no test set, and no idea whether the model learned something robust or something brittle. The authors study a known phenomenon called grokking, where a model first memorizes, then much later suddenly generalizes — and they extend training even further to find a third phase where generalization collapses again while training accuracy stays perfect. The two failure phases (before generalization, and after it collapses again) look identical from the outside but turn out to have a structural difference detectable directly in the weight matrices, without any data at all. The detection method comes from random matrix theory: you shuffle each weight matrix element-by-element, build a covariance-type matrix from the shuffled weights, compute its eigenvalue spectrum, fit that spectrum to a known theoretical distribution, and look for eigenvalues that stick out well beyond where they should be. In a well-trained model these don't appear; in an overfit one they do. The paper then applies this to two large open-weight models from OpenAI and finds the same signatures. Learn from the paper with an AI tutor: https://t.co/3el0pbtqJ7 PDF: https://t.co/5f4NrkLcWh

burkov's tweet photo. Neural networks can overfit — they memorize the training data rather than learning a general rule, which makes them fail on new inputs. Everyone knows this, and the standard fix is to watch the gap between training accuracy and test accuracy during training.

This paper by @CalcCon and his collaborator focuses on a situation where that standard approach breaks down: you have a trained model's weights, no training history, no test set, and no idea whether the model learned something robust or something brittle.

The authors study a known phenomenon called grokking, where a model first memorizes, then much later suddenly generalizes — and they extend training even further to find a third phase where generalization collapses again while training accuracy stays perfect.

The two failure phases (before generalization, and after it collapses again) look identical from the outside but turn out to have a structural difference detectable directly in the weight matrices, without any data at all.

The detection method comes from random matrix theory: you shuffle each weight matrix element-by-element, build a covariance-type matrix from the shuffled weights, compute its eigenvalue spectrum, fit that spectrum to a known theoretical distribution, and look for eigenvalues that stick out well beyond where they should be.

In a well-trained model these don't appear; in an overfit one they do. The paper then applies this to two large open-weight models from OpenAI and finds the same signatures.

Learn from the paper with an AI tutor: https://t.co/3el0pbtqJ7

PDF: https://t.co/5f4NrkLcWh

0

235

47

197

14K

Who to follow

Chris Parsonson

@ChrisParsonson

🦁️PhD student at Columbia 🐰Harvard MS 23' 🎨Interested in Data Science, Bayesian, MCMC, Uncertainty Analysis 📕https://t.co/p9hylXIj6N

gra_davidson retweeted

Antonio Lupetti

@antoniolupetti

28 days ago

"Mathematical Theory of Deep Learning" is an excellent free resource for anyone interested in the mathematical structure underlying modern deep learning systems. The book introduces the theory of deep neural networks through approximation theory, optimization theory, and statistical learning theory, three of the central pillars of the field. What makes it particularly interesting is its attempt to balance rigor with accessibility, focusing on the essential ideas needed to understand modern AI systems without sacrificing mathematical depth. Despite this clarity of exposition, the book is clearly oriented toward a specialized audience. It is also an enormous cultural contribution and an extremely valuable free resource for students, researchers, and anyone interested in studying deep learning more rigorously. https://t.co/csuDODgm1b

antoniolupetti's tweet photo. "Mathematical Theory of Deep Learning" is an excellent free resource for anyone interested in the mathematical structure underlying modern deep learning systems. The book introduces the theory of deep neural networks through approximation theory, optimization theory, and statistical learning theory, three of the central pillars of the field.

What makes it particularly interesting is its attempt to balance rigor with accessibility, focusing on the essential ideas needed to understand modern AI systems without sacrificing mathematical depth. Despite this clarity of exposition, the book is clearly oriented toward a specialized audience.

It is also an enormous cultural contribution and an extremely valuable free resource for students, researchers, and anyone interested in studying deep learning more rigorously.

https://t.co/csuDODgm1b

5

1K

222

1K

46K

gra_davidson retweeted

Peter McCormack 🏴‍☠️🇬🇧🇮🇪

@PeterMcCormack

about 1 month ago

A minimum wage of £15 would end my coffee shop, it would have to close, as would many other businesses. I’ll explain for the economically illiterate. Staff costs are currently half our costs, a £15 minimum wage is actually more than £15 an hour for the company, because you have to add: - 12.07% holiday - Sick pay - Maternity pay if and when required - National insurance - Pension contributions These costs would mean the shop loses money because remember, energy costs are up, rates are up, regulations are up. Now you can pass these costs onto the consumer - that would mean charging a lot more for coffee, people won’t pay it. The likes of Starbucks and Costa can, because they have economies of scale. The independent doesn’t. Now the little socialist will say well this is your fault, if you can’t run a business that can afford to pay its staff properly, but the little socialist has never run a business and does not understand the dynamics. Now I could pay some staff off and fill those hours myself or reduce us to one staff member during certain periods - but this proves the point that a minimum wage costs jobs. There was a time when these jobs were done by kids, perhaps on the weekend, paid a lower wage, no holiday and no silly employment rights. Perhaps they were even paid cash. The dynamic worked and small businesses like this could operate. It was also a great first job. Sadly now it isn’t worth employing entitlement youngsters at this level of pay. So alas, I don’t need the stress, the business would close, a number of jobs would be lost. Economics is about understanding these dynamics, no vibes. The cost of living is not solved through passing on inflation to the business, it is solved by ending high inflation and creating prosperity. This is what socialists don’t understand, they can’t create prosperity, they can only destroy it.

4K

21K

3K

1K

6M

gra_davidson retweeted

Josh Hunt

@iAmJoshHunt

about 1 month ago

I'll tell you what I don't like, Darren. I can't speak for everyone, but these are my thoughts… I don't like a tax burden at its highest level since 1948, under your government and the last, producing the weakest growth in a generation. And worsening public services to boot. I don't like a 46% hike in the minimum wage for under-21s in three years that's helped push UK youth unemployment to 16.1%, above the eurozone average. I want young people paid more, earned through growth, not handed down by decree that squashes the rungs above them and tells a skilled forty-year-old their two decades of graft are worth precisely the same as someone walking through the door on Monday morning. I don't like industrial electricity prices that are the highest of any IEA country reporting. Full stop. UK steelmakers pay 40% more than their French competitors. You don't build a future of advanced manufacturing on those numbers. I don't like a planning system that takes longer to consent a pylon than to build one, business rates that punish high-street enterprise, and employment costs that turn every hire into a risk. I don't like watching world-class British research get commercialised in Boston and Palo Alto because the capital, the talent and the regulatory patience aren't here. They're fleeing. I don't like long-term borrowing costs at their highest level in over 25 years, eating into every budget for schools, hospitals and defence before a penny is spent. I don't like the OECD saying that we're going to be the hardest hit economy as a result of a conflict in the Middle East that's got nothing to do with us. All because we've made ourselves weak and vulnerable. I don't like a government that confuses 'raising money' with 'creating wealth'. Or 'standing against unearned wealth' with taxing to death the people who actually make things happen in this country. You don't lift children out of poverty by strangling the economy that pays for their schools. You do it by letting Britain grow again. Letting it play to its abundance of strengths. In this case, I feel the best way is for government to get the hell out of the way.

154

4K

1K

226

134K

gra_davidson retweeted

about 1 month ago

This paper demonstrates that neural networks are surprisingly poorly calibrated and presents a simple yet highly effective post-processing method (temperature scaling) to address this critical issue for practical applications. ChapterPal: https://t.co/9Ho1Azpdkj PDF: https://t.co/Aq58HfmbRd

burkov's tweet photo. This paper demonstrates that neural networks are surprisingly poorly calibrated and presents a simple yet highly effective post-processing method (temperature scaling) to address this critical issue for practical applications.

ChapterPal: https://t.co/9Ho1Azpdkj

PDF: https://t.co/Aq58HfmbRd

2

107

20

83

5K

gra_davidson retweeted

Dan Neidle

@DanNeidle

about 1 month ago

UK tax is going to be the highest since 1945. But public spending won't increase; in fact most of us will experience a decline in public services. Here's why - in a thread that I'd love to be completely wrong.

DanNeidle's tweet photo. UK tax is going to be the highest since 1945. But public spending won't increase; in fact most of us will experience a decline in public services.

Here's why - in a thread that I'd love to be completely wrong. https://t.co/g4WPTWj4o6

92

2K

400

756

324K

gra_davidson retweeted

about 1 month ago

If you don't understand this, you will not understand why LLM-based agents are irreparably failing for a general-purpose problem solving. An agent (by the way it was the topic of my PhD 20 years ago) to be useful, must be rational. Being rational means to always prefer an outcome that results in the maximal expected utility to its master/user. Let’s say an agent has two actions they can execute in an environment: a_1 and a_2. If the agent can predict that a_1 gives its user an expected utility of 10, and a_2 gives an expected utility of -100, then a rational agent must choose a_1 even if choosing a_2 seems like a better option when explained in words. The numbers 10 and -100 can be obtained by summing the products of all possible outcomes for each action and their likelihoods. Now here is the problem with LLM-based agents. The LLM is not optimizing expected utility in the environment. It is optimizing the next token, conditioned on a prompt, a context window, and a training distribution full of examples of what helpful answers are supposed to look like. Those are not the same objective. So when we wrap an LLM in a loop and call it an “agent,” we have not created a rational decision-maker. We have created a text generator that can imitate the surface form of deliberation. It may say things like: “I should compare the expected outcomes.” “The best action is probably a_1.” “I will now execute the optimal plan.” But the internal mechanism is not selecting actions by maximizing the user’s expected utility. It is generating a continuation that is statistically appropriate given the prompt and prior context. This distinction matters enormously. For narrow tasks, the imitation can be good enough. If the environment is constrained, the actions are simple, and the success criteria are close to patterns seen in training, the system can appear agentic. But for general-purpose problem solving, the gap becomes fatal. A rational agent needs stable preferences, calibrated beliefs, causal models of the world, the ability to evaluate consequences, and the discipline to choose the action with maximal expected utility even when that action is boring, non-linguistic, or unlike the examples in its training data. An LLM-based agent has none of that by default. It has fluency. It has pattern completion. It has a remarkable ability to compress and recombine human text. But fluency is not rationality, and a plausible plan is not an expected-utility calculation. This is why these systems so often fail in strange, brittle, and irreparable ways when given open-ended responsibility. They are not failing because the prompts are insufficiently clever. They are failing because we are asking a simulator of rational agency to be a rational agent.

171

2K

271

1K

201K

gra_davidson retweeted

about 1 month ago

Without FlashAttention, there would not be such progress in LLMs as we have witnessed in these past three years. I'm happy to announce that the FlashAttention paper is now available on @ChapterPal: https://t.co/kkwNPeGhMm

burkov's tweet photo. Without FlashAttention, there would not be such progress in LLMs as we have witnessed in these past three years.

I'm happy to announce that the FlashAttention paper is now available on @ChapterPal: https://t.co/kkwNPeGhMm https://t.co/5OWom8A0Tv

5

316

36

195

14K

about 1 month ago

Very interesting self-preference, makes sense I suppose. As assessment will in part be based on autoregression.

Nav Toor

@heynavtoor

about 1 month ago

Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT. The AI picked the ChatGPT version 97.6% of the time. A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B. Then they asked each AI to pick the better resume. Every model picked itself. GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won. Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective. It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect. Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance. 99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time. If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars. Your qualifications do not matter if the AI prefers its own handwriting over yours.

heynavtoor's tweet photo. Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT.

The AI picked the ChatGPT version 97.6% of the time.

A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B.

Then they asked each AI to pick the better resume. Every model picked itself.

GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won.

Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective.

It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect.

Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance.

99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time.

If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars.

Your qualifications do not matter if the AI prefers its own handwriting over yours.

430

25K

7K

12K

3M

0

8

gra_davidson retweeted

Liz Kendall @leicesterliz

about 1 month ago

A must read for anyone interested in building practical AI systems in 2026: Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems The paper explains the architecture of a modern production-grade AI agent system (Claude Code) by analyzing its source code. This is what they call a "harness" of an agentic coding system. Learn by reading with an AI tutor: https://t.co/sailmnkDcR PDF: https://t.co/Jvl4HRMU4y

burkov's tweet photo. A must read for anyone interested in building practical AI systems in 2026:

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

The paper explains the architecture of a modern production-grade AI agent system (Claude Code) by analyzing its source code. This is what they call a "harness" of an agentic coding system.

Learn by reading with an AI tutor: https://t.co/sailmnkDcR

PDF: https://t.co/Jvl4HRMU4y

52

1K

240

3K

124K

gra_davidson retweeted

about 2 months ago

Barnsley: the UK’s first Tech Town. This Government is making technology work for all, to build a better future for all.

294

171

48

98

772K

gra_davidson retweeted

about 2 months ago

Read to understand with @ChapterPal: https://t.co/vWd3URDhYj PDF: https://t.co/CgbNxLGU41

2

46

3

47

11K

gra_davidson retweeted

Michael Reiners

@MCRReiners

about 2 months ago

Britain is a poor country with low productivity and scarce high-status work. Anyone looking at this data would think the green party had already passed its intended 1:10 income ratio controls. An order of magnitude behind US wages, records stop at at 100k, the only group significantly outperforming 40k is a cluster of professions, who may earn a maximum of 60... Concerned outsiders, from @elonmusk to @curtis_yarvin, wonder why Britain is such a distinct mess. The answer is, nobody is paid nearly enough to consider fixing matters. I have produced my entire body of work, including full-length legislation for @ReinersProject, without a penny paid and with significant risk undertaken in the process: https://t.co/wkgcP9eIaE There is zero incentive structure to resolve Britain's issues, only to setup cottage industries commenting on them, as many have in the nu-media.

17

453

62

158

44K

about 2 months ago

Actually shocking, the government has no right to dictate how private pensions. Especially when they have shown themselves incapable of steering the economy. If they want to do that they shouldn’t have encouraged private pensions.

Katie Lam

@Katie_Lam_MP

about 2 months ago

This week, Labour MPs voted to give ministers the power to decide how your pension savings are invested. So ministers get pensions with guaranteed payouts, while they direct your savings towards their pet ideological causes, even if that means you lose money. Disgraceful.

549

14K

5K

438

261K

0

1