๐๐๐ ๐ ๐๐ฟ๐ฒ ๐ก๐ผ๐ ๐ฅ๐ฒ๐ฎ๐ฑ๐ถ๐ป๐ด ๐ฌ๐ผ๐๐ฟ ๐๐ผ๐ฑ๐ฒ
We keep calling LLMs "AI coding assistants." But writing code and understanding code are not the same thing. Researchers from Virginia Tech and Carnegie Mellon University just ran 750,000 debugging experiments across 10 models to determine how well LLMs actually understand code.
The results show that you should not blindly trust your AI coding assistant when debugging.
Here is what they found:
๐ญ. ๐ ๐ฟ๐ฒ๐ป๐ฎ๐บ๐ฒ๐ฑ ๐๐ฎ๐ฟ๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฏ๐ฟ๐ฒ๐ฎ๐ธ๐ ๐๐ต๐ฒ ๐ฑ๐ฒ๐ฏ๐๐ด๐ด๐ฒ๐ฟ
Researchers created a bug, confirmed that the LLM found it, then made changes that don't touch the bug at all, such as renaming a variable or adding a comment. In 78% of cases, the model could no longer find the same bug. The bug was still there. The variable names and comments changed, and that was enough.
๐ฎ. ๐๐ฒ๐ฎ๐ฑ ๐ฐ๐ผ๐ฑ๐ฒ ๐ถ๐ ๐ฎ ๐๐ฟ๐ฎ๐ฝ
Adding code that never runs reduced bug-detection accuracy to 20.38%. Models treated dead code as live, and flagged it as the source of the bug. But the bug was in another line. So, LLMs cannot reliably distinguish "this runs" from "this never runs."
๐ฏ. ๐ ๐ผ๐ฑ๐ฒ๐น๐ ๐ฟ๐ฒ๐ฎ๐ฑ ๐๐ผ๐ฝ-๐๐ผ-๐ฏ๐ผ๐๐๐ผ๐บ, ๐ป๐ผ๐ ๐น๐ผ๐ด๐ถ๐ฐ๐ฎ๐น๐น๐
56% of correctly found bugs were in the first quarter of the file. Only 6% were in the last quarter. The further down the code, the less attention the model pays to it. If the bug lives in the bottom half of your file, the model is already less likely to find it.
๐ฐ. ๐๐๐ป๐ฐ๐๐ถ๐ผ๐ป ๐ฟ๐ฒ๐ผ๐ฟ๐ฑ๐ฒ๐ฟ๐ถ๐ป๐ด ๐ฎ๐น๐ผ๐ป๐ฒ ๐ฐ๐๐ ๐ฎ๐ฐ๐ฐ๐๐ฟ๐ฎ๐ฐ๐ ๐ฏ๐ ๐ด๐ฏ%
Changing the order of functions in a Java file caused an 83% drop in debugging accuracy. The code still remained the same. Where the code physically sits in the file matters more to the model than what the code does. So, obviously, this is a sign of pattern recognition, not real code understanding.
๐ฑ. ๐ก๐ฒ๐๐ฒ๐ฟ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ต๐ฎ๐ฟ๐ฑ๐น๐ ๐บ๐ผ๐๐ฒ ๐๐ต๐ฒ ๐ป๐ฒ๐ฒ๐ฑ๐น๐ฒ
Claude improved ~1% between 3.7 and 4.5 Sonnet on this task. Gemini improved by ~1.8%. Every model release comes with a new benchmark leaderboard and new headlines. But the ability to reason about code under realistic conditions is improving slowly.
๐ฒ. ๐ง๐ต๐ฒ๐๐ฒ ๐๐ฒ๐ฟ๐ฒ ๐ฏ๐ฒ๐๐-๐ฐ๐ฎ๐๐ฒ ๐ฐ๐ผ๐ป๐ฑ๐ถ๐๐ถ๐ผ๐ป๐
The study used single-file programs with ~250 lines, and each had a clear description of what the code should do. The authors say this was intentional. They wanted the best-case conditions. Real production code is multi-file, cross-module, and poorly documented. It will perform worse for sure.
Here are three things worth changing based on the research:
๐น ๐ฃ๐ฎ๐๐ ๐ฒ๐ ๐ฒ๐ฐ๐๐๐ถ๐ผ๐ป ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐, ๐ป๐ผ๐ ๐ท๐๐๐ ๐ฐ๐ผ๐ฑ๐ฒ. When asking an LLM to debug, include test output, stack traces, and failure messages alongside the source. Without runtime details, the model is guessing based on the code.
๐น ๐๐ผ๐ป'๐ ๐๐ฟ๐๐๐ ๐ถ๐ ๐ผ๐ป ๐ฑ๐ฒ๐ฒ๐ฝ-๐ณ๐ถ๐น๐ฒ ๐ฏ๐๐ด๐. If the suspect code is in the bottom third of a long file, the model will have trouble finding it. Consider splitting the context or feeding the relevant function directly.
๐น ๐๐น๐ฒ๐ฎ๐ป ๐๐ฝ ๐ฑ๐ฒ๐ฎ๐ฑ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฏ๐ฒ๐ณ๐ผ๐ฟ๐ฒ ๐๐๐ถ๐ป๐ด ๐๐ ๐ฑ๐ฒ๐ฏ๐๐ด๐ด๐ถ๐ป๐ด ๐๐ผ๐ผ๐น๐. Commented-out blocks and unreachable branches will mislead the model. It cannot filter them out.
We rate AI coding tools on HumanEval. That tests whether a model can write a function from a description, but this says nothing about finding a bug in code it didn't write.
Those are different problems. We're using the wrong benchmark.
@Santander_Ar cual es la ventaja de ser cliente Black si para arreglar un error causado por uds tengo que estar llamando constantemente sin soluciรณn y la unica soluciรณn propuesta tiene costo para mi ?
#Config2023 Launch 1: Dev Mode
A new space in Figma for developers with features that help translate design into code, faster.
Read more: https://t.co/p1PZzRQ7RX
Here are all the ways you can use Dev Mode ๐
#Config2023 Launch 2: Variables
You can now use variables to make adaptable designsโweโre talking different brand themes, device formats, and more. And yup, variables can be exported as tokens in case thatโs helpful ๐.
๐ See variables in action
#Config2023 launches bridge the gap between design and development, all in Figma.
โ Dev Mode, a new space for developers
โ Variables
โ Advanced prototyping
โ Auto layout updates
โ Font picker
โ File browser redesign
Plus, we previewed the future of Figma with AI and announced the acquisition of @diagram. https://t.co/RB3qHFSSPz
@KirstenMinshall Itโs quite astonishing how much devs of all experience levels really donโt grok narrowing/slicing end-to-end to deliver something sooner (and with less risk/faster feedback). To be fair, itโs a skill that requires lots of *intentional* practice. Believing itโs possible firstโฆ
โHeroes do not have the need to be known as heroes, they just do what heroes do because it is right and it must be done.โ
Sir Nicholas Winton rescued 669 children from the holocaust.
Some of the survivors surprised him 50 years later.