Meta found that forcing an llm to show its work, step by step, with evidence for every claim, nearly halves its error rate when verifying code patches
the technique is embarrassingly simple: a structured template the model has to fill in before it's allowed to say "yes" or "no"
no fine-tuning. no new architecture. just a checklist that won't let the model skip steps
Meta researchers created a mandatory checklist that forces AI to trace code line by line instead of blindly guessing.
This structured approach boosted the accuracy of checking real-world code updates to an impressive 93%.
Usually, when we ask an AI to check if a software update works, it just looks at the names of the functions and makes a very confident guess.
If we want to be absolutely sure the code works, human developers normally have to run the code in expensive and slow testing servers.
This paper changes that dynamic entirely by introducing a strict template that forces the AI to write down the exact path the code takes and provide hard evidence for every single claim it makes.
Because the AI is forced to slow down and show its work step by step, it catches deeply hidden bugs and proves that patches work with 93% accuracy.
The big deal here is that tech companies can now use AI to automatically and reliably verify millions of lines of code without ever paying for the massive computing costs required to actually execute that software.
----
Paper Link – arxiv. org/abs/2603.01896
Paper Title: "Agentic Code Reasoning"
🔔 Announcing our paper on Natural Language Outlines for Code!
Our vision 🔮 - NL Outlines empower human developers with new forms of AI assistance throughout the software development process 🚀
Paper: https://t.co/2jMPKzXdyW
FSE'25 presentation: https://t.co/Yu7WinLhS4
🧵👇
@schandra is giving the 5th keynote (second industry talk) of @ConfForge on "AI for Software Engineering at Google: Progress and Path Ahead" :) Packed room with many standing to hear Satish experience at @Google :) If you are at @ICSEconf, pls join us :)
Exciting News! The ICSE 2013 paper "SemFix: Program Repair via Semantic Analysis" that started our journey in program repair is recognized by the Most Influential Paper Award ten years later in 2023. Congrats to @AbhikRoychoudh1 and all co-authors!
https://t.co/LFiVwzuKpw
Happy New Year everyone! We are looking forward to your contributions to ESEC/FSE 2023!! We will be posting updates and introducing our PC over the next few months here. Reminder that the research track paper submissions are due on February 2nd! https://t.co/cbqNTYYVFZ
#curryon Facebook created a product that analyse their codebases and fixes people did and code reviews pull requests proposing similar fixes to be added seems very nifty
We have built a new system that leverages machine learning to more efficiently detect potential regressions in a proposed code change. This predictive test selection method has doubled the efficiency of Facebook's continuous integration system.
https://t.co/NCiDLIP1sc
Facebook has built a tool called Getafix that automatically finds fixes for code bugs and offers the patch to engineers to approve. Here's how it works.
https://t.co/NywvWKsOPE
Lots of great talks at @nl4se at @FSEconf ---wrapping up with a grand finale, Keynote by Satish Chandra about "Big Code @facebook" ! Starting 3:30pm. Don't miss it!