1/ In the AI era, specifications are becoming the primary interaction mechanism between humans and coding agents. Inferring, validating, and maintaining them is foundational to trustworthy software engineering.
Measuring learning, not just capability, is fundamental to evaluating continual learning methods. Continual Learning Bench is built around exactly that!
Was a pleasure to contribute to this, check out the full write up:
Very excited to announce the v1.0 of SlopCodeBench release:
- Doubling the size of the dataset
- @harborframework support
- scb-check: a CLI that flags slop anti-patterns
- Way more model results
https://t.co/RQkB8wdzAu
https://t.co/36qQR3azeE
🧵
Our MLSys 2026 paper is live on arXiv: “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes.”
@realjustinbauer@Walshe_tech@pham_derek@harit_v@ArminPCM@fredsala and @paroma_varma present a comprehensive empirical study of open-source SLMs after RLVR in low-data regimes, revealing that dataset composition matters more than dataset size for scaling performance across number counting, graph, and spatial reasoning tasks.
Read the paper: https://t.co/dZL8uygPfa
Our #MLSys2026 paper is live on arXiv 📄
We ran a systematic study of RLVR in low-data regimes across 3 procedurally generated benchmarks (counting, graph, spatial reasoning).
Key finding: dataset composition matters more than dataset size.
https://t.co/Z7ZuG1fLMD
We found that agents generate progressively worse code with each iteration. Real developers do not.
SlopCodeBench is the only eval that faithfully measures quality degradation on iterative, long-horizon coding tasks.
https://t.co/JXGHC4w0bv
https://t.co/RQkB8wdzAu
🧵
Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this?
There is now. Very excited to announce SlopCodeBench https://t.co/RQkB8wdzAu
What's the future of AI for Software Engineering?
🤖 Join Alex Gu (@minimario1729) (MIT; StarCoder, CRUXEval, LeanDojo contributor) tomorrow at the #DL4C workshop! He'll cover current challenges in AI for SE and promising directions for what lies ahead. #ICLR2025#ICLR
We're excited to welcome our fourth speaker @taoyds, Assistant Prof at @HKUniversity and director of @XLangNLP. His groundbreaking work in grounding language into code and actions spans digital and physical environments, aiming to democratize data science and enhance human-computer interaction.
Join us at the 3rd Deep Learning for Code workshop at #ICLR2025, this Monday in Singapore! 🇸🇬
#DL4Code #NLP #AI
Go beyond code completion! Baptiste Rozière (@b_roziere) presents his work at @MistralAI automating code tasks (IDE tools, agents) for better developer focus. Catch him this Sunday at #DL4C! #ICLR2025#iclr
Just 6 days until #DL4C! 🗓️ Daniel Fried (CMU / Meta AI) @dan_fried@AIatMeta will be sharing insights on how inducing functions from code makes LLM agents smarter and more efficient. Don't miss it! See you Sunday! #ICLR2025#iclr
🚀 ICLR week is upon us! Join us at the #DL4C Workshop to hear Xingyao Wang (@xingyaow_) discuss LLMs evolving into SE agents, covering the CodeAct framework (code exec as action), the OpenHands platform (dev-like generalist agents), & SWE-Gym (real-world task training). @iclr_conf
Lots of great papers in – thank you all!
Review will start from tmrw. Last chance to sign up to be a reviewer for #DL4C https://t.co/SRLqaeYhBw
@iclr_conf#iclr#iclr2025#iclr25
🚀Excited to share the 3rd Deep Learning for Code workshop is back at @iclr_conf'25! This year we’ll focus on emergent challenges in the field, e.g., agents, post-training, developer productivity, open science, and benchmarking for code Submit by Feb 3 https://t.co/MXqVqNfRgX🧵⬇️
@sameercassim007@vfsglobalcare I had the same issue. This is a DNS problem, as you can see here https://t.co/rDmhrOEl2S.
For me, what worked is to use my phone. But you can also change your DNS server in your network settings.
What does it mean for the LLMs of Code to be open and responsible? Our fourth invited talk by Harm de Vries (@harm_devries) from @ServiceNowRSRCH and Leandro von Werra(@lvwerra) from @huggingface is about BigCode (https://t.co/ADKG0CUeyv)! #ICML2023
Thank you to all authors who submitted their works to the DL4C workshop @ ICLR 2023!
Instructions and Upload Form for Authors of Accepted Papers: https://t.co/jBuXEyNHp9
Deadline: May 1st
📢We are excited to announce our incredible lineup of speakers for the Second DL4C Workshop @ ICLR 2023!
Link: https://t.co/fNIJehxylc
The speakers are: