Devjeet Roy

about 1 month ago

Measuring learning, not just capability, is fundamental to evaluating continual learning methods. Continual Learning Bench is built around exactly that! Was a pleasure to contribute to this, check out the full write up:

0

20

4

3

2K

devjeetrr retweeted

PhD in Software Engineering

about 2 months ago

Very excited to announce the v1.0 of SlopCodeBench release: - Doubling the size of the dataset - @harborframework support - scb-check: a CLI that flags slop anti-patterns - Way more model results https://t.co/RQkB8wdzAu https://t.co/36qQR3azeE 🧵

GOrlanski's tweet photo. Very excited to announce the v1.0 of SlopCodeBench release:
- Doubling the size of the dataset
- @harborframework support
- scb-check: a CLI that flags slop anti-patterns
- Way more model results

https://t.co/RQkB8wdzAu
https://t.co/36qQR3azeE

🧵 https://t.co/HvVYoRrpEr

2

64

10

17

14K

devjeetrr retweeted

Snorkel AI

@SnorkelAI

about 2 months ago

Our MLSys 2026 paper is live on arXiv: “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes.” @realjustinbauer @Walshe_tech @pham_derek @harit_v @ArminPCM @fredsala and @paroma_varma present a comprehensive empirical study of open-source SLMs after RLVR in low-data regimes, revealing that dataset composition matters more than dataset size for scaling performance across number counting, graph, and spatial reasoning tasks. Read the paper: https://t.co/dZL8uygPfa

SnorkelAI's tweet photo. Our MLSys 2026 paper is live on arXiv: “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes.”

@realjustinbauer @Walshe_tech @pham_derek @harit_v @ArminPCM @fredsala and @paroma_varma present a comprehensive empirical study of open-source SLMs after RLVR in low-data regimes, revealing that dataset composition matters more than dataset size for scaling performance across number counting, graph, and spatial reasoning tasks.

Read the paper: https://t.co/dZL8uygPfa

0

46

9

20

527K

Who to follow

Valentina Piantadosi

@valpia93

Sophie Huilian Qiu

@sophiehsqq

PhD student @CarnegieMellon @CMUSTRUDEL A computer scientist and photographer studying witchcraft at Carnegie Mellon University

Sunwei Wang

@WangSunwei

Data/AI Consultant@ Netlight, CS graduate from TU Delft, Computer Vision, Artificial Intelligence, Data Scientist & Software Engineer

devjeetrr retweeted

Justin Bauer

@realjustinbauer

about 2 months ago

Our #MLSys2026 paper is live on arXiv 📄 We ran a systematic study of RLVR in low-data regimes across 3 procedurally generated benchmarks (counting, graph, spatial reasoning). Key finding: dataset composition matters more than dataset size. https://t.co/Z7ZuG1fLMD

2

33

10

7

2K

devjeetrr retweeted

3 months ago

We found that agents generate progressively worse code with each iteration. Real developers do not. SlopCodeBench is the only eval that faithfully measures quality degradation on iterative, long-horizon coding tasks. https://t.co/JXGHC4w0bv https://t.co/RQkB8wdzAu 🧵

GOrlanski's tweet photo. We found that agents generate progressively worse code with each iteration. Real developers do not.

SlopCodeBench is the only eval that faithfully measures quality degradation on iterative, long-horizon coding tasks.

https://t.co/JXGHC4w0bv
https://t.co/RQkB8wdzAu
🧵 https://t.co/dOvNkrFv2c

44

724

100

507

184K

devjeetrr retweeted

Alex Gu @minimario1729

6 months ago

AI coding assistants generate a lot of slop. Now there's a way to measure it: SlopCodeBench!

0

24

3

6

4K

devjeetrr retweeted

6 months ago

If you want to add your agent to the harness/run the evals/or anything related, please reach out. I would be more than glad to help 😀

0

6

1

0

245

devjeetrr retweeted

6 months ago

Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this? There is now. Very excited to announce SlopCodeBench https://t.co/RQkB8wdzAu

GOrlanski's tweet photo. Often, a bug I have had to fix has been tied directly to overly verbose or defensive code from an AI agent that has eroded my project as the specs changed. Why isn't there an eval for this?

There is now. Very excited to announce SlopCodeBench https://t.co/RQkB8wdzAu https://t.co/FnyCCdt3co

1

52

18

13

15K

devjeetrr retweeted

about 1 year ago

What's the future of AI for Software Engineering? 🤖 Join Alex Gu (@minimario1729) (MIT; StarCoder, CRUXEval, LeanDojo contributor) tomorrow at the #DL4C workshop! He'll cover current challenges in AI for SE and promising directions for what lies ahead. #ICLR2025 #ICLR

1

13

4

2

5K

devjeetrr retweeted

about 1 year ago

We're excited to welcome our fourth speaker @taoyds, Assistant Prof at @HKUniversity and director of @XLangNLP. His groundbreaking work in grounding language into code and actions spans digital and physical environments, aiming to democratize data science and enhance human-computer interaction. Join us at the 3rd Deep Learning for Code workshop at #ICLR2025, this Monday in Singapore! 🇸🇬 #DL4Code #NLP #AI

0

4

1

0

210

devjeetrr retweeted

about 1 year ago

Go beyond code completion! Baptiste Rozière (@b_roziere) presents his work at @MistralAI automating code tasks (IDE tools, agents) for better developer focus. Catch him this Sunday at #DL4C! #ICLR2025 #iclr

0

7

2

0

410

devjeetrr retweeted

about 1 year ago

Just 6 days until #DL4C! 🗓️ Daniel Fried (CMU / Meta AI) @dan_fried @AIatMeta will be sharing insights on how inducing functions from code makes LLM agents smarter and more efficient. Don't miss it! See you Sunday! #ICLR2025 #iclr

0

11

3

1

3K

devjeetrr retweeted

about 1 year ago

🚀 ICLR week is upon us! Join us at the #DL4C Workshop to hear Xingyao Wang (@xingyaow_) discuss LLMs evolving into SE agents, covering the CodeAct framework (code exec as action), the OpenHands platform (dev-like generalist agents), & SWE-Gym (real-world task training). @iclr_conf

0

13

2

1

5K

devjeetrr retweeted

over 1 year ago

Lots of great papers in – thank you all! Review will start from tmrw. Last chance to sign up to be a reviewer for #DL4C https://t.co/SRLqaeYhBw @iclr_conf #iclr #iclr2025 #iclr25

0

12

30

0

2K

devjeetrr retweeted

over 1 year ago

🚀Excited to share the 3rd Deep Learning for Code workshop is back at @iclr_conf'25! This year we’ll focus on emergent challenges in the field, e.g., agents, post-training, developer productivity, open science, and benchmarking for code Submit by Feb 3 https://t.co/MXqVqNfRgX🧵⬇️

1

18

10

5

24K

Devjeet Roy @devjeetrr

over 1 year ago

@sameercassim007 @vfsglobalcare I had the same issue. This is a DNS problem, as you can see here https://t.co/rDmhrOEl2S. For me, what worked is to use my phone. But you can also change your DNS server in your network settings.

1

0

53

devjeetrr retweeted

about 3 years ago

What does it mean for the LLMs of Code to be open and responsible? Our fourth invited talk by Harm de Vries (@harm_devries) from @ServiceNowRSRCH and Leandro von Werra(@lvwerra) from @huggingface is about BigCode (https://t.co/ADKG0CUeyv)! #ICML2023

1

14

5

0

1K

devjeetrr retweeted

about 3 years ago

Thank you to all authors who submitted their works to the DL4C workshop @ ICLR 2023! Instructions and Upload Form for Authors of Accepted Papers: https://t.co/jBuXEyNHp9 Deadline: May 1st

0

5

4

0

695

devjeetrr retweeted