🚨 Are Large Language Models Memorizing Bug Benchmarks? 🚨
There’s growing concern that LLMs for SE are prone to data leakage, but no one has quantified it... until now. 🕵️♂️ We measured leakage in benchmarks like Defects4J, and SWEBenchLite.
https://t.co/1yT36DO18Q
Findings👇
Apparently EU users don't get new AI versions till several months after US users, because AI companies have to get regulatory signoff first. Was the cost of overregulation ever so clear? Every user in Europe is several months behind, in a field that changes every several months.
Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.
Preprint: https://t.co/wsDx4H28IH
Leaderboard: https://t.co/hE95PJyjPY
Looking for students!
@johny_dry and I proposed Meerkat, a live, distributed, reactive PL, in an Onward! vision paper this year. We'd love to work with a Ph.D. student in the joint @CMUPortugal program, who wants to help make this a reality! Due Dec 11.
https://t.co/gKcUHYTPXF
🚀 Curious about GitSEED accepted @SIGCSE_Virtual and how it’s transforming programming education? Check out my full thread on Bluesky for all the details! 🌟 #SIGCSE#CSforALL#GitLab
🔗https://t.co/3xA69VnTYG
(4/4) 📈 Llama 3.1 (70B)—trained on far more data—shows less leakage than older and smaller models like CodeGen and CodeLlama. Its higher NLL and lower 5-gram match show limited signs of leakage.
Read the preprint: https://t.co/1yT36DO18Q
#AI#SoftwareEngineering#DataLeakage
🚨 Are Large Language Models Memorizing Bug Benchmarks? 🚨
There’s growing concern that LLMs for SE are prone to data leakage, but no one has quantified it... until now. 🕵️♂️ We measured leakage in benchmarks like Defects4J, and SWEBenchLite.
https://t.co/1yT36DO18Q
Findings👇
(3/4)📜 5-gram match reveals memorization: We used 5-gram match to check if models generated nearly identical outputs when given the same input. CodeGen scored 82% on Defects4J!
I'm excited to share that I'll be joining @UCIrvine as an Assistant Professor starting in Fall 2025! I'm looking forward to working with amazing colleagues and students to empower programmers using AI and HCI. Come join us—I’m recruiting students!
Do you work with Ansible? We want to hear from you! 🎯 This study is led by a joint team from @CarnegieMellon and @istecnico
Plus, if you agree to an interview, you'll be entered into a $100 raffle! 💸
Click here to participate: https://t.co/hT0ztiAAeI
I am on the #SoftwareEngineering#Teaching Job Market!
After co-creating & co-instructing two software engineering courses at CMU I am looking for teaching-focused faculty positions in SE. If you know anyone who is hiring, please let me know! Retweets welcome.
Excited to announce that I will present our work "Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches" at ISSTA 2024 (@issta_conf)! 🎉🤖
Find out more about our work: https://t.co/DjKnh9LjaH! 🧵[1/3]
@OpenRoboticsOrg@rosorg
Excited to present our work "A Lightweight Polyglot Code Transformation Language" next week at @PLDI2024!
https://t.co/vHzG25V1Ma
Are you dealing with large-scale refactoring/migrations? Is your codebase multilingual? We’ve got something for you! 🧵👇 [1/n]
@UberEng If you're going to be at @PLDI , come attend my talk on June 28th at 10:40!
This amazing work was led by @ask1604 , Lázaro Clapp, @rajbarik, and Murali Krishna Ramanathan at @UberEng 's Programming Systems Group! [5/5]
Excited to present our work "A Lightweight Polyglot Code Transformation Language" next week at @PLDI2024!
https://t.co/vHzG25V1Ma
Are you dealing with large-scale refactoring/migrations? Is your codebase multilingual? We’ve got something for you! 🧵👇 [1/n]
At @UberEng, we’ve fully automated stale feature flag clean-up and performed API migrations with this toolkit! Our tools, built on top of PolyglotPiranha, have landed over 2000 PRs and cleaned up over 200k lines of code![4/n]
Are you an undergraduate interested in doing research at
@SCSatCMU
this summer? Then sign up to join our information sessions at https://t.co/z3c7AOcy9P (Please RT!)