My subconcious is officially a professor! I slept badly last night and realized that the classic "unprepared/late for class" dream was not about taking an exam anymore but rather showing up to class not knowing what to teach!
Achievement unlocked: Finally got a paper rejected for *only* validating our proposed algorithm by differential fuzzing against a reference oracle, but not formally proving correctness / verifying equivalence.
On the flip side, it forced me to learn to use Verus, and Claude helped prove most of the dang thing in a few hours (after I hand-wrote the specs). Exciting times.
Ooh, front page.
I was reading the Raft paper over the weekend while also teaching my toddler the card game Spot It! (a.k.a. Dobble). Turns out the math behind the simple game is really cool, and led me to write this post:
https://t.co/IVgiGNCwaj
My first two PhD students defended their theses this week. Congratulations to Dr. @vasumvikram and Dr. @aoli_al for completing fantastic dissertations on various aspects of automated testing including fuzzing, property-based testing, and concurrency. Very proud of you both!
On the flip side, it forced me to learn to use Verus, and Claude helped prove most of the dang thing in a few hours (after I hand-wrote the specs). Exciting times.
Achievement unlocked: Finally got a paper rejected for *only* validating our proposed algorithm by differential fuzzing against a reference oracle, but not formally proving correctness / verifying equivalence.
PL educators: when should we introduce Rust to students? Asking for a colleague with a teenage kid who knows a bit of Python (and maybe Racket?). Is it important to learn something like C/C++/Java to understand static typing, memory layouts, etc. first or just dive in to Rust?
Incredibly proud of my (first solo-advised) PhD student @vasumvikram, who joins @AnthropicAI this week in the evals team.
Vasu's PhD research uncovered various nuances of generator-based fuzzing, including the finding that coverage guidance is largely unnecessary in the AI age.
"What is test coverage in distributed systems?"
I'm excited to be discussing this at BugBash this year: a really neat conference about software reliability organized by @AntithesisHQ in DC.
Check out the speaker list and get your tickets at https://t.co/Ljj6lACPvI!
On this week's episode of the BugBash podcast, @rocallahan tells the story of how the rr debugger came into being, enabling time-travel debugging on Firefox.
No giveaway this week, rr was enough of a gift to devs everywhere. If you've ever used GDB, rr, PyTrace, or one of their many cousins, you'll want to give this one a listen!
My students wrote a blog post (https://t.co/iqsAE9n3Xp) on this problem showing where AI agents struggle with concurrency, using examples such as WorkStealQueue and Kafka's DefaultStateUpdater.
Feedback is welcome!
Ever used AI to fix tricky race conditions and flaky tests? Not pretty, is it?
Check out "Spaghetti Bench 🍝: A SWE-Agent Benchmark for Concurrency Bug Tasks"
Turns out it's a HARD problem on its own, but can be made easier with access to deterministic replay tools like Fray!
Ever used AI to fix tricky race conditions and flaky tests? Not pretty, is it?
Check out "Spaghetti Bench 🍝: A SWE-Agent Benchmark for Concurrency Bug Tasks"
Turns out it's a HARD problem on its own, but can be made easier with access to deterministic replay tools like Fray!