Happy to share our work "Cottontail: LLM-Driven Concolic Execution for Structured Test Input Generation" will appear in S&P'26!
Paper: https://t.co/vIxZD5BGE2
Code: https://t.co/NxuD4wwNF4
Special thanks to @nim_gnoes_eel, @JNUYUXIAN, @spinpx, @LingxiaoJiang, and @mboehme_ ♥️
What happens if you write buggy code and misconfigure the experimental setup when evaluating a fuzzer’s performance? Wrong and misleading conclusion!
We found several fatal bugs and wrong experimental settings in MLFuzz (https://t.co/KoDGAUYJ95, a revisit work on NEUZZ published on a top tier software engineering conference ASE 2023, @AndreasZeller, @ASE_conf ). These following bugs lead to wrong and misleading conclusions in MLFuzz.
• An initialization bug ⇒ Failure setup of persistent mode fuzzing.
• A program crash ⇒ Unexpected early termination of NEUZZ.
• An error in training dataset collection ⇒ A poorly-trained neural network model.
• An error in result collection ⇒ Incomplete code coverage report
We confirmed these bugs with the MLFuzz’s authors and write a rebuttal paper(https://t.co/Pyp3RalEqt) to explain the errors in MLFuzz and summarize the lessons on a fair and scientific fuzzing experiment/revisit.
1. Ensure the correctness of code implementation. Careful and rigorous debugging is needed. If you would like to patch a prior work, double-check your setting or patch is correct and seek help from original developer if needed. MLFuzz introduced 3 implementation bugs that led to wrong experimental results and conclusions.
2. Diverse benchmark selection. Try to evaluate your fuzzer on multiple benchmarks, like FuzzBench, Magma, UniFuzz.
3. Uniform code coverage metric. Covert different code coverage metrics like AFL XOR hash, LLVM coverage sanitizer (pruned), LLVM coverage sanitizer (no-pruned), AFL++ code coverage into a uniform one by replaying
4. Complete test case collection. Be sure to collect all the test cases generated by the fuzzer.
5. Uniform fuzzing mode. Ensure all fuzzer are running under same modes, either the default mode or the faster persistent mode. An apple-to-banana comparison like MLFuzz only leads to wrong conclusions.
6. Open-source your fuzzing corpus. Fuzzing is an optimization and different seed corpus (starting point) can lead to drastically variant results.
https://t.co/X4u8bPKPYZ I and Wei Cao did most of this work and wrote the first draft while we were at Ant Group. However, they removed us from the author list. Sad story. This work is shepherd by Alex Liu. However, he is not in the list, too.
@AndreasZeller@ririnicolae@MaxCamillo@FSEconf Andreas, you are a renowned researcher in the fuzzing community, and your fuzzing book is amazing. But this work draws a completely WRONG conclusion due to the careless comparison of file-retrieval fuzzer against in-memory fuzzer, where the fuzzing throughput gap is up to 10X
We presented HOPPER, which generates fuzzing test cases for libraries automatically via interpretative fuzzing. It transforms the problem of library fuzzing into the problem of interpreter fuzzing. The paper can be found at https://t.co/b0ao9C8pL6
@dgryski We do plan to release the software in the future. Whether Angora works with other language depends on taint analysis engine. We used DFSan in the paper, and Angora also supports libdft now.
@dgryski We do plan to release the software in the future. Whether Angora works with other language depends on taint analysis engine. We used DFSan in the paper, and Angora also supports libdft now.
Very excited to announce that my first paper “Understanding Linux Malware” was accepted @IEEESSP 2018! A study on more than 10k #Linux#malware documenting challenges and Linux-specific malicious techniques. With @emd3l@reyammer@balzarot https://t.co/uK56uu2BgI
“We figured out a way to trick your voice assistants to respond to our commands but since it might be too obvious to you if we do that, we embedded our commands in songs, and everytime your voice assistant hears our songs it executes our commands”.
🔥 This is fine 🔥 https://t.co/vP0vHbO9rC