Billions of Tokens Later: Scaling LLM Fuzzing in Practice

2025-07-18

Summary: A sourced summary of large-scale experiments using LLMs for black-box and white-box fuzzing of compilers and related projects.

This sourced technical post reports on scaling LLM-based fuzzing after earlier experiments with documentation-driven compiler testing. The team tested black-box fuzzing, white-box agentic bug finding, deduplication, manual review, and model comparisons across Tact, FunC, Tolk, TON-related libraries, and other targets.

The post reports 112 real issues found across the experiments. It also explains that deduplication is a major part of the workflow because large LLM runs produce many repeated or low-value findings. One practical conclusion is that many narrower fuzzing runs across different topics can be better than one huge run on a single topic.

Free Basics version: AI can help find software bugs, but results must be filtered, deduplicated, and reviewed by humans.

Source: Daniil Sedov, Gusarich's thoughts, gusarich.com/blog/billions-of-tokens-later