Automating Documentation Testing: How the Drasi Team Leveraged GitHub Copilot to Catch Silent Bugs

Overview: A Wake-Up Call for Open-Source Documentation

For early-stage open-source projects, the “Getting Started” guide is often the first impression a developer has. If a command fails, an output doesn’t match, or a step is unclear, most users don’t file a bug report—they simply move on. Drasi, a CNCF sandbox project designed to detect data changes and trigger immediate reactions, experienced this challenge firsthand. Supported by a small team of four engineers within Microsoft Azure’s Office of the CTO, the project moves quickly. While comprehensive tutorials exist, code is shipped faster than manual testing can validate them.

Automating Documentation Testing: How the Drasi Team Leveraged GitHub Copilot to Catch Silent Bugs — Source: azure.microsoft.com

This gap remained hidden until late 2025, when GitHub updated its Dev Container infrastructure, increasing the minimum Docker version. The update broke the Docker daemon connection, rendering every single tutorial nonfunctional. Because the team relied on manual testing, the extent of the damage wasn’t immediately known. Anyone attempting to use Drasi during that window encountered a wall.

This incident forced a critical realization: with advanced AI coding assistants, documentation testing can be transformed into a monitoring problem—actively catching breaks as they occur.

Why Documentation Breaks: Two Root Causes

Documentation tends to fail for two primary reasons: the curse of knowledge and silent drift.

The Curse of Knowledge

Experienced developers write documentation with implicit context. When they write “wait for the query to bootstrap,” they know to run drasi list query and watch for the Running status—or even better, use the drasi wait command. A new user, however, lacks this context. Similarly, an AI agent reads instructions literally and doesn’t infer missing steps. New users get stuck on the “how,” while documentation only describes the “what.”

Silent Drift

Unlike code, documentation doesn’t fail loudly. Renaming a configuration file in the codebase causes an immediate build failure. But when documentation still references the old filename, nothing alerts the team. Drift accumulates silently until a user reports confusion.

This problem compounds for tutorials like Drasi’s, which spin up sandbox environments with Docker, k3d, and sample databases. When any upstream dependency changes—a deprecated flag, a version bump, or a new default—tutorials can break without anyone noticing.

The Solution: AI Agents as Synthetic Users

To address this, the team reframed tutorial testing as a simulation problem. They built an AI agent that acts as a “synthetic new user.” The agent has three critical characteristics:

Naïve: It has no prior knowledge of Drasi—only what is explicitly stated in the tutorial.
Literal: It executes every command exactly as written. If a step is missing, it fails.
Unforgiving: It verifies every expected output. If the documentation says “You should see ‘Success’” and the CLI returns silently, the agent flags it and fails fast.

The Stack: GitHub Copilot CLI and Dev Containers

The solution was built using GitHub Copilot CLI and Dev Containers. The Copilot CLI, powered by advanced AI, allows the agent to interact with the terminal just as a human would. It reads each step from the tutorial (potentially parsed into a structured format) and executes commands, listening for outputs. Dev Containers provide consistent, disposable environments—ensuring every run starts fresh and isolated.

By combining these tools, the team created a test harness that runs automatically after every code change. When a pull request updates the codebase, the agent runs through the tutorials in Dev Containers. If anything breaks—due to a dependency change, a confusing instruction, or a missing verification—the agent reports the failure immediately. This turns manual, sporadic checks into continuous monitoring.

Impact and Future Directions

This approach dramatically reduced the time between a documentation break and its detection. Instead of waiting for user complaints, the team now catches issues proactively. The synthetic user agent helps uncover both the curse of knowledge (by detecting gaps in instructions) and silent drift (by flagging unexpected command outcomes).

Looking ahead, the Drasi team plans to extend this to more complex scenarios, like verifying multi-step workflows and checking outputs against expected values. This technique can be adopted by any open-source project seeking to keep documentation reliable without draining maintainer resources.

By leveraging GitHub Copilot CLI as an automated tester, documentation quality no longer depends on manual review cycles—it becomes an integral part of continuous integration, ensuring that the “Getting started” experience always works for real users.