How to Build an Autonomous Fleet of AI Coding Agents: A Step-by-Step Guide

Introduction

Imagine a team of seven virtual AI agents that test your product, triage issues, post release notes, and even fix bugs—all running autonomously in your CI pipeline. That’s exactly what the Coding Agent Sandboxes team at Docker built with their Fleet of agent roles, powered by Claude Code skills. This guide walks you through creating your own fleet of agent personas that work on local machines and in CI, shipping faster with fewer manual interventions. By the end, you’ll have a blueprint for constructing a virtual agent team that uses judgment, not just scripts, to handle real-world tasks.

How to Build an Autonomous Fleet of AI Coding Agents: A Step-by-Step Guide — Source: www.docker.com

What You Need

Claude Code CLI – The core command-line tool for interacting with Claude and loading skill files.
A sandbox or isolation environment – Docker’s Coding Agent Sandboxes (sbx) or any microVM-based isolation that provides secure, autonomous runtime for agents.
GitHub repository – To host skill files, workflows, and your product code (tested across macOS, Linux, and Windows).
CI/CD platform – GitHub Actions, GitLab CI, or similar that can run matrix builds for multiple OSes.
Markdown editor – For creating skill files that define agent personas, responsibilities, and allowed tools.
Copy of your product or CLI – The application your agents will test, triage, and release.
Access to Claude API or local model – To allow agents to think and act autonomously.

Step-by-Step Instructions

Step 1: Define Your Agent Roles (Personas)

Start by listing the tasks you want to automate. For the Docker Fleet, these included exploratory testing, CLI integration testing, issue triage, release note generation, load testing, documentation review, and bug fixing. For each task, write a skill file (a Markdown document) that describes the agent’s persona—its expertise, decision-making style, and constraints. A good skill file does not contain step-by-step scripts; instead, it says, “You are the build engineer. You know how to compile and package releases across platforms. You decide when to run integration tests vs. smoke tests.” This distinction is crucial because agents need judgment, not just instructions. When a test fails unexpectedly, a script stops; a role investigates.

Step 2: Write Skill Files Locally First

Never start by writing a GitHub workflow. Instead, open a terminal and invoke your skill directly with Claude Code. For example, to create a CLI tester skill (/cli-tester), you might begin with: claude code --skill ./skills/cli-tester.md. Watch the agent think, execute commands, and report findings. Tweak the skill file until it behaves correctly in your local environment. This local-first approach accelerates iteration cycles from minutes to seconds—you see confusion immediately and fix it. Remember: the same skill file will run identically in CI later.

Step 3: Create a Sandbox Environment for Autonomy

Agents need full autonomy without risking your host system. Use a sandbox like Docker’s sbx (Coding Agent Sandboxes) that provides microVM-based isolation. Each agent gets its own Docker daemon, network, and filesystem. Configure your sandbox to mount the workspace, set environment variables, and grant networking access if needed. The sandbox ensures agents can install dependencies, start services, and test upgrades without affecting your machine. Test locally that agents can run inside the sandbox and still load their skill files.

Step 4: Wire One Skill into CI

Pick the simplest agent role (e.g., a release note generator) and create a GitHub Actions workflow that runs it. The workflow should checkout code, set up the sandbox environment, and invoke the exact same skill file you tested locally. For example:

Use a matrix strategy for macOS, Linux, and Windows runners.
Install Claude Code CLI and any necessary dependencies.
Start your sandbox with appropriate configuration (mounting repo, enabling networking).
Run claude code --skill ./skills/release-notes.md inside the sandbox.
Capture output and push artifacts (e.g., release notes markdown) back to the repository.

Debug any CI-only issues (environment variables, path differences) but keep the skill file unchanged. The goal is a single source of truth for agent behavior.

Step 5: Expand the Fleet with More Roles

Once the first agent runs reliably in CI, add additional roles one by one. For each new role:

Write its skill file locally and test iteratively.
Add a separate CI job or parallel workflow that triggers on a schedule (nightly) or on pull requests.
Ensure each agent has its own set of allowed tools (e.g., the triage agent can only read issues and comment, not push code).

Common fleet roles from the Docker example:

Exploratory Tester – Runs random CLI commands and flags crashes.
Integration Tester – Tests upgrade paths, network configs, and cross-platform behavior.
Issue Triage Agent – Reads new issues, categorizes them, and assigns priority.
Release Manager – Generates release notes, bumps version numbers, and creates tags.
Load Tester – Simulates sustained usage and reports resource leaks.
Documentation Reviewer – Checks for outdated docs and suggests updates.
Bug Fix Bot – For simple issues, proposes and tests fixes autonomously.

Step 6: Implement Cross-Fleet Collaboration

When multiple agents share a backlog (e.g., release notes need test results), let them communicate through shared artifacts. For instance, the integration tester can produce a test report JSON that the release manager reads to decide which features to include. Agents can also collaborate via issue comments—the triage agent can tag the bug fix bot for a confirmed issue. Use the CI pipeline to orchestrate: job A (tester) completes, then job B (release notes) triggers and reads the output. This creates a virtual team that works asynchronously in production.

Step 7: Monitor and Iterate on Skill Files

Treat your fleet as a living system. Review agent logs daily. When an agent makes a wrong decision (e.g., closing a valid issue as a duplicate), adjust its skill file to clarify criteria. Use version control for skill files—each change is tracked. Because agents run both locally and in CI, you can reproduce any misbehavior instantly on your laptop. Over time, you’ll develop a library of refined personas that handle more edge cases without human intervention.

Tips for Success

Start small – Build one agent role (like a release note generator) before rolling out a full fleet. Prove the local-first, CI-second workflow first.
Use judgment-focused definitions – Resist the urge to write brittle scripts. A good skill file teaches the agent how to think, not just what to do.
Debug locally – Every CI failure should first be reproduced on your machine. That’s where you can see the agent’s reasoning and fix the skill file in seconds.
Keep CI environment minimal – The workflow should only set up the sandbox and call the skill. Avoid environment-specific tweaks that break portability.
Document your fleet – Maintain a README.md that lists all agent roles, their responsibilities, and how to run them locally. New team members can then contribute new skills or debug existing ones.
Monitor resource usage – Autonomous agents can consume tokens and compute time. Set limits on iteration counts, retries, and runtime durations to keep costs predictable.
Iterate often – Your agents’ behavior will improve over time. Treat skill files as code: review, refactor, and test regularly.

By following these steps, you can build a virtual AI agent fleet that accelerates shipping, reduces manual toil, and scales with your product—just like the Docker Coding Agent Sandboxes team did. Start with one role, master the local-first pattern, and gradually grow your autonomous crew. Back to top