7 AI Agent Roles That Supercharge Docker's Shipping Speed

At Docker, the Coding Agent Sandboxes team (known as "sbx") has pioneered a new way to accelerate development: a virtual team of seven AI agents that work autonomously in CI. These agents don't just automate tasks—they take on personas like tester, build engineer, or release manager, each equipped with a skill file that defines their role and decision-making process. Built on top of secure microVM-based sandboxes, this "Fleet" tests products, triages issues, posts release notes, and even fixes bugs. The result? Faster shipping cycles and a more resilient codebase. Here are seven key things to know about how this Fleet works.

1. The Fleet: A Virtual Team of Autonomous Agents

The Fleet consists of seven distinct AI agent roles, each with a defined persona and responsibilities. These agents run entirely within Docker's CI pipelines, operating inside isolated microVM sandboxes that provide their own Docker daemon, network, and filesystem. This setup ensures they never interfere with the host system. Originally a simple sandbox management tool, the project evolved to create a "miniature team" that handles tasks from exploratory testing to bug fixing. The agents are not scripted to follow rigid steps; instead, they use role-based reasoning to investigate and solve problems independently. For example, when a test fails unexpectedly, the agent doesn't stop—it starts investigating why and reports findings.

7 AI Agent Roles That Supercharge Docker's Shipping Speed — Source: www.docker.com

2. Skills Over Scripts: Role Descriptions That Enable Judgment

At the heart of the Fleet is the concept of "skills." A skill is a markdown file that describes an agent's persona, responsibilities, and allowed tools, similar to a job description. Unlike traditional scripts that dictate exact steps—"run this command, then that"—a skill empowers the agent to use judgment. For instance, the build engineer role defines what the agent knows about build processes and how to make decisions when a compilation fails. This distinction is critical: a script stops at an unexpected error, but a role investigates and adapts. The same skill file works whether the agent runs on a developer's laptop or in CI, ensuring consistent behavior and faster debugging.

3. Local First, CI Second: Debugging in Seconds, Not Minutes

The Fleet's design principle is simple: every skill runs on your machine first. The team never writes a skill directly for CI. Instead, they invoke it locally, watch the agent think, and tweak the skill until it performs correctly. This iterative process takes seconds because you see the agent struggle and correct course instantly. Only after local validation do they wire the skill into a GitHub workflow. The alternative—debugging through commit-push-wait-read-logs cycles—would waste minutes per iteration. This approach ensures that the agent's behavior is predictable and reliable before it scales to nightly CI runs across macOS, Linux, and Windows.

4. The /cli-tester: An Exploratory Tester That Runs Cross-Platform

One of the most active roles in the Fleet is the /cli-tester, an exploratory tester that exercises the sbx CLI. It builds binaries, runs commands against different sandbox configurations, and uncovers edge cases. The agent operates by simulating real-world usage: it creates, starts, stops, and removes sandboxes while mounting workspaces and configuring networking. Every night, this agent runs on macOS, Linux, and Windows runners to catch platform-specific bugs and resource leaks. Its reports feed directly into the issue tracker, providing detailed logs and reproduction steps. Because the skill was first honed locally, the agent's testing is thorough and its findings actionable, reducing manual QA effort significantly.

5. Seamless Transitions: One Skill, Two Runtimes

A key architectural insight is that the same skill file—identical in content—runs both locally and in CI. The CI pipeline simply sets up the environment (e.g., checks out code, installs dependencies) and then calls the skill. No special "CI version" or translation layer is needed. This unity eliminates the common problem of environment-specific bugs, where code works on a developer's laptop but fails in automated testing. It also means that any improvements made to the skill during local debugging instantly benefit the CI runs. For the team, this reduces cognitive overhead: they only need to understand one system, not two. The result is faster iteration and more reliable automation.

6. Triage, Release Notes, and Bug Fixes: A Multi-Role Workforce

Beyond testing, the Fleet handles essential but tedious maintenance tasks. Agents are assigned to triage incoming issues by categorizing severity, reproducing bugs, and suggesting fixes based on historical patterns. Another role automatically generates release notes by analyzing merged pull requests and summarizing changes in a user-friendly format. A third role even applies patches for minor bugs, running them through the same test suite before committing. Each role operates independently but shares a common communication channel—the sandbox's filesystem—so they can pass artifacts and logs without human intervention. This division of labor turns what could be a team of developers into a scalable virtual workforce that works 24/7.

7. Faster Shipping and a More Resilient Future

The impact of the Fleet is measurable: releases ship faster because testing and reporting happen continuously, and the issue backlog no longer requires dedicated human hours. The team has seen a reduction in manual QA time by over 40%, and critical bugs are caught earlier in the pipeline. Looking ahead, Docker plans to expand the Fleet to handle more complex tasks, such as automated code reviews and performance benchmarking. The architecture also opens possibilities for external contributions: anyone can write a skill file and run it locally, then submit it to the Fleet. This democratization of automation could transform how teams manage code quality at scale. For now, the Fleet proves that a well-designed virtual team can dramatically accelerate development without sacrificing reliability.

Conclusion
Docker's Coding Agent Sandboxes team has demonstrated that a fleet of AI agents, guided by simple skill files and running on secure sandboxes, can autonomously handle many aspects of software development. From exploratory testing across platforms to triaging issues and generating release notes, these agents operate with the same skill locally and in CI, enabling rapid iteration. The result is a faster, more resilient shipping process that frees human developers to focus on creative challenges. As AI agent technology evolves, this model may well become a standard practice in software engineering.

Xshell Pro