Over the past weeks, I've built something that doesn't have a standard name yet: an AI development team. Not a single AI assistant that helps you code, but a coordinated system of specialized AI agents that take a Jira ticket, design the UI, implement the frontend and backend in parallel, review their own code for security issues, run tests, generate reports, and update Jira with the results — all from a single command.
This document explains how it works. Not the marketing version — the actual architecture, the design decisions, the things that worked and the things that didn't. It's written for engineers and technical leads who want to understand what's possible with agent orchestration today, and what the practical constraints are.
This is not a product. It's a working prototype that runs against real Jira tickets and real codebases. The commands are Markdown files. The agents are Claude instances with specialized prompts. The "team" is a set of conventions for how those instances coordinate through files and tool calls. There is no custom infrastructure — it runs entirely within Claude Code's existing agent framework.
Claude Code has a feature called custom slash commands. You write a Markdown file in .claude/commands/, and it becomes a command you can run in the terminal. The Markdown file is the prompt — it tells Claude what role to play, what steps to follow, what tools to use, and when to spawn sub-agents.
The insight that makes this work: a sufficiently detailed Markdown prompt is an executable specification. If you describe a 5-phase pipeline with decision points, agent handoffs, and file-based communication — Claude will execute it. The Markdown file is simultaneously the documentation and the implementation.
# Example: .claude/commands/jira.md (simplified)
You are the **Jira Ticket Orchestrator**.
Fetch the ticket, analyze it, plan the work,
coordinate implementation agents, and update Jira.
## PHASE 1: Fetch & Analyze
Fetch the ticket via REST API...
Ask the user for project paths...
## PHASE 2: Design (if UI changes)
Launch a UI Designer agent in Paper.design...
Present designs to user for approval...
## PHASE 3: Implementation
Launch Frontend + Backend agents in parallel...
## PHASE 4: Report & Update Jira
Generate HTML report, post to Jira...
That's it. No framework, no SDK, no deployment pipeline. A Markdown file in a folder. When a user types /jira FO-2847, Claude reads the Markdown, and the pipeline begins.
The system has three layers:
| Layer | What It Is | Examples |
|---|---|---|
| Commands | Markdown files that define pipelines. Each command is a self-contained workflow with phases, decision points, and agent specifications. | /jira, /new-feature, /full-pipeline, /unit-test, /deps |
| Agents | Claude instances spawned by commands. Each agent has a specialized role, receives a detailed prompt, and works independently. Agents have no memory of each other — they communicate through files. | Security Auditor, Frontend Developer, UI Designer, Test Engineer, Code Analyst |
| Integrations | External systems that agents interact with via APIs, MCP servers, or CLI tools. | Jira REST API, JAM MCP, Paper MCP, Playwright, Docker, Git |
Each agent is a Claude instance with a specialized prompt. When the orchestrator spawns an agent, it passes the full context that agent needs: the ticket details, the stack profile, the file paths, and the specific task. Agents don't know about each other — they each do their job and write their output to a report file.
The orchestrator isn't a separate agent — it's the main Claude instance running the command. It's the only one that can interact with the user. It reads the command Markdown, executes the phases, spawns sub-agents, and presents results for user approval.
Responsibilities:
Sub-agents cannot interact with the user. Only the main Claude instance (the orchestrator) can ask questions, present options, and get approvals. This is a Claude Code constraint, not a design choice — but it's actually a good constraint. It means all user-facing decisions flow through a single point, which makes the pipeline predictable and auditable.
Creates visual mockups in Paper.design using MCP tools. Reads the feature plan, existing code patterns, and any screenshots of the current app. Produces artboards with HTML/CSS designs and writes a design specification.
Tools used: mcp__paper__create_artboard, mcp__paper__write_html, mcp__paper__get_screenshot, mcp__paper__get_jsx
Output: Paper artboards + reports/feature-ui-plan.md
Key constraint: The user MUST review and approve designs before implementation begins. This is a hard gate — no code is written until designs are approved.
Implements UI changes based on the approved design spec and feature plan. Creates components, pages, routes, hooks. Updates navigation. Follows the existing project's patterns and the API contract exactly.
Input: Feature plan + UI design spec + existing codebase patterns
Output: reports/feature-ui-implementation.md
Runs in parallel with: Backend Developer (when scope is full-stack)
Implements API endpoints, database schema changes, middleware. Uses parameterized SQL, proper auth, input validation. Follows the API contract from the feature plan exactly.
Input: Feature plan + existing codebase patterns
Output: reports/feature-backend-implementation.md
Also used for: Fixing issues found by Security Auditor and Quality Engineer in the dev-team loop
Scans every source file for OWASP Top 10 vulnerabilities: injection, broken auth, data exposure, XSS, access control, misconfiguration. Categorizes findings as CRITICAL, WARNING, or INFO.
Output: reports/security-audit.md
Runs in parallel with: Code Quality Engineer
Scans for code smells, bugs, anti-patterns: type safety issues, error handling gaps, performance problems, dead code, API design issues. Same severity categorization.
Output: reports/quality-audit.md
Writes unit tests for every fix. Covers edge cases and security regression scenarios. Runs all tests (existing + new) and iterates until 100% pass. Supports 9 stacks: Java/JUnit, C#/xUnit, JS/Jest, Python/pytest, Go, Rust, PHP, Ruby.
Output: reports/test-report.md or reports/unit-test-report.html
Reviews ONLY the git diff — not the entire codebase. Checks new code for SQL injection, XSS, command injection, hardcoded secrets, null refs, race conditions, missing auth. Auto-fixes critical issues.
Output: reports/code-analysis.md
Key distinction from Security Auditor: The Security Auditor scans the whole codebase. The Code Analyst reviews only what changed. One is for audits, the other is for pull request-style review.
Compiles all agent reports into a single, polished HTML report. Dark theme, collapsible sections, severity badges, code snippets, testing checklists. The report is the artifact that gets attached to the Jira ticket.
Input: All reports/*.md files from other agents
Output: reports/master-report.html or reports/feature-report.html or reports/jira-{KEY}-report.html
Agents don't talk to each other directly. They communicate through three mechanisms:
Every agent writes its output to a file in reports/. The orchestrator reads these files and passes relevant content to downstream agents. For example:
reports/security-audit.mdreports/quality-audit.mdreports/fixes-applied.mdThis is effectively a message-passing system where the messages are Markdown files. It's simple, inspectable (you can read the files), and robust (files don't disappear if an agent crashes).
When the orchestrator spawns an agent, it includes all necessary context directly in the prompt. The agent receives: the Jira ticket details, the stack profile, the file paths, the design spec, and whatever else it needs. This is redundant with the file system, but it ensures the agent has everything without needing to read files first.
Agents interact with external systems via tool calls: Jira REST API (via curl), Paper MCP tools, JAM MCP tools, Docker CLI, Git commands, Playwright. The orchestrator handles Jira authentication centrally (credentials are in .env), and agents inherit this through sourcing the same file.
User: /jira FO-2847 │ ▼ Orchestrator ──── curl ────▶ Jira REST API (fetch ticket) │ │ │ ◄── ticket JSON ──────────┘ │ ├── Detect JAM links? ──── mcp ────▶ JAM MCP (video analysis) │ ├── Ask user for project paths (AskUserQuestion) │ ├── UI changes? ──── spawn ────▶ UI Designer Agent ──── mcp ────▶ Paper MCP │ │ │ ◄── reports/feature-ui-plan.md ───┘ │ ├── User approves designs? (AskUserQuestion) │ ├── spawn (parallel) ──────▶ Frontend Dev Agent ──▶ reports/feature-ui-implementation.md │ └─────▶ Backend Dev Agent ──▶ reports/feature-backend-implementation.md │ ├── spawn ──────────────────▶ Code Analyst ──▶ reports/code-analysis.md │ ├── spawn ──────────────────▶ Doc Lead ──▶ reports/jira-FO-2847-report.html │ ├── curl ────▶ Jira (upload report + screenshots + comment) │ └── curl ────▶ Jira (transition status + log time)
Each command is a Markdown file in .claude/commands/. Some are standalone (single agent, single task), others are orchestrators that spawn multiple agents across phases. Here's the full inventory:
| Command | What It Does | Phases |
|---|---|---|
/jira FO-2847 |
Full ticket-to-resolution pipeline. Fetches ticket, detects JAM recordings, classifies scope, asks for project paths, designs in Paper (if UI), implements, reviews code, generates report, updates Jira (comment + attachments + status + time). | 5 phases |
/jira sprint |
Batch mode. Fetches available Scrum teams dynamically, user picks a team, fetches sprint tickets, user selects which to process, runs each through the full pipeline sequentially with continue/skip/stop controls. | 5 phases per ticket |
/jira teams |
Lists all available Scrum teams from Jira. Instant lookup, no processing. | Instant |
/jam {url} |
Analyzes JAM bug recordings via MCP. Fetches video analysis, console logs, network requests, user events. Accepts URL, JAM ID, or Jira ticket key. | MCP-based |
/tempo |
Time logging. /tempo addTime FO-2847 2h "Bug fix". Also supports getTime, getWeek, getMonth for viewing logged time. |
Instant |
The Jira pipeline always asks the user which project directories to work in. It never guesses. This was a deliberate decision after the AI incorrectly routed work to the wrong codebase. The current working directory (d:\Kunder\247\AIComp) is the orchestration project — it contains the commands and configs, not the actual code. The real codebases are at separate paths like D:\Kunder\247\Finago\control-backend-api.
| Command | What It Does | Agents Involved |
|---|---|---|
/new-feature |
6-phase pipeline: plan feature, capture existing UI screenshots, design in Paper (user approval gate), parallel frontend + backend implementation, code analysis, master report. Optional handoff to /full-pipeline for testing. |
Orchestrator, UI Designer, Frontend Dev, Backend Dev, Code Analyst, Doc Lead |
/code-analysis |
Reviews only the git diff. Checks for security issues, logic errors, quality problems. Auto-fixes critical issues. | Code Analyst (single agent) |
/full-pipeline| Command | What It Does | Agents Involved |
|---|---|---|
/dev-team |
Iterative quality loop: parallel security + quality scan, fix all issues, write tests, re-scan. Repeats until zero new findings. Typically 2-3 rounds. | Security Auditor, Quality Engineer, Backend Developer, Test Engineer, Doc Lead |
/security-audit |
Standalone OWASP Top 10 scan. | Security Auditor |
/quality-audit |
Standalone code quality scan. | Quality Engineer |
/fix-all |
Fix all findings from audit reports. | Backend Developer |
/test-all |
Write tests for all fixes and run them. | Test Engineer |
/master-report |
Compile all reports into master HTML. | Documentation Lead |
| Command | What It Does | Modes |
|---|---|---|
/unit-test |
Maps existing test coverage, identifies gaps, creates unit tests, runs and fixes them iteratively (up to 5 rounds). Supports 9 language stacks. | * (full scan), {file} (single), --fix-ignored (rehabilitate disabled tests) |
/playwright-test |
Runs Playwright E2E browser tests. Analyzes failures, distinguishes app bugs from test bugs, writes missing tests, generates HTML report. | Single mode |
| Command | What It Does | Modes |
|---|---|---|
/deps |
Scans dependencies for CVEs (with exploitability assessment), outdated packages (staleness score), and license risks (GPL/copyleft detection). Produces a health score 0-100 with letter grade A-F. Can auto-fix safe updates or export Dependabot/Renovate configs. | Full audit, --vuln-only, --outdated, --license |
| Command | What It Does |
|---|---|
/docker-build | Build Docker image, validate (size, non-root, health check, no secrets), security scan. |
/docker-deploy | Deploy with docker compose, wait for health check, test endpoints, collect container info. |
/docker-test | Full integration test suite against live container. |
/docker-teardown | Gracefully tear down containers, report freed resources. |
/full-pipeline | All of the above: dev-team loop + Playwright + Docker build/deploy/test + master report. |
| Command | What It Does |
|---|---|
/create "description" | Context-aware feature creator. Reads ProjectType from .env, adapts role (Game/App/SaaS Dev), designs in Paper, generates HTML plan with mockup screenshots (auto-opens in browser), implements, verifies with Playwright. |
/create-project "description" | Full project creator from scratch. Asks clarifying questions, designs architecture + UI in Paper, generates HTML plan, builds with full agent team (backend + frontend + security + tests + Docker), delivers running application. |
/bug "description" | Context-aware bug fixer. Analyzes pasted screenshots, diagnoses root cause, applies minimum fix, verifies with Playwright, saves timestamped report. |
/verify | E2E verification with Playwright. Uses project profile (.claude/project-profile.json) for login and navigation. Takes before/after screenshots, generates self-contained HTML report with clickable lightbox. Auto-checks profile completeness before running. |
/changelog | Reads reports from .claude/unprocessed_reports/ (created by /create and /bug), generates beautiful HTML changelog with features (blue) and bug fixes (amber). Moves processed reports to prevent double-counting. |
| Command | What It Does |
|---|---|
/git sync {branch} | Merge latest from a branch into current. Pre-flight checks, fetch, merge. With --fix-merge-errors: AI-powered per-file conflict resolution. With --all: sync all projects. |
/git status | Quick branch overview: ahead/behind, uncommitted changes, stashes, last commit. |
| Command | What It Does |
|---|---|
/repo-setup {url} | Clone, analyze stack, install deps, configure env, build, test, start. Produces 12-section HTML setup guide. Detects dependent repos and offers to set them up too. |
/repo-setup {org_url} | Organization scan: fetches all repos via GitHub API, maps relationships (depends-on, frontend-for, shared-library), calculates startup order, generates architecture documentation with CSS diagrams. |
/repo-setup --auto-setup | Auto-clone, install, build, test all repos without prompting. Compatible with --search filter. |
docker compose build — validate image size, non-root user, health check, no secrets in layers.docker compose up -d — health check, test endpoints, collect container ID, IP, ports, network.| System | Protocol | What It's Used For |
|---|---|---|
| Jira | REST API (curl) | Fetch tickets, post comments, upload attachments, transition status, log time. Credentials in .env. |
| JAM (jam.dev) | MCP Server | Analyze bug recordings: video analysis, console logs, network requests, user events, screenshots. Auto-detected in Jira ticket descriptions and comments. |
| Paper (paper.design) | MCP Server | Create UI mockups: artboards, HTML/CSS designs, screenshots, JSX export. Used in both /jira (for UI-related tickets) and /new-feature. |
| Docker | CLI | Build images, deploy containers, run integration tests, teardown. |
| Playwright | CLI (npx) | Browser-based E2E testing, screenshot capture for before/after comparisons. |
| Git | CLI | Diff detection for code analysis, branch management. |
Decision: Agents communicate through reports/*.md files, not through shared memory or context.
Why: Sub-agents in Claude Code start with a fresh context. They have no memory of the parent or other agents. Files are the natural handoff mechanism — they're persistent, inspectable, and don't depend on context window management. An agent that writes to reports/security-audit.md produces an artifact that any other agent (or human) can read.
Decision: The pipeline never auto-detects which codebase to work in. It always asks.
Why: The AI incorrectly assumed a ticket belonged to gateway-backend when it was actually control-backend-api. In a multi-project environment, the cost of working in the wrong codebase is catastrophic — you're modifying the wrong code. Asking takes 5 seconds. Fixing a wrong-codebase mistake takes much longer.
Decision: Two mandatory approval points: (1) the feature plan, and (2) the UI design. No code is written until both are approved.
Why: Agent work is cheap to redo at the planning stage but expensive at the implementation stage. A wrong plan means multiple agents building the wrong thing. Catching misunderstandings at the plan/design stage saves massive amounts of token spend and time.
Decision: Security + Quality scans run in parallel. Frontend + Backend implementation run in parallel. But design must complete before implementation, and implementation must complete before code review.
Why: True parallelism saves time (two agents working simultaneously is faster than two agents working sequentially). But dependencies must be respected: you can't implement a UI that hasn't been designed, and you can't review code that hasn't been written.
Decision: Commands like /unit-test and /deps detect the project's tech stack at runtime and adapt their behavior. The same command works for Java/Maven, C#/.NET, JavaScript/npm, Python/pip, Go, Rust, PHP, and Ruby.
Why: We work across multiple stacks (Java backend, React frontend, .NET APIs). Having separate commands per stack would be unmaintainable. Runtime detection lets one command serve all projects.
Decision: Every pipeline generates an HTML report, regardless of whether the user asked for one.
Why: Reports serve three purposes: (1) they're the artifact that gets uploaded to Jira, (2) they're the communication mechanism between agents, and (3) they're the audit trail. If something goes wrong, the report tells you what each agent did and found.
Used in /dev-team and /unit-test. Scan, fix, verify, re-scan. Continue until zero findings. This is the most reliable pattern because it's self-correcting: if a fix introduces a new issue, the next scan catches it.
Used in /jira and /new-feature. Classify the work as UI-only, backend-only, or full-stack, then branch the pipeline accordingly. UI-only skips backend agents. Backend-only skips design and frontend. Full-stack runs everything.
Launch two or more agents simultaneously, wait for all to complete, then merge their outputs into the next step. Used for Security + Quality scanning, and for Frontend + Backend implementation.
Present results to the user with explicit options (Accept / Change / Reject). Block pipeline progression until the user responds. Used for plan approval, design review, and Jira update decisions.
Commands accept either a full path or a project name (e.g., control-backend-api). The system resolves names to paths based on known project directories. If resolution fails, it asks the user. This makes commands feel like CLI tools — short, memorable invocations.
This caused the AI to work in the wrong codebase. Never again. Always ask.
A single agent trying to scan, fix, test, and report in one go produces worse results than specialized agents. The context window gets polluted with too many concerns, and the agent loses focus. Specialization works.
Even a one-line color change benefits from a Paper mockup. The user seeing the change before it's implemented catches misunderstandings that are trivial to fix at design time and expensive to fix after implementation.
Newly created tests frequently fail on the first run due to mock misconfiguration, wrong assertions, or compilation errors. The fix loop (run, analyze failures, fix, re-run, up to 5 iterations) is essential. Without it, you'd hand the user a pile of broken tests.
Agent orchestration is expensive in tokens. Every agent spawn is a fresh context. Every tool call costs tokens. Every report file that gets read costs tokens. Here's what we've learned about managing costs:
| Strategy | Impact | Example |
|---|---|---|
| Scope-based branching | High | Backend-only tickets skip UI Designer + Frontend Dev + Paper MCP — saves ~40% of tokens |
| Parallel execution | Medium (time) | Security + Quality scans run simultaneously — wall clock time cut in half, same token cost |
| Targeted code review | High | Code Analyst reviews only the git diff, not the entire codebase — 10x fewer tokens than a full scan |
| Stack detection | Medium | Detecting "Java + Maven" means the agent prompt includes JUnit patterns, not Jest. No wasted exploration. |
| Fix loop cap | Safety | Unit test fix loop caps at 5 iterations. Prevents infinite token burn on unfixable tests. |
In a traditional software project, you write code and the code runs. In this system, you write prompts and the prompts run. The quality of the output is directly proportional to the quality of the prompt. A vague prompt produces vague results. A prompt that specifies exact file paths, exact output formats, and exact decision criteria produces reliable, reproducible results. We treat our .claude/commands/ Markdown files with the same rigor as production code.
A human developer can be told "fix the bug" and figure out the rest. An agent needs: which project directory, what language, what test framework, what files to read first, what patterns to follow, where to write the output, and what to do when it encounters an edge case. The more structure you provide, the better the output.
We initially worried about agents writing to the wrong files, overwriting each other's output, or producing incompatible formats. In practice, this almost never happens. Agents are good at following naming conventions when you tell them explicitly what to write and where.
Every user approval point adds 30-60 seconds of human time. But each gate prevents 5-15 minutes of wasted agent work when the plan or design is wrong. The math is clear: always gate before expensive operations.
Single-pass pipelines (scan once, fix once, done) miss things. The loop pattern (scan, fix, re-scan, repeat until clean) catches issues introduced by fixes, previously masked issues, and interaction effects between changes. It consistently produces cleaner output than single-pass.
Before JAM integration, bug tickets had text descriptions and maybe a screenshot. With JAM MCP, the agent can analyze the actual video recording: see what the user clicked, read the console errors, check the network requests. It turns a vague bug report into a structured, actionable analysis. The auto-detection of JAM links in Jira tickets means this happens automatically — no extra steps.
| Priority | Feature | Why |
|---|---|---|
| High | /pr — Smart PR Creator |
Closes the loop: Jira ticket → implement → PR → Jira update. Currently the pipeline stops before creating a PR. This is the missing piece. |
| Medium | Git integration in /jira |
Auto-create feature branches per ticket, auto-commit with ticket reference, link PRs to Jira. |
| Medium | Slack notifications | Post to a team channel when a ticket is resolved, with a link to the report and key stats. |
| Low | CI/CD integration | Trigger builds after implementation, verify deployment, link build status to Jira. |
| Low | Sprint retrospective (/retro) |
Analyze completed sprint: tickets resolved, time logged, code quality metrics, churn areas. Auto-generate a retro report. |
| Command | Category | Agents | Output |
|---|---|---|---|
/jira {key} | Project Mgmt | Up to 6 | reports/jira-{KEY}-report.html |
/jira sprint | Project Mgmt | Up to 6/ticket | Sequential processing |
/jira teams | Project Mgmt | 0 | Team list (terminal) |
/jam {url} | Bug Analysis | 0 (MCP) | JAM analysis (terminal) |
/tempo | Time Tracking | 0 | Jira worklog |
/new-feature | Feature Dev | 4-5 | reports/feature-report.html |
/dev-team | Code Quality | 5 (iterative) | reports/master-report.html |
/full-pipeline | End-to-End | 9+ | reports/master-report.html + Docker |
/unit-test | Testing | 1-3 | reports/unit-test-report.html |
/playwright-test | Testing | 1 | reports/playwright-report/ |
/deps | Security | 1-2 | reports/deps-audit-report.html |
/security-audit | Security | 1 | reports/security-audit.md |
/quality-audit | Quality | 1 | reports/quality-audit.md |
/fix-all | Implementation | 1 | reports/fixes-applied.md |
/test-all | Testing | 1 | reports/test-report.md |
/code-analysis | Review | 1 | reports/code-analysis.md |
/master-report | Documentation | 1 | reports/master-report.html |
/docker-build | DevOps | 1 | reports/docker-build-report.md |
/docker-deploy | DevOps | 1 | reports/docker-deploy-report.md |
/docker-test | DevOps | 1 | reports/docker-integration-test-report.md |
/docker-teardown | DevOps | 1 | reports/docker-teardown-report.md |
/create "desc" | Universal | 0-1 | reports/feature-plan.html + .claude/unprocessed_reports/ |
/create-project "desc" | Universal | 5+ | reports/project-plan.html + reports/project-delivery-report.html |
/bug "desc" | Universal | 0-1 | .claude/unprocessed_reports/ |
/verify | Testing | 0 | reports/verification-report.html |
/changelog | Documentation | 0 | reports/changelog.html |
/git sync {branch} | Git Ops | 0-1 | reports/git-sync-report.md |
/repo-setup {url} | Onboarding | 1-3 | reports/repo-setup-guide.html |
/report | Reporting | 0 | reports/change-report.html |
/impact-scan "desc" | Analysis | 0 | reports/impact-scan-report.html |
Everything described in this document runs on Claude Code (Opus 4.6) with no custom infrastructure. The entire system is 28 Markdown files in .claude/commands/, an .env file with Jira credentials, and two MCP server connections (JAM + Paper). There is no server, no database, no deployment pipeline. If you have Claude Code, you can copy the Markdown files and have the same team.