Building an AI Development Team: Architecture & Design Decisions

Why This Document Exists

Over the past weeks, I've built something that doesn't have a standard name yet: an AI development team. Not a single AI assistant that helps you code, but a coordinated system of specialized AI agents that take a Jira ticket, design the UI, implement the frontend and backend in parallel, review their own code for security issues, run tests, generate reports, and update Jira with the results — all from a single command.

This document explains how it works. Not the marketing version — the actual architecture, the design decisions, the things that worked and the things that didn't. It's written for engineers and technical leads who want to understand what's possible with agent orchestration today, and what the practical constraints are.

What this is not

This is not a product. It's a working prototype that runs against real Jira tickets and real codebases. The commands are Markdown files. The agents are Claude instances with specialized prompts. The "team" is a set of conventions for how those instances coordinate through files and tool calls. There is no custom infrastructure — it runs entirely within Claude Code's existing agent framework.

The Core Idea: Markdown Files That Become AI Teams

Claude Code has a feature called custom slash commands. You write a Markdown file in .claude/commands/, and it becomes a command you can run in the terminal. The Markdown file is the prompt — it tells Claude what role to play, what steps to follow, what tools to use, and when to spawn sub-agents.

The insight that makes this work: a sufficiently detailed Markdown prompt is an executable specification. If you describe a 5-phase pipeline with decision points, agent handoffs, and file-based communication — Claude will execute it. The Markdown file is simultaneously the documentation and the implementation.

# Example: .claude/commands/jira.md (simplified)

You are the **Jira Ticket Orchestrator**.
Fetch the ticket, analyze it, plan the work,
coordinate implementation agents, and update Jira.

## PHASE 1: Fetch & Analyze
Fetch the ticket via REST API...
Ask the user for project paths...

## PHASE 2: Design (if UI changes)
Launch a UI Designer agent in Paper.design...
Present designs to user for approval...

## PHASE 3: Implementation
Launch Frontend + Backend agents in parallel...

## PHASE 4: Report & Update Jira
Generate HTML report, post to Jira...

That's it. No framework, no SDK, no deployment pipeline. A Markdown file in a folder. When a user types /jira FO-2847, Claude reads the Markdown, and the pipeline begins.

Architecture Overview

The system has three layers:

Layer	What It Is	Examples
Commands	Markdown files that define pipelines. Each command is a self-contained workflow with phases, decision points, and agent specifications.	`/jira`, `/new-feature`, `/full-pipeline`, `/unit-test`, `/deps`
Agents	Claude instances spawned by commands. Each agent has a specialized role, receives a detailed prompt, and works independently. Agents have no memory of each other — they communicate through files.	Security Auditor, Frontend Developer, UI Designer, Test Engineer, Code Analyst
Integrations	External systems that agents interact with via APIs, MCP servers, or CLI tools.	Jira REST API, JAM MCP, Paper MCP, Playwright, Docker, Git

System Architecture

User

Types a command

↓

Orchestrator

Reads the .md command
Plans, coordinates, asks user

↓ spawns agents ↓

UI Designer

Paper MCP

Frontend Dev

Components, routes

Backend Dev

APIs, DB, logic

Code Analyst

Security + quality

Test Engineer

Unit + E2E tests

Doc Lead

HTML reports

↓ communicate via ↓

reports/*.md

File-based comms

Jira API

Tickets, comments

JAM MCP

Bug recordings

Paper MCP

Design tool

The Agent Team: Roles and Responsibilities

Each agent is a Claude instance with a specialized prompt. When the orchestrator spawns an agent, it passes the full context that agent needs: the ticket details, the stack profile, the file paths, and the specific task. Agents don't know about each other — they each do their job and write their output to a report file.

The Orchestrator You

The orchestrator isn't a separate agent — it's the main Claude instance running the command. It's the only one that can interact with the user. It reads the command Markdown, executes the phases, spawns sub-agents, and presents results for user approval.

Responsibilities:

Fetch and parse Jira tickets
Detect project stack and ask user for paths
Classify work scope (UI / Backend / Full-stack)
Present plans and designs for user approval
Spawn and coordinate all other agents
Post results back to Jira

Why the orchestrator asks the user, not the agents

Sub-agents cannot interact with the user. Only the main Claude instance (the orchestrator) can ask questions, present options, and get approvals. This is a Claude Code constraint, not a design choice — but it's actually a good constraint. It means all user-facing decisions flow through a single point, which makes the pipeline predictable and auditable.

UI Designer Design Agent

Creates visual mockups in Paper.design using MCP tools. Reads the feature plan, existing code patterns, and any screenshots of the current app. Produces artboards with HTML/CSS designs and writes a design specification.

Tools used: mcp__paper__create_artboard, mcp__paper__write_html, mcp__paper__get_screenshot, mcp__paper__get_jsx

Output: Paper artboards + reports/feature-ui-plan.md

Key constraint: The user MUST review and approve designs before implementation begins. This is a hard gate — no code is written until designs are approved.

Frontend Developer Implementation Agent

Implements UI changes based on the approved design spec and feature plan. Creates components, pages, routes, hooks. Updates navigation. Follows the existing project's patterns and the API contract exactly.

Input: Feature plan + UI design spec + existing codebase patterns

Output: reports/feature-ui-implementation.md

Runs in parallel with: Backend Developer (when scope is full-stack)

Backend Developer Implementation Agent

Implements API endpoints, database schema changes, middleware. Uses parameterized SQL, proper auth, input validation. Follows the API contract from the feature plan exactly.

Input: Feature plan + existing codebase patterns

Output: reports/feature-backend-implementation.md

Also used for: Fixing issues found by Security Auditor and Quality Engineer in the dev-team loop

Security Auditor Review Agent

Scans every source file for OWASP Top 10 vulnerabilities: injection, broken auth, data exposure, XSS, access control, misconfiguration. Categorizes findings as CRITICAL, WARNING, or INFO.

Output: reports/security-audit.md

Runs in parallel with: Code Quality Engineer

Code Quality Engineer Review Agent

Scans for code smells, bugs, anti-patterns: type safety issues, error handling gaps, performance problems, dead code, API design issues. Same severity categorization.

Output: reports/quality-audit.md

Test Engineer Testing Agent

Writes unit tests for every fix. Covers edge cases and security regression scenarios. Runs all tests (existing + new) and iterates until 100% pass. Supports 9 stacks: Java/JUnit, C#/xUnit, JS/Jest, Python/pytest, Go, Rust, PHP, Ruby.

Output: reports/test-report.md or reports/unit-test-report.html

Code Analyst Review Agent

Reviews ONLY the git diff — not the entire codebase. Checks new code for SQL injection, XSS, command injection, hardcoded secrets, null refs, race conditions, missing auth. Auto-fixes critical issues.

Output: reports/code-analysis.md

Key distinction from Security Auditor: The Security Auditor scans the whole codebase. The Code Analyst reviews only what changed. One is for audits, the other is for pull request-style review.

Documentation Lead Report Agent

Compiles all agent reports into a single, polished HTML report. Dark theme, collapsible sections, severity badges, code snippets, testing checklists. The report is the artifact that gets attached to the Jira ticket.

Input: All reports/*.md files from other agents

Output: reports/master-report.html or reports/feature-report.html or reports/jira-{KEY}-report.html

How Agents Communicate

Agents don't talk to each other directly. They communicate through three mechanisms:

1. Report Files (Primary)

Every agent writes its output to a file in reports/. The orchestrator reads these files and passes relevant content to downstream agents. For example:

Security Auditor writes findings to reports/security-audit.md
Quality Engineer writes findings to reports/quality-audit.md
Orchestrator spawns Backend Developer, telling it: "Read the findings from both reports and fix everything"
Backend Developer writes what it changed to reports/fixes-applied.md
Orchestrator spawns Test Engineer, telling it: "Read the fixes and write tests for each one"

This is effectively a message-passing system where the messages are Markdown files. It's simple, inspectable (you can read the files), and robust (files don't disappear if an agent crashes).

2. Prompt Injection (Secondary)

When the orchestrator spawns an agent, it includes all necessary context directly in the prompt. The agent receives: the Jira ticket details, the stack profile, the file paths, the design spec, and whatever else it needs. This is redundant with the file system, but it ensures the agent has everything without needing to read files first.

3. Tool Calls (External Systems)

Agents interact with external systems via tool calls: Jira REST API (via curl), Paper MCP tools, JAM MCP tools, Docker CLI, Git commands, Playwright. The orchestrator handles Jira authentication centrally (credentials are in .env), and agents inherit this through sourcing the same file.

Communication Flow (Jira Pipeline Example)

User: /jira FO-2847
  │
  ▼
Orchestrator ──── curl ────▶ Jira REST API (fetch ticket)
  │                           │
  │ ◄── ticket JSON ──────────┘
  │
  ├── Detect JAM links? ──── mcp ────▶ JAM MCP (video analysis)
  │
  ├── Ask user for project paths (AskUserQuestion)
  │
  ├── UI changes? ──── spawn ────▶ UI Designer Agent ──── mcp ────▶ Paper MCP
  │                                   │
  │ ◄── reports/feature-ui-plan.md ───┘
  │
  ├── User approves designs? (AskUserQuestion)
  │
  ├── spawn (parallel) ──────▶ Frontend Dev Agent ──▶ reports/feature-ui-implementation.md
  │                    └─────▶ Backend Dev Agent  ──▶ reports/feature-backend-implementation.md
  │
  ├── spawn ──────────────────▶ Code Analyst ──▶ reports/code-analysis.md
  │
  ├── spawn ──────────────────▶ Doc Lead ──▶ reports/jira-FO-2847-report.html
  │
  ├── curl ────▶ Jira (upload report + screenshots + comment)
  │
  └── curl ────▶ Jira (transition status + log time)

The Command System: 28 Slash Commands

Each command is a Markdown file in .claude/commands/. Some are standalone (single agent, single task), others are orchestrators that spawn multiple agents across phases. Here's the full inventory:

Jira Pipeline Project Management

Command	What It Does	Phases
`/jira FO-2847`	Full ticket-to-resolution pipeline. Fetches ticket, detects JAM recordings, classifies scope, asks for project paths, designs in Paper (if UI), implements, reviews code, generates report, updates Jira (comment + attachments + status + time).	5 phases
`/jira sprint`	Batch mode. Fetches available Scrum teams dynamically, user picks a team, fetches sprint tickets, user selects which to process, runs each through the full pipeline sequentially with continue/skip/stop controls.	5 phases per ticket
`/jira teams`	Lists all available Scrum teams from Jira. Instant lookup, no processing.	Instant
`/jam {url}`	Analyzes JAM bug recordings via MCP. Fetches video analysis, console logs, network requests, user events. Accepts URL, JAM ID, or Jira ticket key.	MCP-based
`/tempo`	Time logging. `/tempo addTime FO-2847 2h "Bug fix"`. Also supports `getTime`, `getWeek`, `getMonth` for viewing logged time.	Instant

Mandatory user interaction: project paths

The Jira pipeline always asks the user which project directories to work in. It never guesses. This was a deliberate decision after the AI incorrectly routed work to the wrong codebase. The current working directory (d:\Kunder\247\AIComp) is the orchestration project — it contains the commands and configs, not the actual code. The real codebases are at separate paths like D:\Kunder\247\Finago\control-backend-api.

Feature Development Pipeline Design + Implementation

Command	What It Does	Agents Involved
`/new-feature`	6-phase pipeline: plan feature, capture existing UI screenshots, design in Paper (user approval gate), parallel frontend + backend implementation, code analysis, master report. Optional handoff to `/full-pipeline` for testing.	Orchestrator, UI Designer, Frontend Dev, Backend Dev, Code Analyst, Doc Lead
`/code-analysis`	Reviews only the git diff. Checks for security issues, logic errors, quality problems. Auto-fixes critical issues.	Code Analyst (single agent)

/new-feature — Feature Development Pipeline

Feature Planning — Orchestrator (you) does this directly

User Ask for Project Paths

Detect stack (package.json, *.csproj, pom.xml). Always ask user for backend/frontend/database paths. Build Stack Profile for all agents.

Analyze Codebase

Read project structure, routing, components, APIs, DB models. Adapt to detected stack.

Draft Feature Plan

Summary, scope (UI/Backend/Full-stack), API contract, DB schema changes, UI components, risks.

→ reports/feature-plan.md

Gate User Approval

Present plan. User: Approve / Request changes (iterate) / Reject. No code written until approved.

Scope Decision

Full-stack / UI → Phase 2-3-4 Backend-only → Skip to Phase 4

Gate: user approved plan

Visual Capture — UI changes only

Playwright Screenshots

Check if app running (localhost:3000/5173/4200). Capture BEFORE screenshots of relevant pages as design reference.

→ reports/screenshots/*.png

Gate: screenshots captured

UI Design in Paper — mandatory for any UI change

Agent UI Designer

Creates artboards in Paper.design via MCP. Reads feature plan + existing code patterns + screenshots. Writes design spec.

→ Paper artboards + reports/feature-ui-plan.md

Gate Design Review

User opens Paper to review. Accept (proceed) / Change (re-launch designer) / Reject (new direction). No code until approved.

Gate: user approved designs

Parallel Implementation — agents build simultaneously

Agent Frontend Dev

Components, pages, routes, hooks. Follows approved design + API contract.

→ reports/feature-ui-implementation.md

Agent Backend Dev

APIs, DB schema, middleware, validation. Parameterized SQL, proper auth.

→ reports/feature-backend-implementation.md

Agent Code Analyst

Reviews ONLY the git diff. SQL injection, XSS, secrets, null refs, race conditions. Auto-fixes critical issues.

→ reports/code-analysis.md

Gate: implementation + code review complete

Master Report

Agent Documentation Lead

Compiles all reports: plan, design, frontend, backend, code analysis. Produces polished HTML report with 8 sections.

→ reports/feature-report.html

Optional Testing — handoff to /full-pipeline

User Decision

Run full testing pipeline (security, quality, unit tests, E2E)? If yes → /full-pipeline

Feature Development Complete

Phases

Agents

User Gates

Reports

Development Team Loop Iterative Quality

Command	What It Does	Agents Involved
`/dev-team`	Iterative quality loop: parallel security + quality scan, fix all issues, write tests, re-scan. Repeats until zero new findings. Typically 2-3 rounds.	Security Auditor, Quality Engineer, Backend Developer, Test Engineer, Doc Lead
`/security-audit`	Standalone OWASP Top 10 scan.	Security Auditor
`/quality-audit`	Standalone code quality scan.	Quality Engineer
`/fix-all`	Fix all findings from audit reports.	Backend Developer
`/test-all`	Write tests for all fixes and run them.	Test Engineer
`/master-report`	Compile all reports into master HTML.	Documentation Lead

Testing & Dependencies Quality Assurance

Command	What It Does	Modes
`/unit-test`	Maps existing test coverage, identifies gaps, creates unit tests, runs and fixes them iteratively (up to 5 rounds). Supports 9 language stacks.	`*` (full scan), `{file}` (single), `--fix-ignored` (rehabilitate disabled tests)
`/playwright-test`	Runs Playwright E2E browser tests. Analyzes failures, distinguishes app bugs from test bugs, writes missing tests, generates HTML report.	Single mode

Dependency Health Security & Maintenance

Command	What It Does	Modes
`/deps`	Scans dependencies for CVEs (with exploitability assessment), outdated packages (staleness score), and license risks (GPL/copyleft detection). Produces a health score 0-100 with letter grade A-F. Can auto-fix safe updates or export Dependabot/Renovate configs.	Full audit, `--vuln-only`, `--outdated`, `--license`

Docker Pipeline DevOps

Command	What It Does
`/docker-build`	Build Docker image, validate (size, non-root, health check, no secrets), security scan.
`/docker-deploy`	Deploy with docker compose, wait for health check, test endpoints, collect container info.
`/docker-test`	Full integration test suite against live container.
`/docker-teardown`	Gracefully tear down containers, report freed resources.
`/full-pipeline`	All of the above: dev-team loop + Playwright + Docker build/deploy/test + master report.

Universal Commands Any Project

Command	What It Does
`/create "description"`	Context-aware feature creator. Reads ProjectType from .env, adapts role (Game/App/SaaS Dev), designs in Paper, generates HTML plan with mockup screenshots (auto-opens in browser), implements, verifies with Playwright.
`/create-project "description"`	Full project creator from scratch. Asks clarifying questions, designs architecture + UI in Paper, generates HTML plan, builds with full agent team (backend + frontend + security + tests + Docker), delivers running application.
`/bug "description"`	Context-aware bug fixer. Analyzes pasted screenshots, diagnoses root cause, applies minimum fix, verifies with Playwright, saves timestamped report.
`/verify`	E2E verification with Playwright. Uses project profile (.claude/project-profile.json) for login and navigation. Takes before/after screenshots, generates self-contained HTML report with clickable lightbox. Auto-checks profile completeness before running.
`/changelog`	Reads reports from .claude/unprocessed_reports/ (created by /create and /bug), generates beautiful HTML changelog with features (blue) and bug fixes (amber). Moves processed reports to prevent double-counting.

Git Operations Git

Command	What It Does
`/git sync {branch}`	Merge latest from a branch into current. Pre-flight checks, fetch, merge. With --fix-merge-errors: AI-powered per-file conflict resolution. With --all: sync all projects.
`/git status`	Quick branch overview: ahead/behind, uncommitted changes, stashes, last commit.

Repository Onboarding Onboarding

Command	What It Does
`/repo-setup {url}`	Clone, analyze stack, install deps, configure env, build, test, start. Produces 12-section HTML setup guide. Detects dependent repos and offers to set them up too.
`/repo-setup {org_url}`	Organization scan: fetches all repos via GitHub API, maps relationships (depends-on, frontend-for, shared-library), calculates startup order, generates architecture documentation with CSS diagrams.
`/repo-setup --auto-setup`	Auto-clone, install, build, test all repos without prompting. Compatible with --search filter.

/full-pipeline — End-to-End Delivery Pipeline

Code Quality Loop — iterative: scan → fix → test → verify until zero findings

Agent Security Auditor

OWASP Top 10 scan. Injection, XSS, broken auth, data exposure, misconfig.

→ reports/security-audit.md

Agent Quality Engineer

Code smells, bugs, anti-patterns. Type safety, error handling, dead code.

→ reports/quality-audit.md

Agent Backend Developer

Reads both audit reports. Fixes ALL issues by severity: Critical → Warning → Info.

→ reports/fixes-applied.md

Agent Test Engineer

Writes unit tests for every fix. Edge cases + security regression tests. Runs all tests, iterates until 100% pass.

→ reports/test-report.md

Regression Check

Re-run Security + Quality scans on modified code. New issues → loop back Zero findings → proceed

Gate: zero findings + all unit tests pass

Playwright E2E Testing — browser-based end-to-end tests

Agent Playwright E2E Engineer

Full test suite: auth flows, CRUD, user management, orders, security (XSS, auth bypass, data leaks). Distinguishes app bugs from test bugs. Writes missing tests for coverage gaps.

→ reports/playwright-report/ + reports/playwright-test-report.md

Failure Triage

App bug → fix source, re-run unit + E2E Test bug → fix test, re-run E2E All pass → proceed

Gate: all E2E tests pass

Docker Pipeline — build, deploy, verify, integration test

Agent Docker Build

docker compose build — validate image size, non-root user, health check, no secrets in layers.

→ reports/docker-build-report.md

Agent Docker Deploy

docker compose up -d — health check, test endpoints, collect container ID, IP, ports, network.

→ reports/docker-deploy-report.md

Agent Integration Tests

Full test suite against live container: auth, CRUD, security, edge cases. Records every request/response.

→ reports/docker-integration-test-report.md

Pass / Fail

Fail → fix, rebuild, redeploy, retest All pass → container stays running

Gate: integration tests pass + container healthy

Master Report

Agent Documentation Lead

Compiles ALL agent reports into a single master HTML report: security audit, code quality, fixes, tests, Docker build, deployment info, integration tests, recommendations.

→ reports/master-report.html

Pipeline Complete

Issues fixed

187

Unit tests

E2E tests

100%

Pass rate

Agents

External Integrations

System	Protocol	What It's Used For
Jira	REST API (curl)	Fetch tickets, post comments, upload attachments, transition status, log time. Credentials in `.env`.
JAM (jam.dev)	MCP Server	Analyze bug recordings: video analysis, console logs, network requests, user events, screenshots. Auto-detected in Jira ticket descriptions and comments.
Paper (paper.design)	MCP Server	Create UI mockups: artboards, HTML/CSS designs, screenshots, JSX export. Used in both `/jira` (for UI-related tickets) and `/new-feature`.
Docker	CLI	Build images, deploy containers, run integration tests, teardown.
Playwright	CLI (npx)	Browser-based E2E testing, screenshot capture for before/after comparisons.
Git	CLI	Diff detection for code analysis, branch management.

Key Design Decisions (and Why)

1. File-based communication, not memory

Decision: Agents communicate through reports/*.md files, not through shared memory or context.

Why: Sub-agents in Claude Code start with a fresh context. They have no memory of the parent or other agents. Files are the natural handoff mechanism — they're persistent, inspectable, and don't depend on context window management. An agent that writes to reports/security-audit.md produces an artifact that any other agent (or human) can read.

2. Always ask the user for project paths

Decision: The pipeline never auto-detects which codebase to work in. It always asks.

Why: The AI incorrectly assumed a ticket belonged to gateway-backend when it was actually control-backend-api. In a multi-project environment, the cost of working in the wrong codebase is catastrophic — you're modifying the wrong code. Asking takes 5 seconds. Fixing a wrong-codebase mistake takes much longer.

3. User approval gates before implementation

Decision: Two mandatory approval points: (1) the feature plan, and (2) the UI design. No code is written until both are approved.

Why: Agent work is cheap to redo at the planning stage but expensive at the implementation stage. A wrong plan means multiple agents building the wrong thing. Catching misunderstandings at the plan/design stage saves massive amounts of token spend and time.

4. Parallel agents where possible, sequential where necessary

Decision: Security + Quality scans run in parallel. Frontend + Backend implementation run in parallel. But design must complete before implementation, and implementation must complete before code review.

Why: True parallelism saves time (two agents working simultaneously is faster than two agents working sequentially). But dependencies must be respected: you can't implement a UI that hasn't been designed, and you can't review code that hasn't been written.

5. Stack-agnostic commands with runtime detection

Decision: Commands like /unit-test and /deps detect the project's tech stack at runtime and adapt their behavior. The same command works for Java/Maven, C#/.NET, JavaScript/npm, Python/pip, Go, Rust, PHP, and Ruby.

Why: We work across multiple stacks (Java backend, React frontend, .NET APIs). Having separate commands per stack would be unmaintainable. Runtime detection lets one command serve all projects.

6. Mandatory reports for every pipeline

Decision: Every pipeline generates an HTML report, regardless of whether the user asked for one.

Why: Reports serve three purposes: (1) they're the artifact that gets uploaded to Jira, (2) they're the communication mechanism between agents, and (3) they're the audit trail. If something goes wrong, the report tells you what each agent did and found.

Patterns That Work Well

The Iterative Loop Pattern

Used in /dev-team and /unit-test. Scan, fix, verify, re-scan. Continue until zero findings. This is the most reliable pattern because it's self-correcting: if a fix introduces a new issue, the next scan catches it.

The Scope-Based Branching Pattern

Used in /jira and /new-feature. Classify the work as UI-only, backend-only, or full-stack, then branch the pipeline accordingly. UI-only skips backend agents. Backend-only skips design and frontend. Full-stack runs everything.

The Parallel-Then-Merge Pattern

Launch two or more agents simultaneously, wait for all to complete, then merge their outputs into the next step. Used for Security + Quality scanning, and for Frontend + Backend implementation.

The User Gate Pattern

Present results to the user with explicit options (Accept / Change / Reject). Block pipeline progression until the user responds. Used for plan approval, design review, and Jira update decisions.

The Project Name Resolution Pattern

Commands accept either a full path or a project name (e.g., control-backend-api). The system resolves names to paths based on known project directories. If resolution fails, it asks the user. This makes commands feel like CLI tools — short, memorable invocations.

Patterns That Don't Work

Letting agents guess project paths

This caused the AI to work in the wrong codebase. Never again. Always ask.

One giant agent that does everything

A single agent trying to scan, fix, test, and report in one go produces worse results than specialized agents. The context window gets polluted with too many concerns, and the agent loses focus. Specialization works.

Skipping the design phase "because it's a small change"

Even a one-line color change benefits from a Paper mockup. The user seeing the change before it's implemented catches misunderstandings that are trivial to fix at design time and expensive to fix after implementation.

Running all tests in one shot without iterating

Newly created tests frequently fail on the first run due to mock misconfiguration, wrong assertions, or compilation errors. The fix loop (run, analyze failures, fix, re-run, up to 5 iterations) is essential. Without it, you'd hand the user a pile of broken tests.

Token Efficiency: What Actually Matters

Agent orchestration is expensive in tokens. Every agent spawn is a fresh context. Every tool call costs tokens. Every report file that gets read costs tokens. Here's what we've learned about managing costs:

Strategy	Impact	Example
Scope-based branching	High	Backend-only tickets skip UI Designer + Frontend Dev + Paper MCP — saves ~40% of tokens
Parallel execution	Medium (time)	Security + Quality scans run simultaneously — wall clock time cut in half, same token cost
Targeted code review	High	Code Analyst reviews only the git diff, not the entire codebase — 10x fewer tokens than a full scan
Stack detection	Medium	Detecting "Java + Maven" means the agent prompt includes JUnit patterns, not Jest. No wasted exploration.
Fix loop cap	Safety	Unit test fix loop caps at 5 iterations. Prevents infinite token burn on unfixable tests.

Lessons Learned

1. The prompt IS the product

In a traditional software project, you write code and the code runs. In this system, you write prompts and the prompts run. The quality of the output is directly proportional to the quality of the prompt. A vague prompt produces vague results. A prompt that specifies exact file paths, exact output formats, and exact decision criteria produces reliable, reproducible results. We treat our .claude/commands/ Markdown files with the same rigor as production code.

2. Agents need more structure than humans

A human developer can be told "fix the bug" and figure out the rest. An agent needs: which project directory, what language, what test framework, what files to read first, what patterns to follow, where to write the output, and what to do when it encounters an edge case. The more structure you provide, the better the output.

3. File-based communication is surprisingly robust

We initially worried about agents writing to the wrong files, overwriting each other's output, or producing incompatible formats. In practice, this almost never happens. Agents are good at following naming conventions when you tell them explicitly what to write and where.

4. User gates save more time than they cost

Every user approval point adds 30-60 seconds of human time. But each gate prevents 5-15 minutes of wasted agent work when the plan or design is wrong. The math is clear: always gate before expensive operations.

5. The iterative loop is the most powerful pattern

Single-pass pipelines (scan once, fix once, done) miss things. The loop pattern (scan, fix, re-scan, repeat until clean) catches issues introduced by fixes, previously masked issues, and interaction effects between changes. It consistently produces cleaner output than single-pass.

6. JAM MCP integration changed bug analysis

Before JAM integration, bug tickets had text descriptions and maybe a screenshot. With JAM MCP, the agent can analyze the actual video recording: see what the user clicked, read the console errors, check the network requests. It turns a vague bug report into a structured, actionable analysis. The auto-detection of JAM links in Jira tickets means this happens automatically — no extra steps.

What's Next

Priority	Feature	Why
High	`/pr` — Smart PR Creator	Closes the loop: Jira ticket → implement → PR → Jira update. Currently the pipeline stops before creating a PR. This is the missing piece.
Medium	Git integration in `/jira`	Auto-create feature branches per ticket, auto-commit with ticket reference, link PRs to Jira.
Medium	Slack notifications	Post to a team channel when a ticket is resolved, with a link to the report and key stats.
Low	CI/CD integration	Trigger builds after implementation, verify deployment, link build status to Jira.
Low	Sprint retrospective (`/retro`)	Analyze completed sprint: tickets resolved, time logged, code quality metrics, churn areas. Auto-generate a retro report.

Full Command Reference

Command	Category	Agents	Output
`/jira {key}`	Project Mgmt	Up to 6	reports/jira-{KEY}-report.html
`/jira sprint`	Project Mgmt	Up to 6/ticket	Sequential processing
`/jira teams`	Project Mgmt	0	Team list (terminal)
`/jam {url}`	Bug Analysis	0 (MCP)	JAM analysis (terminal)
`/tempo`	Time Tracking	0	Jira worklog
`/new-feature`	Feature Dev	4-5	reports/feature-report.html
`/dev-team`	Code Quality	5 (iterative)	reports/master-report.html
`/full-pipeline`	End-to-End	9+	reports/master-report.html + Docker
`/unit-test`	Testing	1-3	reports/unit-test-report.html
`/playwright-test`	Testing	1	reports/playwright-report/
`/deps`	Security	1-2	reports/deps-audit-report.html
`/security-audit`	Security	1	reports/security-audit.md
`/quality-audit`	Quality	1	reports/quality-audit.md
`/fix-all`	Implementation	1	reports/fixes-applied.md
`/test-all`	Testing	1	reports/test-report.md
`/code-analysis`	Review	1	reports/code-analysis.md
`/master-report`	Documentation	1	reports/master-report.html
`/docker-build`	DevOps	1	reports/docker-build-report.md
`/docker-deploy`	DevOps	1	reports/docker-deploy-report.md
`/docker-test`	DevOps	1	reports/docker-integration-test-report.md
`/docker-teardown`	DevOps	1	reports/docker-teardown-report.md
`/create "desc"`	Universal	0-1	reports/feature-plan.html + .claude/unprocessed_reports/
`/create-project "desc"`	Universal	5+	reports/project-plan.html + reports/project-delivery-report.html
`/bug "desc"`	Universal	0-1	.claude/unprocessed_reports/
`/verify`	Testing	0	reports/verification-report.html
`/changelog`	Documentation	0	reports/changelog.html
`/git sync {branch}`	Git Ops	0-1	reports/git-sync-report.md
`/repo-setup {url}`	Onboarding	1-3	reports/repo-setup-guide.html
`/report`	Reporting	0	reports/change-report.html
`/impact-scan "desc"`	Analysis	0	reports/impact-scan-report.html

Reproducing this setup

Everything described in this document runs on Claude Code (Opus 4.6) with no custom infrastructure. The entire system is 28 Markdown files in .claude/commands/, an .env file with Jira credentials, and two MCP server connections (JAM + Paper). There is no server, no database, no deployment pipeline. If you have Claude Code, you can copy the Markdown files and have the same team.

Building an AI Development Team: How Agents Work Together to Ship Software