SWE for AI Orchestrators: Process & Methodology

Series: Overview | Process & Methodology (you are here) | Engineering Thinking | Meta-Skills | Glossary

Who This Is For
The Filter: What to Learn and What to Skip
1. Decomposition
2. Iteration Over Specification
3. Scope Management / MVP
4. Checkpoint Discipline
5. Acceptance Criteria
6. Fixed Time, Variable Scope
7. Spikes
8. Vertical Slicing
9. WIP Limits
What to Skip (and Why)
When to Re-read This

Who This Is For

You’ve started using AI coding tools like Claude Code, Cursor, or Copilot. You can get things built. But you’ve noticed that software engineers get dramatically better results, and the gap isn’t about typing code faster.

The gap is a set of mental models that SWEs absorb over years of practice. This guide extracts the ones that transfer directly to AI orchestration and skips the rest.

The Filter: What to Learn and What to Skip

Most software engineering methodology exists to coordinate teams of engineers: Scrum, SAFe, sprint planning, story points, estimation, CI/CD pipelines, code review checklists. If you’re working solo with AI, you don’t need any of that — it’s overhead for a problem you don’t have.

What transfers is the individual-craft layer: how to structure work, manage complexity, evaluate output, and decide what to build.

One nuance: as AI tools mature, you’ll find yourself orchestrating multiple AI agents simultaneously — one researching, one coding, one reviewing. At that point, some team-coordination concepts become relevant again, reframed. You’re not managing a team of people, but you are managing parallel workers with different contexts and capabilities. The vocabulary of PRs, branches, and code review starts to matter — not as ceremony, but as concepts for keeping parallel work from colliding.

This guide teaches the 9 highest-leverage concepts — the ones that change how you structure every AI session.

The Concepts

1. Decomposition

The principle: Break a vague goal into a sequence of small, testable steps. Each step should produce a verifiable result before you move to the next.

The SWE vocabulary: SWEs call individual steps “tasks” or “tickets.” A task that’s too big to complete in one sitting is called an “epic” and needs to be broken down further. The act of breaking work down is “decomposition” or “task breakdown.” When a task can’t be broken down further, it’s “atomic."

Why it matters for AI orchestration: AI tools work dramatically better with specific, bounded requests than with vague, ambitious ones. “Build me a dashboard that shows sales metrics” will produce mediocre results. “Read this CSV file and print the column names” → verify → “Compute the monthly averages for the ‘revenue’ column” → verify → “Create a bar chart of those monthly averages” → verify will produce exactly what you want, because you catch mistakes at each step.

The failure mode it prevents: The “200 lines of wrong” problem. You ask for something big, the AI produces a wall of code, something is broken somewhere in the middle, and you can’t tell where. You’ve lost more time than if you’d gone step by step.

In practice: Claude Code calls this “Explore first, then plan, then code” and has a dedicated Plan Mode for it. Cursor has a similar “think first” approach. The tool doesn’t matter — the habit does: understand the problem before solving it.

Example: You want to build a personal finance tracker. Instead of describing the whole app, start with: “Create a Python script that reads my bank CSV export and prints each transaction’s date, description, and amount.” Run it. Verify the output looks right. Then: “Add a function that categorises each transaction based on keywords in the description.” Run it. Verify. Build up.

One more thing — start in the middle. Don’t start with the easy, boring parts. Start with the piece that is core, novel, and risky. If that part doesn’t work, nothing else matters. When using AI, the temptation is to scaffold the boring setup first. Instead, prove the hard part works, then fill in the rest.

Try this: Before your next AI session, write down 3-5 sub-steps for what you want to accomplish. Then reorder them: put the riskiest or most uncertain step first. Prompt for one at a time.

2. Iteration Over Specification

The principle: Work in tight loops — small change, verify, next change — rather than writing a detailed specification upfront and hoping the AI gets it all right. (Concept #1 is about how to plan — breaking a goal into steps. This concept is about how to execute — the rhythm of build-verify-build, and what to do when something goes wrong mid-session.)

The SWE vocabulary: This is the core of “agile” development (as opposed to “waterfall," where you specify everything upfront and build it all at once). Each loop is an “iteration.” The verify step is a “feedback loop." SWEs say “ship early, ship often” or “fail fast” — the point is to discover problems quickly, not to avoid them entirely.

Why it matters for AI orchestration: AI makes mistakes. That’s not a bug, it’s a fact you design around. Small iterations mean you catch mistakes immediately, when they’re cheap to fix. Giant prompts mean the mistake is buried somewhere in a large output and you can’t tell where — or worse, you don’t notice it at all.

The failure mode it prevents: The “specification trap.” You spend 30 minutes writing a detailed prompt describing everything you want. The AI misunderstands one assumption on line 3, and everything downstream is wrong. You’ve wasted the 30 minutes of writing AND the time debugging.

The anti-pattern: “Repeated corrections” — when you keep trying to fix something by adding more corrections to the same conversation, you’re polluting the AI’s memory with failed approaches. After two failed corrections, it’s better to start fresh with a clearer prompt. This is the iteration principle applied to conversations: don’t keep patching, restart the loop.

The right heuristic: One task per prompt, regardless of context length. A 50-line prompt that gives rich context for a single decision is fine. A 5-line prompt that asks for three unrelated things is too much.

Example: Instead of “Build a contact form with email validation, a database backend, error handling, and a confirmation email,” try: “Build a basic HTML form with name and email fields that submits to a /contact endpoint.” Verify it works. Then: “Add email format validation.” Verify. Then: “Store submissions in a SQLite database.” Verify. Each step is small enough that if something breaks, you know exactly what caused it.

Try this: In your next AI session, notice when you’re about to write a prompt longer than 3-4 sentences. Ask yourself: “Am I describing one thing or several?” If several, split them.

3. Scope Management / MVP

The principle: Always ask: “What’s the smallest thing I can build that tells me whether this approach works?” Build that first. Expand only after it’s validated.

The SWE vocabulary: “MVP” stands for Minimum Viable Product — the smallest version that delivers value and provides learning. “Scope” is what’s included in a project. “Scope creep” is when a project gradually expands beyond its original boundaries (“while we’re at it, let’s also add…”). SWEs “cut scope” to ship on time — they remove features rather than extending deadlines. Some practitioners use the term scope hammering to reflect the force it takes to repeatedly bang the scope until it fits the time box. Useful questions: “Could we ship without this? What happens if we don’t do this? Is this a must-have or a nice-to-have?”

Why it matters for AI orchestration: AI makes ambitious projects feel deceptively easy to start. You can scaffold a whole app in minutes. But complexity compounds. Every feature interacts with every other feature. An app with 3 features has 3 potential interaction points; an app with 10 features has 45. The AI doesn’t warn you when you’re crossing the complexity threshold where things start breaking in unexpected ways.

The failure mode it prevents: The “90% done, 90% to go” trap. You build most of an app quickly, then spend 5x longer on the last 10% because everything interacts with everything else. If you’d built the MVP first, you’d have known whether the approach was viable before investing all that time.

The anti-pattern: the “kitchen sink session” — mixing too many unrelated tasks into one conversation. Each unrelated concern dilutes the AI’s focus and fills its working memory with noise.

The calibration data (from early vibe-coding reports): Budget 60% of your time for testing and fixing, not building. If you think a project will take 10 hours, you’ll spend 4 hours building and 6 hours debugging. Knowing this changes how you scope: if you only have 2 hours, you can afford to build about 50 minutes’ worth of features.

Example: You want a personal CRM. The MVP isn’t “a full CRM with contacts, notes, reminders, email integration, and a dashboard.” The MVP is “a spreadsheet-like page that lists people’s names and the last time I contacted them.” Build that. Use it for a week. Then decide what to add.

Try this: Before your next project, write down every feature you want. Then cross out half of them. Then cross out half again. What’s left is your MVP. If you can build that in one sitting and it’s useful on its own, you’ve scoped it right.

4. Checkpoint Discipline

The principle: Save working states frequently. Never be more than a few minutes of work away from a version that works. When the AI goes off the rails, rewind to the last good state instead of trying to fix the mess.

The SWE vocabulary: This is what “version control” is for. Git is the tool SWEs use — it tracks every change to every file, so you can always go back. A commit is a saved snapshot (“checkpoint”). A branch is a parallel version where you can try something risky without affecting the main version. A pull request (PR) is a request that the maintainer of the main branch “pull” your changes in — it’s how teams review changes before they become permanent. Rebase and merge are two ways to combine branches.

You don’t need to learn git commands (the AI will handle that). But understanding the concepts — snapshots, branches, rollback — changes how you think about risk. Every experiment is safe if you can undo it.

Why it matters for AI orchestration: AI is unreliable in ways that are hard to predict. It might work perfectly for 20 minutes, then make a change that breaks everything. Without checkpoints, you’re stuck trying to manually undo the damage. With checkpoints, you rewind 30 seconds and try a different approach. This makes you bolder — you can say “try this risky thing” knowing the cost of failure is near zero.

In practice: Most AI coding tools have some form of checkpoint or undo. Claude Code creates a checkpoint on every action; Cursor has similar rollback. The mantra: “Make it work, make it right, make it fast” — in that order, and often you stop after step 1.

The failure mode it prevents: The “dig yourself deeper” problem. The AI makes a mistake, you try to fix it, the fix creates a new problem, you try to fix that — three iterations later, the code is a mess and you can’t remember what the working version looked like. Rewind, don’t repair.

Example: You’re building a script that processes photos. It works for resizing. You ask the AI to add watermarking, and suddenly nothing works — not even resizing. Instead of debugging the watermark code, rewind to the version where resizing worked. Then add watermarking from a clean state, with a more specific prompt.

Try this: Before asking the AI to make a risky change, say “Let’s commit what we have first” (or in Claude Code, just note that a checkpoint exists). If the change breaks things, rewind instead of debugging.

5. Acceptance Criteria

The principle: Before asking the AI to build something, write down how you’ll know it works. “I’ll know this is done when ___.”

The SWE vocabulary: “Acceptance criteria” are the conditions that must be true for work to be considered complete. In agile, a “user story” follows the format: “As a [who], I want [what], so that [why].” The acceptance criteria are the testable conditions attached to that story. “Definition of done” is the team’s checklist for what “complete” means (tests pass, documentation updated, reviewed by someone).

A related concept is TDD (Test-Driven Development) — writing the test before writing the code. You define what “correct” looks like, then build until the test passes. You don’t need to write literal test code, but the principle — define success before building — is the single highest-leverage habit you can adopt.

Why it matters for AI orchestration: Giving the AI a way to verify its own work is the single highest-leverage thing you can do. When you provide clear success criteria, the AI can self-check. When you don’t, it guesses what “done” means, and its guess often diverges from yours.

The failure mode it prevents: The “it looks right” trap. The AI produces something that seems to work on first glance, but you haven’t defined what “working” means, so you don’t test the edge cases. You discover the bugs later, when they’re harder to fix and you’ve lost the context.

The anti-pattern: “Trust-then-verify gap” — shipping without verification. The fix is simple: define verification before implementation.

Example: Before asking the AI to build an email parser, write: “I’ll know this works when: (1) it extracts the sender, subject, and date from a .eml file, (2) it handles emails with no subject line without crashing, (3) the output is a JSON object I can pipe to another tool.” Now you have three testable conditions, and you can verify each one.

Try this: Complete this sentence before your next prompt: “I’ll know this works when ___.” If you can’t, you’re not ready to build yet — you’re still figuring out what you want.

6. Fixed Time, Variable Scope

The principle: Instead of asking “how long will this take?”, ask “how much time is this worth?” Set a time budget, then adjust your ambition to fit within it.

The SWE vocabulary: This comes from Basecamp’s Shape Up methodology. They call the time budget an “appetite” — how much time you’re willing to spend, not an estimate of how long it will take. The key constraint is a circuit breaker: if it’s not working within the time box, you stop and re-evaluate the approach, rather than extending the deadline. “Fixed time, variable scope” is the opposite of how most people work (“fixed scope, variable time” — which is how projects run over budget).

Why it matters for AI orchestration: AI makes it dangerously easy to keep going. “Just one more feature,” “let me fix this one more thing.” Without a time box, you can burn an entire afternoon on something that should have taken 30 minutes — or that should have been abandoned after 30 minutes. Your time has a dollar value. Every hour spent polishing a side project is an hour not spent on something higher-leverage.

The failure mode it prevents: The “sunk cost spiral.” You’ve spent 2 hours and it’s 80% working. Surely another 30 minutes will finish it? But the last 20% takes as long as the first 80% (this is almost always true in software). If you’d set a 2-hour appetite, you’d either ship the 80% version or decide the remaining 20% isn’t worth the time.

The mindset shift: Shape Up (Ch.14) says: when evaluating your work, compare down to baseline, not up to ideal. Don’t ask “is this perfect?” Ask “is this better than what exists today?” There’s always a better version. But if what you’ve built in 1 hour solves a problem that previously took 30 minutes of manual work every week, ship it. This is especially important with AI — you can always ask for one more improvement, but the question is whether the current version already delivers value.

Example: You want to automate your expense reporting. Set an appetite: “I’ll spend 1 hour on this.” In that hour, you might get a script that reads receipts from a folder and generates a CSV. That’s useful. You might not get PDF receipt parsing, multi-currency support, or email integration — and that’s fine. Ship the 1-hour version and use it. If it’s valuable enough, invest more time later.

Try this: Before your next AI project, write down your appetite: “I’m willing to spend ___ on this.” Set a timer. When it goes off, assess: is this working? If yes, commit and stop. If no, stop and reconsider the approach.

Further reading: Shape Up by Basecamp — free at basecamp.com/shapeup. Short, practical, and the best single book on product thinking for builders.

7. Spikes

The principle: When you don’t know if an approach will work, don’t commit to building it. Instead, run a small, time-boxed investigation first. If the spike fails, you’ve lost 30 minutes, not 3 days.

The SWE vocabulary: The term comes from Extreme Programming (XP). A “spike” is a short, throwaway experiment to answer a specific question: “Can this library do what I need?” “Is this API fast enough?” “Will this approach work on my data?” Spikes are estimated, executed, and demonstrated like any other task — they produce knowledge, not shippable code. The output is a decision: proceed, pivot, or abandon.

Why it matters for AI orchestration: AI makes it easy to jump straight into building. You describe what you want, the AI starts producing code, and 45 minutes later you discover the entire approach is unworkable — the API doesn’t support what you need, the library can’t handle your file format, the data is shaped differently than you assumed. A spike catches this in 10 minutes.

The failure mode it prevents: The “sunk cost build.” You invest hours into an approach before discovering a fundamental blocker. Because you’ve already invested the time, you try to work around the blocker instead of choosing a different approach. The workaround creates complexity. The complexity creates bugs. A spike at the start would have surfaced the blocker before any of that investment.

The key discipline: Spikes are time-boxed and the output is a decision, not a product. “I’ll spend 30 minutes determining if this approach is feasible” is a spike. “I’ll start building and see how it goes” is not — that’s just building without a plan.

Example: You want to build a tool that reads handwritten notes from photos and converts them to text. Before building the whole pipeline, spike the OCR step: “Take this one photo of my handwriting and try three different OCR approaches. Which one produces usable output?” If none of them work on your handwriting, you’ve saved yourself from building an elaborate pipeline around a core capability that doesn’t exist.

Try this: Next time you’re about to start building something that depends on a technology or approach you haven’t used before, stop. Spend 15-30 minutes testing just the uncertain part in isolation. Only proceed to the full build if the spike succeeds.

8. Vertical Slicing

The principle: Build a thin, end-to-end version first, rather than completing one layer at a time. Prove the whole flow works before making any part of it good.

The SWE vocabulary: A “vertical slice” cuts through all layers of a system — user interface, logic, data storage — to deliver one complete feature. The opposite is “horizontal slicing," where you build one layer at a time: finish the database schema, then the backend API, then the frontend. Horizontal slicing feels orderly but delays integration — you don’t discover that the layers don’t fit together until the end. The related concept of a “walking skeleton” is an even thinner version: the smallest possible implementation that connects all the pieces and actually runs, even if every piece is crude.

Why it matters for AI orchestration: AI is excellent at building individual layers. Ask it to design a database schema and it’ll produce a thorough one. Ask it to build an API and it’ll be well-structured. Ask it to create a frontend and it’ll look polished. But when you try to connect them, nothing fits — the API returns data in a format the frontend doesn’t expect, the database schema is missing a field the API needs. Vertical slicing forces integration early, when mismatches are cheap to fix.

The failure mode it prevents: The “integration nightmare.” You build three beautiful layers in three separate sessions. On day four you try to connect them and spend longer debugging the integration than you spent building the individual pieces. Each layer made assumptions about the others that turned out to be wrong.

Example: You’re building a personal dashboard that shows your exercise data. Horizontal approach: design the database → build the data import → build the API → build the chart. You might spend a full session on the database schema before discovering the data import doesn’t give you what you expected. Vertical approach: “Import one day of exercise data from one CSV into a SQLite table, query it, and display it as a single number on a webpage.” Ugly, minimal, but the entire pipeline works end-to-end. Now you know the data format, the query, and the display all connect. Improve each layer from there.

Try this: Next time you’re planning a multi-step build, ask: “What’s the thinnest version that touches every layer?” Build that first, even if each layer is embarrassingly basic.

9. WIP Limits

The principle: Explicitly cap how many things you’re working on at the same time. Finish things before starting new things. The most productive state is one task in progress, not five.

The SWE vocabulary: “WIP” stands for Work In Progress. A “WIP limit” is an explicit maximum on how many tasks can be in progress simultaneously. This comes from Kanban, a lean methodology that visualises work as cards on a board moving from “To Do” through “In Progress” to “Done.” The key Kanban insight is that limiting WIP increases throughput — counterintuitively, you finish more by starting less. The reason: context switching has a real cost. Every time you switch tasks, you lose 10-20 minutes reloading context (for humans) or an entire conversation’s worth of context (for AI).

Why it matters for AI orchestration: AI sessions are especially vulnerable to scope creep. You start fixing a bug, notice an unrelated improvement, ask the AI to do that too, notice a third thing, and 90 minutes later you have three half-finished tasks and nothing committed. Each unrelated tangent pollutes the AI’s context with information irrelevant to the other tasks (see Context Management). A WIP limit of one forces you to finish or explicitly park the current task before starting a new one.

The failure mode it prevents: The “everything in progress, nothing done” trap. You have six projects with uncommitted changes, you can’t remember the state of any of them, and starting any session requires 15 minutes of “where was I?” A WIP limit of one or two means you always know exactly what you’re working on and your checkpoints are clean.

In practice: This doesn’t mean you can only care about one project. It means at any given moment, only one or two things are actively being worked on. Everything else is either queued (hasn’t started) or parked (started, checkpointed, waiting for its turn). The discipline is: when you feel the urge to start something new mid-session, write it down and keep going on the current task.

Example: You’re building an expense tracker and halfway through you think “I should also set up that backup script.” Instead of switching, you write “backup script” on your to-do list and continue with the expense tracker. You commit the working expense tracker. Then you start the backup script with a fresh context. Both tasks get done faster than if you’d interleaved them.

Try this: At the start of your next session, write down the one thing you’re going to finish. When you’re tempted to branch off, write the new idea down and return to the task. Commit before starting anything else.

What to Skip (and Why)

These practices solve coordination problems across teams of engineers. If you’re working solo with AI, the overhead isn’t worth it — yet.

Practice	What it solves	Why you can skip it
Scrum / SAFe	Synchronising 5-50 engineers on shared priorities	You have one brain and one AI. Your priorities are in your head.
Sprint planning / estimation	Forecasting delivery timelines for stakeholders	You’re building for yourself. Just start and timebox (see concept #6).
CI/CD pipelines	Automated testing and deployment for production software	Relevant when you’re deploying services. Not for scripts and personal tools.
Code review checklists	Catching bugs before they reach production in a team	Your “review” is running the code and seeing if it works.
Story points	Abstracting effort estimates for team-level planning	A solution to a problem you don’t have.

The nuance: As you orchestrate more AI agents in parallel (one researching, one coding, one reviewing), some of these concepts become relevant again. The PR model (propose changes → review → merge) is useful when your “reviewer” agent checks your “coder” agent’s work. Branching matters when two agents are modifying the same files. You’ll know when you need these because the coordination problems they solve will start actually happening to you.

When to Re-read This

Diagnostic triggers — if a session goes sideways, it’s usually one of these:

“I spent hours and have nothing to show for it” → Re-read #3 (Scope Management) and #6 (Fixed Time, Variable Scope). You probably didn’t set an appetite or define an MVP.
“The AI produced a mess and I can’t tell what went wrong” → Re-read #1 (Decomposition) and #2 (Iteration). Your steps were too big.
“It seemed to work but then broke in a way I didn’t expect” → Re-read #5 (Acceptance Criteria). You didn’t define what “working” means.
“I kept fixing things and it got worse” → Re-read #4 (Checkpoint Discipline). Rewind, don’t repair.
“I started something new before finishing the last thing” → Re-read #9 (WIP Limits). Finish or park before switching.
“I built each piece separately and now they don’t fit together” → Re-read #8 (Vertical Slicing). Build thin end-to-end first.
“I committed to an approach and it turned out to be wrong” → Re-read #7 (Spikes). Investigate before committing.
“The AI started giving worse and worse responses” → Re-read Context Management. Your context is polluted.

Next: Engineering Thinking — technical debt, separation of concerns, state, debugging, AI reliability

Contents