SWE for AI Orchestrators: Engineering Thinking

Series: Overview | Process & Methodology | Engineering Thinking (you are here) | Meta-Skills | Glossary

10. Technical Debt
11. Separation of Concerns
12. State and Data Flow
13. Debugging as a Discipline
14. AI Reliability Calibration

10. Technical Debt

The principle: Not all shortcuts are equal. The question is: “Am I going to touch this again?” If yes, invest in doing it properly. If no, the shortcut is free.

The SWE vocabulary: “Technical debt” is the cost of shortcuts that save time now but create work later — like financial debt, it accrues interest. A quick hack that works today becomes a bug magnet next month when you try to build on top of it. “Refactoring” is the act of paying down tech debt: restructuring code to be cleaner without changing what it does. A “code smell” is a sign that something is messy — not broken, but likely to cause problems. “Legacy code” is old code that still runs but nobody fully understands anymore; changing it is risky because you can’t predict the side effects.

Why it matters for AI orchestration: AI generates code fast. That speed is seductive — you can build something in an hour that would have taken days. But AI-generated code is often quick-and-dirty by default. It works for the immediate case but falls apart when you try to extend it. If you’re building a one-off script you’ll run once and throw away, that’s fine — take on all the debt you want. But if you’re building something you’ll use every week, or something other people will use, you need to tell the AI: “Make this maintainable, not just functional.”

The failure mode it prevents: The “house of cards” problem. You build feature A with a shortcut. Feature B works around A’s shortcut. Feature C depends on B’s workaround. Now you can’t change A without breaking B and C. The AI doesn’t track these dependencies across sessions — that’s your job.

The mental model: Think of your projects in two categories: throwaway (scripts, experiments, one-off data processing) and infrastructure (tools you’ll reuse, things other people will touch, anything you’ll build on top of). Throwaway code gets no investment in quality — ship it messy, use it, delete it. Infrastructure gets explicit instructions to the AI: “Add error handling. Use clear variable names. Write comments explaining why, not what.”

Example: You ask the AI to write a script that renames your photo files based on EXIF date. If you’re doing this once for a specific folder, let the AI write it however it wants — who cares if the code is ugly. But if you want a tool you’ll use after every shoot, say: “Make this a reusable script with clear argument handling, error messages for files without EXIF data, and a dry-run mode so I can preview changes before they happen.”

Try this: Before your next AI prompt, ask yourself: “Is this throwaway or infrastructure?” If throwaway, let speed win. If infrastructure, add one sentence to your prompt about quality: “Make this maintainable” or “Handle edge cases” or “Add comments explaining the logic.”

11. Separation of Concerns

The principle: Keep things that change for different reasons in different places. Don’t mix unrelated responsibilities into the same file, function, or conversation.

The SWE vocabulary: “Separation of concerns” is the idea that each piece of a system should handle one thing. “Modularity” means building in self-contained pieces that can be changed independently. “Coupling” is how much separate modules depend on each other — tightly coupled code means changing one module breaks another. “Cohesion” is related but not the opposite: it’s how closely the parts within a single module belong together. The goal is low coupling (modules should be independent of each other) and high cohesion (each module should do one thing well). Separation of concerns gets you both at once: put related things together, keep unrelated things apart. The “single responsibility principle” is another way of saying the same thing — a module should have one reason to change.

Why it matters for AI orchestration: This concept applies at every level of working with AI:

Sessions: Don’t mix unrelated projects in one conversation. Research for your website and debugging your photo script use different context. Mixing them wastes the AI’s limited working memory on irrelevant information.
Prompts: “Set up the database AND design the homepage” is two concerns. The AI will do both adequately but neither well. Split them.
Agents: If your AI tool supports sub-agents or parallel sessions, use them. One agent researches, another implements. Their contexts don’t pollute each other.
Code: When the AI generates a 200-line file that handles user input, data processing, and output formatting, ask it to split these into separate functions or files. If you later want to change the output format, you shouldn’t have to worry about accidentally breaking the input handling.

The failure mode it prevents: The “tangled mess” problem. Everything is connected to everything else, so you can’t change anything without unexpected side effects. The AI adds a new feature and breaks an existing one because they shared code that should have been separate.

Example: You’re building a tool that fetches weather data, processes it, and emails you a daily summary. Instead of asking the AI to build it as one monolithic script, ask for three pieces: (1) a function that fetches weather data and returns it as structured data, (2) a function that formats that data into a readable summary, (3) a function that sends an email with a given body. Now you can change your email provider without touching the weather logic, or swap the data source without changing the formatting.

Try this: Next time the AI generates a long file, ask: “Can you split this into separate functions/files by responsibility?” If it’s already in one conversation with mixed topics, start a fresh session for the next topic.

12. State and Data Flow

The principle: Almost every bug is either “wrong data went to the wrong place” or “state got out of sync.” Understanding where data lives and how it moves lets you diagnose problems instead of staring blankly.

The SWE vocabulary: “State” is stored information that can change over time. A user’s login status is state. The items in a shopping cart are state. A counter that tracks how many times a button was clicked is state. State lives somewhere — in memory (gone when the program stops), in a file (persists on disc), in a database (persists and can be queried), or in a remote service accessed via an API.

“Client” and “server” describe the two sides of most web and networked software. The client is what runs on the user’s device (a web browser, a mobile app). The server is what runs remotely and responds to requests. When you load a webpage, your browser (client) sends a request to a server, which sends back HTML. This back-and-forth is the request/response cycle.

A database is a structured store for persistent data. SQL databases (PostgreSQL, SQLite) store data in tables with rows and columns. NoSQL databases (MongoDB, Redis) store data in other formats. When the AI sets up a project with a database, data flows: user action → client → server → database → server → client → screen.

Why it matters for AI orchestration: “It works on my machine” is the most common frustration in software, and it’s almost always a state problem. The AI builds something that works perfectly in one context but breaks in another because data is coming from a different source, a file is missing, or state from a previous session is gone. If you can trace the data flow — “where does this data come from? where does it go? what could be different between my test and the real situation?” — you can diagnose most problems.

The failure mode it prevents: The “it worked yesterday” mystery. Something breaks and you have no idea why. Usually the answer is: the data changed, the environment changed, or state from a previous run is either missing or stale. If you understand data flow, you can ask the right questions instead of guessing.

Example: You build a script that reads a CSV, processes it, and writes results to a new file. It works perfectly. Next week you run it again and it crashes. Without understanding data flow, you’d paste the error into the AI and hope. With it, you ask: “What changed? Is the CSV format the same? Is the file in the expected location? Does the output directory exist?” You check the CSV and find the bank added a new column. The data changed — now you know exactly what to tell the AI to fix.

Try this: Next time something breaks, before pasting the error to the AI, ask yourself: “Where does the data come from, and what could be different about it right now?” Spend 30 seconds tracing the flow. You’ll often find the answer yourself — or give the AI a much more useful prompt than “it’s broken.”

13. Debugging as a Discipline

The principle: Debugging is a systematic process of narrowing down where things went wrong — not guessing, not pasting the error into a new prompt and hoping.

The SWE vocabulary: A stack trace is the error output that shows which functions were called, in what order, leading to the crash — it’s a map of where things went wrong. An error message usually contains three useful pieces: what went wrong, where it happened, and sometimes why. A reproducible bug is one you can trigger reliably — these are much easier to fix than intermittent ones. A root cause is the actual source of the problem, as opposed to the symptom (what you see).

Why it matters for AI orchestration: When something breaks, the quality of what you tell the AI determines the quality of the fix. “It doesn’t work” gets you a generic response. “It crashes on line 47 with a KeyError when the ’email’ field is missing from the input” gets you a targeted fix in seconds. The skill isn’t writing code — it’s reading errors and describing problems precisely enough that the AI can act on them.

The failure mode it prevents: The “shotgun debugging” loop. Something breaks, you paste the error, the AI changes something, a different thing breaks, you paste that error, the AI changes something else — five iterations later you’ve introduced three new bugs and fixed none. Systematic debugging is faster even though it feels slower.

The method (the scientific method, applied to bugs):

Read the error message. Actually read it. The last line of a stack trace usually says what went wrong. The line numbers tell you where. Most people skip this step.
Reproduce it. Can you make it happen again? If yes, you have a handle on it. If no, note what’s different between when it works and when it doesn’t.
Form a hypothesis. “I think it’s crashing because the input file has empty rows.” Don’t start fixing until you have a theory.
Test the hypothesis. Check: does the input file have empty rows? If yes, your theory is confirmed — now you have a precise request for the AI. If no, form a new hypothesis.
Isolate the problem. If you can’t figure it out, simplify. Remove features until it works, then add them back one at a time until it breaks again. The last thing you added is the problem.

Example: Your web scraping script used to work but now returns empty results. Instead of “fix my scraper,” you investigate: the script runs without errors (so it’s not crashing), but the output is empty (the symptom). You check: is the website still accessible? Yes. Has the page structure changed? You view the page source and see they redesigned — the CSS class names are different. Now you tell the AI: “The page at [URL] was redesigned. The data used to be in divs with class ‘product-item’ but that class no longer exists. Find the new structure and update the selectors.”

Try this: Next time something breaks, resist the urge to immediately paste the error into a new prompt. Instead: (1) read the error message for 15 seconds, (2) form one hypothesis about what’s wrong, (3) include that hypothesis when you ask the AI for help. “I think the problem is X because Y — can you check?”

14. AI Reliability Calibration

The principle: AI is highly reliable for some tasks and unreliable for others. Knowing which is which saves you from both over-trusting and under-trusting.

The SWE vocabulary: This doesn’t come from traditional SWE — it’s specific to working with AI. But SWEs have a related concept: knowing the limits of your tools. A hammer is great for nails and terrible for screws. AI coding tools have the same kind of profile: excellent at some things, dangerously confident at others.

Where AI is reliable (verify quickly — these are unlikely to have problems):

Boilerplate and standard patterns — setting up a new project, writing CRUD (Create, Read, Update, Delete) operations, generating configuration files. AI has seen millions of these and reproduces them accurately.
Data transformation — parsing CSV files, converting between formats, extracting data from structured sources. Repetitive, rule-based work is AI’s sweet spot.
Well-documented tools and libraries — if there are thousands of examples of something on the internet, the AI has seen them and can reproduce the patterns reliably.
Explaining code and concepts — AI is excellent at reading code and explaining what it does in plain language. Use this freely.

Where AI is unreliable (verify carefully, or do it yourself):

Novel architecture — if you’re building something that doesn’t look like anything the AI has seen before, it will confidently produce something that looks right but has subtle structural problems.
Security — AI generates code with security vulnerabilities regularly. It will store passwords in plain text, expose API keys, or skip input validation unless explicitly told not to. Never trust AI-generated code to be secure without review.
Performance at scale — code that works for 100 records may break at 100,000. AI optimises for correctness, not for performance, unless specifically prompted.
Complex state management — when multiple things need to stay in sync (user sessions, database transactions, concurrent operations), AI tends to miss edge cases.
Anything that requires knowing your specific context — your business rules, your deployment environment, your users’ specific needs. AI can only work with what you tell it.

The reinforcement insight (from Geoffrey Huntley): AI becomes more reliable when you give it feedback loops — fast ways to know if the output is correct. Type systems, automated tests, linters, and strict formatting requirements all constrain the AI’s output space. The more guardrails you provide, the more reliable the output. This is why Acceptance Criteria is so powerful — it’s a feedback loop.

Example: You ask the AI to build a login system. It will happily produce one that works — you can log in, see your account, log out. But it might store passwords as plain text, allow SQL injection in the login form, or forget to invalidate sessions on logout. These are the kinds of things AI gets wrong silently. For anything security-related, either use established libraries (tell the AI: “use bcrypt for password hashing”) or have someone who understands security review the output.

Try this: Before handing off a task to the AI, mentally categorise it: is this boilerplate (trust it) or novel/security-critical (verify carefully)? Adjust your verification effort accordingly. Don’t spend 20 minutes reviewing a standard project setup. Do spend 20 minutes reviewing anything that handles user data or money.

Previous: Process & Methodology — 9 process & methodology concepts Next: Meta-Skills — post-incident learning, context management, prompt-as-spec

Contents

10. Technical Debt

11. Separation of Concerns

12. State and Data Flow

13. Debugging as a Discipline

14. AI Reliability Calibration