SWE for AI Orchestrators: Meta-Skills

Series: Overview | Process & Methodology | Engineering Thinking | Meta-Skills (you are here) | Glossary

15. Post-Incident Learning
16. Context Management
17. Prompt-as-Spec

15. Post-Incident Learning

The principle: When something goes wrong, capture the mechanism that allowed it — not just the fix. “Only get something wrong once.”

The SWE vocabulary: A “postmortem” (or “post-incident review”) is a structured analysis of what went wrong after an incident. The key discipline is separating what happened (the timeline) from what allowed it to happen (the mechanism). SWEs also distinguish between the root cause (the actual source) and contributing factors (things that made it worse). “Blameless postmortems” focus on systems, not people — the question isn’t “who screwed up?” but “what about our process allowed this to happen?” A “retrospective” is a lighter version: a regular team check-in on what’s working and what isn’t.

Why it matters for AI orchestration: You’ll make the same mistakes repeatedly unless you build a feedback loop. The AI won’t remember your past mistakes across sessions (unless you explicitly tell it). And the mistakes that matter most aren’t the ones where code crashed — those are obvious. The expensive mistakes are process mistakes: spending 3 hours on something that should have been scoped to 30 minutes, or trusting AI output that should have been verified, or not checkpointing before a risky change.

The failure mode it prevents: The “Groundhog Day” problem. You hit the same frustration every few weeks — the AI clobbers a file you cared about, or a project spirals out of scope, or you deploy something broken. Each time you fix it in the moment but don’t change anything about how you work. The same situation triggers the same mistake.

The method: After something goes wrong (or surprisingly right), spend 2 minutes answering:

What happened? (The facts, not the narrative.)
What allowed it to happen? (The mechanism — what check was missing, what assumption was wrong?)
What would prevent it next time? (A specific, checkable change — not “be more careful.”)

The third answer is the valuable part. “Be more careful” is not a mechanism — it’s a wish. “Always run the script on test data before real data” is a mechanism. “Add a dry-run flag to every destructive script” is even better — it changes the system, not just your intentions.

Example: You ask the AI to reorganise your project’s file structure. It moves files around but breaks all the internal links between them. You fix the links manually — takes an hour. Postmortem: What happened? The AI moved files without updating references. What allowed it? You didn’t tell it about the internal links, and you didn’t check before approving the changes. What would prevent it? Before any file-moving operation, prompt: “List all references to these files across the project and update them too.” And checkpoint before the operation so you can rewind if it misses some.

Try this: Next time something goes wrong in an AI session, write down one sentence: “This happened because ___.” If the blank is “I wasn’t careful enough,” push harder. What specific check or step would have caught it?

16. Context Management

The principle: Your AI’s context window is finite and degrades as it fills. Manage it like a scarce resource — because it is one.

The SWE vocabulary: A “context window” is the amount of text the AI can hold in its working memory at once. Think of it as a desk: the bigger the desk, the more documents you can spread out and reference simultaneously. But even a big desk gets cluttered. When the context fills up, the AI starts losing track of things mentioned earlier in the conversation — it’s like papers sliding off the edge of the desk.

“Token” is the unit of measurement — roughly ¾ of a word. Current AI models have context windows of 100K-1M tokens, but performance degrades well before the limit. Many practitioners report that quality noticeably drops well before the window is full — some estimate around 50-60% fill.

Why it matters for AI orchestration: This is arguably the single most important concept specific to AI orchestration, because it’s invisible. You can’t see the context filling up. The AI doesn’t warn you. It just gets gradually worse — more forgetful, less coherent, more likely to contradict its earlier work. If you’ve ever noticed the AI giving worse responses late in a long conversation, this is why.

The anti-patterns:

Kitchen sink session — doing unrelated tasks in one conversation. Your photo script context is polluting your website debugging context.
Infinite exploration — asking the AI to research broadly without constraints. It fills the context with tangential information.
Repeated corrections — each failed attempt stays in context, confusing the AI’s model of what you actually want. After two failed corrections, starting fresh is almost always faster than continuing.
Pasting entire files when you only need a section. Context is a budget — spend it on what matters.

The techniques:

Start fresh between unrelated tasks. A new conversation costs nothing and gives the AI a clean desk.
Use sub-agents or side conversations for research so exploration doesn’t pollute your implementation session.
Compact or summarise when a conversation gets long. Some tools have built-in commands for this.
Front-load context. Put the most important information at the start of the conversation (system instructions, project context files) where the AI attends to it most strongly.
Be specific about what to reference. “Read lines 40-60 of server.py” is better than “look at the server code” which might load the whole file.

The analogy: Context management is to AI orchestration what working memory is to human cognition. You can only hold about 7 things in your head at once. When you try to juggle too many tasks, you drop things. The same is true for AI — the more you load into one conversation, the more it “forgets.” Treat conversation boundaries like tab groups in your browser: separate topics into separate contexts.

Example: You’re debugging a server issue and also want to draft an email. If you do both in one session, the server error logs and code snippets fill up context that the email draft doesn’t need. The AI might even start using technical language in your email because it’s primed by the debugging context. Two separate sessions: one for debugging, one for the email. Both get better results.

Try this: If a conversation has been going for more than 30 minutes, or if the AI starts repeating itself or forgetting things you mentioned earlier, start a fresh session. Copy over only the essential context (a 3-line summary of where you are and what’s left to do).

17. Prompt-as-Spec

The principle: The best prompts function like engineering specifications — they define what success looks like, provide relevant context, and constrain the solution space. The difference between a mediocre result and an excellent one is almost always the quality of the prompt, not the capability of the AI.

The SWE vocabulary: A “specification” (spec) is a document that describes exactly what a piece of software should do — its inputs, outputs, constraints, and success criteria. A “requirements document” describes what needs to be built from the user’s perspective. A “design document” describes how it will be built from the engineer’s perspective. SWEs spend significant time on specs before writing code because a clear spec prevents rework. The same principle applies to prompts.

“Constraints” are boundaries on the solution. In engineering: “must run on a Raspberry Pi” or “must handle 10,000 users” or “must work offline.” In prompting: “use Python, not JavaScript” or “keep it under 50 lines” or “don’t use any external libraries.” Constraints are powerful because they narrow the solution space — the AI has fewer wrong directions to wander in.

Why it matters for AI orchestration: A vague prompt gets a vague result. “Make me a website” is a feature request — it conveys intent but not enough information to produce what you want. “Build a single-page site with my name, a bio paragraph, and links to three projects. Use plain HTML and CSS, no frameworks. Make it look clean and minimal — think Hacker News, not Apple” is a spec — it conveys intent plus constraints plus examples plus aesthetic direction. The AI can execute the second prompt on the first try. The first prompt will require 5 rounds of “no, not like that.”

The spectrum from bad to good:

Feature request: “Build me a dashboard.” (What dashboard? For what data? What should it show?)
Wish: “Build me a dashboard that shows my sales data.” (Getting warmer — but which metrics? What time range? What data source?)
Brief: “Build a dashboard that reads from sales.csv and shows monthly revenue as a bar chart and top 10 products as a table.” (This is workable.)
Spec: “Build a dashboard: (1) Read sales.csv (columns: date, product, amount, quantity). (2) Show a bar chart of monthly revenue (sum of amount, grouped by month). (3) Show a table of top 10 products by total revenue. (4) Use Python with matplotlib for the chart and print the table to stdout. (5) Handle missing or malformed rows by skipping them and printing a count of skipped rows at the end.” (This gets built correctly on the first try.)

The “let AI interview you” pattern: For larger features, don’t try to write the perfect spec yourself. Instead, describe what you want at a high level and ask the AI to interview you: “I want to build X. Before you start, ask me clarifying questions about requirements, constraints, and edge cases.” The AI will surface questions you wouldn’t have thought to answer — and the answers become the spec.

Example: Instead of “write a script that backs up my photos,” try: “Write a Python script that: (1) takes a source directory and destination directory as command-line arguments, (2) copies all .jpg and .raw files from source to destination, preserving the folder structure, (3) skips files that already exist in the destination (compare by filename and size), (4) prints a summary at the end: X files copied, Y files skipped, Z errors. If a file can’t be copied, log the error and continue — don’t crash.”

Try this: Before your next prompt, add one constraint and one success criterion that you wouldn’t have included otherwise. “Use Python 3, no external libraries” (constraint). “I’ll know this works when it processes my test folder of 50 photos in under 10 seconds” (success criterion). Notice how the result improves.

Previous: Engineering Thinking — technical debt, separation of concerns, state, debugging, AI reliability Next: Glossary — ~50 terms explained in plain English

Contents

15. Post-Incident Learning

16. Context Management

17. Prompt-as-Spec