Pipeline from requirements to code, each stage catching what the previous one missed

The Full Pipeline: Five Stages from Requirements to Code

This is article 6 in “Taming AI Coding Agents with TDD.” The first four covered requirements disambiguation with the GEAR protocol, tech spec guardrails, test documents before test code, and convergent review loops. Article 5 upgraded the review layer with procedural justice. This one strings everything together into a single pipeline you can actually run. The Complete Pipeline Product Design → Tech Spec → Test Plan → Test Code → Production Code ↑ ↑ ↑ ↑ ↑ Ralph Loop Ralph Loop Ralph Loop Ralph Loop Ralph Loop Each stage has its own inputs, outputs, and review rules: ...

2026-04-30 · 9 min · Alex Wang
Procedural justice encoded: adversarial review where every decision is verifiable

Procedural Justice Encoded: Making Every Step of AI Review Verifiable

My Ralph Loop review mechanism had a hidden problem. v0.2’s flow was straightforward: find issues → fix → confirm convergence. In part 4 of this series, I mentioned that if the creator disagrees with the reviewer’s judgment, they can present evidence in the next round for reassessment. But that was one sentence in the rules — not a formal protocol. Nobody was checking whether the review itself was sound. The reviewer might mislabel severity. The main agent might blindly accept bad suggestions. ...

2026-04-30 · 10 min · Alex Wang
Ralph Loop: multi-round convergent review, two consecutive clean rounds to exit

AI Errors Converge, They Don't Randomize: The Review Loop That Catches What You Miss

in “Taming AI Coding Agents with TDD.” The first covered test-driven requirements anchoring, the second introduced the GEAR protocol for disambiguation, the third laid out what the tech spec must nail down. This one covers the last line of defense: review. The Problem the Tech Spec Cannot Solve Article 3 ended with an uncomfortable admission. The PRD locks down “what to build.” The tech spec locks down “how to build it.” Together they compress the AI’s improvisation space down to implementation details. That is a huge improvement. ...

2026-04-29 · 11 min · Alex Wang
PRD to tech spec: documents as guardrails, not burden

Why PRD Alone Is Not Enough: What the Tech Spec Must Cover in AI-Assisted Development

in the “Taming AI Coding Agents with TDD” series. The first covered test-driven requirements anchoring, the second covered the GEAR protocol for requirements disambiguation. This one fills the gap between them: after the PRD is done, what must the tech spec cover? Requirements Locked, Code Still Wrong Before the second Aristotle refactor, I spent two full days writing requirements. Following the structured approach from the previous article, I captured every acceptance criterion, boundary condition, error path, and platform constraint[1]. The AI consumed the document, passed all 37 static assertions plus end-to-end tests. The codebase was split into four files by responsibility. Information flow was switched from push to pull. ...

2026-04-29 · 11 min · Alex Wang
Structured requirements vs one-liner: the trap of AI auto-filling gaps

Why AI-Assisted Development Needs Structured Requirements First: Lessons from the GEAR Protocol

in the “Taming AI Coding Agents with TDD” series. The first article covered requirement anchoring at the test layer[1]. Tests assume clear requirements. This one goes upstream — to the practice of disambiguating requirements before a single line of code gets written. The v1 Lesson: One-Line Requirement, 371 Lines of Pollution Aristotle v1 had no GEAR protocol[2]. No role separation. The entire reflection feature lived in a single 371-line SKILL.md. The requirement was roughly one sentence: the system should detect when a user corrects an AI mistake, then generate a reusable rule. ...

2026-04-25 · 8 min · Alex Wang
Requirement anchoring: test plan before test code before business code

Write Test Plans Before Test Code: Requirement Anchoring in AI Development

This is the first article in the series “Taming AI Coding Agents with TDD.” The series has one thesis: AI-assisted development demands stricter process discipline than traditional development, and here is exactly how to enforce it at every step. The series follows the pipeline order — requirements, design, testing, review, implementation. This article starts at the testing layer. During Aristotle’s third refactoring, the test plan document was where I learned the hardest lesson. I’ll cover this layer first, then work backward and forward in subsequent posts. ...

2026-04-23 · 16 min · Alex Wang
Context rot: an easily overlooked problem in AI coding

Context Rot: An Easily Overlooked Problem in AI Coding

Yesterday someone in a group chat said GPT-5.4 performed worse than Doubao. When they asked questions, the model would often give irrelevant answers without even reading the question. I asked a few follow-up questions and found they had fed it a lot of documents, and the conversation had gone on for many turns. This probably wasn’t the model’s problem—it was context rot. I’ve had similar experiences myself. After talking to a model for a long time, it starts “forgetting” what we discussed earlier, or repeats mistakes that were already corrected. The model hasn’t gotten stupider. The conversation has just gotten too long. ...

2026-04-18 · 14 min · Alex Wang
Seven human-AI collaboration patterns from the Aristotle project

Looking Back: Seven Human-AI Collaboration Patterns in the Aristotle Project

Five articles in. Time to step back and look at the path itself. Aristotle: Teaching AI to Reflect on Its Mistakes covered the design philosophy and initial implementation. claude-code-reflect: Same Metacognition, Different Soil told the story of porting across platforms. Trust Boundaries: One Idea, Two Systems proposed a trust tiering model. From Scars to Armor: Harness Engineering in Practice validated the theory through refactoring. A Markdown’s Three Lives: From Static Rules to a Git-Backed MCP Server evolved the rule storage from append-only to the GEAR protocol. ...

2026-04-16 · 11 min · Alex Wang
A Markdown's three lives: from static rules to Git-backed MCP Server

A Markdown's Three Lives: From Static Rules to Git-Backed MCP Server

The previous article, From Scars to Armor: Harness Engineering in Practice, ended with Aristotle having a streamlined router (SKILL.md compressed from 371 lines to 84), an on-demand progressive disclosure architecture, and a working reflect→review→confirm workflow. But one thread never got pulled: Where do confirmed rules actually live? This article follows that thread. It wasn’t planned from the start. Three concrete problems in actual use forced the design out, step by step. ...

2026-04-16 · 21 min · Alex Wang
From scars to armor: Progressive Disclosure architecture reforged from four defects

From Scars to Armor: Harness Engineering in Practice

Three articles in. Back to code — and a hard look in the mirror. The first post, Aristotle: Teaching AI to Reflect on Its Mistakes, covered the design philosophy and a smooth implementation — three commits in one go. The second, claude-code-reflect: Same Metacognition, Different Soil, described the adaptation cost of moving the same philosophy to Claude Code — continuous iteration from V1 to V3. The third, Trust Boundaries: The Same Idea on Open and Closed Platforms, proposed a tiered trust model and a harness engineering framework. ...

2026-04-11 · 14 min · Alex Wang