The bug loop: four rounds of root cause diagnosis and regression tests breaking the spiral

The Bug Loop You Can't Escape: Root Cause Diagnosis with AI

1. The Loop That Never Ends A few days ago, the Aristotle project [1] — aimed at fully implementing the GEAR protocol — finally validated all its core technical pathways. The codebase had gone through its third refactoring, core features were working, and testing was complete. Right before merging the development branch into main for release, I ran a manual test and discovered that SKILL.md instructions weren’t being executed correctly — the model received the action but didn’t call task() to launch a background subagent. Instead, it loaded LEARN.md. From investigating this issue, more bugs kept surfacing: ...

2026-05-01 · 17 min
Pipeline from requirements to code, each stage catching what the previous one missed

The Full Pipeline: Five Stages from Requirements to Code

This is article 6 in “Taming AI Coding Agents with TDD.” The first four covered requirements disambiguation with the GEAR protocol, tech spec guardrails, test documents before test code, and convergent review loops. Article 5 upgraded the review layer with procedural justice. This one strings everything together into a single pipeline you can actually run. The Complete Pipeline Product Design → Tech Spec → Test Plan → Test Code → Production Code ↑ ↑ ↑ ↑ ↑ Ralph Loop Ralph Loop Ralph Loop Ralph Loop Ralph Loop Each stage has its own inputs, outputs, and review rules: ...

2026-04-30 · 9 min
Procedural justice encoded: adversarial review where every decision is verifiable

Procedural Justice Encoded: Making Every Step of AI Review Verifiable

My Ralph Loop review mechanism had a hidden problem. v0.2’s flow was straightforward: find issues → fix → confirm convergence. In part 4 of this series, I mentioned that if the creator disagrees with the reviewer’s judgment, they can present evidence in the next round for reassessment. But that was one sentence in the rules — not a formal protocol. Nobody was checking whether the review itself was sound. The reviewer might mislabel severity. The main agent might blindly accept bad suggestions. ...

2026-04-30 · 10 min
Ralph Loop: multi-round convergent review, two consecutive clean rounds to exit

AI Errors Converge, They Don't Randomize: The Review Loop That Catches What You Miss

in “Taming AI Coding Agents with TDD.” The first covered test-driven requirements anchoring, the second introduced the GEAR protocol for disambiguation, the third laid out what the tech spec must nail down. This one covers the last line of defense: review. The Problem the Tech Spec Cannot Solve Article 3 ended with an uncomfortable admission. The PRD locks down “what to build.” The tech spec locks down “how to build it.” Together they compress the AI’s improvisation space down to implementation details. That is a huge improvement. ...

2026-04-29 · 11 min
PRD to tech spec: documents as guardrails, not burden

Why PRD Alone Is Not Enough: What the Tech Spec Must Cover in AI-Assisted Development

in the “Taming AI Coding Agents with TDD” series. The first covered test-driven requirements anchoring, the second covered the GEAR protocol for requirements disambiguation. This one fills the gap between them: after the PRD is done, what must the tech spec cover? Requirements Locked, Code Still Wrong Before the second Aristotle refactor, I spent two full days writing requirements. Following the structured approach from the previous article, I captured every acceptance criterion, boundary condition, error path, and platform constraint[1]. The AI consumed the document, passed all 37 static assertions plus end-to-end tests. The codebase was split into four files by responsibility. Information flow was switched from push to pull. ...

2026-04-29 · 11 min
Structured requirements vs one-liner: the trap of AI auto-filling gaps

Why AI-Assisted Development Needs Structured Requirements First: Lessons from the GEAR Protocol

in the “Taming AI Coding Agents with TDD” series. The first article covered requirement anchoring at the test layer[1]. Tests assume clear requirements. This one goes upstream — to the practice of disambiguating requirements before a single line of code gets written. The v1 Lesson: One-Line Requirement, 371 Lines of Pollution Aristotle v1 had no GEAR protocol[2]. No role separation. The entire reflection feature lived in a single 371-line SKILL.md. The requirement was roughly one sentence: the system should detect when a user corrects an AI mistake, then generate a reusable rule. ...

2026-04-25 · 8 min
Requirement anchoring: test plan before test code before business code

Write Test Plans Before Test Code: Requirement Anchoring in AI Development

This is the first article in the series “Taming AI Coding Agents with TDD.” The series has one thesis: AI-assisted development demands stricter process discipline than traditional development, and here is exactly how to enforce it at every step. The series follows the pipeline order — requirements, design, testing, review, implementation. This article starts at the testing layer. During Aristotle’s third refactoring, the test plan document was where I learned the hardest lesson. I’ll cover this layer first, then work backward and forward in subsequent posts. ...

2026-04-23 · 16 min