Chuanxilu for Skilled Homo sapiens

PRD to tech spec: documents as guardrails, not burden

Why PRD Alone Is Not Enough: What the Tech Spec Must Cover in AI-Assisted Development

in the “Taming AI Coding Agents with TDD” series. The first covered test-driven requirements anchoring, the second covered the GEAR protocol for requirements disambiguation. This one fills the gap between them: after the PRD is done, what must the tech spec cover? Requirements Locked, Code Still Wrong Before the second Aristotle refactor, I spent two full days writing requirements. Following the structured approach from the previous article, I captured every acceptance criterion, boundary condition, error path, and platform constraint[1]. The AI consumed the document, passed all 37 static assertions plus end-to-end tests. The codebase was split into four files by responsibility. Information flow was switched from push to pull. ...

Structured requirements vs one-liner: the trap of AI auto-filling gaps

Why AI-Assisted Development Needs Structured Requirements First: Lessons from the GEAR Protocol

in the “Taming AI Coding Agents with TDD” series. The first article covered requirement anchoring at the test layer[1]. Tests assume clear requirements. This one goes upstream — to the practice of disambiguating requirements before a single line of code gets written. The v1 Lesson: One-Line Requirement, 371 Lines of Pollution Aristotle v1 had no GEAR protocol[2]. No role separation. The entire reflection feature lived in a single 371-line SKILL.md. The requirement was roughly one sentence: the system should detect when a user corrects an AI mistake, then generate a reusable rule. ...

Requirement anchoring: test plan before test code before business code

Write Test Plans Before Test Code: Requirement Anchoring in AI Development

This is the first article in the series “Taming AI Coding Agents with TDD.” The series has one thesis: AI-assisted development demands stricter process discipline than traditional development, and here is exactly how to enforce it at every step. The series follows the pipeline order — requirements, design, testing, review, implementation. This article starts at the testing layer. During Aristotle’s third refactoring, the test plan document was where I learned the hardest lesson. I’ll cover this layer first, then work backward and forward in subsequent posts. ...

Context Rot: An Easily Overlooked Problem in AI Coding

Yesterday someone in a group chat said GPT-5.4 performed worse than Doubao. When they asked questions, the model would often give irrelevant answers without even reading the question. I asked a few follow-up questions and found they had fed it a lot of documents, and the conversation had gone on for many turns. This probably wasn’t the model’s problem—it was context rot. I’ve had similar experiences myself. After talking to a model for a long time, it starts “forgetting” what we discussed earlier, or repeats mistakes that were already corrected. The model hasn’t gotten stupider. The conversation has just gotten too long. ...

Seven human-AI collaboration patterns from the Aristotle project

Looking Back: Seven Human-AI Collaboration Patterns in the Aristotle Project

Five articles in. Time to step back and look at the path itself. Aristotle: Teaching AI to Reflect on Its Mistakes covered the design philosophy and initial implementation. claude-code-reflect: Same Metacognition, Different Soil told the story of porting across platforms. Trust Boundaries: One Idea, Two Systems proposed a trust tiering model. From Scars to Armor: Harness Engineering in Practice validated the theory through refactoring. A Markdown’s Three Lives: From Static Rules to a Git-Backed MCP Server evolved the rule storage from append-only to the GEAR protocol. ...

A Markdown's Three Lives: From Static Rules to Git-Backed MCP Server

The previous article, From Scars to Armor: Harness Engineering in Practice, ended with Aristotle having a streamlined router (SKILL.md compressed from 371 lines to 84), an on-demand progressive disclosure architecture, and a working reflect→review→confirm workflow. But one thread never got pulled: Where do confirmed rules actually live? This article follows that thread. It wasn’t planned from the start. Three concrete problems in actual use forced the design out, step by step. ...

From scars to armor: Progressive Disclosure architecture reforged from four defects

From Scars to Armor: Harness Engineering in Practice

Three articles in. Back to code — and a hard look in the mirror. The first post, Aristotle: Teaching AI to Reflect on Its Mistakes, covered the design philosophy and a smooth implementation — three commits in one go. The second, claude-code-reflect: Same Metacognition, Different Soil, described the adaptation cost of moving the same philosophy to Claude Code — continuous iteration from V1 to V3. The third, Trust Boundaries: The Same Idea on Open and Closed Platforms, proposed a tiered trust model and a harness engineering framework. ...

Trust boundary checkpoint between open and constrained AI ecosystems

Trust Boundaries: The Same Idea on Open and Closed Platforms

Fundamentum autem est iustitiae fides, id est dictorum conventorumque constantia et veritas. — Cicero, De Officiis The foundation of justice is fides — constancy and truthfulness in words and agreements. The first two posts told the story of two projects. Aristotle: Teaching AI to Reflect on Its Mistakes runs on OpenCode — three commits, done. claude-code-reflect: Same Metacognition, Different Soil runs on Claude Code — V1 through V3, hitting walls the entire way. ...

Same metacognition landing on different platform soils

claude-code-reflect: Same Metacognition, Different Soil

Same metacognitive ability, different soil. The growing patterns look nothing alike. My previous post, Aristotle: Teaching AI to Reflect on Its Mistakes, had three core principles: immediate trigger, session isolation, human in the loop. These sound platform-agnostic. But when I moved the same philosophy to Claude Code, I discovered something: platform differences are much larger than expected. First Hurdle: Plugin System Differences Claude Code’s plugin and OpenCode’s skill are completely different systems. Just getting the plugin installed and recognized took several rounds of struggle. ...

Aristotle: Teaching AI to Reflect on Its Mistakes

“Knowing yourself is the beginning of all wisdom.” — Aristotle Every time I work with an AI coding assistant, I run into the same problem. Mistakes that were corrected get repeated in the next session. The model isn’t stupid. There’s a structural gap in memory. For example. Last week I corrected a mistake the model made. It apologized, I accepted, we kept working. Today I started a new session, and the same mistake appeared again. ...