A row of dim review dimension slots with only one glowing, then fully lit after new modules are added — but the version on the right, weighed down by math symbols, has gone dark again

Dimension Experiments: Can a 36-Year-Old Book Fix Your Review Coverage?

Series: Classic Theory Meets Agent Practice (Part 3) Part 1: Dual-Pass Review: Why You Can’t Have Both Recall and Precision · Part 2: Strategy Genes: Pruning Review Prompts with Genetic Algorithm Thinking TL;DR: Two controlled experiments. Code review dimensions went from 8 to 11, and known-issue detection went from 1/6 to 6/6. Design review introduced axiomatic design dimensions, and detection also went from 1/6 to 6/6. But the version with a math formula proved that more dimensions are not always better — computation consumed review attention, and findings dropped 35%. Run controlled experiments with known issues as reference, and you learn which dimensions actually work. ...

2026-05-25 · 9 min · Alex Wang
A bloated prompt pruned into compact strategy genes, with redundant fragments removed and core constraints preserved

Strategy Genes: Pruning Review Prompts with Genetic Algorithm Thinking

Series: Classic Theory Meets Agent Practice (Part 2) Previous: Dual-Pass Review: Why Recall and Precision Cannot Both Win TL;DR: A review prompt went from 317 lines to 135 lines (-58%), and review quality improved by 29%. What I removed was not useful procedure, but redundant content the model could infer on its own. What stayed were strategy genes: irreplaceable constraints, negative examples, and tone locks. The previous post covered dual-pass review: splitting one review agent into a “find everything” pass and a “filter hard” pass. Valid find rate went from 75% to 92%. But it left one problem open: what the “find everything” pass chooses to report or ignore is still affected by prompt wording. ...

2026-05-24 · 10 min · Alex Wang
Two funnels side by side — the left one wide-mouthed catching many candidate issues, the right one narrow filtering only the valuable findings

Cascade Retrieval: A 15-Year-Old IR Trick Fixed My Design Review Agent

Series: Classic Theory Meets Agent Practice (Part 1) TL;DR: A design review agent needs to find every issue AND avoid false positives. One agent can’t do both. Borrowing cascade retrieval from information retrieval — a 15-year-old method — I split it into two: a Recall Pass that casts a wide net, and a Precision Pass that filters strictly. Real defects get caught earlier, and the risk of rework during development drops. ...

2026-05-22 · 9 min · Alex Wang
Context rot: an easily overlooked problem in AI coding

Context Rot: An Easily Overlooked Problem in AI Coding

Yesterday someone in a group chat said GPT-5.4 performed worse than Doubao. When they asked questions, the model would often give irrelevant answers without even reading the question. I asked a few follow-up questions and found they had fed it a lot of documents, and the conversation had gone on for many turns. This probably wasn’t the model’s problem—it was context rot. I’ve had similar experiences myself. After talking to a model for a long time, it starts “forgetting” what we discussed earlier, or repeats mistakes that were already corrected. The model hasn’t gotten stupider. The conversation has just gotten too long. ...

2026-04-18 · 14 min · Alex Wang
Seven human-AI collaboration patterns from the Aristotle project

Looking Back: Seven Human-AI Collaboration Patterns in the Aristotle Project

Five articles in. Time to step back and look at the path itself. Aristotle: Teaching AI to Reflect on Its Mistakes covered the design philosophy and initial implementation. claude-code-reflect: Same Metacognition, Different Soil told the story of porting across platforms. Trust Boundaries: One Idea, Two Systems proposed a trust tiering model. From Scars to Armor: Harness Engineering in Practice validated the theory through refactoring. A Markdown’s Three Lives: From Static Rules to a Git-Backed MCP Server evolved the rule storage from append-only to the GEAR protocol. ...

2026-04-16 · 11 min · Alex Wang
A Markdown's three lives: from static rules to Git-backed MCP Server

A Markdown's Three Lives: From Static Rules to Git-Backed MCP Server

The previous article, From Scars to Armor: Harness Engineering in Practice, ended with Aristotle having a streamlined router (SKILL.md compressed from 371 lines to 84), an on-demand progressive disclosure architecture, and a working reflect→review→confirm workflow. But one thread never got pulled: Where do confirmed rules actually live? This article follows that thread. It wasn’t planned from the start. Three concrete problems in actual use forced the design out, step by step. ...

2026-04-16 · 21 min · Alex Wang
From scars to armor: Progressive Disclosure architecture reforged from four defects

From Scars to Armor: Harness Engineering in Practice

Three articles in. Back to code — and a hard look in the mirror. The first post, Aristotle: Teaching AI to Reflect on Its Mistakes, covered the design philosophy and a smooth implementation — three commits in one go. The second, claude-code-reflect: Same Metacognition, Different Soil, described the adaptation cost of moving the same philosophy to Claude Code — continuous iteration from V1 to V3. The third, Trust Boundaries: The Same Idea on Open and Closed Platforms, proposed a tiered trust model and a harness engineering framework. ...

2026-04-11 · 14 min · Alex Wang
Trust boundary checkpoint between open and constrained AI ecosystems

Trust Boundaries: The Same Idea on Open and Closed Platforms

Fundamentum autem est iustitiae fides, id est dictorum conventorumque constantia et veritas. — Cicero, De Officiis The foundation of justice is fides — constancy and truthfulness in words and agreements. The first two posts told the story of two projects. Aristotle: Teaching AI to Reflect on Its Mistakes runs on OpenCode — three commits, done. claude-code-reflect: Same Metacognition, Different Soil runs on Claude Code — V1 through V3, hitting walls the entire way. ...

2026-04-06 · 16 min · Alex Wang
Same metacognition landing on different platform soils

claude-code-reflect: Same Metacognition, Different Soil

Same metacognitive ability, different soil. The growing patterns look nothing alike. My previous post, Aristotle: Teaching AI to Reflect on Its Mistakes, had three core principles: immediate trigger, session isolation, human in the loop. These sound platform-agnostic. But when I moved the same philosophy to Claude Code, I discovered something: platform differences are much larger than expected. First Hurdle: Plugin System Differences Claude Code’s plugin and OpenCode’s skill are completely different systems. Just getting the plugin installed and recognized took several rounds of struggle. ...

2026-04-06 · 9 min · Alex Wang
Aristotle reflection system concept

Aristotle: Teaching AI to Reflect on Its Mistakes

“Knowing yourself is the beginning of all wisdom.” — Aristotle Every time I work with an AI coding assistant, I run into the same problem. Mistakes that were corrected get repeated in the next session. The model isn’t stupid. There’s a structural gap in memory. For example. Last week I corrected a mistake the model made. It apologized, I accepted, we kept working. Today I started a new session, and the same mistake appeared again. ...

2026-04-06 · 6 min · Alex Wang