Aristotle

Medieval castle at night with green surveillance lights along walls, a red crack visible in the shadows

1754 Tests All Green, Then a Code Review Found 6 Assassins

Prologue: 1,754 perfect green lights Late on the night Aristotle v1.6.0 was shipping, the team watched the test panel. Green indicators lit up like dominoes. Python side: 1,166 assertions. TypeScript side: 588 checks. Total: 1,754 automated test cases. All green. In code terms, that’s cameras and infrared sensors on every wall. A fly couldn’t sneak through without setting off alarms. The team leaned back. The system looked like an iron fortress. ...

Split architectural structure, left warm amber TypeScript tower, right cool cyan Python engine room, central subprocess bridge with five constraint pillars

One System, Two Languages: The Five Constraints Behind Aristotle v1.6's Architecture

TL;DR: Five constraints shaped the Watchdog-Intervention Bridge’s cross-language architecture. Watchdog has to intercept LLM tool calls synchronously, so it runs in TypeScript. Intervention has to reuse the existing reflection engine and rule system, so it stays in Python. The Bridge adds zero new infrastructure, so it uses subprocess. Communication can’t block every tool call, so batching replaces real-time streaming. Each decision was the least bad option under the circumstances. The last post covered what the Watchdog-Intervention Bridge does in Aristotle v1.6. This one is about why it looks the way it does. ...

Six bug patterns: components correct in isolation, broken after integration, diagnostic clarity emerging from chaos

Green Tests, Broken System: Six Bug Patterns AI Left at the Integration Layer

TL;DR: Before releasing Aristotle v1.1, I found 18 bugs. Unit tests caught four (22%). The other 14 lived at the integration layer — component wiring, config propagation, process startup seams. Root cause analysis revealed six patterns: path/environment mismatch (5), registration omission (3), startup hang (2), silent failure (2), test-production path divergence (2), integration seam errors (4). The root cause isn’t harder problems — it’s AI bypassing the defenses that experience built. Implementation and review rhythms decouple, code appearance misleads quality judgment, and integration shifts from an explicit action to an implicit assumption. Includes an eight-dimension integration checklist and a 16-type bug roadmap at the end. ...

The bug loop: four rounds of root cause diagnosis and regression tests breaking the spiral

The Bug Loop You Can't Escape: Root Cause Diagnosis with AI

1. The Loop That Never Ends A few days ago, the Aristotle project [1] — aimed at fully implementing the GEAR protocol — finally validated all its core technical pathways. The codebase had gone through its third refactoring, core features were working, and testing was complete. Right before merging the development branch into main for release, I ran a manual test and discovered that SKILL.md instructions weren’t being executed correctly — the model received the action but didn’t call task() to launch a background subagent. Instead, it loaded LEARN.md. From investigating this issue, more bugs kept surfacing: ...

Seven human-AI collaboration patterns from the Aristotle project

Looking Back: Seven Human-AI Collaboration Patterns in the Aristotle Project

Five articles in. Time to step back and look at the path itself. Aristotle: Teaching AI to Reflect on Its Mistakes covered the design philosophy and initial implementation. claude-code-reflect: Same Metacognition, Different Soil told the story of porting across platforms. Trust Boundaries: One Idea, Two Systems proposed a trust tiering model. From Scars to Armor: Harness Engineering in Practice validated the theory through refactoring. A Markdown’s Three Lives: From Static Rules to a Git-Backed MCP Server evolved the rule storage from append-only to the GEAR protocol. ...

A Markdown's Three Lives: From Static Rules to Git-Backed MCP Server

The previous article, From Scars to Armor: Harness Engineering in Practice, ended with Aristotle having a streamlined router (SKILL.md compressed from 371 lines to 84), an on-demand progressive disclosure architecture, and a working reflect→review→confirm workflow. But one thread never got pulled: Where do confirmed rules actually live? This article follows that thread. It wasn’t planned from the start. Three concrete problems in actual use forced the design out, step by step. ...

From scars to armor: Progressive Disclosure architecture reforged from four defects

From Scars to Armor: Harness Engineering in Practice

Three articles in. Back to code — and a hard look in the mirror. The first post, Aristotle: Teaching AI to Reflect on Its Mistakes, covered the design philosophy and a smooth implementation — three commits in one go. The second, claude-code-reflect: Same Metacognition, Different Soil, described the adaptation cost of moving the same philosophy to Claude Code — continuous iteration from V1 to V3. The third, Trust Boundaries: The Same Idea on Open and Closed Platforms, proposed a tiered trust model and a harness engineering framework. ...

Trust boundary checkpoint between open and constrained AI ecosystems

Trust Boundaries: The Same Idea on Open and Closed Platforms

Fundamentum autem est iustitiae fides, id est dictorum conventorumque constantia et veritas. — Cicero, De Officiis The foundation of justice is fides — constancy and truthfulness in words and agreements. The first two posts told the story of two projects. Aristotle: Teaching AI to Reflect on Its Mistakes runs on OpenCode — three commits, done. claude-code-reflect: Same Metacognition, Different Soil runs on Claude Code — V1 through V3, hitting walls the entire way. ...

Aristotle: Teaching AI to Reflect on Its Mistakes

“Knowing yourself is the beginning of all wisdom.” — Aristotle Every time I work with an AI coding assistant, I run into the same problem. Mistakes that were corrected get repeated in the next session. The model isn’t stupid. There’s a structural gap in memory. For example. Last week I corrected a mistake the model made. It apologized, I accepted, we kept working. Today I started a new session, and the same mistake appeared again. ...