
A 4-Variable A/B Test — Why Positive Examples Harm Prompt Performance
TL;DR: A 4-variable A/B test on Why Articulation — structure, tone, position, and examples. Positive examples made output worse. The model imitated instead of reasoning. Open-ended prompts improved quality directionally and cut tokens by 33%. Series: Why Make AI Articulate Why Before Acting (Article 2) Previous: From Anthropic’s Alignment Research to a Prompt Design Insight Where We Left Off Anthropic’s alignment research [1] landed on a sharp insight: teaching a model why beats telling it what. I took that idea and built Why Articulation into my TDD Pipeline — a mechanism that forces the model to explain its understanding before it writes any code. Early results looked good. ...








