An experiment dashboard where every expected metric shows red — except one gauge in the corner, glowing green

What a Failed Experiment Got Right

Series: Breaking to Build: TDD Process Iterations (first post) TL;DR: I refined Phase 6 (pre-release testing) of the TDD Pipeline from step-driven to principle-driven. The goal was better output. I didn’t get it — the refined version was worse at drilling into individual bugs and building evidence chains. But comparing the two outputs revealed dimensional differences. The refined version was better at component gap checking and cross-bug pattern scanning. Those differences pointed to a judgment call: Phase 6 doesn’t need refining. It needs a layer on top of it. That layer later became Phase 7. ...

2026-05-19 · 5 min · Alex Wang
Watercolor style: three guide lines in signal-light colors converging on a door — green labeled reasoning, yellow labeled trade-offs, red labeled assumptions — symbolizing the three signals that trigger CoT

When Should You Ask AI to 'Think Step by Step'? Three Signals

Tip Card: When Should You Ask AI to “Think Step by Step”? Adding “please reason step by step” at the end of your prompt — that’s Chain-of-Thought (CoT). Deceptively simple, yet remarkably effective in the right situations. The question is: when should you add it? The answer is straightforward. Watch for three signals. If any apply, add it. Signal 1: The Problem Requires Multi-Step Reasoning “If I save 30% of my monthly income at 4% annual interest, compounded, how much will I have after 10 years?” ...

2026-05-19 · 2 min · Alex Wang
Watercolor style: a workbench with five woodworking sketches progressing from rough to refined, symbolizing how the same request gets rewritten from vague to precise

RBGO Rewrites in 5 Real Scenarios: Vague Prompt vs. Precise Prompt

We covered the RBGO (Role-Background-Goal-Output) framework in the previous post. But there’s a gap between knowing the framework and actually using it: how do you translate “I want…” into those four elements? Below are 5 common everyday scenarios. Each one starts with the vague version (what most people actually write), followed by the RBGO rewrite, and finally a breakdown of what changed and why. Scenario 1: Writing a Work Email Vague version: ...

2026-05-18 · 6 min · Alex Wang
Three objects on warm cream: a compass, a crossed-out stamp, and a blank card with a hand-drawn arrow

The Upgrade — New Template and Three Transferable Lessons

TL;DR: Before-and-after comparison of the upgraded Why Articulation template, plus three transferable lessons: give principles not examples, lock critical steps with mandatory tone, and trust the model’s self-organization. Experiment limitations included. Series: Why Make AI Articulate Why Before Acting (Article 3) Previous: A 4-Variable A/B Test — Why Positive Examples Harm Prompt Performance Recap Article 1 started from Anthropic’s alignment research: teaching a model why rather than what cut misalignment from 22% to 3% (about 7×), and achieved equivalent results with 1/28 the data [1]. I adapted this into Why Articulation — a mechanism that forces AI to explain purpose, risks, and approach before writing any code. ...

2026-05-17 · 8 min · Alex Wang
Watercolor illustration: a rough pencil sketch on the left transforming into a polished drawing on the right, connected by a soft arrow, symbolizing the rewrite from vague to precise

Practice: Rewrite Your First Question with RBGO

Today’s Practice Recall the first question you asked AI today (or recently) — the more casual, the better. Don’t cherry-pick. Ask it again exactly as-is. Save the answer. Now rewrite the same question using the RBGO framework: R (Role): Who should AI play — “senior ops manager”, “strict tech reviewer”, “patient teacher” B (Background): Your specific situation — target users, budget, timeline G (Goal): What you want — a strategy, a troubleshooting approach, an email draft O (Output): What format — 3 recommendations, table format, under 300 words Save the rewritten answer too. Put both side by side. ...

2026-05-17 · 2 min · Alex Wang
Watercolor still life: rough unpolished stone beside a faceted gemstone, symbolizing the refinement from vague to precise prompts

AI Path L0→L1 Upgrade Guide (2): From Vague Questions to Precise Instructions

📖 This is Part 2 of 5 in the “AI Path L0→L1 Upgrade Guide” series. Part 1: Understanding Your Tools · Part 2: From Vague Questions to Precise Instructions · Part 3: Turning AI Into Your Collaboration Partner (coming soon) · Part 4: Building Your Personal System (coming soon) · Part 5: Graduation & Next Steps (coming soon) In the last part we covered how LLMs actually work, how their memory operates, and the key differences between major platforms. Starting this week, we move into practice — how to turn what you want to say into instructions that AI can understand precisely. ...

2026-05-16 · 6 min · Alex Wang
Left: a stamp copying identical patterns. Right: freeform marks for independent thinking. Red X marks the imitation path as wrong

A 4-Variable A/B Test — Why Positive Examples Harm Prompt Performance

TL;DR: A 4-variable A/B test on Why Articulation — structure, tone, position, and examples. Positive examples made output worse. The model imitated instead of reasoning. Open-ended prompts improved quality directionally and cut tokens by 33%. Series: Why Make AI Articulate Why Before Acting (Article 2) Previous: From Anthropic’s Alignment Research to a Prompt Design Insight Where We Left Off Anthropic’s alignment research [1] landed on a sharp insight: teaching a model why beats telling it what. I took that idea and built Why Articulation into my TDD Pipeline — a mechanism that forces the model to explain its understanding before it writes any code. Early results looked good. ...

2026-05-15 · 8 min · Alex Wang
Watercolor illustration: three artisan tools on a warm wooden workbench — a wide terracotta bowl, an elegant glass carafe, and a segmented wooden organizer — each suited for different tasks, no ranking implied

Pick Your AI by the Job, Not the Ranking

Tried ChatGPT, Claude, Gemini, DeepSeek… and still can’t decide which one to stick with? Here’s the thing: that’s the wrong question. There is no universally best AI — only the one that fits what you’re doing right now. What’s your scenario? “I want a general-purpose assistant for everything” → ChatGPT. As of May 2026 the default is GPT-5.5 — well-rounded, with the richest plugin ecosystem. If you pick just one, this is a solid choice. ...

2026-05-15 · 1 min · Alex Wang
An arched gateway inscribed with WHY, two rods of different length and color on the ground

From Anthropic's Alignment Research to a Prompt Design Insight

TL;DR: Anthropic’s alignment research shows that teaching a model why works better than teaching it what — misalignment dropped from 22% to 3%. This post breaks down four experiments and distills three lessons you can use in prompt design. I ran an A/B test comparing two prompt strategies. One group got positive examples — “do it like this.” The other got no examples. Instead, the AI had to explain why a choice was correct before acting on it. ...

2026-05-14 · 7 min · Alex Wang
Watercolor illustration: a cluttered desk on the left, a neat filing cabinet on the right, separated by a dashed line — symbolizing working memory vs. long-term memory

Your AI Has a Desk and a Filing Cabinet

Ever notice your AI suddenly ignoring something you said ten minutes ago? Or opened a fresh chat and had to explain your entire project from scratch? Here’s why. Your AI actually has two kinds of memory, and understanding both changes how you work with it. The Desk: Working Memory Working memory is everything inside your current conversation. Think of it as a desk — limited surface area. A few documents fit comfortably. Stack too many, and older pages slide right off the edge. ...

2026-05-14 · 2 min · Alex Wang