AB Testing on Chuanxilu for Skilled Homo sapiens

AB Testing on Chuanxilu for Skilled Homo sapienshttps://blog.chuanxilu.net/en/tags/ab-testing/Recent content in AB Testing on Chuanxilu for Skilled Homo sapiensHugoen-USSun, 17 May 2026 09:00:00 +0800The Upgrade — New Template and Three Transferable Lessonshttps://blog.chuanxilu.net/en/posts/2026/05/why-articulation-upgrade-and-takeaways/Sun, 17 May 2026 09:00:00 +0800https://blog.chuanxilu.net/en/posts/2026/05/why-articulation-upgrade-and-takeaways/Upgrading the Why Articulation template based on A/B test data: replacing explicit questions with open-ended reasoning plus self-check, keeping mandatory tone and negative-only examples. Three transferable prompt engineering lessons.A 4-Variable A/B Test — Why Positive Examples Harm Prompt Performancehttps://blog.chuanxilu.net/en/posts/2026/05/ab-test-positive-examples-harm/Fri, 15 May 2026 10:00:00 +0800https://blog.chuanxilu.net/en/posts/2026/05/ab-test-positive-examples-harm/Why do positive examples make AI output worse? A 4-variable A/B test on Why Articulation structure, tone, position, and example type found that demonstrations hurt — echoing Anthropic's alignment research.From Anthropic's Alignment Research to a Prompt Design Insighthttps://blog.chuanxilu.net/en/posts/2026/05/anthropic-alignment-to-prompt-design/Thu, 14 May 2026 10:00:00 +0800https://blog.chuanxilu.net/en/posts/2026/05/anthropic-alignment-to-prompt-design/Anthropic discovered that teaching models "why" works better than teaching them "what" — misalignment dropped from 22% to 3%. This insight from safety training applies to everyday prompt design too.