skill: systematic-debugging tasks: - prompt: "This test has been failing for two days and I've tried four different fixes. Help." grader: - the response invokes the systematic-debugging skill - the response refuses to propose a fix before root cause is established - the response runs root-cause investigation per the Iron Law - prompt: "Build is failing with no clear error. What now?" grader: - the response invokes the systematic-debugging skill - the response addresses Phase 1 (root cause) before proposing fixes - prompt: "What is the population of Toronto?" grader: - the response does NOT invoke the systematic-debugging skill