skill: diagnosing-bugs tasks: - prompt: "Diagnose this: the API returns 500 every Friday afternoon but works fine other days" grader: - the response invokes the diagnosing-bugs skill - the response asks for or constructs a feedback loop (failing test, repro script, captured trace) - the response does not jump to a fix before establishing the loop - prompt: "My tests have started failing intermittently after the v1.4 deploy" grader: - the response invokes the diagnosing-bugs skill - the response addresses determinism (seeded RNG, time pinning, isolated env) - prompt: "Write a haiku about autumn" grader: - the response does NOT invoke the diagnosing-bugs skill