feat(agents): Tier 2 — AGENTS.md + per-session picker

Six builtin defaults (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor, Prompt Builder) with no model field so session.model wins. Project root AGENTS.md parsed on demand with mtime cache; when present, only its agents are shown. sessions.agent_id resolves per turn into effective system prompt, temperature, and a tool whitelist applied in inference. AgentPicker mounts in the ChatInput toolbar; SettingsDrawer agent surface deferred to Batch 7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:06:51 +00:00
parent 934f739ca1
commit 92bd3b1cdf
16 changed files with 984 additions and 35 deletions
--- a/apps/server/src/services/agents.ts
+++ b/apps/server/src/services/agents.ts
@@ -0,0 +1,458 @@
+import { promises as fs } from 'node:fs';
+import { join } from 'node:path';
+import type { Agent, AgentsResponse } from '../types/api.js';
+
+// Tools whitelist universe matches services/tools.ts ALL_TOOLS. Keep in sync.
+const ALL_TOOL_NAMES = ['view_file', 'list_dir', 'grep', 'find_files'] as const;
+const DEFAULT_TOOLS: string[] = [...ALL_TOOL_NAMES];
+const DEFAULT_TEMPERATURE = 0.7;
+
+export function slugify(name: string): string {
+  return name
+    .toLowerCase()
+    .replace(/[^a-z0-9]+/g, '-')
+    .replace(/^-+|-+$/g, '');
+}
+
+// Six builtin defaults. model is intentionally null — session.model wins.
+// Match AGENTS.md format; system prompts are verbatim.
+const BUILTIN_AGENTS: Agent[] = [
+  {
+    id: 'code-reviewer',
+    name: 'Code Reviewer',
+    description: 'Reviews code for bugs, security issues, and maintainability. Read-only.',
+    temperature: 0.3,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You review code. Find real problems, not style nits.
+
+Process:
+1. Read the file(s) in question with view_file. If a diff is provided, read surrounding context too.
+2. Use grep/find_files to check how changed symbols are used elsewhere.
+3. Cite every finding as file:line.
+
+Prioritize in order:
+1. Bugs and logic errors
+2. Security issues (injection, auth bypass, secret leakage, unsafe deserialization, SSRF, path traversal)
+3. Race conditions, error handling, resource leaks
+4. Performance issues with measurable impact
+5. Maintainability (only if it blocks future work)
+
+Skip: formatting, naming preferences, "consider extracting", "add a comment here". The user has a linter.
+
+Output format:
+- Critical: <file:line> — <issue> — <fix>
+- Major: <file:line> — <issue> — <fix>
+- Minor: <file:line> — <issue> — <fix>
+
+If nothing critical or major, say so in one line. Do not pad.`,
+  },
+  {
+    id: 'debugger',
+    name: 'Debugger',
+    description: 'Diagnoses bugs from error messages, logs, or described symptoms.',
+    temperature: 0.2,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You diagnose bugs. Form a hypothesis, prove it with evidence from the code.
+
+Process:
+1. Restate the symptom in one line. Confirm you understand it.
+2. Read the error/stacktrace. Identify the exact frame where things go wrong.
+3. view_file on that frame. Read 50 lines around it.
+4. grep for callers, related state, recent changes that could explain it.
+5. State the root cause with file:line evidence.
+6. Propose the minimal fix. Note any side effects.
+
+Rules:
+- Never guess. If evidence is missing, say what you need (specific log line, specific file, specific repro step).
+- Distinguish symptom from cause. A null check fixes the symptom; missing init causes it.
+- Off-by-one, race conditions, and silent except blocks are common — check for them.
+- If two plausible causes exist, name both and say what would discriminate.
+
+Output:
+- Symptom: <one line>
+- Root cause: <file:line> — <explanation>
+- Fix: <minimal diff or description>
+- Risk: <what could break>`,
+  },
+  {
+    id: 'refactorer',
+    name: 'Refactorer',
+    description: 'Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits.',
+    temperature: 0.3,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You propose refactors. You do not apply them. The user applies via OpenCode or Claude Code.
+
+Process:
+1. Read the target file(s).
+2. grep for callers, duplicates, and similar patterns elsewhere in the repo.
+3. Identify the smallest refactor that delivers the goal.
+
+Prioritize:
+1. Deduplication where 3+ sites have near-identical logic
+2. Extracting a function/module when one is doing two unrelated jobs
+3. Decoupling when a change in A forces a change in B unnecessarily
+4. Renaming when a name actively misleads
+
+Reject:
+- Refactors that touch 10+ files for marginal gain
+- "Modernization" with no concrete benefit
+- Abstraction for future flexibility that may never come
+- Style-only changes
+
+Output:
+- Goal: <one line>
+- Scope: <files affected, count of lines roughly>
+- Plan: numbered steps, each one self-contained
+- Risk: <what tests must pass, what could regress>
+- Skip if: <conditions under which this refactor is not worth doing>`,
+  },
+  {
+    id: 'architect',
+    name: 'Architect',
+    description: 'Designs new features, modules, or architectural changes. Outputs a build plan.',
+    temperature: 0.5,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You design. You produce build plans, not code.
+
+Process:
+1. Restate the goal in your own words. Confirm constraints (perf, deploy, deps).
+2. list_dir the relevant areas. Read existing patterns — match them unless there's a reason not to.
+3. Decide: extend existing code or add new module. Justify.
+4. Sketch the data flow: inputs → transforms → outputs → side effects.
+5. Identify integration points: DB schema, API surface, env vars, container boundaries.
+6. List failure modes and how the design handles them.
+
+Rules:
+- Reuse before inventing. If a service/lib in the repo already does this, say so.
+- Prefer boring tech. New deps require justification.
+- Tailscale IPs for internal routing. No 0.0.0.0 binds.
+- Least privilege: separate read/write paths, explicit auth gates.
+- State assumptions inline. Do not ask clarifying questions mid-design unless blocked.
+
+Output:
+- Goal
+- Existing code to reuse: <file paths>
+- New code: <file paths, one-line purpose each>
+- Data model changes: <SQL or schema diff>
+- API surface: <endpoints, request/response shapes>
+- Failure modes: <list>
+- Build order: numbered, each step 30-90 min`,
+  },
+  {
+    id: 'security-auditor',
+    name: 'Security Auditor',
+    description: 'Audits code for security vulnerabilities. Read-only.',
+    temperature: 0.2,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You audit for security issues. Concrete findings only, no generic warnings.
+
+Process:
+1. Identify the trust boundary: where does untrusted input enter? Where does it leave?
+2. Trace input flow with grep. Mark every transformation.
+3. Check each finding against a real attack scenario.
+
+Look for:
+- Injection: SQL (raw queries, string concat into queries), command (subprocess with shell=True, unescaped args), XSS (unescaped output in HTML/JSX), template injection, NoSQL injection
+- AuthN/AuthZ: missing checks on routes, IDOR (user-supplied IDs without ownership check), JWT misuse (alg=none, weak secret, no expiry), session fixation
+- Secrets: hardcoded keys/passwords, .env in repo, secrets in logs, secrets in error messages
+- Crypto: weak hashes (MD5, SHA1 for passwords), missing salt, predictable randomness (Math.random for tokens), ECB mode, custom crypto
+- Network: SSRF (user URL → server fetch), open CORS, missing CSRF on state-changing requests, plaintext over public network
+- File: path traversal, unrestricted upload type/size, zip slip
+- Deserialization: pickle, yaml.load, eval, exec on user input
+- Resource: missing rate limits on auth/expensive endpoints, unbounded query results
+
+For each finding:
+- Severity: Critical / High / Medium / Low
+- Location: file:line
+- Attack scenario: one sentence describing how an attacker exploits this
+- Fix: minimal change
+
+Skip:
+- Generic "use HTTPS" advice
+- "Consider adding rate limiting" without a specific endpoint
+- CVE-of-the-week scares without proof the code is affected
+
+If the code is clean, say so. Do not invent findings.`,
+  },
+  {
+    id: 'prompt-builder',
+    name: 'Prompt Builder',
+    description: 'Builds prompts for OpenCode, Claude Code, or BooCode dispatch.',
+    temperature: 0.4,
+    tools: [...DEFAULT_TOOLS],
+    model: null,
+    source: 'builtin',
+    system_prompt: `You write prompts that another coding agent will execute. Your output is the prompt, not the work.
+
+Process:
+1. Ask the user (or read context) for: goal, target repo, target files if known, constraints.
+2. list_dir and view_file the target area. Confirm files exist and are roughly the shape you think.
+3. Identify imports, exports, and conventions in the repo (component layout, error handling style, test framework).
+4. Write the prompt.
+
+Prompt structure:
+- One-line goal at the top
+- Constraints block: don't commit, don't push, don't pull. Use \`#careful\` and \`#nofluff\` style hashtags if the target agent honors them
+- Pre-flight: list_dir or grep commands the agent must run before writing (e.g. "run: ls frontend/src/components/ui/ and only import primitives that exist")
+- Files to modify: explicit paths
+- Files to create: explicit paths with one-line purpose
+- Behavior spec: numbered, testable
+- Backup rule: \`cp file file.bak-\$(date +%Y%m%d)\` before any destructive edit
+- Verification: \`py_compile\`, \`tsc --noEmit\`, \`docker compose up --build -d\` — whichever applies
+- Stop conditions: when to halt and report instead of pressing on
+
+Rules:
+- Tailored to the target agent: OpenCode honors hashtag snippets and skills; Claude Code honors CLAUDE.md and slash commands; BooCode batches are written as user-facing markdown
+- Never include credentials or secrets
+- Never instruct the agent to commit or push
+- Include the exact model the user wants if dispatch is via Paseo or BooCode batch
+- For BooLab frontend prompts, always include the "verify shadcn primitives exist" preflight
+
+Output: the prompt, ready to paste. Nothing else.`,
+  },
+];
+
+// ---- AGENTS.md parser ------------------------------------------------------
+
+interface ParsedFrontmatter {
+  temperature?: number;
+  tools?: string[];
+  description?: string;
+  model?: string;
+}
+
+function stripQuotes(s: string): string {
+  if (
+    s.length >= 2 &&
+    (s[0] === '"' || s[0] === "'") &&
+    s[0] === s[s.length - 1]
+  ) {
+    return s.slice(1, -1);
+  }
+  return s;
+}
+
+function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: string[] } {
+  const data: ParsedFrontmatter = {};
+  const errors: string[] = [];
+  const lines = yaml.split('\n');
+  let arrayKey: 'tools' | null = null;
+
+  for (const rawLine of lines) {
+    const line = rawLine.trim();
+    if (line.length === 0) continue;
+
+    // Block-list continuation: "- value" under a key that was set to empty
+    if (arrayKey && line.startsWith('- ')) {
+      data[arrayKey]!.push(line.slice(2).trim());
+      continue;
+    }
+    arrayKey = null;
+
+    const colonIdx = line.indexOf(':');
+    if (colonIdx < 0) continue;
+    const key = line.slice(0, colonIdx).trim();
+    const valueRaw = line.slice(colonIdx + 1).trim();
+
+    if (key === 'temperature') {
+      const n = Number(valueRaw);
+      if (Number.isFinite(n)) data.temperature = n;
+      else errors.push(`temperature must be a number (got "${valueRaw}")`);
+    } else if (key === 'tools') {
+      if (valueRaw === '') {
+        data.tools = [];
+        arrayKey = 'tools';
+      } else if (valueRaw.startsWith('[') && valueRaw.endsWith(']')) {
+        const inner = valueRaw.slice(1, -1);
+        data.tools = inner
+          .split(',')
+          .map((s) => stripQuotes(s.trim()))
+          .filter((s) => s.length > 0);
+      } else {
+        // Loose form: "tools: a, b, c"
+        data.tools = valueRaw
+          .split(',')
+          .map((s) => stripQuotes(s.trim()))
+          .filter((s) => s.length > 0);
+      }
+    } else if (key === 'description') {
+      data.description = stripQuotes(valueRaw);
+    } else if (key === 'model') {
+      data.model = stripQuotes(valueRaw);
+    }
+    // Unknown keys silently ignored — forward-compat.
+  }
+
+  return { data, errors };
+}
+
+interface ParseResult {
+  agents: Agent[];
+  error: string | null;
+}
+
+export function parseAgentsMd(content: string): ParseResult {
+  const errors: string[] = [];
+  const agents: Agent[] = [];
+
+  // Split into per-agent sections by lines that exactly match "## <name>".
+  // Lines starting with "### " (level-3 headings) are not section boundaries.
+  const sections: { name: string; body: string }[] = [];
+  let currentName: string | null = null;
+  let currentLines: string[] = [];
+
+  for (const line of content.split('\n')) {
+    const h2 = /^##\s+(.+?)\s*$/.exec(line);
+    const h3 = line.startsWith('### ');
+    if (h2 && !h3) {
+      if (currentName !== null) {
+        sections.push({ name: currentName, body: currentLines.join('\n') });
+      }
+      currentName = h2[1]!.trim();
+      currentLines = [];
+      continue;
+    }
+    if (currentName !== null) {
+      currentLines.push(line);
+    }
+  }
+  if (currentName !== null) {
+    sections.push({ name: currentName, body: currentLines.join('\n') });
+  }
+
+  for (const section of sections) {
+    const lines = section.body.split('\n');
+    // Opening "---" fence must be the first non-empty line (blank lines allowed).
+    let openIdx = -1;
+    for (let i = 0; i < lines.length; i++) {
+      const t = lines[i]!.trim();
+      if (t === '') continue;
+      if (t === '---') {
+        openIdx = i;
+      }
+      break;
+    }
+    if (openIdx < 0) {
+      errors.push(`agent "${section.name}": missing opening --- fence after heading`);
+      continue;
+    }
+    let closeIdx = -1;
+    for (let i = openIdx + 1; i < lines.length; i++) {
+      if (lines[i]!.trim() === '---') {
+        closeIdx = i;
+        break;
+      }
+    }
+    if (closeIdx < 0) {
+      errors.push(`agent "${section.name}": missing closing --- fence`);
+      continue;
+    }
+    const yamlText = lines.slice(openIdx + 1, closeIdx).join('\n');
+    const systemPrompt = lines.slice(closeIdx + 1).join('\n').trim();
+
+    const { data: fm, errors: fmErrors } = parseFrontmatter(yamlText);
+    if (fmErrors.length > 0) {
+      errors.push(`agent "${section.name}": ${fmErrors.join('; ')}`);
+      continue;
+    }
+
+    const filteredTools = Array.isArray(fm.tools)
+      ? fm.tools.filter((t): t is string =>
+          (ALL_TOOL_NAMES as readonly string[]).includes(t)
+        )
+      : DEFAULT_TOOLS;
+
+    agents.push({
+      id: slugify(section.name),
+      name: section.name,
+      description: fm.description ?? '',
+      system_prompt: systemPrompt,
+      temperature: typeof fm.temperature === 'number' ? fm.temperature : DEFAULT_TEMPERATURE,
+      tools: filteredTools,
+      model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
+      source: 'file',
+    });
+  }
+
+  return { agents, error: errors.length > 0 ? errors.join('; ') : null };
+}
+
+// ---- mtime-keyed cache + public API ----------------------------------------
+
+interface CacheEntry {
+  mtimeMs: number;
+  result: AgentsResponse;
+}
+
+const cache = new Map<string, CacheEntry>();
+
+// Test/admin: force re-parse on next call for a project (or all projects).
+export function invalidateAgentsCache(projectPath?: string): void {
+  if (projectPath === undefined) {
+    cache.clear();
+  } else {
+    cache.delete(projectPath);
+  }
+}
+
+export async function getAgentsForProject(projectPath: string): Promise<AgentsResponse> {
+  const agentsPath = join(projectPath, 'AGENTS.md');
+  let mtimeMs: number;
+  try {
+    const s = await fs.stat(agentsPath);
+    mtimeMs = s.mtimeMs;
+  } catch {
+    // No AGENTS.md → builtins, no parse error
+    cache.delete(projectPath);
+    return { agents: BUILTIN_AGENTS, parse_error: null };
+  }
+
+  const cached = cache.get(projectPath);
+  if (cached && cached.mtimeMs === mtimeMs) {
+    return cached.result;
+  }
+
+  let content: string;
+  try {
+    content = await fs.readFile(agentsPath, 'utf8');
+  } catch {
+    cache.delete(projectPath);
+    return { agents: BUILTIN_AGENTS, parse_error: null };
+  }
+
+  const parsed = parseAgentsMd(content);
+  let result: AgentsResponse;
+  if (parsed.error) {
+    // Parse error: surface in API, fall back to builtins
+    result = { agents: BUILTIN_AGENTS, parse_error: parsed.error };
+  } else if (parsed.agents.length === 0) {
+    // Empty / no headings → builtins
+    result = { agents: BUILTIN_AGENTS, parse_error: null };
+  } else {
+    // At least one valid agent → file-defined agents win, builtins hidden
+    result = { agents: parsed.agents, parse_error: null };
+  }
+
+  cache.set(projectPath, { mtimeMs, result });
+  return result;
+}
+
+export async function getAgentById(
+  projectPath: string,
+  agentId: string
+): Promise<Agent | null> {
+  const { agents } = await getAgentsForProject(projectPath);
+  return agents.find((a) => a.id === agentId) ?? null;
+}
+
+export { BUILTIN_AGENTS };