v1.8.2: tool loop cap-hit summary + tool call UI compaction

Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for read-only-only agents, 10 for agents that include any non-read-only tool, 15 for raw chat. When the loop hits cap, fire one final summary call with tools disabled, stream the wrap-up into the in-flight assistant message, then insert a system sentinel with metadata.kind='cap_hit'. The sentinel renders an amber bubble with a Continue button (latest sentinel only) that POSTs to a new /api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per chat (2 continues max) — third sentinel reports can_continue=false. Error frames carry a machine-readable reason code alongside human error text. Failed messages persist the reason via metadata.kind='error' so the bubble renders specifics on reload (WS error frame is one-shot). Tool call UI rewired: ToolCallLine renders inline (↳ name args spinner/check/✗, expand-on-tap for args+result); ToolCallGroup collapses 3+ consecutive same-tool runs into a compact card. MessageList owns a three-pass pre-render (flatten + fold tool results onto matching runs by id + group same-tool runs + number sentinels). MessageBubble drops tool rendering and adds the sentinel / error-reason branches. ToolCallCard deleted. Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6 agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for discoverability (defaults handle behavior identically). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gitignore data/ for global AGENTS.md
2026-05-17 10:31:32 +00:00 · 2026-05-16 23:50:47 +00:00 · 2026-05-16 23:16:38 +00:00 · 2026-05-16 23:16:02 +00:00
25 changed files with 1478 additions and 506 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,4 @@ dist
 .vite
 coverage
 secrets/
+data/
--- a/apps/server/src/routes/chats.ts
+++ b/apps/server/src/routes/chats.ts
@@ -231,7 +231,7 @@ export function registerChatRoutes(
          INSERT INTO messages (
            session_id, chat_id, role, content, kind, tool_calls, tool_results,
            status, tokens_used, ctx_used, ctx_max, started_at, finished_at,
-            created_at
+            created_at, metadata
          )
          SELECT
            ${source.session_id}, ${chat!.id}, role, content, kind,
@@ -239,7 +239,8 @@ export function registerChatRoutes(
            tokens_used, ctx_used, ctx_max, started_at, finished_at,
            clock_timestamp() + (
              ROW_NUMBER() OVER (ORDER BY created_at ASC, id ASC) * INTERVAL '1 microsecond'
-            )
+            ),
+            metadata
          FROM messages
          WHERE chat_id = ${source.id}
            AND created_at <= ${target.created_at}::timestamptz
@@ -268,7 +269,7 @@ export function registerChatRoutes(
      }
      const rows = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
-               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
+               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata
        FROM messages
        WHERE chat_id = ${req.params.id}
        ORDER BY created_at ASC, id ASC
--- a/apps/server/src/routes/messages.ts
+++ b/apps/server/src/routes/messages.ts
@@ -7,6 +7,13 @@ const SendBody = z.object({
  content: z.string().min(1).max(64_000),
 });

+// v1.8.2: Continue extends an inference loop that hit the tool budget. Caller
+// passes the sentinel message it's continuing from; server validates shape
+// and the per-chat hard ceiling before resuming.
+const ContinueBody = z.object({
+  sentinel_message_id: z.string().uuid(),
+});
+
 interface MessageHandlers {
  enqueueInference: (sessionId: string, chatId: string, assistantMessageId: string, user: string) => void;
  enqueueCompact: (sessionId: string, chatId: string, compactMessageId: string, user: string) => void;
@@ -36,7 +43,7 @@ export function registerMessageRoutes(
      }
      const rows = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
-               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
+               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata
        FROM messages
        WHERE session_id = ${req.params.id}
        ORDER BY created_at ASC, id ASC
@@ -253,6 +260,76 @@ export function registerMessageRoutes(
    }
  );

+  app.post<{ Params: { id: string } }>(
+    '/api/chats/:id/continue',
+    async (req, reply) => {
+      const parsed = ContinueBody.safeParse(req.body);
+      if (!parsed.success) {
+        reply.code(400);
+        return { error: 'invalid body', details: parsed.error.flatten() };
+      }
+
+      const chatRows = await sql<Chat[]>`
+        SELECT id, session_id FROM chats WHERE id = ${req.params.id} AND status = 'open'
+      `;
+      if (chatRows.length === 0) {
+        reply.code(404);
+        return { error: 'chat not found' };
+      }
+      const chat = chatRows[0]!;
+      const sessionId = chat.session_id;
+
+      // Cap-hit sentinels are only ever inserted after a turn completes, so
+      // there must not be an active inference at this moment. If there is,
+      // the client is racing the cap-hit summary that just emitted the
+      // sentinel — bail rather than enqueue a parallel run.
+      if (handlers.hasActiveInference(chat.id)) {
+        reply.code(409);
+        return { error: 'chat is currently streaming' };
+      }
+
+      const sentinel = await sql<{ metadata: { kind?: unknown; can_continue?: unknown } | null }[]>`
+        SELECT metadata
+        FROM messages
+        WHERE id = ${parsed.data.sentinel_message_id}
+          AND chat_id = ${chat.id}
+          AND role = 'system'
+      `;
+      if (sentinel.length === 0) {
+        reply.code(404);
+        return { error: 'sentinel not found' };
+      }
+      const meta = sentinel[0]!.metadata;
+      if (!meta || meta.kind !== 'cap_hit') {
+        reply.code(400);
+        return { error: 'message is not a cap-hit sentinel' };
+      }
+      // Server-side hard ceiling check. UI already disables the button when
+      // can_continue is false; defending against a stale tab or a direct
+      // API hit is the only reason this lives on the server too.
+      if (meta.can_continue !== true) {
+        reply.code(409);
+        return { error: 'hard limit reached for this chat' };
+      }
+
+      const result = await sql.begin(async (tx) => {
+        const [assistantMsg] = await tx<{ id: string }[]>`
+          INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
+          VALUES (${sessionId}, ${chat.id}, 'assistant', '', 'streaming', clock_timestamp())
+          RETURNING id
+        `;
+        await tx`UPDATE sessions SET updated_at = clock_timestamp() WHERE id = ${sessionId}`;
+        await tx`UPDATE chats SET updated_at = clock_timestamp() WHERE id = ${chat.id}`;
+        return { assistant_message_id: assistantMsg!.id };
+      });
+
+      handlers.enqueueInference(sessionId, chat.id, result.assistant_message_id, 'default');
+
+      reply.code(202);
+      return result;
+    }
+  );
+
  app.post<{ Params: { id: string } }>(
    '/api/chats/:id/force_send',
    async (req, reply) => {
--- a/apps/server/src/routes/ws.ts
+++ b/apps/server/src/routes/ws.ts
@@ -23,7 +23,7 @@ export function registerWebSocket(

      const messages = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
-               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
+               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata
        FROM messages
        WHERE session_id = ${sessionId}
        ORDER BY created_at ASC, id ASC
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -158,3 +158,10 @@ END $$;
 -- the DB; they live in builtins (services/agents.ts) and a per-project AGENTS.md.
 -- agent_id is the slugified agent name. NULL means "use BooCode defaults".
 ALTER TABLE sessions ADD COLUMN IF NOT EXISTS agent_id TEXT;
+
+-- v1.8.2: per-message metadata for sentinels (cap-hit) and structured error
+-- reasons. JSONB so future kinds can extend without further schema churn.
+-- Shape for cap_hit:  { kind: 'cap_hit', used: number, limit: number,
+--                       agent_name: string|null, can_continue: boolean }
+-- Shape for errors:   { error_reason: 'llm_provider_error'|..., error_text: string }
+ALTER TABLE messages ADD COLUMN IF NOT EXISTS metadata JSONB;
--- a/apps/server/src/services/tests/inference.test.ts
+++ b/apps/server/src/services/tests/inference.test.ts
@@ -21,6 +21,7 @@ function makeSession(overrides: Partial<Session> = {}): Session {
    status: 'open',
    created_at: new Date(0).toISOString(),
    updated_at: new Date(0).toISOString(),
+    agent_id: null,
    ...overrides,
  };
 }
@@ -62,6 +63,7 @@ function makeMessage(
    started_at: null,
    finished_at: null,
    created_at: new Date(counter * 1000).toISOString(),
+    metadata: null,
    ...overrides,
  };
 }
--- a/apps/server/src/services/agents.ts
+++ b/apps/server/src/services/agents.ts
@@ -1,9 +1,17 @@
 import { promises as fs } from 'node:fs';
 import { join } from 'node:path';
-import type { Agent, AgentsResponse } from '../types/api.js';
+import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
+
+// v1.8.1: global agents live at /data/AGENTS.md inside the container
+// (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
+// root overrides global by name. In-code builtins are gone — the seed file is
+// the contents of the previous BUILTIN_AGENTS list, copied into /data/AGENTS.md
+// once on first deploy.
+const GLOBAL_AGENTS_PATH = '/data/AGENTS.md';
+const CACHE_TTL_MS = 60_000;

 // Tools whitelist universe matches services/tools.ts ALL_TOOLS. Keep in sync.
-const ALL_TOOL_NAMES = ['view_file', 'list_dir', 'grep', 'find_files'] as const;
+const ALL_TOOL_NAMES = ['view_file', 'list_dir', 'grep', 'find_files', 'git_status'] as const;
 const DEFAULT_TOOLS: string[] = [...ALL_TOOL_NAMES];
 const DEFAULT_TEMPERATURE = 0.7;

@@ -14,214 +22,6 @@ export function slugify(name: string): string {
    .replace(/^-+|-+$/g, '');
 }

-// Six builtin defaults. model is intentionally null — session.model wins.
-// Match AGENTS.md format; system prompts are verbatim.
-const BUILTIN_AGENTS: Agent[] = [
-  {
-    id: 'code-reviewer',
-    name: 'Code Reviewer',
-    description: 'Reviews code for bugs, security issues, and maintainability. Read-only.',
-    temperature: 0.3,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You review code. Find real problems, not style nits.
-
-Process:
-1. Read the file(s) in question with view_file. If a diff is provided, read surrounding context too.
-2. Use grep/find_files to check how changed symbols are used elsewhere.
-3. Cite every finding as file:line.
-
-Prioritize in order:
-1. Bugs and logic errors
-2. Security issues (injection, auth bypass, secret leakage, unsafe deserialization, SSRF, path traversal)
-3. Race conditions, error handling, resource leaks
-4. Performance issues with measurable impact
-5. Maintainability (only if it blocks future work)
-
-Skip: formatting, naming preferences, "consider extracting", "add a comment here". The user has a linter.
-
-Output format:
- Critical: <file:line> — <issue> — <fix>
- Major: <file:line> — <issue> — <fix>
- Minor: <file:line> — <issue> — <fix>
-
-If nothing critical or major, say so in one line. Do not pad.`,
-  },
-  {
-    id: 'debugger',
-    name: 'Debugger',
-    description: 'Diagnoses bugs from error messages, logs, or described symptoms.',
-    temperature: 0.2,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You diagnose bugs. Form a hypothesis, prove it with evidence from the code.
-
-Process:
-1. Restate the symptom in one line. Confirm you understand it.
-2. Read the error/stacktrace. Identify the exact frame where things go wrong.
-3. view_file on that frame. Read 50 lines around it.
-4. grep for callers, related state, recent changes that could explain it.
-5. State the root cause with file:line evidence.
-6. Propose the minimal fix. Note any side effects.
-
-Rules:
- Never guess. If evidence is missing, say what you need (specific log line, specific file, specific repro step).
- Distinguish symptom from cause. A null check fixes the symptom; missing init causes it.
- Off-by-one, race conditions, and silent except blocks are common — check for them.
- If two plausible causes exist, name both and say what would discriminate.
-
-Output:
- Symptom: <one line>
- Root cause: <file:line> — <explanation>
- Fix: <minimal diff or description>
- Risk: <what could break>`,
-  },
-  {
-    id: 'refactorer',
-    name: 'Refactorer',
-    description: 'Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits.',
-    temperature: 0.3,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You propose refactors. You do not apply them. The user applies via OpenCode or Claude Code.
-
-Process:
-1. Read the target file(s).
-2. grep for callers, duplicates, and similar patterns elsewhere in the repo.
-3. Identify the smallest refactor that delivers the goal.
-
-Prioritize:
-1. Deduplication where 3+ sites have near-identical logic
-2. Extracting a function/module when one is doing two unrelated jobs
-3. Decoupling when a change in A forces a change in B unnecessarily
-4. Renaming when a name actively misleads
-
-Reject:
- Refactors that touch 10+ files for marginal gain
- "Modernization" with no concrete benefit
- Abstraction for future flexibility that may never come
- Style-only changes
-
-Output:
- Goal: <one line>
- Scope: <files affected, count of lines roughly>
- Plan: numbered steps, each one self-contained
- Risk: <what tests must pass, what could regress>
- Skip if: <conditions under which this refactor is not worth doing>`,
-  },
-  {
-    id: 'architect',
-    name: 'Architect',
-    description: 'Designs new features, modules, or architectural changes. Outputs a build plan.',
-    temperature: 0.5,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You design. You produce build plans, not code.
-
-Process:
-1. Restate the goal in your own words. Confirm constraints (perf, deploy, deps).
-2. list_dir the relevant areas. Read existing patterns — match them unless there's a reason not to.
-3. Decide: extend existing code or add new module. Justify.
-4. Sketch the data flow: inputs → transforms → outputs → side effects.
-5. Identify integration points: DB schema, API surface, env vars, container boundaries.
-6. List failure modes and how the design handles them.
-
-Rules:
- Reuse before inventing. If a service/lib in the repo already does this, say so.
- Prefer boring tech. New deps require justification.
- Tailscale IPs for internal routing. No 0.0.0.0 binds.
- Least privilege: separate read/write paths, explicit auth gates.
- State assumptions inline. Do not ask clarifying questions mid-design unless blocked.
-
-Output:
- Goal
- Existing code to reuse: <file paths>
- New code: <file paths, one-line purpose each>
- Data model changes: <SQL or schema diff>
- API surface: <endpoints, request/response shapes>
- Failure modes: <list>
- Build order: numbered, each step 30-90 min`,
-  },
-  {
-    id: 'security-auditor',
-    name: 'Security Auditor',
-    description: 'Audits code for security vulnerabilities. Read-only.',
-    temperature: 0.2,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You audit for security issues. Concrete findings only, no generic warnings.
-
-Process:
-1. Identify the trust boundary: where does untrusted input enter? Where does it leave?
-2. Trace input flow with grep. Mark every transformation.
-3. Check each finding against a real attack scenario.
-
-Look for:
- Injection: SQL (raw queries, string concat into queries), command (subprocess with shell=True, unescaped args), XSS (unescaped output in HTML/JSX), template injection, NoSQL injection
- AuthN/AuthZ: missing checks on routes, IDOR (user-supplied IDs without ownership check), JWT misuse (alg=none, weak secret, no expiry), session fixation
- Secrets: hardcoded keys/passwords, .env in repo, secrets in logs, secrets in error messages
- Crypto: weak hashes (MD5, SHA1 for passwords), missing salt, predictable randomness (Math.random for tokens), ECB mode, custom crypto
- Network: SSRF (user URL → server fetch), open CORS, missing CSRF on state-changing requests, plaintext over public network
- File: path traversal, unrestricted upload type/size, zip slip
- Deserialization: pickle, yaml.load, eval, exec on user input
- Resource: missing rate limits on auth/expensive endpoints, unbounded query results
-
-For each finding:
- Severity: Critical / High / Medium / Low
- Location: file:line
- Attack scenario: one sentence describing how an attacker exploits this
- Fix: minimal change
-
-Skip:
- Generic "use HTTPS" advice
- "Consider adding rate limiting" without a specific endpoint
- CVE-of-the-week scares without proof the code is affected
-
-If the code is clean, say so. Do not invent findings.`,
-  },
-  {
-    id: 'prompt-builder',
-    name: 'Prompt Builder',
-    description: 'Builds prompts for OpenCode, Claude Code, or BooCode dispatch.',
-    temperature: 0.4,
-    tools: [...DEFAULT_TOOLS],
-    model: null,
-    source: 'builtin',
-    system_prompt: `You write prompts that another coding agent will execute. Your output is the prompt, not the work.
-
-Process:
-1. Ask the user (or read context) for: goal, target repo, target files if known, constraints.
-2. list_dir and view_file the target area. Confirm files exist and are roughly the shape you think.
-3. Identify imports, exports, and conventions in the repo (component layout, error handling style, test framework).
-4. Write the prompt.
-
-Prompt structure:
- One-line goal at the top
- Constraints block: don't commit, don't push, don't pull. Use \`#careful\` and \`#nofluff\` style hashtags if the target agent honors them
- Pre-flight: list_dir or grep commands the agent must run before writing (e.g. "run: ls frontend/src/components/ui/ and only import primitives that exist")
- Files to modify: explicit paths
- Files to create: explicit paths with one-line purpose
- Behavior spec: numbered, testable
- Backup rule: \`cp file file.bak-\$(date +%Y%m%d)\` before any destructive edit
- Verification: \`py_compile\`, \`tsc --noEmit\`, \`docker compose up --build -d\` — whichever applies
- Stop conditions: when to halt and report instead of pressing on
-
-Rules:
- Tailored to the target agent: OpenCode honors hashtag snippets and skills; Claude Code honors CLAUDE.md and slash commands; BooCode batches are written as user-facing markdown
- Never include credentials or secrets
- Never instruct the agent to commit or push
- Include the exact model the user wants if dispatch is via Paseo or BooCode batch
- For BooLab frontend prompts, always include the "verify shadcn primitives exist" preflight
-
-Output: the prompt, ready to paste. Nothing else.`,
-  },
-];
-
 // ---- AGENTS.md parser ------------------------------------------------------

 interface ParsedFrontmatter {
@@ -229,6 +29,9 @@ interface ParsedFrontmatter {
  tools?: string[];
  description?: string;
  model?: string;
+  // v1.8.2: optional per-agent tool-loop budget. Absent → inference resolves
+  // from the agent's toolset at runtime.
+  max_tool_calls?: number;
 }

 function stripQuotes(s: string): string {
@@ -289,6 +92,21 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
      data.description = stripQuotes(valueRaw);
    } else if (key === 'model') {
      data.model = stripQuotes(valueRaw);
+    } else if (key === 'max_tool_calls') {
+      // v1.8.2: 1..100 inclusive integer. Out-of-range values are skipped
+      // with a warning rather than throwing — agents shouldn't be unusable
+      // because of a typo on a defaulted field. Non-numeric or non-integer
+      // still hard-fails the block, matching `temperature` behavior.
+      const n = Number(valueRaw);
+      if (Number.isInteger(n) && n >= 1 && n <= 100) {
+        data.max_tool_calls = n;
+      } else if (Number.isInteger(n)) {
+        console.warn(
+          `agents: max_tool_calls ${n} out of range 1-100, ignoring (falling back to default)`,
+        );
+      } else {
+        errors.push(`max_tool_calls must be an integer 1-100 (got "${valueRaw}")`);
+      }
    }
    // Unknown keys silently ignored — forward-compat.
  }
@@ -296,18 +114,14 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
  return { data, errors };
 }

-interface ParseResult {
-  agents: Agent[];
-  error: string | null;
+interface RawSection {
+  name: string;
+  body: string;
 }

-export function parseAgentsMd(content: string): ParseResult {
-  const errors: string[] = [];
-  const agents: Agent[] = [];
-
-  // Split into per-agent sections by lines that exactly match "## <name>".
-  // Lines starting with "### " (level-3 headings) are not section boundaries.
-  const sections: { name: string; body: string }[] = [];
+function splitSections(content: string): RawSection[] {
+  // Split by lines matching exactly "## <name>". Level-3+ headings are body content.
+  const sections: RawSection[] = [];
  let currentName: string | null = null;
  let currentLines: string[] = [];

@@ -329,10 +143,14 @@ export function parseAgentsMd(content: string): ParseResult {
  if (currentName !== null) {
    sections.push({ name: currentName, body: currentLines.join('\n') });
  }
+  return sections;
+}

-  for (const section of sections) {
+// Throws on malformed section — caller handles per-block error collection.
+function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
  const lines = section.body.split('\n');
-    // Opening "---" fence must be the first non-empty line (blank lines allowed).
+
+  // Opening "---" fence must be the first non-empty line.
  let openIdx = -1;
  for (let i = 0; i < lines.length; i++) {
    const t = lines[i]!.trim();
@@ -343,8 +161,7 @@ export function parseAgentsMd(content: string): ParseResult {
    break;
  }
  if (openIdx < 0) {
-      errors.push(`agent "${section.name}": missing opening --- fence after heading`);
-      continue;
+    throw new Error('missing opening --- fence after heading');
  }
  let closeIdx = -1;
  for (let i = openIdx + 1; i < lines.length; i++) {
@@ -354,25 +171,23 @@ export function parseAgentsMd(content: string): ParseResult {
    }
  }
  if (closeIdx < 0) {
-      errors.push(`agent "${section.name}": missing closing --- fence`);
-      continue;
+    throw new Error('missing closing --- fence');
  }
  const yamlText = lines.slice(openIdx + 1, closeIdx).join('\n');
  const systemPrompt = lines.slice(closeIdx + 1).join('\n').trim();

  const { data: fm, errors: fmErrors } = parseFrontmatter(yamlText);
  if (fmErrors.length > 0) {
-      errors.push(`agent "${section.name}": ${fmErrors.join('; ')}`);
-      continue;
+    throw new Error(fmErrors.join('; '));
  }

  const filteredTools = Array.isArray(fm.tools)
    ? fm.tools.filter((t): t is string =>
-          (ALL_TOOL_NAMES as readonly string[]).includes(t)
+        (ALL_TOOL_NAMES as readonly string[]).includes(t),
      )
    : DEFAULT_TOOLS;

-    agents.push({
+  return {
    id: slugify(section.name),
    name: section.name,
    description: fm.description ?? '',
@@ -380,23 +195,50 @@ export function parseAgentsMd(content: string): ParseResult {
    temperature: typeof fm.temperature === 'number' ? fm.temperature : DEFAULT_TEMPERATURE,
    tools: filteredTools,
    model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
-      source: 'file',
-    });
+    max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null,
+  };
+}
+
+interface ParseResult {
+  agents: Omit<Agent, 'source'>[];
+  errors: AgentParseError[];
+}
+
+// v1.8.1: parse each `## Name` block independently. A failure in one block
+// does not abort the rest of the file — we collect a per-agent error and
+// keep parsing. Server logs a console.warn for each skipped agent.
+export function parseAgentsMd(content: string): ParseResult {
+  const sections = splitSections(content);
+  const agents: Omit<Agent, 'source'>[] = [];
+  const errors: AgentParseError[] = [];
+
+  for (const section of sections) {
+    try {
+      agents.push(parseAgentSection(section));
+    } catch (err) {
+      const reason = err instanceof Error ? err.message : String(err);
+      console.warn(`agents: skipped "${section.name}" — ${reason}`);
+      errors.push({ agent_name: section.name, reason });
+    }
  }

-  return { agents, error: errors.length > 0 ? errors.join('; ') : null };
+  return { agents, errors };
 }

 // ---- mtime-keyed cache + public API ----------------------------------------

 interface CacheEntry {
-  mtimeMs: number;
+  globalMtime: number | null;
+  projectMtime: number | null;
+  cachedAt: number;
  result: AgentsResponse;
 }

+// Keyed by projectPath ('' is fine — no project case, e.g. tests). Two files
+// participate in the cache key (global + project); editing either bumps the
+// corresponding mtime so the next read sees a miss without a watcher.
 const cache = new Map<string, CacheEntry>();

-// Test/admin: force re-parse on next call for a project (or all projects).
 export function invalidateAgentsCache(projectPath?: string): void {
  if (projectPath === undefined) {
    cache.clear();
@@ -405,54 +247,74 @@ export function invalidateAgentsCache(projectPath?: string): void {
  }
 }

-export async function getAgentsForProject(projectPath: string): Promise<AgentsResponse> {
-  const agentsPath = join(projectPath, 'AGENTS.md');
-  let mtimeMs: number;
+async function safeStat(path: string): Promise<number | null> {
  try {
-    const s = await fs.stat(agentsPath);
-    mtimeMs = s.mtimeMs;
+    const s = await fs.stat(path);
+    return s.mtimeMs;
  } catch {
-    // No AGENTS.md → builtins, no parse error
-    cache.delete(projectPath);
-    return { agents: BUILTIN_AGENTS, parse_error: null };
+    return null;
  }
+}

-  const cached = cache.get(projectPath);
-  if (cached && cached.mtimeMs === mtimeMs) {
+async function safeRead(path: string): Promise<string | null> {
+  try {
+    return await fs.readFile(path, 'utf8');
+  } catch {
+    return null;
+  }
+}
+
+export async function getAgentsForProject(projectPath: string): Promise<AgentsResponse> {
+  const projectAgentsPath = projectPath ? join(projectPath, 'AGENTS.md') : null;
+
+  const [globalMtime, projectMtime] = await Promise.all([
+    safeStat(GLOBAL_AGENTS_PATH),
+    projectAgentsPath ? safeStat(projectAgentsPath) : Promise.resolve(null),
+  ]);
+
+  const cacheKey = projectPath || '__none__';
+  const cached = cache.get(cacheKey);
+  const now = Date.now();
+  if (
+    cached &&
+    cached.globalMtime === globalMtime &&
+    cached.projectMtime === projectMtime &&
+    now - cached.cachedAt < CACHE_TTL_MS
+  ) {
    return cached.result;
  }

-  let content: string;
-  try {
-    content = await fs.readFile(agentsPath, 'utf8');
-  } catch {
-    cache.delete(projectPath);
-    return { agents: BUILTIN_AGENTS, parse_error: null };
+  const [globalContent, projectContent] = await Promise.all([
+    globalMtime !== null ? safeRead(GLOBAL_AGENTS_PATH) : Promise.resolve(null),
+    projectAgentsPath && projectMtime !== null ? safeRead(projectAgentsPath) : Promise.resolve(null),
+  ]);
+
+  const errors: AgentParseError[] = [];
+  const byName = new Map<string, Agent>();
+
+  if (globalContent !== null) {
+    const r = parseAgentsMd(globalContent);
+    for (const a of r.agents) byName.set(a.name, { ...a, source: 'global' });
+    errors.push(...r.errors);
+  }
+  if (projectContent !== null) {
+    const r = parseAgentsMd(projectContent);
+    for (const a of r.agents) byName.set(a.name, { ...a, source: 'project' });
+    errors.push(...r.errors);
  }

-  const parsed = parseAgentsMd(content);
-  let result: AgentsResponse;
-  if (parsed.error) {
-    // Parse error: surface in API, fall back to builtins
-    result = { agents: BUILTIN_AGENTS, parse_error: parsed.error };
-  } else if (parsed.agents.length === 0) {
-    // Empty / no headings → builtins
-    result = { agents: BUILTIN_AGENTS, parse_error: null };
-  } else {
-    // At least one valid agent → file-defined agents win, builtins hidden
-    result = { agents: parsed.agents, parse_error: null };
-  }
-
-  cache.set(projectPath, { mtimeMs, result });
+  const result: AgentsResponse = {
+    agents: Array.from(byName.values()),
+    errors,
+  };
+  cache.set(cacheKey, { globalMtime, projectMtime, cachedAt: now, result });
  return result;
 }

 export async function getAgentById(
  projectPath: string,
-  agentId: string
+  agentId: string,
 ): Promise<Agent | null> {
  const { agents } = await getAgentsForProject(projectPath);
  return agents.find((a) => a.id === agentId) ?? null;
 }
-
-export { BUILTIN_AGENTS };
--- a/apps/server/src/services/inference.ts
+++ b/apps/server/src/services/inference.ts
@@ -1,8 +1,23 @@
 import type { FastifyBaseLogger } from 'fastify';
 import type { Sql } from '../db.js';
 import type { Config } from '../config.js';
-import type { Agent, Message, Project, Session, ToolCall, UserStreamFrame } from '../types/api.js';
-import { ALL_TOOLS, TOOLS_BY_NAME, toolJsonSchemas, type ToolJsonSchema } from './tools.js';
+import type {
+  Agent,
+  ErrorReason,
+  Message,
+  MessageMetadata,
+  Project,
+  Session,
+  ToolCall,
+  UserStreamFrame,
+} from '../types/api.js';
+import {
+  ALL_TOOLS,
+  READ_ONLY_TOOL_NAMES,
+  TOOLS_BY_NAME,
+  toolJsonSchemas,
+  type ToolJsonSchema,
+} from './tools.js';
 import { PathScopeError, resolveProjectRoot } from './path_guard.js';
 import { maybeAutoNameChat } from './auto_name.js';
 import { getAgentById } from './agents.js';
@@ -11,7 +26,39 @@ const BASE_SYSTEM_PROMPT = (projectPath: string) =>
  `You are BooCode Chat, a code investigation assistant. The user is working on a project located at ${projectPath}. Use the file-read tools (view_file, list_dir, grep, find_files) to investigate code when needed. Be concise. Cite file paths and line numbers when discussing code. Do not hallucinate file contents — read the file first. Tool results may be truncated; if so, narrow your query rather than guessing.`;

 const DB_FLUSH_INTERVAL_MS = 500;
-const MAX_TOOL_LOOP_DEPTH = 15;
+
+// v1.8.2: tool-call budget defaults. Resolved per-turn by resolveToolBudget.
+//   - Agent with explicit max_tool_calls: that value.
+//   - Agent with read-only-only tools:    BUDGET_READ_ONLY (30).
+//   - Agent with any non-read-only tool:  BUDGET_NON_READ_ONLY (10).
+//   - No agent (raw chat):                BUDGET_NO_AGENT (15).
+const BUDGET_READ_ONLY = 30;
+const BUDGET_NON_READ_ONLY = 10;
+const BUDGET_NO_AGENT = 15;
+
+const READ_ONLY_SET: ReadonlySet<string> = new Set(READ_ONLY_TOOL_NAMES);
+
+function resolveToolBudget(agent: Agent | null): number {
+  if (agent?.max_tool_calls != null) return agent.max_tool_calls;
+  if (!agent) return BUDGET_NO_AGENT;
+  const allReadOnly = agent.tools.every((t) => READ_ONLY_SET.has(t));
+  return allReadOnly ? BUDGET_READ_ONLY : BUDGET_NON_READ_ONLY;
+}
+
+// Synthetic system note appended to the cap-hit summary call. Verbatim from
+// the v1.8.2 spec — do not paraphrase: the model is more reliable when the
+// instruction is short, declarative, and identical across calls.
+const CAP_HIT_SUMMARY_NOTE = (limit: number) =>
+  `You've reached the tool budget (${limit} calls). Produce the best answer you can with what you have. Do not call more tools.`;
+
+function isCapHitSentinel(m: Message): boolean {
+  return (
+    m.role === 'system' &&
+    m.metadata !== null &&
+    typeof m.metadata === 'object' &&
+    (m.metadata as { kind?: unknown }).kind === 'cap_hit'
+  );
+}

 export interface InferenceFrame {
  type:
@@ -29,12 +76,22 @@ export interface InferenceFrame {
  chat_id?: string;
  tool_message_id?: string;
  tool_call_id?: string;
-  role?: 'assistant' | 'tool' | 'user';
+  // v1.8.2: 'system' added so cap-hit sentinel messages can announce themselves
+  // through the normal message_started → delta → message_complete sequence.
+  role?: 'assistant' | 'tool' | 'user' | 'system';
  content?: string;
  tool_call?: ToolCall;
  output?: unknown;
  truncated?: boolean;
  error?: string;
+  // v1.8.2: structured error reason. Set on `type: 'error'` so the UI can
+  // surface a specific message; `error` stays the human-readable text.
+  reason?: ErrorReason;
+  // v1.8.2: piggybacks on `message_complete` so static or terminally-resolved
+  // messages can carry their persisted metadata to the live stream without a
+  // refetch (sentinels carry { kind: 'cap_hit', ... }; failed messages carry
+  // { kind: 'error', ... }).
+  metadata?: MessageMetadata | null;
  tokens_used?: number | null;
  ctx_used?: number | null;
  ctx_max?: number | null;
@@ -135,6 +192,11 @@ export function buildMessagesPayload(
      out.push({ role: 'system', content: m.content });
      continue;
    }
+    // v1.8.2: cap-hit sentinels are UI-only — never send them to the LLM. The
+    // synthetic "you've reached the tool budget" note lives only inside the
+    // summary call's messages array and is never persisted, so on Continue
+    // the model resumes with a clean context.
+    if (isCapHitSentinel(m)) continue;
    if (m.role === 'assistant' && m.status === 'streaming') continue;
    if (m.role === 'assistant' && m.status === 'cancelled') continue;
    if (m.role === 'tool') {
@@ -193,7 +255,7 @@ async function loadContext(

  const history = await sql<Message[]>`
    SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
-           tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
+           tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata
    FROM messages
    WHERE chat_id = ${chatId}
    ORDER BY created_at ASC, id ASC
@@ -379,7 +441,10 @@ interface TurnArgs {
  sessionId: string;
  chatId: string;
  assistantMessageId: string;
-  depth: number;
+  // v1.8.2: cumulative tool calls executed this run. Compared against the
+  // resolved budget at the top of each turn. Replaces the older `depth`
+  // counter (which counted iterations, not invocations).
+  toolsUsed: number;
  signal: AbortSignal | undefined;
 }

@@ -480,6 +545,24 @@ async function handleAbortOrError(
  const { sessionId, chatId, assistantMessageId } = args;
  const isAbort = err instanceof Error && err.name === 'AbortError';
  const finalStatus = isAbort ? 'cancelled' : 'failed';
+  const errMsg = err instanceof Error ? err.message : String(err);
+  // v1.8.2: persist a structured error metadata blob on genuine failures so
+  // the bubble can render the reason on reload without re-deriving from the
+  // (one-shot) WS error frame. User-initiated abort skips this — there's no
+  // "reason" to surface for a stop the user already explicitly chose.
+  const errorMetadata: MessageMetadata | null = isAbort
+    ? null
+    : { kind: 'error', error_reason: 'llm_provider_error', error_text: errMsg };
+  if (errorMetadata) {
+    await ctx.sql`
+      UPDATE messages
+      SET status = ${finalStatus},
+          content = ${accumulated},
+          finished_at = clock_timestamp(),
+          metadata = ${ctx.sql.json(errorMetadata as never)}
+      WHERE id = ${assistantMessageId}
+    `;
+  } else {
    await ctx.sql`
      UPDATE messages
      SET status = ${finalStatus},
@@ -487,6 +570,7 @@ async function handleAbortOrError(
          finished_at = clock_timestamp()
      WHERE id = ${assistantMessageId}
    `;
+  }
  const [failSessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
    UPDATE sessions SET updated_at = clock_timestamp()
    WHERE id = ${sessionId}
@@ -494,9 +578,10 @@ async function handleAbortOrError(
  `;
  ctx.publishUser({ type: 'session_updated', session_id: sessionId, project_id: failSessRow!.project_id, name: failSessRow!.name, updated_at: failSessRow!.updated_at });
  // v1.8 mobile-tabs: cancellation is a user-initiated stop, treat as idle;
-  // genuine errors flip the dot red.
-  ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: isAbort ? 'idle' : 'error', at: new Date().toISOString() });
+  // genuine errors flip the dot red. v1.8.2: error path also carries a
+  // machine-readable `reason` so the UI can render specifics inline.
  if (isAbort) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
    ctx.publish(sessionId, {
      type: 'message_complete',
      message_id: assistantMessageId,
@@ -504,12 +589,19 @@ async function handleAbortOrError(
    });
    ctx.log.info({ sessionId, chatId, assistantMessageId }, 'inference cancelled');
  } else {
-    const errMsg = err instanceof Error ? err.message : String(err);
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'error',
+      at: new Date().toISOString(),
+      reason: 'llm_provider_error',
+    });
    ctx.publish(sessionId, {
      type: 'error',
      message_id: assistantMessageId,
      chat_id: chatId,
      error: errMsg,
+      reason: 'llm_provider_error',
    });
    ctx.log.error({ err, sessionId, assistantMessageId }, 'inference failed');
  }
@@ -523,7 +615,7 @@ async function executeToolPhase(
  session: Session,
  projectRoot: string
 ): Promise<void> {
-  const { sessionId, chatId, assistantMessageId, depth, signal } = args;
+  const { sessionId, chatId, assistantMessageId, toolsUsed, signal } = args;
  const { content, toolCalls, promptTokens, completionTokens, nCtx } = result;

  const [updated] = await ctx.sql<
@@ -607,7 +699,10 @@ async function executeToolPhase(
    sessionId,
    chatId,
    assistantMessageId: nextAssistant!.id,
-    depth: depth + 1,
+    // v1.8.2: charge this turn's actual tool invocations against the budget.
+    // One assistant message can emit multiple tool_calls, so we add the run
+    // count, not 1. The next turn's budget check sees the cumulative total.
+    toolsUsed: toolsUsed + result.toolCalls.length,
    signal,
  });
 }
@@ -671,25 +766,7 @@ async function runAssistantTurn(
  ctx: InferenceContext,
  args: TurnArgs,
 ): Promise<void> {
-  const { sessionId, chatId, assistantMessageId, depth } = args;
-
-  if (depth > MAX_TOOL_LOOP_DEPTH) {
-    await ctx.sql`
-      UPDATE messages
-      SET status = 'failed',
-          content = ${'tool loop depth exceeded'},
-          finished_at = clock_timestamp()
-      WHERE id = ${assistantMessageId}
-    `;
-    ctx.publish(sessionId, {
-      type: 'error',
-      message_id: assistantMessageId,
-      chat_id: chatId,
-      error: 'tool loop depth exceeded',
-    });
-    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'error', at: new Date().toISOString() });
-    return;
-  }
+  const { sessionId, chatId } = args;

  const loaded = await loadContext(ctx.sql, sessionId, chatId);
  if (!loaded) {
@@ -704,6 +781,17 @@ async function runAssistantTurn(
  const agent = session.agent_id
    ? await getAgentById(project.path, session.agent_id)
    : null;
+
+  // v1.8.2: cap-hit replaces the older "tool loop depth exceeded" failure.
+  // When we've already burned the budget *before* this turn even runs, we
+  // skip straight to the summary flow — the in-flight assistant message slot
+  // gets reused for the wrap-up reply instead of being marked failed.
+  const budget = resolveToolBudget(agent);
+  if (args.toolsUsed >= budget) {
+    await runCapHitSummary(ctx, args, session, project, history, agent, budget);
+    return;
+  }
+
  const messages = buildMessagesPayload(session, project, history, agent);

  const state: StreamPhaseState = { accumulated: '', startedAt: null };
@@ -730,7 +818,264 @@ export async function runInference(
  assistantMessageId: string,
  signal?: AbortSignal
 ): Promise<void> {
-  return runAssistantTurn(ctx, { sessionId, chatId, assistantMessageId, depth: 0, signal });
+  // v1.8.2: every fresh inference (initial send, regenerate, force_send,
+  // continue) starts with a clean budget. Tool-call accumulation across
+  // Continue invocations is what the hard ceiling guards against, not the
+  // per-call budget.
+  return runAssistantTurn(ctx, { sessionId, chatId, assistantMessageId, toolsUsed: 0, signal });
+}
+
+// v1.8.2: cap-hit summary flow. Called instead of erroring when the loop
+// hits its budget. Reuses the in-flight assistant message slot to stream a
+// short wrap-up reply with the synthetic note prepended and tools disabled,
+// then always inserts a cap_hit sentinel afterward (regardless of summary
+// outcome) so the UI can show a Continue affordance.
+async function runCapHitSummary(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  session: Session,
+  project: Project,
+  history: Message[],
+  agent: Agent | null,
+  budget: number,
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId, signal } = args;
+
+  const messages = buildMessagesPayload(session, project, history, agent);
+  messages.push({ role: 'system', content: CAP_HIT_SUMMARY_NOTE(budget) });
+
+  const startedRow = await ctx.sql<{ started_at: string }[]>`
+    UPDATE messages
+    SET started_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING started_at
+  `;
+  const startedAt = startedRow[0]?.started_at ?? null;
+
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    role: 'assistant',
+  });
+
+  let accumulated = '';
+  let pendingFlushTimer: NodeJS.Timeout | null = null;
+  let flushPromise: Promise<unknown> = Promise.resolve();
+  const flushNow = () => {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    const snapshot = accumulated;
+    flushPromise = flushPromise.then(() =>
+      ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
+    );
+  };
+  const scheduleFlush = () => {
+    if (pendingFlushTimer) return;
+    pendingFlushTimer = setTimeout(() => {
+      pendingFlushTimer = null;
+      flushNow();
+    }, DB_FLUSH_INTERVAL_MS);
+  };
+
+  let summaryOk = false;
+  let summarySoftCancelled = false;
+  let summaryError: string | null = null;
+  let result: StreamResult | null = null;
+  try {
+    result = await streamCompletion(
+      ctx,
+      session.model,
+      messages,
+      { tools: null, temperature: agent?.temperature },
+      (delta) => {
+        accumulated += delta;
+        ctx.publish(sessionId, {
+          type: 'delta',
+          message_id: assistantMessageId,
+          chat_id: chatId,
+          content: delta,
+        });
+        scheduleFlush();
+      },
+      signal,
+    );
+    summaryOk = true;
+  } catch (err) {
+    if (err instanceof Error && err.name === 'AbortError') {
+      summarySoftCancelled = true;
+    } else {
+      summaryError = err instanceof Error ? err.message : String(err);
+    }
+  } finally {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    await flushPromise;
+  }
+
+  // Finalize the summary message based on the three outcomes. The sentinel
+  // is inserted regardless so the user always has the Continue affordance —
+  // even on a partial / failed summary the chat history shows where the
+  // budget was hit.
+  if (summaryOk && result) {
+    const [updated] = await ctx.sql<
+      { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
+    >`
+      UPDATE messages
+      SET content = ${result.content},
+          status = 'complete',
+          tokens_used = ${result.completionTokens},
+          ctx_used = ${result.promptTokens},
+          ctx_max = ${result.nCtx},
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+      RETURNING tokens_used, ctx_used, ctx_max, finished_at
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      tokens_used: updated?.tokens_used ?? null,
+      ctx_used: updated?.ctx_used ?? null,
+      ctx_max: updated?.ctx_max ?? null,
+      started_at: startedAt,
+      finished_at: updated?.finished_at ?? null,
+      model: session.model,
+    });
+  } else if (summarySoftCancelled) {
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'cancelled',
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+    });
+  } else {
+    const errMeta: MessageMetadata = {
+      kind: 'error',
+      error_reason: 'summary_after_cap_failed',
+      error_text: summaryError ?? 'summary failed',
+    };
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'failed',
+          finished_at = clock_timestamp(),
+          metadata = ${ctx.sql.json(errMeta as never)}
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'error',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      error: summaryError ?? 'summary failed',
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  // Bump session/chat updated_at exactly once for this turn.
+  const [sessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({
+    type: 'session_updated',
+    session_id: sessionId,
+    project_id: sessRow!.project_id,
+    name: sessRow!.name,
+    updated_at: sessRow!.updated_at,
+  });
+
+  await insertCapHitSentinel(ctx, sessionId, chatId, agent, budget);
+
+  // Status frame fires last so the dot color reflects the terminal state.
+  // Success → idle, abort → idle (user-driven stop), error → error+reason.
+  if (summaryOk) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  } else if (summarySoftCancelled) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  } else {
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'error',
+      at: new Date().toISOString(),
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  ctx.log.info(
+    { sessionId, chatId, assistantMessageId, budget, summaryOk, summaryCancelled: summarySoftCancelled },
+    'inference cap-hit summary finished',
+  );
+}
+
+async function insertCapHitSentinel(
+  ctx: InferenceContext,
+  sessionId: string,
+  chatId: string,
+  agent: Agent | null,
+  budget: number,
+): Promise<void> {
+  // Hard ceiling: count prior cap_hit sentinels in this chat. After two
+  // continues (sentinel count of 2), the next sentinel reports can_continue
+  // false and the UI disables the Continue button.
+  const priorRows = await ctx.sql<{ count: number }[]>`
+    SELECT COUNT(*)::int AS count
+    FROM messages
+    WHERE chat_id = ${chatId}
+      AND role = 'system'
+      AND metadata->>'kind' = 'cap_hit'
+  `;
+  const priorCount = priorRows[0]?.count ?? 0;
+  const canContinue = priorCount < 2;
+  const metadata: MessageMetadata = {
+    kind: 'cap_hit',
+    used: budget,
+    limit: budget,
+    agent_name: agent?.name ?? null,
+    can_continue: canContinue,
+  };
+  const content = `Reached tool budget (${budget}/${budget}). Continue to extend.`;
+
+  const [row] = await ctx.sql<{ id: string }[]>`
+    INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
+    VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
+    RETURNING id
+  `;
+
+  // The sentinel content is static, but we still walk the standard frame
+  // sequence (started → delta → complete) so useSessionStream's reducer
+  // appends it via the same path it uses for streaming assistant messages.
+  // The delta carries the full text in one chunk.
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: row!.id,
+    chat_id: chatId,
+    role: 'system',
+  });
+  ctx.publish(sessionId, {
+    type: 'delta',
+    message_id: row!.id,
+    chat_id: chatId,
+    content,
+  });
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: row!.id,
+    chat_id: chatId,
+    metadata,
+  });
 }

 const COMPACT_SYSTEM_PROMPT =
--- a/apps/server/src/services/tools.ts
+++ b/apps/server/src/services/tools.ts
@@ -308,6 +308,19 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
  gitStatus as ToolDef<unknown>,
 ];

+// v1.8.2: forward-compatible read-only whitelist. An agent whose `tools` is
+// fully contained in this set gets a generous default tool budget (30);
+// anything outside means the agent can mutate state and gets a tighter
+// default (10). Every tool in v1.8.2 happens to be read-only, so the
+// non-RO branch only takes effect once BooCoder lands write tools.
+export const READ_ONLY_TOOL_NAMES = [
+  'view_file',
+  'list_dir',
+  'grep',
+  'find_files',
+  'git_status',
+] as const;
+
 export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntries(
  ALL_TOOLS.map((t) => [t.name, t])
 );
--- a/apps/server/src/types/api.ts
+++ b/apps/server/src/types/api.ts
@@ -31,9 +31,10 @@ export interface Session {
  agent_id: string | null;
 }

-// Agent sources: 'builtin' = baked-in default (services/agents.ts),
-// 'file' = parsed from project's AGENTS.md.
-export type AgentSource = 'builtin' | 'file';
+// v1.8.1: agents come from two sources. 'global' = /data/AGENTS.md (always
+// loaded inside the container), 'project' = per-project override at
+// <root>/AGENTS.md. Project entries override global by name (case-sensitive).
+export type AgentSource = 'global' | 'project';

 export interface Agent {
  id: string;            // slug of name; stable handle stored in sessions.agent_id
@@ -44,11 +45,23 @@ export interface Agent {
  tools: string[];       // whitelist of tool names; empty = no tools allowed
  model: string | null;  // null means "session.model wins"
  source: AgentSource;
+  // v1.8.2: per-agent tool-loop budget. null means resolve at runtime from the
+  // agent's toolset (30 if all tools are read-only, 10 otherwise) or 15 for
+  // raw chat with no agent.
+  max_tool_calls: number | null;
+}
+
+// One entry per malformed `## Name` block. Per-block errors don't fail the
+// whole file — the loader returns parsed-successfully agents AND the list of
+// skipped ones so the UI can show a non-blocking warning chip.
+export interface AgentParseError {
+  agent_name: string;
+  reason: string;
 }

 export interface AgentsResponse {
  agents: Agent[];
-  parse_error: string | null;  // present (non-null) when AGENTS.md exists but failed to parse
+  errors: AgentParseError[];
 }

 // KEEP IN SYNC: apps/server/src/schema.sql chats_status_chk
@@ -91,6 +104,31 @@ export interface ToolResult {
  error?: string;
 }

+// v1.8.2: structured reason codes for failed inferences. `error` carries the
+// human text; `reason` is the machine-readable discriminator the UI matches
+// on (with `error` as fallback when reason is absent or unrecognized).
+export type ErrorReason =
+  | 'llm_provider_error'
+  | 'tool_execution_failed'
+  | 'summary_after_cap_failed';
+
+// v1.8.2: shapes stored in messages.metadata. Discriminated on `kind`.
+//   cap_hit  — system sentinel emitted when tool budget is exhausted
+//   error    — attached to a failed assistant message so UI can show reason
+export type MessageMetadata =
+  | {
+      kind: 'cap_hit';
+      used: number;
+      limit: number;
+      agent_name: string | null;
+      can_continue: boolean;
+    }
+  | {
+      kind: 'error';
+      error_reason: ErrorReason;
+      error_text: string;
+    };
+
 export interface Message {
  id: string;
  session_id: string;
@@ -108,6 +146,9 @@ export interface Message {
  started_at: string | null;
  finished_at: string | null;
  created_at: string;
+  // v1.8.2: per-message metadata. See MessageMetadata for the discriminated
+  // shapes currently in use.
+  metadata: MessageMetadata | null;
 }

 export interface ModelInfo {
@@ -248,11 +289,14 @@ export interface ProjectUpdatedFrame {
 }
 // v1.8 mobile-tabs: server can't know about client-side panes, so status
 // is keyed by chat_id. Frontend dot derives pane status from pane.activeChatId.
+// v1.8.2: optional `reason` carries a machine-readable code when status is
+// 'error'. UI prefers reason; falls back to no detail when absent.
 export interface ChatStatusFrame {
  type: 'chat_status';
  chat_id: string;
  status: 'working' | 'idle' | 'error';
  at: string;
+  reason?: ErrorReason;
 }
 export type UserStreamFrame =
  | ProjectCreatedFrame
--- a/apps/web/src/api/client.ts
+++ b/apps/web/src/api/client.ts
@@ -152,6 +152,13 @@ export const api = {
        `/api/chats/${chatId}/force_send`,
        { method: 'POST', body: JSON.stringify({ content }) }
      ),
+    // v1.8.2: extend an inference that hit the tool budget. `sentinelMessageId`
+    // is the cap-hit sentinel message the user clicked Continue on.
+    continue: (chatId: string, sentinelMessageId: string) =>
+      request<{ assistant_message_id: string }>(
+        `/api/chats/${chatId}/continue`,
+        { method: 'POST', body: JSON.stringify({ sentinel_message_id: sentinelMessageId }) }
+      ),
    fork: (chatId: string, body: { messageId: string; name?: string }) =>
      request<Chat>(`/api/chats/${chatId}/fork`, {
        method: 'POST',
--- a/apps/web/src/api/types.ts
+++ b/apps/web/src/api/types.ts
@@ -30,7 +30,10 @@ export interface Session {
  agent_id: string | null;
 }

-export type AgentSource = 'builtin' | 'file';
+// v1.8.1: 'global' = /data/AGENTS.md (always-on), 'project' = per-project
+// override at <root>/AGENTS.md. In-code builtins were retired; the seed file
+// lives at /data/AGENTS.md.
+export type AgentSource = 'global' | 'project';

 export interface Agent {
  id: string;
@@ -41,11 +44,20 @@ export interface Agent {
  tools: string[];
  model: string | null;
  source: AgentSource;
+  // v1.8.2: per-agent tool-loop budget. null means resolve at runtime from
+  // the agent's toolset (30 for all read-only, 10 otherwise) or 15 for raw
+  // chat with no agent.
+  max_tool_calls: number | null;
+}
+
+export interface AgentParseError {
+  agent_name: string;
+  reason: string;
 }

 export interface AgentsResponse {
  agents: Agent[];
-  parse_error: string | null;
+  errors: AgentParseError[];
 }

 export const CHAT_STATUSES = ['open', 'archived'] as const;
@@ -81,6 +93,32 @@ export interface ToolResult {
  error?: string;
 }

+// v1.8.2: structured reason codes that flow through error frames / metadata.
+// `error` text stays human; `reason` is the discriminator the UI matches on.
+export type ErrorReason =
+  | 'llm_provider_error'
+  | 'tool_execution_failed'
+  | 'summary_after_cap_failed';
+
+// v1.8.2: shapes stored in Message.metadata. Discriminated on `kind`.
+//   cap_hit — sentinel emitted when the tool budget is hit; carries the
+//             budget + agent name + whether Continue is still allowed.
+//   error   — attached to a failed assistant message so the bubble can show
+//             a specific reason on reload (WS error frame is one-shot).
+export type MessageMetadata =
+  | {
+      kind: 'cap_hit';
+      used: number;
+      limit: number;
+      agent_name: string | null;
+      can_continue: boolean;
+    }
+  | {
+      kind: 'error';
+      error_reason: ErrorReason;
+      error_text: string;
+    };
+
 export interface Message {
  id: string;
  session_id: string;
@@ -98,6 +136,9 @@ export interface Message {
  started_at: string | null;
  finished_at: string | null;
  created_at: string;
+  // v1.8.2: per-message metadata; see MessageMetadata. null for the vast
+  // majority of messages.
+  metadata: MessageMetadata | null;
 }

 export interface ModelInfo {
@@ -217,7 +258,13 @@ export type WsFrame =
      ctx_max?: number | null;
      started_at?: string | null;
      finished_at?: string | null;
+      // v1.8.2: piggybacks the persisted metadata onto the terminal frame so
+      // cap-hit sentinels (and any future stamped-on-complete metadata) flow
+      // to the client without a refetch.
+      metadata?: MessageMetadata | null;
    }
  | { type: 'messages_deleted'; message_ids: string[]; chat_id?: string }
  | { type: 'chat_renamed'; chat_id: string; name: string }
-  | { type: 'error'; message_id?: string; chat_id?: string; error: string };
+  // v1.8.2: `reason` discriminates structured failures (the UI prefers it
+  // over `error` text when present).
+  | { type: 'error'; message_id?: string; chat_id?: string; error: string; reason?: ErrorReason };
--- a/apps/web/src/components/AgentPicker.tsx
+++ b/apps/web/src/components/AgentPicker.tsx
@@ -2,7 +2,7 @@ import { useEffect, useState } from 'react';
 import { Check, ChevronDown } from 'lucide-react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
-import type { Agent } from '@/api/types';
+import type { Agent, AgentParseError } from '@/api/types';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -19,23 +19,28 @@ interface Props {

 export function AgentPicker({ projectId, value, onChange }: Props) {
  const [agents, setAgents] = useState<Agent[] | null>(null);
+  const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [open, setOpen] = useState(false);

-  // Load on mount (and on projectId change) so the trigger shows the agent
-  // name immediately, not the raw id. AGENTS.md parse errors surface as a
-  // toast once per load.
+  // v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
+  // loaded successfully; a gray warning toast fires only when EVERY agent
+  // in AGENTS.md failed to parse. Server logs a console.warn either way.
  useEffect(() => {
    let cancelled = false;
    setAgents(null);
+    setParseErrors([]);
    setError(null);
    api.agents
      .list(projectId)
      .then((res) => {
        if (cancelled) return;
        setAgents(res.agents);
-        if (res.parse_error) {
-          toast.error(`AGENTS.md parse error: ${res.parse_error}`);
+        setParseErrors(res.errors);
+        if (res.errors.length > 0 && res.agents.length === 0) {
+          toast.warning(
+            `AGENTS.md: ${res.errors.length} agent${res.errors.length === 1 ? '' : 's'} failed to parse, none loaded`,
+          );
        }
      })
      .catch((err) => {
@@ -100,6 +105,14 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
                )}
              </DropdownMenuItem>
            ))}
+            {parseErrors.length > 0 && (
+              <div
+                className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
+                title={parseErrors.map((e) => `${e.agent_name}: ${e.reason}`).join('\n')}
+              >
+                {parseErrors.length} agent{parseErrors.length === 1 ? '' : 's'} skipped
+              </div>
+            )}
          </>
        )}
      </DropdownMenuContent>
--- a/apps/web/src/components/CapHitSentinel.tsx
+++ b/apps/web/src/components/CapHitSentinel.tsx
@@ -0,0 +1,90 @@
+import { useState } from 'react';
+import { AlertCircle } from 'lucide-react';
+import { toast } from 'sonner';
+import { api } from '@/api/client';
+import type { Message } from '@/api/types';
+import { Button } from '@/components/ui/button';
+
+interface Props {
+  message: Message;
+  // 1-indexed position among cap-hit sentinels in this chat. The first
+  // cap-hit is 1, second is 2, third is 3 (hard ceiling).
+  capHitPosition: number;
+  // Only the most recent sentinel shows the Continue button. Older ones
+  // render text-only — they've already been continued past.
+  isLatest: boolean;
+}
+
+// Hard ceiling = 3 cap-hits per chat ⇒ 2 continues max. Lives here in sync
+// with insertCapHitSentinel's `canContinue = priorCount < 2` rule in
+// services/inference.ts.
+const MAX_CONTINUES = 2;
+
+export function CapHitSentinel({ message, capHitPosition, isLatest }: Props) {
+  const meta = message.metadata;
+  // Defensive parse — if the row is somehow missing metadata we still render
+  // the bare text rather than crashing the chat.
+  const isCapHit =
+    meta !== null && typeof meta === 'object' && meta.kind === 'cap_hit';
+
+  const limit = isCapHit ? meta.limit : null;
+  const canContinue = isCapHit ? meta.can_continue : false;
+  const agentName = isCapHit ? meta.agent_name : null;
+  // `capHitPosition` is 1-indexed; `MAX_CONTINUES - (position - 1)` is the
+  // number of continues remaining including this one. Clamped to ≥0.
+  const remaining = Math.max(0, MAX_CONTINUES - (capHitPosition - 1));
+
+  const [continuing, setContinuing] = useState(false);
+
+  async function handleContinue() {
+    if (continuing || !canContinue || !isLatest) return;
+    setContinuing(true);
+    try {
+      await api.chats.continue(message.chat_id, message.id);
+    } catch (err) {
+      toast.error(err instanceof Error ? err.message : 'continue failed');
+    } finally {
+      setContinuing(false);
+    }
+  }
+
+  // Tooltip wording from the v1.8.2 spec. Disabled state takes precedence —
+  // the spec text "Hard limit reached — start a new chat" matches what the
+  // server returns when canContinue is false.
+  const enabledTooltip = limit
+    ? `Resumes with a fresh budget of ${limit} tool calls. ${remaining} continue${remaining === 1 ? '' : 's'} remaining on this chat.`
+    : undefined;
+  const disabledTooltip = 'Hard limit reached — start a new chat';
+
+  return (
+    <div className="rounded-md border border-amber-500/40 bg-amber-500/10 text-sm">
+      <div className="px-3 py-2 flex items-start gap-2">
+        <AlertCircle className="size-4 text-amber-500 shrink-0 mt-0.5" />
+        <div className="flex-1 min-w-0 space-y-1">
+          <div className="text-xs font-medium text-amber-700 dark:text-amber-300">
+            {isCapHit && limit !== null
+              ? `Reached tool budget (${limit}/${limit})${agentName ? ` — ${agentName}` : ''}.`
+              : 'Reached tool budget.'}
+          </div>
+          <div className="text-xs text-muted-foreground">
+            {message.content}
+          </div>
+          {isLatest && (
+            <div className="pt-1">
+              <Button
+                type="button"
+                size="sm"
+                variant="outline"
+                onClick={() => void handleContinue()}
+                disabled={!canContinue || continuing}
+                title={canContinue ? enabledTooltip : disabledTooltip}
+              >
+                {continuing ? 'Continuing…' : 'Continue'}
+              </Button>
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  );
+}
--- a/apps/web/src/components/MessageBubble.tsx
+++ b/apps/web/src/components/MessageBubble.tsx
@@ -4,10 +4,10 @@ import Markdown from 'react-markdown';
 import remarkGfm from 'remark-gfm';
 import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2 } from 'lucide-react';
 import { toast } from 'sonner';
-import type { Chat, Message } from '@/api/types';
+import type { Chat, ErrorReason, Message } from '@/api/types';
 import { api } from '@/api/client';
 import { sessionEvents } from '@/hooks/sessionEvents';
-import { ToolCallCard } from './ToolCallCard';
+import { CapHitSentinel } from './CapHitSentinel';
 import { CodeBlock } from './CodeBlock';
 import { Button } from '@/components/ui/button';
 import {
@@ -19,6 +19,15 @@ import {
  DialogTitle,
 } from '@/components/ui/dialog';

+// v1.8.2: human labels for the machine-readable error reasons that ride on
+// failed assistant messages via metadata.kind === 'error'. Kept short so the
+// inline render under "message failed" stays a single muted line.
+const ERROR_REASON_LABELS: Record<ErrorReason, string> = {
+  llm_provider_error: 'LLM provider error',
+  tool_execution_failed: 'Tool execution failed',
+  summary_after_cap_failed: 'Summary after tool budget hit failed',
+};
+
 // Match path-shaped substrings ending in `.ext`. Additionally require a `/`
 // in the match to reduce false positives in prose (e.g. plain `foo.ts` won't
 // match, but `src/foo.ts` will). False positives at the edges are accepted
@@ -94,6 +103,9 @@ function linkifyChildren(children: ReactNode, keyPrefix = 'l'): ReactNode {
 interface Props {
  message: Message;
  sessionChats?: Chat[];
+  // v1.8.2: passed by MessageList's render-item pass for cap-hit sentinels.
+  // Only the most recent sentinel shows the Continue button.
+  capHitInfo?: { position: number; isLatest: boolean };
 }

 function MarkdownBody({ content }: { content: string }) {
@@ -464,15 +476,34 @@ function CompactCard({ message, sessionChats }: { message: Message; sessionChats
  );
 }

-export function MessageBubble({ message, sessionChats }: Props) {
+export function MessageBubble({ message, sessionChats, capHitInfo }: Props) {
  if (message.kind === 'compact') {
    return <CompactCard message={message} sessionChats={sessionChats} />;
  }

-  if (message.role === 'tool') {
-    return <ToolCallCard message={message} />;
+  // v1.8.2: cap-hit sentinels render as a distinct system bubble with a
+  // Continue button. MessageList's pre-render pass tags each sentinel with
+  // its position; only the latest gets the actionable button.
+  if (
+    message.role === 'system' &&
+    message.metadata?.kind === 'cap_hit' &&
+    capHitInfo
+  ) {
+    return (
+      <CapHitSentinel
+        message={message}
+        capHitPosition={capHitInfo.position}
+        isLatest={capHitInfo.isLatest}
+      />
+    );
  }

+  // v1.8.2: tool messages and assistant tool_calls are now rendered by
+  // MessageList via ToolCallLine / ToolCallGroup. Tool-role messages reach
+  // this point only if MessageList didn't consume them (shouldn't happen,
+  // but guard against it by rendering nothing rather than a stale card).
+  if (message.role === 'tool') return null;
+
  if (message.role === 'user') {
    return (
      <div className="group flex flex-col items-end gap-1">
@@ -487,14 +518,17 @@ export function MessageBubble({ message, sessionChats }: Props) {
  const isStreaming = message.status === 'streaming';
  const failed = message.status === 'failed';
  const hasContent = message.content.length > 0;
-  const hasToolCalls = (message.tool_calls?.length ?? 0) > 0;
+  // v1.8.2: if metadata stamps an error reason, surface it inline under the
+  // generic "message failed" line. Keeps the user's eye where it already is
+  // rather than introducing a separate banner.
+  const errorMeta =
+    message.metadata !== null && message.metadata.kind === 'error'
+      ? message.metadata
+      : null;

  return (
    <div className="group flex flex-col gap-2">
-      {message.tool_calls?.map((tc) => (
-        <ToolCallCard key={tc.id} toolCall={tc} />
-      ))}
-      {(hasContent || (!hasToolCalls && isStreaming)) && (
+      {(hasContent || isStreaming) && (
        <div className="max-w-[90%] text-sm leading-relaxed space-y-2 break-words min-w-0">
          {hasContent ? <MarkdownBody content={message.content} /> : null}
          {isStreaming && (
@@ -503,12 +537,18 @@ export function MessageBubble({ message, sessionChats }: Props) {
        </div>
      )}
      {failed && (
-        <div className="text-xs text-destructive">message failed</div>
+        <div className="text-xs text-destructive">
+          message failed
+          {errorMeta && (
+            <span className="block text-muted-foreground mt-0.5">
+              {ERROR_REASON_LABELS[errorMeta.error_reason]}
+              {errorMeta.error_text ? ` — ${errorMeta.error_text}` : ''}
+            </span>
+          )}
+        </div>
      )}
      {!isStreaming && <StatsLine message={message} />}
-      {!isStreaming && (hasContent || hasToolCalls) && (
-        <ActionRow message={message} />
-      )}
+      {!isStreaming && hasContent && <ActionRow message={message} />}
    </div>
  );
 }
--- a/apps/web/src/components/MessageList.tsx
+++ b/apps/web/src/components/MessageList.tsx
@@ -1,15 +1,128 @@
-import { useEffect, useRef } from 'react';
+import { useEffect, useMemo, useRef } from 'react';
 import type { Chat, Message } from '@/api/types';
 import { MessageBubble } from './MessageBubble';
+import { ToolCallGroup } from './ToolCallGroup';
+import { ToolCallLine, type ToolRun } from './ToolCallLine';

 interface Props {
  messages: Message[];
  sessionChats?: Chat[];
 }

+// v1.8.2: pre-render units. The single linear `messages` array gets walked
+// into a render-time list where each tool_call is a first-class item and
+// tool_result messages are folded onto their matching tool_run by id.
+type RenderItem =
+  | { kind: 'message'; message: Message; capHitInfo?: { position: number; isLatest: boolean } }
+  | { kind: 'tool_run'; run: ToolRun; key: string }
+  | { kind: 'tool_group'; runs: ToolRun[]; key: string };
+
+const GROUP_THRESHOLD = 3;
+
+function isCapHitSentinel(m: Message): boolean {
+  return m.role === 'system' && m.metadata?.kind === 'cap_hit';
+}
+
+// First pass: walk messages chronologically, expanding assistant tool_calls
+// into per-call run items and folding tool_result messages onto their
+// matching runs. Tool messages themselves never produce a render item.
+// Assistant messages produce a text render item only when they have text;
+// pure tool-call messages are "transparent" so consecutive tool runs can
+// still group across them.
+function flatten(messages: Message[]): RenderItem[] {
+  const items: RenderItem[] = [];
+  const runsByCallId = new Map<string, ToolRun>();
+
+  for (const m of messages) {
+    if (m.role === 'tool') {
+      if (m.tool_results) {
+        const run = runsByCallId.get(m.tool_results.tool_call_id);
+        if (run) run.result = m.tool_results;
+      }
+      continue;
+    }
+    const hasToolCalls = m.tool_calls != null && m.tool_calls.length > 0;
+    const hasText = m.content.length > 0;
+    if (m.role === 'assistant' && hasToolCalls) {
+      if (hasText || m.status === 'streaming') {
+        items.push({ kind: 'message', message: m });
+      }
+      for (const tc of m.tool_calls!) {
+        const run: ToolRun = { call: tc, result: null };
+        runsByCallId.set(tc.id, run);
+        items.push({ kind: 'tool_run', run, key: tc.id });
+      }
+      continue;
+    }
+    items.push({ kind: 'message', message: m });
+  }
+
+  return items;
+}
+
+// Second pass: collapse runs of >=GROUP_THRESHOLD consecutive tool_run items
+// of the same tool name into a single tool_group. Any other render item
+// (text bubble, sentinel, user message) breaks the chain.
+function group(items: RenderItem[]): RenderItem[] {
+  const out: RenderItem[] = [];
+  let i = 0;
+  while (i < items.length) {
+    const item = items[i]!;
+    if (item.kind !== 'tool_run') {
+      out.push(item);
+      i += 1;
+      continue;
+    }
+    const name = item.run.call.name;
+    let j = i + 1;
+    while (
+      j < items.length &&
+      items[j]!.kind === 'tool_run' &&
+      (items[j] as { kind: 'tool_run'; run: ToolRun }).run.call.name === name
+    ) {
+      j += 1;
+    }
+    const run = items.slice(i, j) as Array<{ kind: 'tool_run'; run: ToolRun; key: string }>;
+    if (run.length >= GROUP_THRESHOLD) {
+      out.push({
+        kind: 'tool_group',
+        runs: run.map((r) => r.run),
+        key: `group-${run[0]!.key}`,
+      });
+    } else {
+      for (const r of run) out.push(r);
+    }
+    i = j;
+  }
+  return out;
+}
+
+// Third pass: number cap-hit sentinels (1-indexed) and mark the latest.
+// CapHitSentinel uses position to compute the "N continues remaining"
+// tooltip, and isLatest to gate the Continue button (only the most recent
+// sentinel is actionable).
+function stampCapHits(items: RenderItem[]): RenderItem[] {
+  const totalCapHits = items.reduce(
+    (n, it) => n + (it.kind === 'message' && isCapHitSentinel(it.message) ? 1 : 0),
+    0,
+  );
+  if (totalCapHits === 0) return items;
+  let index = 0;
+  return items.map((it) => {
+    if (it.kind !== 'message' || !isCapHitSentinel(it.message)) return it;
+    index += 1;
+    return {
+      ...it,
+      capHitInfo: { position: index, isLatest: index === totalCapHits },
+    };
+  });
+}
+
 export function MessageList({ messages, sessionChats }: Props) {
  const endRef = useRef<HTMLDivElement>(null);

+  const renderItems = useMemo(() => stampCapHits(group(flatten(messages))), [messages]);
+
  useEffect(() => {
    endRef.current?.scrollIntoView({ block: 'end' });
  }, [messages]);
@@ -25,9 +138,22 @@ export function MessageList({ messages, sessionChats }: Props) {
  return (
    <div className="flex-1 overflow-y-auto">
      <div className="max-w-[1000px] mx-auto w-full px-6 py-4 space-y-4">
-        {messages.map((m) => (
-          <MessageBubble key={m.id} message={m} sessionChats={sessionChats} />
-        ))}
+        {renderItems.map((item) => {
+          if (item.kind === 'message') {
+            return (
+              <MessageBubble
+                key={item.message.id}
+                message={item.message}
+                sessionChats={sessionChats}
+                capHitInfo={item.capHitInfo}
+              />
+            );
+          }
+          if (item.kind === 'tool_run') {
+            return <ToolCallLine key={item.key} run={item.run} />;
+          }
+          return <ToolCallGroup key={item.key} runs={item.runs} />;
+        })}
        <div ref={endRef} />
      </div>
    </div>
--- a/apps/web/src/components/ToolCallCard.tsx
+++ b/apps/web/src/components/ToolCallCard.tsx
@@ -1,102 +0,0 @@
-import { useState } from 'react';
-import type { ReactNode } from 'react';
-import { ChevronRight, Wrench } from 'lucide-react';
-import type { Message, ToolCall } from '@/api/types';
-import { sessionEvents } from '@/hooks/sessionEvents';
-
-interface Props {
-  message?: Message;
-  toolCall?: ToolCall;
-}
-
-// Same regex/heuristic as MessageBubble: paths ending in `.ext` with at
-// least one `/`. Linkifies file paths emitted by tools like grep / find_files
-// so they're clickable.
-const PATH_REGEX = /([a-zA-Z0-9._/-]+\.[a-zA-Z0-9]+)/g;
-
-function linkifyOutput(text: string): ReactNode[] {
-  const out: ReactNode[] = [];
-  let lastIdx = 0;
-  let idx = 0;
-  for (const match of text.matchAll(PATH_REGEX)) {
-    const matchedText = match[0];
-    const start = match.index ?? 0;
-    if (!matchedText.includes('/')) continue;
-    if (start > lastIdx) out.push(text.slice(lastIdx, start));
-    out.push(
-      <button
-        key={idx}
-        type="button"
-        onClick={() =>
-          sessionEvents.emit({
-            type: 'open_file_in_browser',
-            path: matchedText,
-          })
-        }
-        className="text-primary underline cursor-pointer hover:text-primary/80"
-      >
-        {matchedText}
-      </button>
-    );
-    lastIdx = start + matchedText.length;
-    idx += 1;
-  }
-  if (lastIdx < text.length) out.push(text.slice(lastIdx));
-  return out.length > 0 ? out : [text];
-}
-
-export function ToolCallCard({ message, toolCall }: Props) {
-  const [open, setOpen] = useState(false);
-  const tc = toolCall ?? message?.tool_calls?.[0];
-  const result = message?.tool_results;
-
-  const name = tc?.name ?? 'tool';
-  const args = tc?.args ?? {};
-  const error = result?.error;
-  const output = result?.output;
-  const truncated = result?.truncated;
-
-  return (
-    <div className="rounded-md border border-border bg-muted/30 text-sm overflow-hidden">
-      <button
-        type="button"
-        onClick={() => setOpen((v) => !v)}
-        className="w-full flex items-center gap-2 px-2.5 py-1.5 hover:bg-muted/60 text-left"
-      >
-        <ChevronRight
-          className={`size-3.5 transition-transform ${open ? 'rotate-90' : ''}`}
-        />
-        <Wrench className="size-3.5 opacity-70" />
-        <span className="font-mono font-medium">{name}</span>
-        <span className="font-mono text-xs text-muted-foreground truncate min-w-0 flex-1">
-          {JSON.stringify(args)}
-        </span>
-        {error && (
-          <span className="text-xs text-destructive font-medium ml-2">error</span>
-        )}
-        {truncated && (
-          <span className="text-xs text-muted-foreground ml-2">truncated</span>
-        )}
-      </button>
-      {open && (
-        <div className="px-2.5 py-2 border-t bg-background/40">
-          {error ? (
-            <pre className="text-xs text-destructive font-mono whitespace-pre-wrap">
-              {error}
-            </pre>
-          ) : output !== undefined ? (
-            <pre className="text-xs font-mono whitespace-pre-wrap overflow-x-auto max-h-72 overflow-y-auto">
-              {linkifyOutput(
-                typeof output === 'string'
-                  ? output
-                  : JSON.stringify(output, null, 2)
-              )}
-            </pre>
-          ) : (
-            <div className="text-xs text-muted-foreground">no result yet</div>
-          )}
-        </div>
-      )}
-    </div>
-  );
-}
--- a/apps/web/src/components/ToolCallGroup.tsx
+++ b/apps/web/src/components/ToolCallGroup.tsx
@@ -0,0 +1,64 @@
+import { useState } from 'react';
+import { ChevronRight } from 'lucide-react';
+import { ToolCallLine, runStatus, type ToolRun } from './ToolCallLine';
+
+interface Props {
+  // All runs must share the same tool name. Caller (MessageList grouping
+  // pass) enforces that invariant.
+  runs: ToolRun[];
+}
+
+export function ToolCallGroup({ runs }: Props) {
+  const [open, setOpen] = useState(false);
+  if (runs.length === 0) return null;
+  const toolName = runs[0]!.call.name;
+  const count = runs.length;
+
+  // Group-level status: pending if any are still running, error if any
+  // finished with an error, otherwise success. Matches the visual the user
+  // gets when scanning a long run of greps / view_files.
+  let pending = 0;
+  let errored = 0;
+  for (const r of runs) {
+    const s = runStatus(r);
+    if (s === 'pending') pending += 1;
+    else if (s === 'error') errored += 1;
+  }
+  const summaryParts: string[] = [];
+  if (pending > 0) summaryParts.push(`${pending} running`);
+  if (errored > 0) summaryParts.push(`${errored} failed`);
+  const summary = summaryParts.length > 0 ? ` (${summaryParts.join(', ')})` : '';
+
+  return (
+    <div className="rounded border border-border/60 bg-muted/20 text-xs">
+      <button
+        type="button"
+        onClick={() => setOpen((v) => !v)}
+        className="w-full flex items-center gap-1.5 px-2 py-1 hover:bg-muted/40 text-left"
+      >
+        <ChevronRight
+          className={`size-3 text-muted-foreground/60 shrink-0 transition-transform ${open ? 'rotate-90' : ''}`}
+        />
+        <span className="text-muted-foreground/60 select-none shrink-0">⊞</span>
+        <span className="font-mono text-foreground/90">
+          {count} {toolName} call{count === 1 ? '' : 's'}
+        </span>
+        {summary && (
+          <span className="text-muted-foreground truncate">{summary}</span>
+        )}
+        <span className="ml-auto text-muted-foreground/60 shrink-0">tap</span>
+      </button>
+      {open && (
+        <div className="border-t border-border/40 px-2 py-1 space-y-0.5">
+          {runs.map((run, i) => (
+            <ToolCallLine
+              key={`${run.call.id}-${i}`}
+              run={run}
+              insideGroup
+            />
+          ))}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/apps/web/src/components/ToolCallLine.tsx
+++ b/apps/web/src/components/ToolCallLine.tsx
@@ -0,0 +1,167 @@
+import { useState } from 'react';
+import type { ReactNode } from 'react';
+import { Check, ChevronRight, Loader2, X } from 'lucide-react';
+import type { ToolCall, ToolResult } from '@/api/types';
+import { sessionEvents } from '@/hooks/sessionEvents';
+
+// v1.8.2: cap on the inline arg-summary length. Expanded view shows full
+// args + full result, so this is purely a single-line render budget.
+const ARG_SUMMARY_MAX = 60;
+
+export interface ToolRun {
+  call: ToolCall;
+  // null while the call is in flight or the matching tool result hasn't
+  // arrived yet on the WS stream.
+  result: ToolResult | null;
+}
+
+function truncate(s: string, n: number): string {
+  return s.length > n ? s.slice(0, n - 1) + '…' : s;
+}
+
+// Per-tool argument summary mapping from the v1.8.2 spec. Goal is a single
+// scannable line that surfaces the *what* (path / pattern) without
+// overwhelming the chat with full JSON.
+export function formatToolArgs(name: string, args: Record<string, unknown>): string {
+  if (name === 'view_file') {
+    const path = String(args.path ?? '');
+    const start = args.start_line;
+    const end = args.end_line;
+    if (typeof start === 'number' && typeof end === 'number') {
+      return truncate(`${path}:${start}-${end}`, ARG_SUMMARY_MAX);
+    }
+    if (typeof start === 'number') {
+      return truncate(`${path}:${start}`, ARG_SUMMARY_MAX);
+    }
+    return truncate(path, ARG_SUMMARY_MAX);
+  }
+  if (name === 'list_dir') {
+    return truncate(String(args.path ?? '.'), ARG_SUMMARY_MAX);
+  }
+  if (name === 'grep') {
+    const pattern = String(args.pattern ?? '');
+    const path = args.path ? ` ${String(args.path)}` : '';
+    return truncate(`"${pattern}"${path}`, ARG_SUMMARY_MAX);
+  }
+  if (name === 'find_files') {
+    return truncate(String(args.pattern ?? ''), ARG_SUMMARY_MAX);
+  }
+  if (name === 'git_status') {
+    return '';
+  }
+  // Unknown tool — surface first arg value or the literal {} so the user can
+  // see something happened. Forward-compatible with future tools.
+  const keys = Object.keys(args);
+  if (keys.length === 0) return '{}';
+  const first = keys[0]!;
+  return truncate(`${first}: ${String(args[first])}`, ARG_SUMMARY_MAX);
+}
+
+export function runStatus(run: ToolRun): 'pending' | 'success' | 'error' {
+  if (run.result === null) return 'pending';
+  if (run.result.error) return 'error';
+  return 'success';
+}
+
+// Path-shaped paths in tool output text get a click handler so users can
+// jump to the file. Same heuristic as MessageBubble.linkifyPaths.
+const PATH_REGEX = /([a-zA-Z0-9._/-]+\.[a-zA-Z0-9]+)/g;
+function linkifyOutput(text: string): ReactNode[] {
+  const out: ReactNode[] = [];
+  let lastIdx = 0;
+  let idx = 0;
+  for (const match of text.matchAll(PATH_REGEX)) {
+    const matchedText = match[0];
+    const start = match.index ?? 0;
+    if (!matchedText.includes('/')) continue;
+    if (start > lastIdx) out.push(text.slice(lastIdx, start));
+    out.push(
+      <button
+        key={idx}
+        type="button"
+        onClick={() =>
+          sessionEvents.emit({ type: 'open_file_in_browser', path: matchedText })
+        }
+        className="text-primary underline cursor-pointer hover:text-primary/80"
+      >
+        {matchedText}
+      </button>
+    );
+    lastIdx = start + matchedText.length;
+    idx += 1;
+  }
+  if (lastIdx < text.length) out.push(text.slice(lastIdx));
+  return out.length > 0 ? out : [text];
+}
+
+interface Props {
+  run: ToolRun;
+  // When rendered inside a ToolCallGroup the line is already nested under a
+  // shared header, so the leading arrow is dropped to avoid double indent.
+  insideGroup?: boolean;
+}
+
+export function ToolCallLine({ run, insideGroup }: Props) {
+  const [open, setOpen] = useState(false);
+  const status = runStatus(run);
+  const args = run.call.args ?? {};
+  const summary = formatToolArgs(run.call.name, args);
+
+  return (
+    <div className="text-xs">
+      <button
+        type="button"
+        onClick={() => setOpen((v) => !v)}
+        className="flex items-center gap-1.5 w-full text-left hover:bg-muted/40 rounded px-1 py-0.5 -mx-1"
+      >
+        {!insideGroup && (
+          <span className="text-muted-foreground/60 select-none shrink-0">↳</span>
+        )}
+        <ChevronRight
+          className={`size-3 text-muted-foreground/60 shrink-0 transition-transform ${open ? 'rotate-90' : ''}`}
+        />
+        <span className="font-mono text-foreground/90 shrink-0">{run.call.name}</span>
+        {summary && (
+          <span className="font-mono text-muted-foreground truncate min-w-0 flex-1">
+            {summary}
+          </span>
+        )}
+        {!summary && <span className="flex-1" />}
+        <span className="shrink-0 ml-1">
+          {status === 'pending' && (
+            <Loader2 className="size-3 text-muted-foreground animate-spin" aria-label="running" />
+          )}
+          {status === 'success' && (
+            <Check className="size-3 text-emerald-500" aria-label="success" />
+          )}
+          {status === 'error' && (
+            <X className="size-3 text-destructive" aria-label="error" />
+          )}
+        </span>
+      </button>
+      {open && (
+        <div className="ml-5 mt-1 mb-1 space-y-1">
+          <pre className="text-[10px] text-muted-foreground font-mono whitespace-pre-wrap break-all bg-muted/30 rounded px-2 py-1">
+            {JSON.stringify(args, null, 2)}
+          </pre>
+          {run.result && (
+            <pre className="text-[11px] font-mono whitespace-pre-wrap bg-muted/30 rounded px-2 py-1 max-h-72 overflow-y-auto">
+              {run.result.error ? (
+                <span className="text-destructive">{run.result.error}</span>
+              ) : (
+                linkifyOutput(
+                  typeof run.result.output === 'string'
+                    ? run.result.output
+                    : JSON.stringify(run.result.output, null, 2)
+                )
+              )}
+              {run.result.truncated && (
+                <div className="text-muted-foreground/60 mt-1">— output truncated —</div>
+              )}
+            </pre>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/apps/web/src/hooks/sessionEvents.ts
+++ b/apps/web/src/hooks/sessionEvents.ts
@@ -2,7 +2,7 @@
 // across hooks (e.g. AI rename arriving via WS in the session view needs to
 // also refresh the sidebar's session list).

-import type { Chat, Project, Session } from '@/api/types';
+import type { Chat, ErrorReason, Project, Session } from '@/api/types';
 import type { Attachment } from '@/lib/attachments';

 export interface SessionRenamedEvent {
@@ -118,11 +118,14 @@ export interface ProjectUpdatedEvent {
 // v1.8 mobile-tabs: broadcast on user channel from inference.ts so any device
 // subscribed sees a chat working/idle/error. Frontend stores per-chat; panes
 // derive their dot from pane.activeChatId.
+// v1.8.2: optional `reason` carries a machine-readable code when status is
+// 'error'. UI prefers reason for inline error rendering.
 export interface ChatStatusEvent {
  type: 'chat_status';
  chat_id: string;
  status: 'working' | 'idle' | 'error';
  at: string;
+  reason?: ErrorReason;
 }

 export type SessionEvent =
--- a/apps/web/src/hooks/useSessionStream.ts
+++ b/apps/web/src/hooks/useSessionStream.ts
@@ -29,7 +29,9 @@ function applyFrame(state: State, frame: WsFrame): State {
        kind: 'message',
        tool_calls: null,
        tool_results: null,
-        status: 'streaming',
+        // v1.8.2: cap-hit sentinels arrive role='system' and are static, so
+        // skipping the streaming dot for them keeps the UI accurate.
+        status: frame.role === 'system' ? 'complete' : 'streaming',
        last_seq: 0,
        tokens_used: null,
        ctx_used: null,
@@ -37,6 +39,7 @@ function applyFrame(state: State, frame: WsFrame): State {
        started_at: null,
        finished_at: null,
        created_at: new Date().toISOString(),
+        metadata: null,
      };
      return { ...state, messages: [...state.messages, newMsg] };
    }
@@ -96,6 +99,7 @@ function applyFrame(state: State, frame: WsFrame): State {
        started_at: null,
        finished_at: null,
        created_at: new Date().toISOString(),
+        metadata: null,
      };
      return { ...state, messages: [...state.messages, newMsg] };
    }
@@ -110,6 +114,10 @@ function applyFrame(state: State, frame: WsFrame): State {
              ...(frame.ctx_max !== undefined ? { ctx_max: frame.ctx_max } : {}),
              ...(frame.started_at !== undefined ? { started_at: frame.started_at } : {}),
              ...(frame.finished_at !== undefined ? { finished_at: frame.finished_at } : {}),
+              // v1.8.2: cap-hit sentinels (and future stamped metadata) ride
+              // in on this terminal frame so the reducer can attach it
+              // without waiting for a refetch.
+              ...(frame.metadata !== undefined ? { metadata: frame.metadata } : {}),
            }
          : m
      );
@@ -133,9 +141,22 @@ function applyFrame(state: State, frame: WsFrame): State {
      return state;
    }
    case 'error': {
+      // v1.8.2: when the frame carries a structured reason, stamp it onto the
+      // failed message's metadata so the bubble can render specifics inline
+      // (the WS error frame is one-shot; refresh-safe rendering needs the
+      // value persisted on the message).
+      const errorMeta = frame.reason
+        ? { kind: 'error' as const, error_reason: frame.reason, error_text: frame.error }
+        : null;
      const next = frame.message_id
        ? state.messages.map((m) =>
-            m.id === frame.message_id ? { ...m, status: 'failed' as const } : m
+            m.id === frame.message_id
+              ? {
+                  ...m,
+                  status: 'failed' as const,
+                  ...(errorMeta ? { metadata: errorMeta } : {}),
+                }
+              : m
          )
        : state.messages;
      return { ...state, messages: next, error: frame.error };
@@ -143,6 +164,11 @@ function applyFrame(state: State, frame: WsFrame): State {
  }
 }

+// Matches useUserEvents — exponential backoff with the same ceiling so the
+// two channels reconnect on the same cadence after a network handoff.
+const RECONNECT_INITIAL_MS = 1000;
+const RECONNECT_MAX_MS = 30_000;
+
 export function useSessionStream(sessionId: string | undefined) {
  const [state, setState] = useState<State>({ messages: [], connected: false, error: null });
  const wsRef = useRef<WebSocket | null>(null);
@@ -152,12 +178,19 @@ export function useSessionStream(sessionId: string | undefined) {

    setState({ messages: [], connected: false, error: null });

+    let unmounted = false;
+    let reconnectTimer: ReturnType<typeof setTimeout> | null = null;
+    let reconnectDelay = RECONNECT_INITIAL_MS;
+
+    const connect = () => {
+      if (unmounted) return;
      const proto = window.location.protocol === 'https:' ? 'wss' : 'ws';
      const url = `${proto}://${window.location.host}/api/ws/sessions/${sessionId}`;
      const ws = new WebSocket(url);
      wsRef.current = ws;

      ws.onopen = () => {
+        reconnectDelay = RECONNECT_INITIAL_MS;
        setState((s) => ({ ...s, connected: true, error: null }));
      };
      ws.onmessage = (ev) => {
@@ -168,16 +201,29 @@ export function useSessionStream(sessionId: string | undefined) {
          console.warn('bad ws frame', err);
        }
      };
+      // v1.8.1: WS errors no longer surface as user-facing toasts here. The
+      // user-channel hook (useUserEvents) owns the debounced "reconnecting…"
+      // UI; this channel just reconnects silently on the same backoff.
      ws.onerror = () => {
-      setState((s) => ({ ...s, error: 'websocket error' }));
+        try { ws.close(); } catch {}
      };
      ws.onclose = () => {
+        if (unmounted) return;
        setState((s) => ({ ...s, connected: false }));
+        const delay = reconnectDelay;
+        reconnectDelay = Math.min(reconnectDelay * 2, RECONNECT_MAX_MS);
+        reconnectTimer = setTimeout(connect, delay);
+      };
    };

+    connect();
+
    return () => {
+      unmounted = true;
+      if (reconnectTimer) clearTimeout(reconnectTimer);
+      const ws = wsRef.current;
      wsRef.current = null;
-      ws.close();
+      if (ws) try { ws.close(); } catch {}
    };
  }, [sessionId]);

--- a/apps/web/src/hooks/useUserEvents.ts
+++ b/apps/web/src/hooks/useUserEvents.ts
@@ -1,5 +1,6 @@
 import { useEffect } from 'react';
 import { sessionEvents } from './sessionEvents';
+import { createWsReconnectToast } from './wsReconnectToast';

 const RECONNECT_INITIAL_MS = 1000;
 const RECONNECT_MAX_MS = 30000;
@@ -11,6 +12,20 @@ export function useUserEvents(): void {
    let reconnectDelay = RECONNECT_INITIAL_MS;
    let unmounted = false;

+    // v1.8.1: silent on the first disconnect; gray "reconnecting…" after 3
+    // fails / 15 s; red "connection lost" with a Retry Now action after 60 s.
+    const reconnectToast = createWsReconnectToast({
+      label: 'Live updates',
+      onRetryNow: () => {
+        if (reconnectTimer) {
+          clearTimeout(reconnectTimer);
+          reconnectTimer = null;
+          reconnectDelay = RECONNECT_INITIAL_MS;
+          connect();
+        }
+      },
+    });
+
    const connect = () => {
      if (unmounted) return;
      const url = new URL('/api/ws/user', window.location.href);
@@ -19,6 +34,7 @@ export function useUserEvents(): void {

      ws.onopen = () => {
        reconnectDelay = RECONNECT_INITIAL_MS;
+        reconnectToast.onConnected();
      };

      ws.onmessage = (ev) => {
@@ -34,6 +50,7 @@ export function useUserEvents(): void {

      ws.onclose = () => {
        if (unmounted) return;
+        reconnectToast.onFailure();
        const delay = reconnectDelay;
        reconnectDelay = Math.min(reconnectDelay * 2, RECONNECT_MAX_MS);
        reconnectTimer = setTimeout(connect, delay);
@@ -50,8 +67,8 @@ export function useUserEvents(): void {

    return () => {
      unmounted = true;
+      reconnectToast.dispose();
      if (reconnectTimer) clearTimeout(reconnectTimer);
-      // best-effort cleanup; ignore failure because the socket may already be closed
      if (ws) try { ws.close(); } catch {}
    };
  }, []);
--- a/apps/web/src/hooks/wsReconnectToast.ts
+++ b/apps/web/src/hooks/wsReconnectToast.ts
@@ -0,0 +1,95 @@
+import { toast } from 'sonner';
+
+// v1.8.1 thresholds. First disconnect is silent — mobile Authelia idle timeouts
+// and tab suspensions trip reconnects constantly and the old red "websocket
+// error" toast made the app feel broken. Only escalate once the failure is
+// sustained.
+const TOAST_AFTER_FAILS = 3;
+const TOAST_AFTER_MS = 15_000;
+const PERSISTENT_AFTER_MS = 60_000;
+
+export interface WsReconnectToast {
+  onFailure(): void;
+  onConnected(): void;
+  dispose(): void;
+}
+
+interface Options {
+  label: string;            // shown in the toast (e.g. "Live updates")
+  onRetryNow: () => void;   // user clicked the "Retry now" action
+}
+
+// Per-connection toast wrapper. Caller drives it from the WS lifecycle:
+//   onFailure   — after each failed connection attempt
+//   onConnected — after a successful onopen
+//   dispose     — on hook unmount
+// The wrapper itself runs no timers and does not change the caller's reconnect
+// cadence; it only decides when to show / dismiss the toast.
+export function createWsReconnectToast(opts: Options): WsReconnectToast {
+  let firstFailureAt: number | null = null;
+  let failureCount = 0;
+  let reconnectingId: string | number | null = null;
+  let persistentId: string | number | null = null;
+
+  function dismissReconnecting(): void {
+    if (reconnectingId !== null) {
+      toast.dismiss(reconnectingId);
+      reconnectingId = null;
+    }
+  }
+  function dismissPersistent(): void {
+    if (persistentId !== null) {
+      toast.dismiss(persistentId);
+      persistentId = null;
+    }
+  }
+
+  return {
+    onFailure() {
+      if (firstFailureAt === null) firstFailureAt = Date.now();
+      failureCount += 1;
+      const elapsed = Date.now() - firstFailureAt;
+
+      // Escalate to red error + Retry button after PERSISTENT_AFTER_MS. Replaces
+      // the gray toast if it's still showing.
+      if (persistentId === null && elapsed >= PERSISTENT_AFTER_MS) {
+        dismissReconnecting();
+        persistentId = toast.error(`${opts.label}: connection lost`, {
+          duration: Infinity,
+          action: {
+            label: 'Retry now',
+            onClick: () => {
+              dismissReconnecting();
+              dismissPersistent();
+              opts.onRetryNow();
+            },
+          },
+        });
+        return;
+      }
+
+      // Gray "reconnecting…" toast once we've crossed either threshold.
+      if (
+        reconnectingId === null &&
+        persistentId === null &&
+        (failureCount >= TOAST_AFTER_FAILS || elapsed >= TOAST_AFTER_MS)
+      ) {
+        reconnectingId = toast.warning(`${opts.label}: reconnecting…`, {
+          duration: Infinity,
+        });
+      }
+    },
+    onConnected() {
+      firstFailureAt = null;
+      failureCount = 0;
+      dismissReconnecting();
+      dismissPersistent();
+    },
+    dispose() {
+      firstFailureAt = null;
+      failureCount = 0;
+      dismissReconnecting();
+      dismissPersistent();
+    },
+  };
+}
--- a/boocode_roadmap.md
+++ b/boocode_roadmap.md
@@ -323,6 +323,10 @@ Full inventory in `boocode_code_review.md`. Headline items:
 - **codeprysm rejected** — embedding-based; node/edge taxonomy noted as reference if we ever build our own graph.
 - **Batch 9 decoupled from Batch 7 (2026-05-16).** AgentPicker mounts in `ChatInput.tsx` toolbar only. SettingsDrawer agent entry and Header active-agent badge moved to Batch 7. Builtin defaults shipped: six agents (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor, Prompt Builder) with no `model` field — session model wins by default.

+## Follow-ups (post-ship docs / cleanup)
+
+- **After v1.8.2 ships:** Add explicit `max_tool_calls: 30` to all 6 agents in `/data/AGENTS.md` and `/opt/boocode/AGENTS.md`. Purely for documentation/discoverability — defaults handle behavior identically (all 6 agents use only read-only tools, default is already 30).
+
 -----

 ## Workflow
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -15,6 +15,9 @@ services:
      # Host must `mkdir -p /opt/projects` before container start.
      - /opt/projects:/opt/projects:rw
      - ./secrets/boocode_gitea:/root/.ssh/id_ed25519:ro
+      # v1.8.1: global agents file. Host seeds it once before deploy:
+      # cp /opt/boocode/AGENTS.md /opt/boocode/data/AGENTS.md
+      - ./data:/data:ro
    depends_on:
      - boocode_db
    networks:
Author	SHA1	Message	Date
indifferentketchup	5c61cc7281	v1.8.2: tool loop cap-hit summary + tool call UI compaction Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for read-only-only agents, 10 for agents that include any non-read-only tool, 15 for raw chat. When the loop hits cap, fire one final summary call with tools disabled, stream the wrap-up into the in-flight assistant message, then insert a system sentinel with metadata.kind='cap_hit'. The sentinel renders an amber bubble with a Continue button (latest sentinel only) that POSTs to a new /api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per chat (2 continues max) — third sentinel reports can_continue=false. Error frames carry a machine-readable reason code alongside human error text. Failed messages persist the reason via metadata.kind='error' so the bubble renders specifics on reload (WS error frame is one-shot). Tool call UI rewired: ToolCallLine renders inline (↳ name args spinner/check/✗, expand-on-tap for args+result); ToolCallGroup collapses 3+ consecutive same-tool runs into a compact card. MessageList owns a three-pass pre-render (flatten + fold tool results onto matching runs by id + group same-tool runs + number sentinels). MessageBubble drops tool rendering and adds the sentinel / error-reason branches. ToolCallCard deleted. Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6 agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for discoverability (defaults handle behavior identically). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 10:31:32 +00:00
indifferentketchup	5422c47928	gitignore data/ for global AGENTS.md The /data dir is host-mounted into the container at /data:ro and holds the global AGENTS.md seed (v1.8.1). It is part of the deployment contract — anyone cloning needs to mkdir data/ + cp AGENTS.md into it themselves — so the directory itself should never be tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:50:47 +00:00
indifferentketchup	b09d0ffde0	Merge v1.8.1	2026-05-16 23:16:38 +00:00
indifferentketchup	12d91c9a12	v1.8.1: global agents + parser robustness + WS reconnect toast Builtins move out of code into /data/AGENTS.md (always-on, mounted ro into the container); per-project AGENTS.md is now an optional override. agents.ts merges global + project entries with project-wins-by-name and caches per-source mtimes (60s TTL). Parser switches to per-block try/catch and returns AgentsResponse { agents, errors[] } so one malformed block no longer fails the file. AgentPicker shows a non-blocking amber chip listing skipped blocks and only fires a gray toast when zero agents loaded. WS reconnect UX (useUserEvents + useSessionStream) now silent on the first disconnect; createWsReconnectToast escalates to gray after 3 failures or 15 s, then to red with a Retry Now action after 60 s. useSessionStream also gained the exponential-backoff reconnect it was missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:16:02 +00:00