v1.13.17-cross-repo-reads: on-demand read access to paths outside the project root

When the agent needed context from another repo, pathGuard rejected every read with no recovery path. This batch adds a reactive request_read_access flow: pathGuard's error now hints at the tool, the model emits a structured request, the inference loop pauses (same mechanism as ask_user_input), the user picks Allow/Deny via inline chips, and subsequent reads under the granted root succeed for the rest of the session. Schema: sessions.allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[] (idempotent ADD COLUMN IF NOT EXISTS). Grant unit (design D1): nearest registered projects.path ancestor → nearest repo-shaped ancestor (.git/ / package.json / go.mod / Cargo.toml) under PROJECT_ROOT_WHITELIST → else refuse. grant_resolver.ts walks ancestors with a per-iteration whitelist invariant check so symlinked input can't escape the whitelist mid-walk (Sam's checkpoint-1 ask). Path-guard: optional extraRoots arg threaded from session.allowed_read_paths through executeToolCall to view_file / list_dir / grep / find_files. The ToolDef.execute signature gets an optional third param; non-FS tools ignore it. view_file re-anchors the secret-guard check on basename(real) whenever a relative path starts with "../" so .env / id_rsa* etc. still deny across grant roots. Endpoint: POST /api/chats/:id/grant_read_access mirrors /answer_user_input. On 'allow' it re-resolves the grant root (state may have changed since prompt — auto-falls to denial reason text on failure, not 500), array_appends to sessions.allowed_read_paths with in-memory dedup, then publishes tool_result + session_updated frames and enqueues the next assistant turn. PATCH /api/sessions/:id allowed_read_paths supports revocation only. Zod refines absolute + no traversal markers; runtime findUnauthorizedAdditions guard rejects any entry not already present in the row, so a malicious curl -X PATCH -d '{"allowed_read_paths":["/etc"]}' returns 400 instead of bypassing the grant flow (Sam's compliance-review action item). Frontend: RequestReadAccessCard renders pending (path + reason + Allow/Deny) and answered (granted/denied summary with the resolved root) variants; MessageList.flatten/group special-cases the tool name; SettingsPane adds a per-session grants list with per-row revoke that PATCHes the shortened array. Tests: 11 grant_resolver, 8 path_guard, 8 sessions PATCH subset, including explicit cases for symlink escape mid-walk, walk-bound termination at whitelist root, /etc bypass attempt via PATCH, and nearest-project disambiguation. 292 total server tests green. Pairs with v1.13.16-xml-parser — the model now self-recovers from both a wrong tool name AND from a refused path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:45:52 +00:00
parent 2e1a81de72
commit b52c5df705
21 changed files with 1610 additions and 41 deletions
--- a/openspec/changes/v1.13.17-cross-repo-reads/proposal.md
+++ b/openspec/changes/v1.13.17-cross-repo-reads/proposal.md
@@ -0,0 +1,185 @@
+# v1.13.17-cross-repo-reads — on-demand read access to another repo (draft, 2026-05-22)
+
+BooChat sessions are scoped to one project root. When the agent needs context from another repo (e.g. `/opt/forks/codecontext` to investigate a dependency), `pathGuard` rejects every read tool and the agent has no recovery path.
+
+This batch adds a reactive `ask_user_input`-style flow that the agent triggers on `PathScopeError`. User approves once per session per project root; subsequent reads under that root succeed without further prompting.
+
+## Trigger flow
+
+1. Model emits `view_file("/opt/forks/codecontext/go.mod")` while session is scoped to `/opt/boocode`.
+2. `pathGuard` throws `PathScopeError`. Existing tool wrapper catches it and returns the error to the model. **The error message now ends with a hint:** `"Use request_read_access(path, reason) to ask the user for permission."`
+3. Model self-issues `request_read_access("/opt/forks/codecontext/go.mod", "investigating codecontext fork to write design doc")` on the next turn.
+4. The new tool emits a pending tool-call frame (same pause mechanism as `ask_user_input`); inference loop pauses.
+5. Frontend renders approve/deny chips with the path + reason.
+6. User picks Allow → append the grant root to `session.allowed_read_paths`, resume inference, tool returns `"granted: /opt/forks/codecontext"`. Model retries the original `view_file` on the next turn.
+7. User picks Deny → tool returns `"denied"` without mutating session state; model decides what to do next.
+
+## Decisions (draft — override in dispatch if different)
+
+### D1. Grant unit = nearest registered project root, then nearest path-whitelist ancestor, then refuse
+
+When user approves access to `/opt/forks/codecontext/go.mod`:
+- If a row in `projects.path` is an ancestor of the requested path → grant the project's root path.
+- Else if `PROJECT_ROOT_WHITELIST` env (default `/opt`) is an ancestor and the immediate child dir of the whitelist looks like a repo root (`.git/`, `package.json`, `go.mod`, or `Cargo.toml` present) → grant that immediate child dir (e.g. `/opt/forks/codecontext`).
+- Else → refuse without prompting. Tool returns `"denied: path outside permitted scope"`. No user prompt fires.
+
+Why: granting the literal path is too narrow (next file in the same repo re-prompts). Granting an arbitrary parent dir over-scopes. The nearest repo-shaped directory is the natural unit.
+
+### D2. Persistence = per-session, no expiry
+
+`sessions.allowed_read_paths` is the source of truth. Grants stick until the session is archived. A new session in the same project re-prompts on the first cross-repo read.
+
+Why: per-chat is too granular for the typical workflow (Sam investigates the same fork across multiple chats in one investigation session). Per-project is too broad (different sessions in the same project might have different scope needs). Per-session is the natural unit and matches `session.web_search_enabled`'s scope.
+
+### D3. Secret-file deny list applies across all grant roots
+
+`is_secret_path` in `secret_guard.ts` filters filenames (`.env`, `*.pem`, `credentials.json`, etc.) regardless of which root they're under. The check is post-`pathGuard`, so it already runs on the resolved path. No change needed.
+
+### D4. Revocation UI = chat-settings panel + automatic clear on archive
+
+- Settings panel under the session-info popover: lists current `allowed_read_paths` with a per-row delete button.
+- Session archive deletes the row (no need to clear allowed_read_paths separately — the row goes).
+- No expiry timer.
+
+Optional v1.13.18 follow-up if Sam wants it: a `/clear_grants` slash command for power users. Out of scope for v1.13.17.
+
+## Schema
+
+```sql
+-- v1.13.17: session-scoped cross-repo read grants. Populated via the
+-- request_read_access tool's approve path; never written by other code.
+ALTER TABLE sessions
+  ADD COLUMN IF NOT EXISTS allowed_read_paths text[] NOT NULL DEFAULT ARRAY[]::text[];
+```
+
+No CHECK constraint — values are absolute paths validated at write time against the projects table + whitelist heuristic.
+
+## New tool: `request_read_access`
+
+```ts
+// apps/server/src/services/request_read_access.ts (new)
+
+export const requestReadAccessInput = z.object({
+  path: z.string().min(1),
+  reason: z.string().min(1).max(500),
+});
+
+export const requestReadAccess: ToolDef<...> = {
+  name: 'request_read_access',
+  description:
+    'Ask the user for read-only access to a path outside the current ' +
+    'session\'s project scope. Use when pathGuard rejected a read ' +
+    'attempt and the path is plausibly under another known repo. ' +
+    'Returns "granted: <root>" or "denied".',
+  inputSchema: requestReadAccessInput,
+  jsonSchema: { ... },
+  category: 'read_only',
+  async execute(input, projectRoot) {
+    // Validate path: must be absolute, must be under PROJECT_ROOT_WHITELIST
+    // (default /opt), must NOT already be under the session's primary
+    // projectRoot (silly to ask for what's already in scope).
+    // Validation failures return sentinel without prompting the user.
+
+    // Emit pending-grant tool result (parallel of ask_user_input's pause
+    // sentinel). Inference loop pauses on this kind=pending_grant marker.
+    // User picks Allow/Deny via a new POST /api/messages/:id/grant endpoint.
+    // On Allow: derive grant root per D1 + UPDATE sessions SET
+    //   allowed_read_paths = array_append(allowed_read_paths, <root>);
+    //   resume inference; tool returns "granted: <root>".
+    // On Deny: resume immediately; tool returns "denied".
+  },
+};
+```
+
+Registered in `ALL_TOOLS` + `READ_ONLY_TOOL_NAMES`. Available to all agents by default (no agent's `tools` whitelist needs to be updated to grant access — the tool registry's filter is per-agent).
+
+## `pathGuard` extension
+
+```ts
+// apps/server/src/services/path_guard.ts — current signature:
+//   pathGuard(projectRoot, requestedPath): Promise<string>
+//
+// Extended:
+//   pathGuard(projectRoot, requestedPath, extraRoots?: string[]): Promise<string>
+//
+// Tries primary projectRoot first; on PathScopeError, walks extraRoots and
+// returns the first one that resolves the requestedPath inside its tree.
+// Throws PathScopeError if no root accepts.
+```
+
+Every tool that calls `pathGuard` (currently `view_file`, `list_dir`, `grep`, `find_files`, `view_truncated_output`) threads `session.allowed_read_paths` through `executeToolCall`. The `Session` interface already flows through `TurnArgs`; tool-phase just needs to forward `session.allowed_read_paths` as the third arg.
+
+## Pause/resume infrastructure reuse
+
+The pending-grant pause uses the **same mechanism as `ask_user_input`**:
+- Tool insert with `payload.output = null` + `payload.kind = 'pending_grant'`.
+- `pausingForUserInput` branch in `tool-phase.ts` is widened to also catch pending grants.
+- `chat_status` flips to `waiting_for_input` per the v1.12.1 5-state model.
+
+New endpoint `POST /api/messages/:tool_msg_id/grant` (parallel of the existing `/answer`):
+- Body: `{ decision: 'allow' | 'deny' }`.
+- Resolves grant root per D1 if Allow. UPDATEs `sessions.allowed_read_paths`. UPDATEs tool message with output. Resumes inference via existing enqueue path.
+
+## Frontend changes (in scope; small)
+
+- `MessageBubble.tsx`: render `pending_grant` tool messages with Allow/Deny chips + the path + reason text. Wires to `api.messages.grant(toolMsgId, decision)`.
+- New API client method `api.messages.grant`.
+- Settings popover: `allowed_read_paths` list with per-row delete (calls `PATCH /api/sessions/:id` with the modified array).
+
+## Hard rules
+
+- No git commit, no git push, no git pull during dispatch. Sam commits manually.
+- Backup every file before edit per the standard convention.
+- TS strict, no `any`.
+- No new deps.
+- Schema migration is **additive only** (ADD COLUMN IF NOT EXISTS), idempotent on re-run.
+- Tool is **read-only** — no path under `allowed_read_paths` can ever be written by BooChat (no write tools registered today; this is a structural guarantee).
+- Secret-file deny list still runs unconditionally on resolved paths.
+
+## Stop checkpoints
+
+1. After recon (read existing path_guard + ask_user_input + answer endpoint patterns): stop, hand back the recon report.
+2. After code edits, before schema migration applies: stop, hand back the diff.
+3. After schema migration applies in dev: stop, run smoke plan, report.
+
+## Smoke plan
+
+1. **Approve flow.** Send a chat in a `/opt/boocode` session asking the agent to investigate `/opt/forks/codecontext/go.mod`. Confirm:
+   - `pathGuard` throws on the first attempt; tool result includes the `request_read_access` hint.
+   - Agent calls `request_read_access`; tool-call frame lands; chat status flips to `waiting_for_input`.
+   - Frontend renders Allow/Deny chips with the path + reason.
+   - Pick Allow → grant root resolves to `/opt/forks/codecontext` (per D1); `sessions.allowed_read_paths` shows the entry; agent retries `view_file` successfully on the next turn.
+2. **Deny flow.** Same setup; pick Deny. Confirm session state unchanged, tool returns `"denied"`, agent gives up or asks differently.
+3. **Persistence.** In the same session, a second `view_file` against a different file under `/opt/forks/codecontext/` succeeds without re-prompting.
+4. **Cross-session isolation.** Open a fresh session in the boocode project, try the same path — re-prompts (allowed_read_paths is empty on the new session).
+5. **Secret-file deny still fires.** Approve access to a repo that contains a `.env` file. Try `view_file('/opt/forks/some-repo/.env')`. Confirm refused via `is_secret_path`, not via pathGuard scope.
+6. **Out-of-scope refusal.** Try `request_read_access('/etc/passwd', 'system file')`. Tool validates against the whitelist + repo-shape heuristic, returns `"denied: path outside permitted scope"` without prompting the user.
+
+## Done when
+
+- New `request_read_access` tool + `POST /api/messages/:id/grant` endpoint shipped.
+- `path_guard.ts` extended; all read tools forward `allowed_read_paths`.
+- `MessageBubble.tsx` renders pending-grant bubbles; settings popover lists + clears grants.
+- Schema migration applied (sessions.allowed_read_paths).
+- Smoke plan green.
+- v1.13.17-cross-repo-reads tag + CHANGELOG entry + roadmap retrospective bullet.
+
+## Files expected to touch
+
+- `apps/server/src/schema.sql` — new column
+- `apps/server/src/services/request_read_access.ts` — NEW
+- `apps/server/src/services/path_guard.ts` — extra-roots param + helpful PathScopeError message
+- `apps/server/src/services/tools.ts` — register the new tool, update view_file / list_dir / grep / find_files / view_truncated_output to thread allowed_read_paths
+- `apps/server/src/services/inference/tool-phase.ts` — pause-on-pending-grant branch (alongside ask_user_input)
+- `apps/server/src/routes/messages.ts` — new `/grant` endpoint
+- `apps/server/src/types/api.ts` — `Session.allowed_read_paths`
+- `apps/web/src/api/client.ts` — `api.messages.grant`
+- `apps/web/src/api/types.ts` — `Session.allowed_read_paths`
+- `apps/web/src/components/MessageBubble.tsx` — render pending_grant chips
+- `apps/web/src/components/` — settings-popover grants list (file TBD during impl)
+
+Estimate: ~120 LoC across backend + frontend + schema. Single batch.
+
+## Open questions for dispatch
+
+The four design decisions above are my recommendations. Override any of them in the dispatch and I'll update the proposal before recon. Most likely-overridable: **D1** (grant unit — you may want exact-path-only for tighter scoping, accepting the re-prompt cost) and **D4** (revocation UI — you may want it deferred entirely).