v1.13.17-cross-repo-reads: on-demand read access to paths outside the project root

When the agent needed context from another repo, pathGuard rejected every read
with no recovery path. This batch adds a reactive request_read_access flow:
pathGuard's error now hints at the tool, the model emits a structured request,
the inference loop pauses (same mechanism as ask_user_input), the user picks
Allow/Deny via inline chips, and subsequent reads under the granted root succeed
for the rest of the session.

Schema: sessions.allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[]
(idempotent ADD COLUMN IF NOT EXISTS).

Grant unit (design D1): nearest registered projects.path ancestor →
nearest repo-shaped ancestor (.git/ / package.json / go.mod / Cargo.toml)
under PROJECT_ROOT_WHITELIST → else refuse. grant_resolver.ts walks
ancestors with a per-iteration whitelist invariant check so symlinked
input can't escape the whitelist mid-walk (Sam's checkpoint-1 ask).

Path-guard: optional extraRoots arg threaded from session.allowed_read_paths
through executeToolCall to view_file / list_dir / grep / find_files. The
ToolDef.execute signature gets an optional third param; non-FS tools
ignore it. view_file re-anchors the secret-guard check on basename(real)
whenever a relative path starts with "../" so .env / id_rsa* etc. still
deny across grant roots.

Endpoint: POST /api/chats/:id/grant_read_access mirrors /answer_user_input.
On 'allow' it re-resolves the grant root (state may have changed since
prompt — auto-falls to denial reason text on failure, not 500), array_appends
to sessions.allowed_read_paths with in-memory dedup, then publishes
tool_result + session_updated frames and enqueues the next assistant turn.

PATCH /api/sessions/:id allowed_read_paths supports revocation only. Zod
refines absolute + no traversal markers; runtime findUnauthorizedAdditions
guard rejects any entry not already present in the row, so a malicious
curl -X PATCH -d '{"allowed_read_paths":["/etc"]}' returns 400 instead of
bypassing the grant flow (Sam's compliance-review action item).

Frontend: RequestReadAccessCard renders pending (path + reason + Allow/Deny)
and answered (granted/denied summary with the resolved root) variants;
MessageList.flatten/group special-cases the tool name; SettingsPane adds a
per-session grants list with per-row revoke that PATCHes the shortened
array.

Tests: 11 grant_resolver, 8 path_guard, 8 sessions PATCH subset, including
explicit cases for symlink escape mid-walk, walk-bound termination at
whitelist root, /etc bypass attempt via PATCH, and nearest-project
disambiguation. 292 total server tests green.

Pairs with v1.13.16-xml-parser — the model now self-recovers from both
a wrong tool name AND from a refused path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 21:45:52 +00:00
parent 2e1a81de72
commit b52c5df705
21 changed files with 1610 additions and 41 deletions

View File

@@ -0,0 +1,185 @@
# v1.13.17-cross-repo-reads — on-demand read access to another repo (draft, 2026-05-22)
BooChat sessions are scoped to one project root. When the agent needs context from another repo (e.g. `/opt/forks/codecontext` to investigate a dependency), `pathGuard` rejects every read tool and the agent has no recovery path.
This batch adds a reactive `ask_user_input`-style flow that the agent triggers on `PathScopeError`. User approves once per session per project root; subsequent reads under that root succeed without further prompting.
## Trigger flow
1. Model emits `view_file("/opt/forks/codecontext/go.mod")` while session is scoped to `/opt/boocode`.
2. `pathGuard` throws `PathScopeError`. Existing tool wrapper catches it and returns the error to the model. **The error message now ends with a hint:** `"Use request_read_access(path, reason) to ask the user for permission."`
3. Model self-issues `request_read_access("/opt/forks/codecontext/go.mod", "investigating codecontext fork to write design doc")` on the next turn.
4. The new tool emits a pending tool-call frame (same pause mechanism as `ask_user_input`); inference loop pauses.
5. Frontend renders approve/deny chips with the path + reason.
6. User picks Allow → append the grant root to `session.allowed_read_paths`, resume inference, tool returns `"granted: /opt/forks/codecontext"`. Model retries the original `view_file` on the next turn.
7. User picks Deny → tool returns `"denied"` without mutating session state; model decides what to do next.
## Decisions (draft — override in dispatch if different)
### D1. Grant unit = nearest registered project root, then nearest path-whitelist ancestor, then refuse
When user approves access to `/opt/forks/codecontext/go.mod`:
- If a row in `projects.path` is an ancestor of the requested path → grant the project's root path.
- Else if `PROJECT_ROOT_WHITELIST` env (default `/opt`) is an ancestor and the immediate child dir of the whitelist looks like a repo root (`.git/`, `package.json`, `go.mod`, or `Cargo.toml` present) → grant that immediate child dir (e.g. `/opt/forks/codecontext`).
- Else → refuse without prompting. Tool returns `"denied: path outside permitted scope"`. No user prompt fires.
Why: granting the literal path is too narrow (next file in the same repo re-prompts). Granting an arbitrary parent dir over-scopes. The nearest repo-shaped directory is the natural unit.
### D2. Persistence = per-session, no expiry
`sessions.allowed_read_paths` is the source of truth. Grants stick until the session is archived. A new session in the same project re-prompts on the first cross-repo read.
Why: per-chat is too granular for the typical workflow (Sam investigates the same fork across multiple chats in one investigation session). Per-project is too broad (different sessions in the same project might have different scope needs). Per-session is the natural unit and matches `session.web_search_enabled`'s scope.
### D3. Secret-file deny list applies across all grant roots
`is_secret_path` in `secret_guard.ts` filters filenames (`.env`, `*.pem`, `credentials.json`, etc.) regardless of which root they're under. The check is post-`pathGuard`, so it already runs on the resolved path. No change needed.
### D4. Revocation UI = chat-settings panel + automatic clear on archive
- Settings panel under the session-info popover: lists current `allowed_read_paths` with a per-row delete button.
- Session archive deletes the row (no need to clear allowed_read_paths separately — the row goes).
- No expiry timer.
Optional v1.13.18 follow-up if Sam wants it: a `/clear_grants` slash command for power users. Out of scope for v1.13.17.
## Schema
```sql
-- v1.13.17: session-scoped cross-repo read grants. Populated via the
-- request_read_access tool's approve path; never written by other code.
ALTER TABLE sessions
ADD COLUMN IF NOT EXISTS allowed_read_paths text[] NOT NULL DEFAULT ARRAY[]::text[];
```
No CHECK constraint — values are absolute paths validated at write time against the projects table + whitelist heuristic.
## New tool: `request_read_access`
```ts
// apps/server/src/services/request_read_access.ts (new)
export const requestReadAccessInput = z.object({
path: z.string().min(1),
reason: z.string().min(1).max(500),
});
export const requestReadAccess: ToolDef<...> = {
name: 'request_read_access',
description:
'Ask the user for read-only access to a path outside the current ' +
'session\'s project scope. Use when pathGuard rejected a read ' +
'attempt and the path is plausibly under another known repo. ' +
'Returns "granted: <root>" or "denied".',
inputSchema: requestReadAccessInput,
jsonSchema: { ... },
category: 'read_only',
async execute(input, projectRoot) {
// Validate path: must be absolute, must be under PROJECT_ROOT_WHITELIST
// (default /opt), must NOT already be under the session's primary
// projectRoot (silly to ask for what's already in scope).
// Validation failures return sentinel without prompting the user.
// Emit pending-grant tool result (parallel of ask_user_input's pause
// sentinel). Inference loop pauses on this kind=pending_grant marker.
// User picks Allow/Deny via a new POST /api/messages/:id/grant endpoint.
// On Allow: derive grant root per D1 + UPDATE sessions SET
// allowed_read_paths = array_append(allowed_read_paths, <root>);
// resume inference; tool returns "granted: <root>".
// On Deny: resume immediately; tool returns "denied".
},
};
```
Registered in `ALL_TOOLS` + `READ_ONLY_TOOL_NAMES`. Available to all agents by default (no agent's `tools` whitelist needs to be updated to grant access — the tool registry's filter is per-agent).
## `pathGuard` extension
```ts
// apps/server/src/services/path_guard.ts — current signature:
// pathGuard(projectRoot, requestedPath): Promise<string>
//
// Extended:
// pathGuard(projectRoot, requestedPath, extraRoots?: string[]): Promise<string>
//
// Tries primary projectRoot first; on PathScopeError, walks extraRoots and
// returns the first one that resolves the requestedPath inside its tree.
// Throws PathScopeError if no root accepts.
```
Every tool that calls `pathGuard` (currently `view_file`, `list_dir`, `grep`, `find_files`, `view_truncated_output`) threads `session.allowed_read_paths` through `executeToolCall`. The `Session` interface already flows through `TurnArgs`; tool-phase just needs to forward `session.allowed_read_paths` as the third arg.
## Pause/resume infrastructure reuse
The pending-grant pause uses the **same mechanism as `ask_user_input`**:
- Tool insert with `payload.output = null` + `payload.kind = 'pending_grant'`.
- `pausingForUserInput` branch in `tool-phase.ts` is widened to also catch pending grants.
- `chat_status` flips to `waiting_for_input` per the v1.12.1 5-state model.
New endpoint `POST /api/messages/:tool_msg_id/grant` (parallel of the existing `/answer`):
- Body: `{ decision: 'allow' | 'deny' }`.
- Resolves grant root per D1 if Allow. UPDATEs `sessions.allowed_read_paths`. UPDATEs tool message with output. Resumes inference via existing enqueue path.
## Frontend changes (in scope; small)
- `MessageBubble.tsx`: render `pending_grant` tool messages with Allow/Deny chips + the path + reason text. Wires to `api.messages.grant(toolMsgId, decision)`.
- New API client method `api.messages.grant`.
- Settings popover: `allowed_read_paths` list with per-row delete (calls `PATCH /api/sessions/:id` with the modified array).
## Hard rules
- No git commit, no git push, no git pull during dispatch. Sam commits manually.
- Backup every file before edit per the standard convention.
- TS strict, no `any`.
- No new deps.
- Schema migration is **additive only** (ADD COLUMN IF NOT EXISTS), idempotent on re-run.
- Tool is **read-only** — no path under `allowed_read_paths` can ever be written by BooChat (no write tools registered today; this is a structural guarantee).
- Secret-file deny list still runs unconditionally on resolved paths.
## Stop checkpoints
1. After recon (read existing path_guard + ask_user_input + answer endpoint patterns): stop, hand back the recon report.
2. After code edits, before schema migration applies: stop, hand back the diff.
3. After schema migration applies in dev: stop, run smoke plan, report.
## Smoke plan
1. **Approve flow.** Send a chat in a `/opt/boocode` session asking the agent to investigate `/opt/forks/codecontext/go.mod`. Confirm:
- `pathGuard` throws on the first attempt; tool result includes the `request_read_access` hint.
- Agent calls `request_read_access`; tool-call frame lands; chat status flips to `waiting_for_input`.
- Frontend renders Allow/Deny chips with the path + reason.
- Pick Allow → grant root resolves to `/opt/forks/codecontext` (per D1); `sessions.allowed_read_paths` shows the entry; agent retries `view_file` successfully on the next turn.
2. **Deny flow.** Same setup; pick Deny. Confirm session state unchanged, tool returns `"denied"`, agent gives up or asks differently.
3. **Persistence.** In the same session, a second `view_file` against a different file under `/opt/forks/codecontext/` succeeds without re-prompting.
4. **Cross-session isolation.** Open a fresh session in the boocode project, try the same path — re-prompts (allowed_read_paths is empty on the new session).
5. **Secret-file deny still fires.** Approve access to a repo that contains a `.env` file. Try `view_file('/opt/forks/some-repo/.env')`. Confirm refused via `is_secret_path`, not via pathGuard scope.
6. **Out-of-scope refusal.** Try `request_read_access('/etc/passwd', 'system file')`. Tool validates against the whitelist + repo-shape heuristic, returns `"denied: path outside permitted scope"` without prompting the user.
## Done when
- New `request_read_access` tool + `POST /api/messages/:id/grant` endpoint shipped.
- `path_guard.ts` extended; all read tools forward `allowed_read_paths`.
- `MessageBubble.tsx` renders pending-grant bubbles; settings popover lists + clears grants.
- Schema migration applied (sessions.allowed_read_paths).
- Smoke plan green.
- v1.13.17-cross-repo-reads tag + CHANGELOG entry + roadmap retrospective bullet.
## Files expected to touch
- `apps/server/src/schema.sql` — new column
- `apps/server/src/services/request_read_access.ts` — NEW
- `apps/server/src/services/path_guard.ts` — extra-roots param + helpful PathScopeError message
- `apps/server/src/services/tools.ts` — register the new tool, update view_file / list_dir / grep / find_files / view_truncated_output to thread allowed_read_paths
- `apps/server/src/services/inference/tool-phase.ts` — pause-on-pending-grant branch (alongside ask_user_input)
- `apps/server/src/routes/messages.ts` — new `/grant` endpoint
- `apps/server/src/types/api.ts``Session.allowed_read_paths`
- `apps/web/src/api/client.ts``api.messages.grant`
- `apps/web/src/api/types.ts``Session.allowed_read_paths`
- `apps/web/src/components/MessageBubble.tsx` — render pending_grant chips
- `apps/web/src/components/` — settings-popover grants list (file TBD during impl)
Estimate: ~120 LoC across backend + frontend + schema. Single batch.
## Open questions for dispatch
The four design decisions above are my recommendations. Override any of them in the dispatch and I'll update the proposal before recon. Most likely-overridable: **D1** (grant unit — you may want exact-path-only for tighter scoping, accepting the re-prompt cost) and **D4** (revocation UI — you may want it deferred entirely).