Files
boocode/docs/DEFERRED-WORK.md
indifferentketchup 314adaae48 docs: reconcile roadmap, README, and deferred work for v2.2 ship state
Mark v2.2/v2.2.1 shipped and v2.3 planned in roadmap and README; fix
DEFERRED-WORK §2 (ACP probe skip is planned, not resolved).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-26 15:27:16 +00:00

15 KiB
Raw Blame History

Deferred work — post stale cleanup (2026-05-26)

This document describes work intentionally not shipped in the 2026-05-26 stale/simplify batch. Each item needs a product or architecture decision before implementation. See also STALE-DEPRECATED.md for what was resolved in that batch.

Last updated: 2026-05-26


Summary

Item Category User impact Effort Risk if left alone
Task cancel → abort ACP/PTY child Correctness / UX High — Stop does not kill external agents Medium Zombie processes, stuck running tasks, orphaned worktrees
Skip ACP cold probe when DB fresh Performance Medium — composer open can stall 530s on cache miss Medium (v2.3 batch) Slow provider picker; repeated ACP spawns on every snapshot rebuild
Unified packages/types Maintainability Low (dev-only) MediumHigh Type drift between server, coder, web
Large file splits Maintainability None directly Medium per file Harder reviews, merge conflicts
Retire apps/coder/web/ fallback SPA Scope / ops Low — Sam uses CoderPane Medium Dual UI maintenance, divergent API client

1. Task cancel → abort external ACP/PTY child

Current behavior

External agent tasks (opencode, cursor, claude, qwen, goose, etc.) flow through apps/coder/src/services/dispatcher.ts path B:

  1. Create git worktree (worktrees.ts → local hostExec)
  2. Spawn ACP or PTY child with an AbortController (ac) passed as signal
  3. Stream assistant output over session WS
  4. Diff worktree → pending_changes
  5. Cleanup worktree

The abort signal is created per task but never registered anywhere cancel can reach:

// dispatcher.ts — ac is local to dispatchExternal(); no Map<taskId, AbortController>
const ac = new AbortController();
// ...
await dispatchViaAcp({ ..., signal: ac.signal });
// or
await dispatchViaPty({ ..., signal: ac.signal });

Cancel paths today:

Route What it does Gap
POST /api/tasks/:id/cancel Sets DB state = 'cancelled', calls cancelPendingPermission, tries inference.cancel() on session chats inference.cancel only affects native boocode streaming — not ACP/PTY children
POST /api/sessions/:sessionId/stop Same — loops open chats, calls inference.cancel Same gap for external tasks tied to that session
Dispatcher stop() on shutdown Sets stopping = true; in-flight external task may finish or mark cancelled at end No immediate child kill

dispatchViaAcp and dispatchViaPty do honor signal when aborted (worktree ops, stream read, spawn teardown). The wiring from HTTP cancel → ac.abort() is missing.

There is also no frontend calling task cancel today (grep across apps/web finds no cancelTask usage). Stop in CoderPane targets native inference only.

Symptoms users would see

  • Click Stop / cancel task → DB says cancelled but agent process keeps running on host
  • Assistant row may stay streaming until the child exits naturally
  • Worktree under /tmp/booworktrees/<taskId> may linger until dispatcher cleanup or manual removal
  • Permission prompts cancel correctly (cancelPendingPermission) — that part works

Proposed implementation

Phase A — backend registry (minimum viable)

  1. Add Map<taskId, AbortController> (or { ac, childPid? }) in dispatcher module scope
  2. On dispatchExternal start: activeAbort.set(taskId, ac)
  3. On task completion/failure/cleanup: activeAbort.delete(taskId)
  4. Export cancelExternalTask(taskId: string): boolean from dispatcher
  5. In POST /api/tasks/:id/cancel:
    • Call cancelExternalTask(taskId) before or instead of inference.cancel when task is external (execution_path or agent !== boocode)
    • Mark assistant message cancelled / failed and publish message_complete + chat_status: idle on session WS
  6. Wire POST /api/sessions/:sessionId/stop to find running external tasks for that session and abort them too

Phase B — child process kill

  • PTY: store child ref from spawn; on abort, child.kill('SIGTERM') then SIGKILL after timeout
  • ACP: acp-dispatch already uses signal; ensure createAcpNdJsonStream cancel kills the underlying spawn (verify in acp-stream.ts)
  • Worktree: on abort mid-flight, call cleanupWorktree in finally block

Phase C — frontend

  • CoderPane Stop button: if provider !== 'boocode' and a task_id is active, call POST /api/coder/tasks/:id/cancel (needs API client method)
  • Show cancelled state in UI (task row or composer status)

Files likely touched

  • apps/coder/src/services/dispatcher.ts — registry, export cancel
  • apps/coder/src/routes/tasks.ts — call dispatcher cancel
  • apps/coder/src/routes/messages.ts — session stop for external tasks
  • apps/coder/src/services/acp-dispatch.ts, pty-dispatch.ts — verify signal → kill
  • apps/web/src/api/client.ts, CoderPane.tsx — Stop wiring

Acceptance criteria

  • Cancel a running opencode/cursor task → child process gone within 5s (ps / pgrep on host)
  • Task row state = 'cancelled', assistant message not left streaming
  • Worktree directory removed (best-effort)
  • Native boocode cancel unchanged
  • No regression on permission-denied / blocked flows

Open questions

  • Should cancel be best-effort (mark cancelled even if kill fails) or fail closed until process dies?
  • Arena parallel tasks: cancel one contestant vs whole arena?
  • Blocked task waiting on permission: cancel already resolves waiter — confirm ACP session teardown order

2. Skip ACP cold probe when DB models are fresh

Status: Planned — openspec/changes/v2-3-provider-lifecycle/. Not shipped (no v2.3 tag; all tasks unchecked).

Current behavior (v2.2)

apps/coder/src/services/provider-snapshot.ts on cache miss:

  1. Reads available_agents.models as a fallback when building each entry
  2. Still cold-probes every installed ACP provider on every rebuild (probeAcpProvider in buildProviderEntry) — DB models do not skip the probe
  3. In-memory snapshot cache TTL is 5 minutes (CACHE_TTL_MS); opening AgentComposerBar calls GET /api/providers/snapshot via useProviderSnapshot
  4. POST /api/providers/refresh clears cache and forces a full rebuild (all probes again)
  5. Uninstalled agents are omitted from the snapshot (return null) — not listed as unavailable
  6. No config-file enable/disable; providers are hardcoded in provider-registry.ts

persistProbedModels() writes probe results back to available_agents (including last_probed_at), but nothing reads last_probed_at to skip tier-2 probes yet. There is no PROVIDER_PROBE_TTL_MS env var.

Planned behavior (v2.3)

See design.md:

  • Tier-1 fast binary check + tier-2 ACP session only when stale or explicitly refreshed
  • PROVIDER_PROBE_TTL_MS (default 24h) gate on available_agents.last_probed_at
  • Return loading synchronously on cache miss; complete via inflight promise
  • Always list registered providers (ready | unavailable | error | loading); respect enabled from /data/coder-providers.json
  • Cold probe only on POST /api/providers/refresh (or TTL expiry), not on every composer open

Why deferred past v2.2

v2.2 shipped the snapshot wire shape and ACP dispatch stack. Lifecycle semantics (config registry, enable/disable, probe TTL, settings UI) were scoped as the follow-on v2.3 batch to avoid mixing two large behavior changes in one tag.

Acceptance criteria (when v2.3 ships)

  • Second GET /api/providers/snapshot within TTL does not invoke probeAcpProvider (mock assert in tests)
  • Disabled provider visible in settings, absent from composer
  • Explicit refresh repopulates models; warm open is sub-second

3. Unified packages/types for provider snapshot JSON

Current behavior

Provider snapshot shapes are duplicated (not byte-identical exports):

Location Types
apps/coder/src/services/provider-types.ts ProviderSnapshotEntry, AgentCommand, ProviderModel, …
apps/web/src/api/types.ts Same names, hand-maintained
WS frames agent_commands frame in ws-frames.ts (server + web copy)

Today drift is caught by:

  • TypeScript at compile time when fields are added to one side only
  • provider-snapshot.test.ts on coder
  • Manual review during v2.2 batch

There is no packages/ workspace yet (monorepo is apps/server, apps/web, apps/coder, apps/booterm only).

Options

A. Zod schema + inferred types (lighter)

  • Add ProviderSnapshotEntrySchema in server or coder
  • Web imports schema for runtime validation on fetch (optional)
  • Parity test: expect(webTypeKeys).toEqual(zodShapeKeys) — similar to existing ws-frames.test.ts

B. Shared packages/types package

packages/types/
  package.json          # @boocode/types
  src/provider.ts       # interfaces + zod
  src/ws-frames.ts      # move duplicated unions here
  • apps/coder, apps/web, apps/server depend on workspace:*
  • Requires NodeNext export map, build order (tsc for types package first)
  • Biggest payoff if WS frames + provider + task types all move

C. Status quo + discipline

  • Keep duplicated TS interfaces
  • Add one test file importing both sides and asserting structural equality

Tradeoffs

Approach Setup cost Runtime safety Best when
A — Zod in coder Low Validate API responses Snapshot is coder-only API
B — packages/types High Full shared source Many cross-app types growing
C — parity test Lowest Compile-time only Rare schema changes

Recommendation

Start with A or C unless planning a broader “shared types” initiative (tasks, permissions, arena). Full packages/types is justified when a third consumer appears or WS frame duplication becomes painful again.

Files likely touched (option B)

  • New packages/types/
  • Root pnpm-workspace.yaml
  • apps/web/tsconfig, apps/coder/tsconfig — path references
  • Strip duplicate blocks from apps/web/src/api/types.ts

4. Large file splits (refactor when touched)

Not user-facing defects — deferred to avoid scope creep in the stale batch. Split when the next feature touches the file heavily.

4.1 CoderPane.tsx (~663 lines)

Responsibilities today (single component):

  • Session WS + polling fallback for messages
  • Pending changes fetch/apply/discard
  • External vs native send paths (AgentComposerBar, slash commands, skill invoke)
  • Permission card + agent commands hint
  • Provider snapshotdriven composer state

Suggested extractions:

Hook / module Owns
useCoderMessages(sessionId) WS subscribe, poll-when-disconnected, temp-id dedup
usePendingChanges(sessionId) List, apply, discard, diff preview trigger
CoderComposerFooter AgentComposerBar + hints + permission card layout

Trigger to do it: next CoderPane feature (e.g. task cancel UI, multi-task inbox) or when file exceeds ~800 lines.

4.2 ChatInput.tsx (~669 lines)

Responsibilities: attachments, drag/drop, paste-as-file, @ mentions, slash picker, agent picker row, context bar, send/submit, registry for send-to-chat.

Suggested extractions:

Hook / module Owns
useSlashCommandInput(...) slashState, isSlashCommandToken, picker open/close
useFileMention(...) @ detection, file index fetch, popover state
useAttachments(...) chips, drop, paste, MAX_ATTACHMENTS

Slash helpers already live in lib/slash-command.ts; hooks would consume them.

4.3 dispatcher.ts (~483 lines)

Two paths in one file:

  • Path A: native boocode — enqueue inference, wait for message completion
  • Path B: external — worktree, ACP/PTY, diff, pending_changes

Suggested split:

  • dispatcher-native.ts — poll loop pick-up for provider = boocode or no agent
  • dispatcher-external.ts — path B + abort registry (pairs with item §1)
  • dispatcher.ts — thin poll loop + routing

4.4 AgentComposerBar.tsx (~323 lines)

Optional split: provider/model/mode/thinking pickers vs persistence (AgentSessionConfig localStorage/session). Lower priority — file is manageable.

4.5 acp-dispatch.ts (~294 lines)

Stream setup already extracted to acp-stream.ts. Remaining split candidate: session lifecycle (create → prompt loop → teardown) vs permission/tool side effects.

General rule

Split only when:

  1. A new feature needs a clear seam, or
  2. Review feedback repeatedly hits the same file, or
  3. Tests need isolated units (e.g. dispatcher external path)

Avoid drive-by splits — they churn blame without shipping user value.


5. Retire apps/coder/web/ fallback SPA

What it is

Standalone Vite React app (@boocode/coder-web) built into apps/coder/web/dist/ and served by BooCoder Fastify at :9502 when no BooChat proxy is in front.

Primary UI: CoderPane inside BooChat SPA (apps/web) — API via /api/coder/* proxy, WS to :9502.

Fallback UI: direct visit to http://100.114.205.53:9502 — own api/client.ts, ChatPane, DiffPane, duplicated types.

Why it was kept

  • Host-only debugging without full BooChat stack
  • Historical path before CoderPane integration
  • Zero dependency on Authelia-protected BooChat origin

Cost of keeping

  • Separate build step in coder deploy
  • Duplicate message/stream types and API paths
  • Features land in CoderPane first; fallback rots unless manually updated

Options

Option When to choose
Keep Still use :9502 directly for debugging or demos
Freeze Stop feature work; build only on release for emergency access
Remove Always use BooChat; :9502 serves health + WS + API only (or minimal static “open in BooChat” page)

Removal checklist (if chosen)

  1. Confirm Sam never uses standalone UI (bookmarks, systemd docs, BOOCODER.md)
  2. Remove apps/coder/web/ package and static serve from apps/coder/src/index.ts
  3. Update Dockerfile/coder build scripts
  4. Keep WS + REST routes — CoderPane depends on them
  5. Optional: single-page static “Use code.indifferentketchup.com” at /

Suggested batch ordering

If picking these up as openspec batches:

  1. Task cancel abort — highest correctness gap; unblocks honest Stop button in CoderPane
  2. v2.3 provider lifecycle — probe TTL, config registry, enable/disable (includes §2 above)
  3. CoderPane hook extraction — natural follow-on when adding cancel UI
  4. Zod parity or packages/types — when next WS/provider field is added
  5. Retire coder/web — only after explicit “I dont use :9502 UI” confirmation