# Openspec Cleanup & High-Value Batch Implementation ## TL;DR > **Quick Summary**: Clean up the `openspec/` folder structure (delete 11 stub files, move 5 misplaced proposals, add missing `.openspec.yaml` files), then implement 5 high-value batches: llama-cache-and-spec, pty-enhancements, results-page, token-analyzer-ui, and enhanced-file-panel. > > **Deliverables**: > - Clean openspec folder: stubs removed, archived/ accurate, all batches schema-compliant > - llama-server KV cache quantization + ngram speculative decoding enabled > - PTY exit notifications and session metadata > - `/results` page for orchestrator runs and arena battles (new route) > - `/analytics` page for token usage dashboard (new route) > - Enhanced file panel: side-by-side diff, hide whitespace, wrap lines, expand/collapse all > > **Estimated Effort**: Medium-Large > **Parallel Execution**: YES — 3 waves + final verification > **Critical Path**: Cleanup → Backend impls → Frontend impls → Integration --- ## Context ### Original Request Analyze `openspec/` folder for structural issues, cross-reference against git tags, and create a work plan for implementing the high-value openspec batch proposals. ### Interview Summary **Key Discussions**: - `openspec/changes/` has 22 active batches (all uncommitted, all unshipped) plus `archived/` with 29 entries - 11 stub files in archived/ are pure noise (49-66 bytes each, "Status: Shipped. Archived." only) - 5 misplaced 2026-06-07 proposals were dumped in archived/ — they're active design docs, not shipped batches - 6 active batches missing `.openspec.yaml`; `openspec/config.yaml` is empty - Active proposals overlap: multiple batches cover evaluation, memory, and workflow engine territory **Research Findings**: - Git tag cross-reference confirms all folder-based archived entries match shipped tags - 3 stub files reference wrong tags (v1.13.12→v1.13.14, v1.14.x→v1.13.19, etc.) - All 22 active batches have zero git references — pure filesystem artifacts - No active batch has shipped yet — zero can be archived ### Metis Review Identified gaps: 1. **Deduplication needed**: 2026-06-07 proposals overlap with active changes/ — merging must happen before cleanup is complete 2. **Prioritization needed**: 22 batches can't all ship at once — need clear tiers 3. **User sign-off needed**: Which Tier 1-2 batches to include in this plan vs defer --- ## Work Objectives ### Core Objective Restore openspec structural integrity and ship the 5 highest-value, lowest-effort batch proposals. ### Concrete Deliverables - Clean openspec: stubs deleted (11 files ~573 bytes), misplaced proposals moved (5 folders), `.openspec.yaml` files added (6 batches), config.yaml populated - llama-cache-and-spec: KV cache quantization (Q4_0) + ngram speculative decoding enabled - pty-enhancements: PTY exit notifications, session metadata, X-Agent-Flags - results-page: `/results` route with Analysis Runs + Arena Battles tabs - token-analyzer-ui: `/analytics` route with token usage dashboard - enhanced-file-panel: side-by-side diff toggle, hide whitespace, wrap long lines, expand/collapse all ### Must Have - All 11 stub files removed from archived/ - 5 misplaced 2026-06-07 proposals moved from archived/ into `changes/` (or merged into existing batches) - `.openspec.yaml` added to all 6 missing batches - `openspec/config.yaml` gets a `context:` block and `rules:` block - llama-server restarts with new flags (verify via `ps aux | grep llama`) - `/results` page loads without 404 and shows real data from existing API endpoints - `/analytics` page loads and shows token aggregates - Side-by-side diff renders correctly for files with wide lines ### Must NOT Have (Guardrails) - **NO** breaking changes to existing routes or API contracts - **NO** new database tables or migrations (all data sources already exist) - **NO** external API dependencies (no cloud embedding models) - **NO** behavioral engine or Pregel state machine work (deferred to future batch) - **NO** touching the conductor flow runner or orchestrator pipeline - **NO** CSS framework changes (stay on Tailwind v4 / shadcn/ui) - **NO** backend changes unless explicitly required by the batch scope ### Spec Framework Integration - **Detected Framework**: OpenSpec (folder structure only — no CLI) - **Config File**: `openspec/config.yaml` - **Active Specs**: 22 batch folders in `openspec/changes/` - **Available Commands**: Manual folder/file operations (no OpenSpec CLI) --- ## Verification Strategy > **ZERO HUMAN INTERVENTION** — ALL verification is agent-executed. ### Test Decision - **Infrastructure exists**: YES (vitest in apps/server, apps/coder) - **Automated tests**: Tests-after (no TDD — these are config/frontend changes) - **Framework**: vitest for backend, Playwright for frontend verification ### QA Policy Every task includes agent-executed QA scenarios. Evidence saved to `.omo/evidence/`. - **Frontend**: Playwright — navigate, assert DOM elements, screenshot - **Backend**: Bash (curl) — send requests, assert status + response - **Config/Restart**: Bash — check processes, verify new flags - **File operations**: Bash — verify files exist/deleted with `test -f` / `test ! -f` --- ## Execution Strategy ``` Wave 1 (Structural Cleanup — quick, MAX PARALLEL): ├── Task 1: Delete 11 stub files from archived/ [quick] ├── Task 2: Move 5 misplaced 2026-06-07 proposals → changes/ [quick] ├── Task 3: Add .openspec.yaml to 6 missing batches [quick] ├── Task 4: Populate openspec/config.yaml with project context [quick] ├── Task 5: Add shipped status metadata to archived/ entries [writing] Wave 2 (Backend — moderate, MAX PARALLEL): ├── Task 6: llama-cache-and-spec — KV cache + ngram flags [quick] ├── Task 7: pty-enhancements — exit notifications + session metadata [unspecified-high] ├── Task 8: token-analyzer-ui — backend API endpoints [unspecified-high] Wave 3 (Frontend — moderate, MAX PARALLEL): ├── Task 9: results-page — /results route [visual-engineering] ├── Task 10: token-analyzer-ui — /analytics route [visual-engineering] ├── Task 11: enhanced-file-panel — diff modes + UI [visual-engineering] Wave FINAL (Verification — 4 parallel reviews): ├── Task F1: Plan compliance audit [oracle] ├── Task F2: Code quality + type check [unspecified-high] ├── Task F3: Real QA — execute every scenario [unspecified-high + playwright] └── Task F4: Scope fidelity check [deep] Critical Path: Cleanup → Backend → Frontend → Integration Parallel Speedup: ~60% faster than sequential Max Concurrent: 4 (Wave 2 & 3) ``` --- ## TODOs - [ ] 1. Delete 11 stub files from archived/ **What to do**: - Remove these 11 files from `openspec/changes/archived/`: - `v1.13.12-skills-audit.md` (57B, wrong tag ref) - `v1.13.15-codecontext-synth.md` (62B) - `v1.13.17-cross-repo-reads.md` (61B) - `v1.13.18-codecontext-file-path.md` (66B) - `v1.13.20-drop-legacy-cols.md` (61B) - `v1.14-outer-loop.md` (52B) - `v1.14.1-mcp-poc.md` (51B) - `v1.14.x-html-artifact-panes.md` (63B, wrong tag ref) - `v1.15-mcp-multi.md` (51B) - `v2.0-boocoder.md` (49B) - `v2.2-paseo-providers.md` (222B) - Each file contains ONLY "# Title\n\n**Status:** Shipped. Archived.\n" — zero documentation value - Git history preserves the knowledge; CHANGELOG.md + tags are the authoritative record **Must NOT do**: - Do NOT delete any folder-based archived entries (they have real content) - Do NOT delete `boocode_batch10.md` or handoff files (they're valuable) **Recommended Agent Profile**: - **Category**: `quick` - **Skills**: `[]` - **Justification**: Trivial file deletion — no domain skills needed **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 1 (with Tasks 2-5) - **Blocks**: F1-F4 - **Blocked By**: None **References**: - `openspec/changes/archived/` — target directory - `openspec/README.md` — schema definition - `~/.gitconfig` — no special config needed **Acceptance Criteria**: - [ ] `test ! -f openspec/changes/archived/v1.13.12-skills-audit.md` → success for all 11 files - [ ] `ls openspec/changes/archived/*.md` shows only allowed files (boocode_batch10.md, handoff_*) **QA Scenarios**: ``` Scenario: Verify stubs deleted Tool: Bash Preconditions: Clean working tree Steps: 1. For each stub file, run: test ! -f openspec/changes/archived/{filename} 2. Assert: all 11 commands return exit code 0 (file does not exist) 3. List remaining .md files: ls openspec/changes/archived/*.md 4. Assert: only boocode_batch10.md and handoff_*.md files remain Expected Result: 11 stubs absent, 3 valuable files present Evidence: .omo/evidence/task-1-stubs-deleted.txt Scenario: Valuable files preserved Tool: Bash Preconditions: Stubs deleted Steps: 1. test -f openspec/changes/archived/boocode_batch10.md 2. test -f openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md 3. test -f openspec/changes/archived/handoff_v1.13.8_prefix_verify.md Expected Result: All 3 return exit code 0 Evidence: .omo/evidence/task-1-valuables-preserved.txt ``` **Evidence to Capture**: - `task-1-stubs-deleted.txt` — confirmation each stub is gone - `task-1-valuables-preserved.txt` — confirmation valuable files remain **Commit**: YES - Message: `chore(openspec): delete 11 stub archive files with zero documentation value` - Files: openspec/changes/archived/v1.13.12-skills-audit.md, ... - [ ] 2. Move 5 misplaced 2026-06-07 proposals from archived/ to changes/ **What to do**: - Move these 5 folders from `openspec/changes/archived/2026-06-07-*` to `openspec/changes/*`: 1. `archived/2026-06-07-boocontext/` → `changes/boocontext/` (partially shipped in v2.8.0) 2. `archived/2026-06-07-eval-sandbox-agent-runtime/` → merge into `changes/import-llm-evaluator/` and `changes/import-pregel-engine/` (overlapping scope) 3. `archived/2026-06-07-hybrid-workflow-engine/` → merge into `changes/orchestrator-flow-advanced/` 4. `archived/2026-06-07-memory-context-engineering/` → merge into `changes/memory-context/` 5. `archived/2026-06-07-port-audit-parlant-patterns/` → merge into `changes/add-behavioral-engine/` and `changes/audit-harness-integration/` - For merges (2-5): append relevant content from the 2026-06-07 proposal into the existing batch's proposal.md, tasks.md, design.md. The 2026-06-07 versions are "grand vision" — extract the concrete specs relevant to the narrower active batch. - For `boocontext/` (1): move as-is since it's a new slug with no direct collision. **Must NOT do**: - Do NOT delete the content of the 2026-06-07 folders — merge, don't discard - Do NOT create duplicate batch slugs - Do NOT overwrite existing proposal content — append/extend **Recommended Agent Profile**: - **Category**: `writing` - **Skills**: `[]` - **Justification**: File organization + content merging — technical writing task **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 1 (with Tasks 1, 3-5) - **Blocks**: F1-F4 - **Blocked By**: None **References**: - `openspec/changes/archived/2026-06-07-*/` — source folders - `openspec/changes/import-llm-evaluator/` — target for eval overlap - `openspec/changes/import-pregel-engine/` — target for graph overlap - `openspec/changes/orchestrator-flow-advanced/` — target for workflow overlap - `openspec/changes/memory-context/` — target for memory overlap - `openspec/changes/add-behavioral-engine/` — target for port patterns - `openspec/changes/audit-harness-integration/` — target for audit patterns **Acceptance Criteria**: - [ ] `openspec/changes/boocontext/` exists with proposal.md + tasks.md + design.md + specs/ - [ ] `openspec/changes/import-llm-evaluator/` proposal.md now references eval-sandbox content - [ ] `openspec/changes/import-pregel-engine/` proposal.md now references graph engine content - [ ] `openspec/changes/orchestrator-flow-advanced/` proposal.md now references hybrid workflow - [ ] `openspec/changes/memory-context/` proposal.md now references context engineering - [ ] `openspec/changes/add-behavioral-engine/` and `audit-harness-integration/` now reference port patterns - [ ] `test ! -d openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/` for each moved folder **QA Scenarios**: ``` Scenario: boocontext moved Tool: Bash Preconditions: Files moved Steps: 1. test -f openspec/changes/boocontext/proposal.md 2. test -f openspec/changes/boocontext/tasks.md 3. test ! -f openspec/changes/archived/2026-06-07-boocontext/proposal.md Expected Result: Files exist in new location, not in old Evidence: .omo/evidence/task-2-boocontext-moved.txt ``` ``` Scenario: Merged proposals updated Tool: Bash Preconditions: Files merged Steps: 1. grep -q "eval-sandbox\|graph engine\|hybrid workflow\|context engineering\|port patterns" openspec/changes/*/proposal.md 2. Assert: each merged batch's proposal.md references the 2026-06-07 source Expected Result: grep finds references in the right target files Evidence: .omo/evidence/task-2-merges-verified.txt ``` **Evidence to Capture**: - `task-2-boocontext-moved.txt` - `task-2-merges-verified.txt` **Commit**: YES (groups with Task 1) - Message: `chore(openspec): move 5 misplaced proposals from archived/ → changes/, merge overlapping content` - Files: openspec/changes/boocontext/*, openspec/changes/*/proposal.md, openspec/changes/*/tasks.md - [ ] 3. Add .openspec.yaml to 6 missing batches **What to do**: - Create `.openspec.yaml` in each of these 6 active batches: - `enhanced-file-panel/` - `llama-cache-and-spec/` - `memory-v2-hybrid-search/` - `omo-paseo-bridge/` - `orchestrator-flow-advanced/` - `results-page/` - Each file must contain: ```yaml schema: spec-driven created: 2026-06-07 ``` **Must NOT do**: - Do NOT modify existing proposal.md or tasks.md content - Do NOT add .openspec.yaml to batches that already have one **Recommended Agent Profile**: - **Category**: `quick` - **Skills**: `[]` - **Justification**: Trivial boilerplate file creation **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 1 (with Tasks 1, 2, 4, 5) - **Blocks**: F1-F4 - **Blocked By**: None **References**: - `openspec/changes/add-3tier-memory/.openspec.yaml` — template **Acceptance Criteria**: - [ ] All 6 created files contain `schema: spec-driven` - [ ] `find openspec/changes/ -name ".openspec.yaml" | wc -l` counts all expected files **QA Scenarios**: ``` Scenario: All .openspec.yaml files present Tool: Bash Preconditions: Files created Steps: 1. For each batch: test -f openspec/changes/{batch}/.openspec.yaml 2. For each: grep -q "schema: spec-driven" openspec/changes/{batch}/.openspec.yaml Expected Result: All 6 files exist with correct content Evidence: .omo/evidence/task-3-openspec-yaml-added.txt ``` **Evidence to Capture**: - `task-3-openspec-yaml-added.txt` **Commit**: YES (groups with Task 1) - Message: `chore(openspec): add .openspec.yaml to 6 missing batch folders` - Files: openspec/changes/enhanced-file-panel/.openspec.yaml, ... - [ ] 4. Populate openspec/config.yaml with project context **What to do**: - Replace the empty `openspec/config.yaml` with a populated version: ```yaml schema: spec-driven context: | Tech stack: TypeScript, React 18, Vite, Tailwind v4, shadcn/ui, Fastify, PostgreSQL 16, pnpm workspaces Apps: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals), Orchestrator (multi-agent conductor) Infrastructure: Docker Compose, Tailscale (100.114.205.53), Authelia auth, llama-swap inference Monorepo: apps/server, apps/web, apps/booterm, apps/coder, packages/contracts Commits: conventional commits, strict TypeScript, NodeNext module resolution Testing: vitest (server + coder), Playwright (web E2E), no root tsconfig rules: proposal: - Every proposal must have a "Why" section explaining the motivation - Every proposal must have a "What Changes" section enumerating deliverables - Include "Must Have" / "Must NOT Have" guardrails - Reference shipped git tags when applicable tasks: - Tasks must be ordered by dependency, not priority - Each task is one atomic change (file, config, or command) - Parallel tasks go in the same wave ``` **Must NOT do**: - Do NOT delete the `schema: spec-driven` line **Recommended Agent Profile**: - **Category**: `writing` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 1 (with Tasks 1-3, 5) - **Blocks**: F1-F4 - **Blocked By**: None **References**: - `openspec/config.yaml` — current (empty) file - `/home/samkintop/opt/boocode/CLAUDE.md` — source for context info **Acceptance Criteria**: - [ ] `grep -q "context:" openspec/config.yaml` → success - [ ] `grep -q "rules:" openspec/config.yaml` → success - [ ] config.yaml has more than 50 bytes (was 20 bytes) **QA Scenarios**: ``` Scenario: config.yaml populated Tool: Bash Preconditions: File written Steps: 1. wc -c openspec/config.yaml → assert > 500 bytes 2. grep -q "context:" openspec/config.yaml 3. grep -q "rules:" openspec/config.yaml 4. grep -q "schema: spec-driven" openspec/config.yaml Expected Result: All assertions pass Evidence: .omo/evidence/task-4-config-populated.txt ``` **Evidence to Capture**: - `task-4-config-populated.txt` **Commit**: YES (groups with Task 1) - Message: `chore(openspec): populate config.yaml with project context and rules` - Files: openspec/config.yaml - [ ] 5. Add shipped-status metadata to 10 archived folder entries **What to do**: - Add frontmatter or status line to each archived folder's proposal.md documenting the shipped version: - `agent-status-normalize/` → `v2.7.6` - `claude-sdk-sessionstore/` → `v2.7.5` - `contracts-ssot/` → `v2.7.13` - `license-debt-mit/` → `v2.7.0` - `mistake-tracker-file-ledger/` → `v2.7.4` - `orchestrator/` → `v2.7.17` - `sampling-streamjson-tokens/` → `v2.7.3` - `v2-3-provider-lifecycle/` → `v2.5.4`–`v2.5.13` - `v2-6-persistent-agent-sessions/` → `v2.6.4`–`v2.6.8` - `write-edit-robustness/` → `v2.7.1` - Add line after the `## Why` section heading: `**Shipped in:** \`v2.7.6-agent-status-normalize\`` (or equivalent) **Must NOT do**: - Do NOT change the body of the proposal beyond the shipped annotation - Do NOT add shipped annotations to the 2026-06-07 batches (they're not shipped) **Recommended Agent Profile**: - **Category**: `quick` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 1 (with Tasks 1-4) - **Blocks**: F1-F4 - **Blocked By**: None **References**: - Git tags: `v2.7.0-mit`, `v2.7.1-write-edit-robustness`, etc. **Acceptance Criteria**: - [ ] All 10 archived batch proposals contain "Shipped in:" referencing a git tag - [ ] `grep -r "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l` = 10 **QA Scenarios**: ``` Scenario: All archived batches annotated Tool: Bash Preconditions: Files edited Steps: 1. grep -rl "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l 2. Assert: exactly 10 files contain "Shipped in:" Expected Result: 10 files annotated Evidence: .omo/evidence/task-5-shipped-annotations.txt ``` **Evidence to Capture**: - `task-5-shipped-annotations.txt` **Commit**: YES (groups with Task 1) - Message: `chore(openspec): add shipped-in version annotations to 10 archived batch proposals` - Files: openspec/changes/archived/*/proposal.md --- ## TODOs (Wave 2) - [ ] 6. llama-cache-and-spec — Enable KV cache quantization + ngram speculative decoding **What to do**: - Edit `apps/server/src/services/inference/providers/llama.ts` (or the llama args validator `llama-args-validator.ts`) to allow `--cache-type-k q4_0` and `--spec-type ngram-mod` through the shadowing lists - Change the base llama-server args to include: - `--cache-type-k q4_0` (4-bit KV cache, ~4× VRAM reduction) - `--spec-type ngram-mod` (ngram speculative decoding, 2-3× tok/s on code) - Verify the sidecar validator (`sidecar/validator.go`) also allows these flags through - Read `apps/server/src/services/inference/llama-args-validator.ts` and `sidecar/validator.go` to understand the current blocklist - Add the two flags to the allowlist instead of the shadow list - Update the sidecar Dockerfile or config if needed **Must NOT do**: - Do NOT change any other llama-server args - Do NOT enable KV cache quantization for Q8_0 or Q3_K (only Q4_0) - Do NOT add a separate draft model (ngram is self-contained) **Recommended Agent Profile**: - **Category**: `unspecified-high` - **Skills**: `[]` - **Justification**: Requires understanding llama.cpp arg validation across two codebases **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 2 (with Tasks 7-8) - **Blocks**: F1-F4 - **Blocked By**: Task 1-5 (Wave 1) **References**: - `apps/server/src/services/inference/llama-args-validator.ts` — current arg blocklist/allowlist - `sidecar/validator.go` — sidecar validation (if exists) - `docker-compose.yml` or sidecar Dockerfile — restart config - `openspec/changes/llama-cache-and-spec/proposal.md` — full spec **Acceptance Criteria**: - [ ] `--cache-type-k q4_0` present in llama-server args after restart - [ ] `--spec-type ngram-mod` present in llama-server args after restart - [ ] llama-server starts without errors - [ ] Inference still works (send test message) **QA Scenarios**: ``` Scenario: KV cache quantization enabled Tool: Bash Preconditions: Server restarted after changes Steps: 1. ps aux | grep llama-server | grep -o "cache-type-k q4_0" 2. Assert: output matches "q4_0" Expected Result: KV cache quantization is active Evidence: .omo/evidence/task-6-kv-cache-enabled.txt ``` ``` Scenario: Speculative decoding enabled Tool: Bash Preconditions: Server restarted Steps: 1. ps aux | grep llama-server | grep -o "spec-type ngram-mod" 2. Assert: output matches "ngram-mod" Expected Result: Ngram speculative decoding is active Evidence: .omo/evidence/task-6-ngram-enabled.txt ``` ``` Scenario: Inference still works Tool: Bash (curl) Preconditions: Server running with new flags Steps: 1. curl -s -o /dev/null -w "%{http_code}" http://100.114.205.53:9500/api/health 2. Assert: HTTP 200 Expected Result: Server is healthy and serving Evidence: .omo/evidence/task-6-health-check.txt ``` **Evidence to Capture**: - `task-6-kv-cache-enabled.txt` — grep output showing the flag - `task-6-ngram-enabled.txt` — grep output showing the flag - `task-6-health-check.txt` — health check confirmation **Commit**: YES - Message: `perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding` - Files: apps/server/src/services/inference/llama-args-validator.ts, sidecar/validator.go (if needed) - [ ] 7. pty-enhancements — PTY exit notifications + session metadata **What to do**: - Add `notifyOnExit` support to the PTY session manager (likely in `apps/booterm/`) - When a PTY process exits AND `notifyOnExit` was set: - Emit an event/message to the agent channel with: session ID, title, exit code, total output lines, last line of output - Add session metadata fields: agent ID that spawned it, task ID, optional title - Add `pty_list` endpoint that returns metadata for all sessions - Wire `X-Agent-Flags` header support for agent identification - Read `apps/booterm/` to understand the current PTY architecture **Must NOT do**: - Do NOT change the existing pty_spawn interface (add notifyOnExit as optional param) - Do NOT implement sandbox or circuit breaker (out of scope for this wave) - Do NOT add new database tables (metadata lives in-memory or in existing session store) **Recommended Agent Profile**: - **Category**: `unspecified-high` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 2 (with Tasks 6, 8) - **Blocks**: F1-F4 - **Blocked By**: Task 1-5 (Wave 1) **References**: - `apps/booterm/src/` — PTY session management code - `apps/coder/src/services/` — agent dispatch that spawns PTYs - `openspec/changes/pty-enhancements/proposal.md` — full spec - `apps/server/src/services/inference/` — inference pipeline that may need to handle notifications **Acceptance Criteria**: - [ ] `notifyOnExit` optional parameter on pty_spawn works - [ ] On process exit with notifyOnExit=true, agent receives notification - [ ] `pty_list` returns session metadata - [ ] `X-Agent-Flags` header is recognized **QA Scenarios**: ``` Scenario: notifyOnExit triggers notification Tool: Bash + tmux Preconditions: booterm running Steps: 1. Start a short PTY with notifyOnExit=true: sleep 1 2. Wait 2 seconds for completion 3. Check notification was delivered Expected Result: Exit notification received with title, exit code, last line Evidence: .omo/evidence/task-7-notify-on-exit.txt ``` ``` Scenario: pty_list shows metadata Tool: Bash (curl) Preconditions: PTY sessions exist Steps: 1. curl http://localhost:9501/api/pty/list 2>/dev/null 2. Assert: response contains session metadata fields Expected Result: Metadata returned for each session Evidence: .omo/evidence/task-7-pty-list.txt ``` **Evidence to Capture**: - `task-7-notify-on-exit.txt` — notification evidence - `task-7-pty-list.txt` — pty_list response **Commit**: YES - Message: `feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags` - Files: apps/booterm/src/*.ts, apps/coder/src/services/*.ts - [ ] 8. token-analyzer-ui — Backend API endpoints for token analytics **What to do**: - Add read-only API endpoints to serve aggregate token data: - `GET /api/coder/token-analytics/sessions` — per-session token usage (input, output, cost) - `GET /api/coder/token-analytics/tools` — per-tool cost breakdown (from tool_cost_stats view) - `GET /api/coder/token-analytics/trends` — token usage over time - Reuse existing data sources: - `agent_sessions.input_tokens`, `agent_sessions.output_tokens`, `agent_sessions.cost` - `tool_cost_stats` view (per-tool 100-call rolling window) - `tasks.token_breakdown` JSONB column - Implement in `apps/coder/src/routes/` (follow existing route patterns) - Add proper error handling, pagination for large result sets, and date filtering **Must NOT do**: - Do NOT create new database tables or migrations - Do NOT add token tracking logic (data is already accumulated) - Do NOT add real-time streaming (data is historical aggregate) **Recommended Agent Profile**: - **Category**: `unspecified-high` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 2 (with Tasks 6-7) - **Blocks**: Task 10 (frontend depends on backend) - **Blocked By**: Task 1-5 (Wave 1) **References**: - `apps/coder/src/routes/` — existing route patterns - `apps/server/src/schema.sql` — `tool_cost_stats` view definition - `apps/coder/CLAUDE.md` — coder conventions, route registration - `packages/contracts/` — shared types for response schemas - `openspec/changes/token-analyzer-ui/proposal.md` — full spec **Acceptance Criteria**: - [ ] `GET /api/coder/token-analytics/sessions?project_id=X` returns 200 with token data - [ ] `GET /api/coder/token-analytics/tools?project_id=X` returns 200 with tool breakdown - [ ] `GET /api/coder/token-analytics/trends?project_id=X` returns 200 with trend data - [ ] All endpoints respect `project_id` filtering - [ ] Empty data returns valid empty arrays (not errors) **QA Scenarios**: ``` Scenario: Sessions endpoint works Tool: Bash (curl) Preconditions: Server running, project exists Steps: 1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1" 2. Assert: HTTP 200 3. Assert: response is valid JSON with expected fields Expected Result: Session token data returned Evidence: .omo/evidence/task-8-sessions-endpoint.txt ``` ``` Scenario: Empty data returns valid response Tool: Bash (curl) Preconditions: Server running Steps: 1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=999" 2. Assert: HTTP 200 3. Assert: response contains empty array (not error) Expected Result: Graceful empty state Evidence: .omo/evidence/task-8-empty-data.txt ``` **Evidence to Capture**: - `task-8-sessions-endpoint.txt` — successful API response - `task-8-empty-data.txt` — graceful empty handling **Commit**: YES - Message: `feat(coder): add token-analytics API endpoints for session/tool/trend data` - Files: apps/coder/src/routes/token-analytics.ts, apps/coder/src/services/token-analytics.ts --- ## TODOs (Wave 3) - [ ] 9. results-page — /results route for orchestrator runs + arena battles **What to do**: - Add sidebar nav button with `ScrollText` icon (lucide-react), **above** the Token Analytics button - Create new `/results` route page with two tabs: - "Analysis Runs" — list orchestrator flow runs (research, code-review, investigate, etc.) - "Arena Battles" — list battle history - Each tab shows: status dot, name/type, band/battle-type, model, timing, error indicator - Completed runs show "View Report" link; completed battles show "View Analysis" - Uses existing API endpoints (no backend changes needed): - `GET /api/coder/runs?project_id=X` - `GET /api/coder/battles?project_id=X` - Requires `project_id` context — load from sidebar on mount, or show project selector - Follow existing route patterns in web (React Router routes, lazy loading) **Must NOT do**: - Do NOT create new API endpoints - Do NOT modify existing API contracts - Do NOT add pagination beyond what the API already provides - Do NOT add real-time updates (static list, refreshed on mount) **Recommended Agent Profile**: - **Category**: `visual-engineering` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 3 (with Tasks 10-11) - **Blocks**: F1-F4 - **Blocked By**: Task 1-5 (Wave 1) **References**: - `apps/web/src/routes/` — existing route patterns (analytics, settings) - `apps/web/src/components/sidebar/` — nav button patterns - `apps/web/src/api/` — existing API client - `openspec/changes/results-page/proposal.md` — full spec - `apps/coder/src/routes/runs.ts` — runs endpoint - `apps/coder/src/routes/battles.ts` — battles endpoint **Acceptance Criteria**: - [ ] Sidebar shows "Results" button with ScrollText icon above Token Analytics - [ ] Clicking navigates to `/results` - [ ] "Analysis Runs" tab loads and displays orchestrator flow history - [ ] "Arena Battles" tab loads and displays battle history - [ ] Completed runs show "View Report" link - [ ] Empty state shown when no data - [ ] Error state shown on API failure **QA Scenarios**: ``` Scenario: Nav button renders Tool: Playwright Preconditions: Web app loaded Steps: 1. Navigate to / 2. Look for sidebar nav button with text "Results" 3. Assert: button exists and links to /results Expected Result: Results nav button present Evidence: .omo/evidence/task-9-nav-button.png ``` ``` Scenario: Results page loads Tool: Playwright Preconditions: Web app loaded, project exists Steps: 1. Navigate to /results 2. Wait for "Analysis Runs" tab to appear 3. Assert: tab shows list of runs or empty state Expected Result: Page loads with data Evidence: .omo/evidence/task-9-results-page.png ``` **Evidence to Capture**: - `task-9-nav-button.png` — screenshot of sidebar with Results button - `task-9-results-page.png` — screenshot of /results page with data **Commit**: YES - Message: `feat(web): add /results page for orchestrator runs and arena battle history` - Files: apps/web/src/routes/results.tsx, apps/web/src/components/sidebar/*.tsx - [ ] 10. token-analyzer-ui — /analytics dashboard route **What to do**: - Add sidebar nav button with appropriate icon, **above Settings** button - Create new `/analytics` route page showing token usage dashboard: - Aggregate token usage across sessions (total input/output tokens) - Per-tool cost breakdown (bar chart or table) - Per-session token history (list or mini chart) - Per-provider cost comparison - Reuse existing data from the backend endpoints created in Task 8 - Follow the same route/nav patterns as results-page **Must NOT do**: - Do NOT add new charting libraries (use what's already available) - Do NOT implement real-time updates **Recommended Agent Profile**: - **Category**: `visual-engineering` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 3 (with Tasks 9, 11) - **Blocks**: F1-F4 - **Blocked By**: Tasks 1-5 (Wave 1), Task 8 (backend endpoints) **References**: - Same as Task 9 + Task 8 endpoints - `openspec/changes/token-analyzer-ui/proposal.md` — full spec - `apps/web/src/components/` — existing chart/list components **Acceptance Criteria**: - [ ] Sidebar shows "Token Analytics" button above Settings - [ ] `/analytics` loads and shows token dashboard - [ ] Per-session, per-tool, per-provider breakdowns visible - [ ] Empty state shown when no data **QA Scenarios**: ``` Scenario: Token Analytics nav button renders Tool: Playwright Preconditions: Web app loaded Steps: 1. Navigate to / 2. Look for "Token Analytics" button in sidebar 3. Assert: button exists above Settings Expected Result: Nav button present Evidence: .omo/evidence/task-10-nav-button.png ``` ``` Scenario: Analytics dashboard loads Tool: Playwright Preconditions: Web app loaded Steps: 1. Navigate to /analytics 2. Wait for dashboard content to render 3. Assert: token usage data is visible Expected Result: Dashboard shows data Evidence: .omo/evidence/task-10-analytics-dashboard.png ``` **Evidence to Capture**: - `task-10-nav-button.png` - `task-10-analytics-dashboard.png` **Commit**: YES - Message: `feat(web): add /analytics route for token usage dashboard` - Files: apps/web/src/routes/analytics.tsx, apps/web/src/components/sidebar/*.tsx - [ ] 11. enhanced-file-panel — Side-by-side diff, hide whitespace, wrap lines, expand/collapse all **What to do**: - Add side-by-side diff toggle to the Git diff tab in the file panel - Add "Hide whitespace" checkbox that filters whitespace-only changes - Add "Wrap long lines" toggle for diff display - Add "Expand All" / "Collapse All" buttons for file-level diffs - Implement in `apps/web/src/components/` following existing file panel patterns - Read `apps/web/src/components/` to find the existing diff rendering components **Must NOT do**: - Do NOT implement inline diff comments (deferred) - Do NOT implement in-browser file editing (deferred) - Do NOT change the backend diff generation logic **Recommended Agent Profile**: - **Category**: `visual-engineering` - **Skills**: `[]` **Parallelization**: - **Can Run In Parallel**: YES - **Parallel Group**: Wave 3 (with Tasks 9-10) - **Blocks**: F1-F4 - **Blocked By**: Task 1-5 (Wave 1) **References**: - `apps/web/src/components/` — existing file panel and diff components - `apps/web/src/hooks/` — hooks for diff state management - `openspec/changes/enhanced-file-panel/proposal.md` — full spec - `apps/server/src/routes/projects.ts` — git diff backend route **Acceptance Criteria**: - [ ] Side-by-side diff toggles correctly - [ ] Hide whitespace checkbox filters whitespace changes - [ ] Wrap long lines toggle works - [ ] Expand/Collapse All buttons toggle all files - [ ] All changes are frontend-only (no new API calls) **QA Scenarios**: ``` Scenario: Side-by-side diff renders Tool: Playwright Preconditions: Repo with uncommitted changes Steps: 1. Open file panel 2. Click Git tab 3. Toggle side-by-side view 4. Assert: diff renders in two columns Expected Result: Side-by-side diff visible Evidence: .omo/evidence/task-11-side-by-side.png ``` ``` Scenario: Hide whitespace works Tool: Playwright Preconditions: Diff has whitespace changes Steps: 1. Open diff with whitespace changes 2. Check "Hide whitespace" 3. Assert: only-whitespace hunks hidden Expected Result: Whitespace-only changes filtered Evidence: .omo/evidence/task-11-hide-whitespace.png ``` ``` Scenario: Expand/Collapse All toggles Tool: Playwright Preconditions: Multiple files changed Steps: 1. Click "Collapse All" 2. Assert: all files collapsed to summary 3. Click "Expand All" 4. Assert: all files expanded Expected Result: Bulk toggle works Evidence: .omo/evidence/task-11-expand-collapse.png ``` **Evidence to Capture**: - `task-11-side-by-side.png` - `task-11-hide-whitespace.png` - `task-11-expand-collapse.png` **Commit**: YES - Message: `feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse all` - Files: apps/web/src/components/*.tsx, apps/web/src/hooks/*.ts --- ## Final Verification Wave - [ ] F1. **Plan Compliance Audit** — `oracle` Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, curl endpoint, run command). For each "Must NOT Have": search codebase for forbidden patterns — reject with file:line if found. Check evidence files exist in .omo/evidence/. Compare deliverables against plan. Output: `Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT` - [ ] F2. **Code Quality Review** — `unspecified-high` Run `tsc --noEmit` for any changed apps + `bun test`. Review all changed files for: `as any`/`@ts-ignore`, empty catches, console.log in prod, commented-out code, unused imports. Output: `Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT` - [ ] F3. **Real Manual QA** — `unspecified-high` Start from clean state. Execute EVERY QA scenario from EVERY task — follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, missing project_id. Save to `.omo/evidence/final-qa/`. Output: `Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT` - [ ] F4. **Scope Fidelity Check** — `deep` For each task: read "What to do", read actual diff. Verify 1:1 — everything in scope was built (no missing), nothing beyond scope was built (no creep). Check "Must NOT do" compliance. Output: `Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT` --- ## Commit Strategy - **1-5** (grouped): `chore(openspec): cleanup openspec folder structure — delete stubs, move proposals, add metadata, populate config` - **6**: `perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding` - **7**: `feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags` - **8**: `feat(coder): add token-analytics API endpoints` - **9**: `feat(web): add /results page for orchestrator runs + arena battles` - **10**: `feat(web): add /analytics token usage dashboard` - **11**: `feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse` --- ## Success Criteria ### Verification Commands ```bash # OpenSpec cleanup test ! -f openspec/changes/archived/v1.13.12-skills-audit.md test -d openspec/changes/boocontext/ test -f openspec/changes/enhanced-file-panel/.openspec.yaml grep -q "context:" openspec/config.yaml # llama-cache-and-spec ps aux | grep llama-server | grep -o "cache-type-k q4_0" ps aux | grep llama-server | grep -o "spec-type ngram-mod" # PTY enhancements curl -s http://localhost:9501/api/pty/list | jq '.' # Results page curl -s "http://localhost:3000/api/coder/runs?project_id=1" | jq '.' # Token analytics curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1" | jq '.' # Enhanced file panel # (visual verification via Playwright) ``` ### Final Checklist - [ ] 11 stub files deleted from archived/ - [ ] 5 misplaced proposals moved/merged into changes/ - [ ] 6 .openspec.yaml files added - [ ] config.yaml populated with context + rules - [ ] 10 archived proposals annotated with shipped versions - [ ] llama-server running with KV cache Q4_0 + ngram - [ ] PTY exit notifications working - [ ] `/results` page renders and loads data - [ ] `/analytics` page renders and loads data - [ ] Side-by-side diff, hide whitespace, wrap lines, expand/collapse all working - [ ] All type checks pass - [ ] All QA scenarios pass