# Openspec Cleanup & High-Value Batch Implementation

## TL;DR

> **Quick Summary**: Clean up the `openspec/` folder structure (delete 11 stub files, move 5 misplaced proposals, add missing `.openspec.yaml` files), then implement 5 high-value batches: llama-cache-and-spec, pty-enhancements, results-page, token-analyzer-ui, and enhanced-file-panel.
>
> **Deliverables**:
> - Clean openspec folder: stubs removed, archived/ accurate, all batches schema-compliant
> - llama-server KV cache quantization + ngram speculative decoding enabled
> - PTY exit notifications and session metadata
> - `/results` page for orchestrator runs and arena battles (new route)
> - `/analytics` page for token usage dashboard (new route)
> - Enhanced file panel: side-by-side diff, hide whitespace, wrap lines, expand/collapse all
>
> **Estimated Effort**: Medium-Large
> **Parallel Execution**: YES — 3 waves + final verification
> **Critical Path**: Cleanup → Backend impls → Frontend impls → Integration

---

## Context

### Original Request
Analyze `openspec/` folder for structural issues, cross-reference against git tags, and create a work plan for implementing the high-value openspec batch proposals.

### Interview Summary
**Key Discussions**:
- `openspec/changes/` has 22 active batches (all uncommitted, all unshipped) plus `archived/` with 29 entries
- 11 stub files in archived/ are pure noise (49-66 bytes each, "Status: Shipped. Archived." only)
- 5 misplaced 2026-06-07 proposals were dumped in archived/ — they're active design docs, not shipped batches
- 6 active batches missing `.openspec.yaml`; `openspec/config.yaml` is empty
- Active proposals overlap: multiple batches cover evaluation, memory, and workflow engine territory

**Research Findings**:
- Git tag cross-reference confirms all folder-based archived entries match shipped tags
- 3 stub files reference wrong tags (v1.13.12→v1.13.14, v1.14.x→v1.13.19, etc.)
- All 22 active batches have zero git references — pure filesystem artifacts
- No active batch has shipped yet — zero can be archived

### Metis Review
Identified gaps:
1. **Deduplication needed**: 2026-06-07 proposals overlap with active changes/ — merging must happen before cleanup is complete
2. **Prioritization needed**: 22 batches can't all ship at once — need clear tiers
3. **User sign-off needed**: Which Tier 1-2 batches to include in this plan vs defer

---

## Work Objectives

### Core Objective
Restore openspec structural integrity and ship the 5 highest-value, lowest-effort batch proposals.

### Concrete Deliverables
- Clean openspec: stubs deleted (11 files ~573 bytes), misplaced proposals moved (5 folders), `.openspec.yaml` files added (6 batches), config.yaml populated
- llama-cache-and-spec: KV cache quantization (Q4_0) + ngram speculative decoding enabled
- pty-enhancements: PTY exit notifications, session metadata, X-Agent-Flags
- results-page: `/results` route with Analysis Runs + Arena Battles tabs
- token-analyzer-ui: `/analytics` route with token usage dashboard
- enhanced-file-panel: side-by-side diff toggle, hide whitespace, wrap long lines, expand/collapse all

### Must Have
- All 11 stub files removed from archived/
- 5 misplaced 2026-06-07 proposals moved from archived/ into `changes/` (or merged into existing batches)
- `.openspec.yaml` added to all 6 missing batches
- `openspec/config.yaml` gets a `context:` block and `rules:` block
- llama-server restarts with new flags (verify via `ps aux | grep llama`)
- `/results` page loads without 404 and shows real data from existing API endpoints
- `/analytics` page loads and shows token aggregates
- Side-by-side diff renders correctly for files with wide lines

### Must NOT Have (Guardrails)
- **NO** breaking changes to existing routes or API contracts
- **NO** new database tables or migrations (all data sources already exist)
- **NO** external API dependencies (no cloud embedding models)
- **NO** behavioral engine or Pregel state machine work (deferred to future batch)
- **NO** touching the conductor flow runner or orchestrator pipeline
- **NO** CSS framework changes (stay on Tailwind v4 / shadcn/ui)
- **NO** backend changes unless explicitly required by the batch scope

### Spec Framework Integration
- **Detected Framework**: OpenSpec (folder structure only — no CLI)
- **Config File**: `openspec/config.yaml`
- **Active Specs**: 22 batch folders in `openspec/changes/`
- **Available Commands**: Manual folder/file operations (no OpenSpec CLI)

---

## Verification Strategy

> **ZERO HUMAN INTERVENTION** — ALL verification is agent-executed.

### Test Decision
- **Infrastructure exists**: YES (vitest in apps/server, apps/coder)
- **Automated tests**: Tests-after (no TDD — these are config/frontend changes)
- **Framework**: vitest for backend, Playwright for frontend verification

### QA Policy
Every task includes agent-executed QA scenarios. Evidence saved to `.omo/evidence/`.

- **Frontend**: Playwright — navigate, assert DOM elements, screenshot
- **Backend**: Bash (curl) — send requests, assert status + response
- **Config/Restart**: Bash — check processes, verify new flags
- **File operations**: Bash — verify files exist/deleted with `test -f` / `test ! -f`

---

## Execution Strategy

```
Wave 1 (Structural Cleanup — quick, MAX PARALLEL):
├── Task 1: Delete 11 stub files from archived/ [quick]
├── Task 2: Move 5 misplaced 2026-06-07 proposals → changes/ [quick]
├── Task 3: Add .openspec.yaml to 6 missing batches [quick]
├── Task 4: Populate openspec/config.yaml with project context [quick]
├── Task 5: Add shipped status metadata to archived/ entries [writing]

Wave 2 (Backend — moderate, MAX PARALLEL):
├── Task 6: llama-cache-and-spec — KV cache + ngram flags [quick]
├── Task 7: pty-enhancements — exit notifications + session metadata [unspecified-high]
├── Task 8: token-analyzer-ui — backend API endpoints [unspecified-high]

Wave 3 (Frontend — moderate, MAX PARALLEL):
├── Task 9: results-page — /results route [visual-engineering]
├── Task 10: token-analyzer-ui — /analytics route [visual-engineering]
├── Task 11: enhanced-file-panel — diff modes + UI [visual-engineering]

Wave FINAL (Verification — 4 parallel reviews):
├── Task F1: Plan compliance audit [oracle]
├── Task F2: Code quality + type check [unspecified-high]
├── Task F3: Real QA — execute every scenario [unspecified-high + playwright]
└── Task F4: Scope fidelity check [deep]

Critical Path: Cleanup → Backend → Frontend → Integration
Parallel Speedup: ~60% faster than sequential
Max Concurrent: 4 (Wave 2 & 3)
```

---

## TODOs

- [ ] 1. Delete 11 stub files from archived/

  **What to do**:
  - Remove these 11 files from `openspec/changes/archived/`:
    - `v1.13.12-skills-audit.md` (57B, wrong tag ref)
    - `v1.13.15-codecontext-synth.md` (62B)
    - `v1.13.17-cross-repo-reads.md` (61B)
    - `v1.13.18-codecontext-file-path.md` (66B)
    - `v1.13.20-drop-legacy-cols.md` (61B)
    - `v1.14-outer-loop.md` (52B)
    - `v1.14.1-mcp-poc.md` (51B)
    - `v1.14.x-html-artifact-panes.md` (63B, wrong tag ref)
    - `v1.15-mcp-multi.md` (51B)
    - `v2.0-boocoder.md` (49B)
    - `v2.2-paseo-providers.md` (222B)
  - Each file contains ONLY "# Title\n\n**Status:** Shipped. Archived.\n" — zero documentation value
  - Git history preserves the knowledge; CHANGELOG.md + tags are the authoritative record

  **Must NOT do**:
  - Do NOT delete any folder-based archived entries (they have real content)
  - Do NOT delete `boocode_batch10.md` or handoff files (they're valuable)

  **Recommended Agent Profile**:
  - **Category**: `quick`
  - **Skills**: `[]`
  - **Justification**: Trivial file deletion — no domain skills needed

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 1 (with Tasks 2-5)
  - **Blocks**: F1-F4
  - **Blocked By**: None

  **References**:
  - `openspec/changes/archived/` — target directory
  - `openspec/README.md` — schema definition
  - `~/.gitconfig` — no special config needed

  **Acceptance Criteria**:
  - [ ] `test ! -f openspec/changes/archived/v1.13.12-skills-audit.md` → success for all 11 files
  - [ ] `ls openspec/changes/archived/*.md` shows only allowed files (boocode_batch10.md, handoff_*)

  **QA Scenarios**:
  ```
  Scenario: Verify stubs deleted
    Tool: Bash
    Preconditions: Clean working tree
    Steps:
      1. For each stub file, run: test ! -f openspec/changes/archived/{filename}
      2. Assert: all 11 commands return exit code 0 (file does not exist)
      3. List remaining .md files: ls openspec/changes/archived/*.md
      4. Assert: only boocode_batch10.md and handoff_*.md files remain
    Expected Result: 11 stubs absent, 3 valuable files present
    Evidence: .omo/evidence/task-1-stubs-deleted.txt

  Scenario: Valuable files preserved
    Tool: Bash
    Preconditions: Stubs deleted
    Steps:
      1. test -f openspec/changes/archived/boocode_batch10.md
      2. test -f openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
      3. test -f openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
    Expected Result: All 3 return exit code 0
    Evidence: .omo/evidence/task-1-valuables-preserved.txt
  ```

  **Evidence to Capture**:
  - `task-1-stubs-deleted.txt` — confirmation each stub is gone
  - `task-1-valuables-preserved.txt` — confirmation valuable files remain

  **Commit**: YES
  - Message: `chore(openspec): delete 11 stub archive files with zero documentation value`
  - Files: openspec/changes/archived/v1.13.12-skills-audit.md, ...

- [ ] 2. Move 5 misplaced 2026-06-07 proposals from archived/ to changes/

  **What to do**:
  - Move these 5 folders from `openspec/changes/archived/2026-06-07-*` to `openspec/changes/*`:
    1. `archived/2026-06-07-boocontext/` → `changes/boocontext/` (partially shipped in v2.8.0)
    2. `archived/2026-06-07-eval-sandbox-agent-runtime/` → merge into `changes/import-llm-evaluator/` and `changes/import-pregel-engine/` (overlapping scope)
    3. `archived/2026-06-07-hybrid-workflow-engine/` → merge into `changes/orchestrator-flow-advanced/`
    4. `archived/2026-06-07-memory-context-engineering/` → merge into `changes/memory-context/`
    5. `archived/2026-06-07-port-audit-parlant-patterns/` → merge into `changes/add-behavioral-engine/` and `changes/audit-harness-integration/`
  - For merges (2-5): append relevant content from the 2026-06-07 proposal into the existing batch's proposal.md, tasks.md, design.md. The 2026-06-07 versions are "grand vision" — extract the concrete specs relevant to the narrower active batch.
  - For `boocontext/` (1): move as-is since it's a new slug with no direct collision.

  **Must NOT do**:
  - Do NOT delete the content of the 2026-06-07 folders — merge, don't discard
  - Do NOT create duplicate batch slugs
  - Do NOT overwrite existing proposal content — append/extend

  **Recommended Agent Profile**:
  - **Category**: `writing`
  - **Skills**: `[]`
  - **Justification**: File organization + content merging — technical writing task

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 1 (with Tasks 1, 3-5)
  - **Blocks**: F1-F4
  - **Blocked By**: None

  **References**:
  - `openspec/changes/archived/2026-06-07-*/` — source folders
  - `openspec/changes/import-llm-evaluator/` — target for eval overlap
  - `openspec/changes/import-pregel-engine/` — target for graph overlap
  - `openspec/changes/orchestrator-flow-advanced/` — target for workflow overlap
  - `openspec/changes/memory-context/` — target for memory overlap
  - `openspec/changes/add-behavioral-engine/` — target for port patterns
  - `openspec/changes/audit-harness-integration/` — target for audit patterns

  **Acceptance Criteria**:
  - [ ] `openspec/changes/boocontext/` exists with proposal.md + tasks.md + design.md + specs/
  - [ ] `openspec/changes/import-llm-evaluator/` proposal.md now references eval-sandbox content
  - [ ] `openspec/changes/import-pregel-engine/` proposal.md now references graph engine content
  - [ ] `openspec/changes/orchestrator-flow-advanced/` proposal.md now references hybrid workflow
  - [ ] `openspec/changes/memory-context/` proposal.md now references context engineering
  - [ ] `openspec/changes/add-behavioral-engine/` and `audit-harness-integration/` now reference port patterns
  - [ ] `test ! -d openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/` for each moved folder

  **QA Scenarios**:
  ```
  Scenario: boocontext moved
    Tool: Bash
    Preconditions: Files moved
    Steps:
      1. test -f openspec/changes/boocontext/proposal.md
      2. test -f openspec/changes/boocontext/tasks.md
      3. test ! -f openspec/changes/archived/2026-06-07-boocontext/proposal.md
    Expected Result: Files exist in new location, not in old
    Evidence: .omo/evidence/task-2-boocontext-moved.txt
  ```
  ```
  Scenario: Merged proposals updated
    Tool: Bash
    Preconditions: Files merged
    Steps:
      1. grep -q "eval-sandbox\|graph engine\|hybrid workflow\|context engineering\|port patterns" openspec/changes/*/proposal.md
      2. Assert: each merged batch's proposal.md references the 2026-06-07 source
    Expected Result: grep finds references in the right target files
    Evidence: .omo/evidence/task-2-merges-verified.txt
  ```

  **Evidence to Capture**:
  - `task-2-boocontext-moved.txt`
  - `task-2-merges-verified.txt`

  **Commit**: YES (groups with Task 1)
  - Message: `chore(openspec): move 5 misplaced proposals from archived/ → changes/, merge overlapping content`
  - Files: openspec/changes/boocontext/*, openspec/changes/*/proposal.md, openspec/changes/*/tasks.md

- [ ] 3. Add .openspec.yaml to 6 missing batches

  **What to do**:
  - Create `.openspec.yaml` in each of these 6 active batches:
    - `enhanced-file-panel/`
    - `llama-cache-and-spec/`
    - `memory-v2-hybrid-search/`
    - `omo-paseo-bridge/`
    - `orchestrator-flow-advanced/`
    - `results-page/`
  - Each file must contain:
    ```yaml
    schema: spec-driven
    created: 2026-06-07
    ```

  **Must NOT do**:
  - Do NOT modify existing proposal.md or tasks.md content
  - Do NOT add .openspec.yaml to batches that already have one

  **Recommended Agent Profile**:
  - **Category**: `quick`
  - **Skills**: `[]`
  - **Justification**: Trivial boilerplate file creation

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 1 (with Tasks 1, 2, 4, 5)
  - **Blocks**: F1-F4
  - **Blocked By**: None

  **References**:
  - `openspec/changes/add-3tier-memory/.openspec.yaml` — template

  **Acceptance Criteria**:
  - [ ] All 6 created files contain `schema: spec-driven`
  - [ ] `find openspec/changes/ -name ".openspec.yaml" | wc -l` counts all expected files

  **QA Scenarios**:
  ```
  Scenario: All .openspec.yaml files present
    Tool: Bash
    Preconditions: Files created
    Steps:
      1. For each batch: test -f openspec/changes/{batch}/.openspec.yaml
      2. For each: grep -q "schema: spec-driven" openspec/changes/{batch}/.openspec.yaml
    Expected Result: All 6 files exist with correct content
    Evidence: .omo/evidence/task-3-openspec-yaml-added.txt
  ```

  **Evidence to Capture**:
  - `task-3-openspec-yaml-added.txt`

  **Commit**: YES (groups with Task 1)
  - Message: `chore(openspec): add .openspec.yaml to 6 missing batch folders`
  - Files: openspec/changes/enhanced-file-panel/.openspec.yaml, ...

- [ ] 4. Populate openspec/config.yaml with project context

  **What to do**:
  - Replace the empty `openspec/config.yaml` with a populated version:
    ```yaml
    schema: spec-driven

    context: |
      Tech stack: TypeScript, React 18, Vite, Tailwind v4, shadcn/ui, Fastify, PostgreSQL 16, pnpm workspaces
      Apps: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals), Orchestrator (multi-agent conductor)
      Infrastructure: Docker Compose, Tailscale (100.114.205.53), Authelia auth, llama-swap inference
      Monorepo: apps/server, apps/web, apps/booterm, apps/coder, packages/contracts
      Commits: conventional commits, strict TypeScript, NodeNext module resolution
      Testing: vitest (server + coder), Playwright (web E2E), no root tsconfig

    rules:
      proposal:
        - Every proposal must have a "Why" section explaining the motivation
        - Every proposal must have a "What Changes" section enumerating deliverables
        - Include "Must Have" / "Must NOT Have" guardrails
        - Reference shipped git tags when applicable
      tasks:
        - Tasks must be ordered by dependency, not priority
        - Each task is one atomic change (file, config, or command)
        - Parallel tasks go in the same wave
    ```

  **Must NOT do**:
  - Do NOT delete the `schema: spec-driven` line

  **Recommended Agent Profile**:
  - **Category**: `writing`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 1 (with Tasks 1-3, 5)
  - **Blocks**: F1-F4
  - **Blocked By**: None

  **References**:
  - `openspec/config.yaml` — current (empty) file
  - `/home/samkintop/opt/boocode/CLAUDE.md` — source for context info

  **Acceptance Criteria**:
  - [ ] `grep -q "context:" openspec/config.yaml` → success
  - [ ] `grep -q "rules:" openspec/config.yaml` → success
  - [ ] config.yaml has more than 50 bytes (was 20 bytes)

  **QA Scenarios**:
  ```
  Scenario: config.yaml populated
    Tool: Bash
    Preconditions: File written
    Steps:
      1. wc -c openspec/config.yaml → assert > 500 bytes
      2. grep -q "context:" openspec/config.yaml
      3. grep -q "rules:" openspec/config.yaml
      4. grep -q "schema: spec-driven" openspec/config.yaml
    Expected Result: All assertions pass
    Evidence: .omo/evidence/task-4-config-populated.txt
  ```

  **Evidence to Capture**:
  - `task-4-config-populated.txt`

  **Commit**: YES (groups with Task 1)
  - Message: `chore(openspec): populate config.yaml with project context and rules`
  - Files: openspec/config.yaml

- [ ] 5. Add shipped-status metadata to 10 archived folder entries

  **What to do**:
  - Add frontmatter or status line to each archived folder's proposal.md documenting the shipped version:
    - `agent-status-normalize/` → `v2.7.6`
    - `claude-sdk-sessionstore/` → `v2.7.5`
    - `contracts-ssot/` → `v2.7.13`
    - `license-debt-mit/` → `v2.7.0`
    - `mistake-tracker-file-ledger/` → `v2.7.4`
    - `orchestrator/` → `v2.7.17`
    - `sampling-streamjson-tokens/` → `v2.7.3`
    - `v2-3-provider-lifecycle/` → `v2.5.4`–`v2.5.13`
    - `v2-6-persistent-agent-sessions/` → `v2.6.4`–`v2.6.8`
    - `write-edit-robustness/` → `v2.7.1`
  - Add line after the `## Why` section heading: `**Shipped in:** \`v2.7.6-agent-status-normalize\`` (or equivalent)

  **Must NOT do**:
  - Do NOT change the body of the proposal beyond the shipped annotation
  - Do NOT add shipped annotations to the 2026-06-07 batches (they're not shipped)

  **Recommended Agent Profile**:
  - **Category**: `quick`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 1 (with Tasks 1-4)
  - **Blocks**: F1-F4
  - **Blocked By**: None

  **References**:
  - Git tags: `v2.7.0-mit`, `v2.7.1-write-edit-robustness`, etc.

  **Acceptance Criteria**:
  - [ ] All 10 archived batch proposals contain "Shipped in:" referencing a git tag
  - [ ] `grep -r "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l` = 10

  **QA Scenarios**:
  ```
  Scenario: All archived batches annotated
    Tool: Bash
    Preconditions: Files edited
    Steps:
      1. grep -rl "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l
      2. Assert: exactly 10 files contain "Shipped in:"
    Expected Result: 10 files annotated
    Evidence: .omo/evidence/task-5-shipped-annotations.txt
  ```

  **Evidence to Capture**:
  - `task-5-shipped-annotations.txt`

  **Commit**: YES (groups with Task 1)
  - Message: `chore(openspec): add shipped-in version annotations to 10 archived batch proposals`
  - Files: openspec/changes/archived/*/proposal.md

---

## TODOs (Wave 2)

- [ ] 6. llama-cache-and-spec — Enable KV cache quantization + ngram speculative decoding

  **What to do**:
  - Edit `apps/server/src/services/inference/providers/llama.ts` (or the llama args validator `llama-args-validator.ts`) to allow `--cache-type-k q4_0` and `--spec-type ngram-mod` through the shadowing lists
  - Change the base llama-server args to include:
    - `--cache-type-k q4_0` (4-bit KV cache, ~4× VRAM reduction)
    - `--spec-type ngram-mod` (ngram speculative decoding, 2-3× tok/s on code)
  - Verify the sidecar validator (`sidecar/validator.go`) also allows these flags through
  - Read `apps/server/src/services/inference/llama-args-validator.ts` and `sidecar/validator.go` to understand the current blocklist
  - Add the two flags to the allowlist instead of the shadow list
  - Update the sidecar Dockerfile or config if needed

  **Must NOT do**:
  - Do NOT change any other llama-server args
  - Do NOT enable KV cache quantization for Q8_0 or Q3_K (only Q4_0)
  - Do NOT add a separate draft model (ngram is self-contained)

  **Recommended Agent Profile**:
  - **Category**: `unspecified-high`
  - **Skills**: `[]`
  - **Justification**: Requires understanding llama.cpp arg validation across two codebases

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 2 (with Tasks 7-8)
  - **Blocks**: F1-F4
  - **Blocked By**: Task 1-5 (Wave 1)

  **References**:
  - `apps/server/src/services/inference/llama-args-validator.ts` — current arg blocklist/allowlist
  - `sidecar/validator.go` — sidecar validation (if exists)
  - `docker-compose.yml` or sidecar Dockerfile — restart config
  - `openspec/changes/llama-cache-and-spec/proposal.md` — full spec

  **Acceptance Criteria**:
  - [ ] `--cache-type-k q4_0` present in llama-server args after restart
  - [ ] `--spec-type ngram-mod` present in llama-server args after restart
  - [ ] llama-server starts without errors
  - [ ] Inference still works (send test message)

  **QA Scenarios**:
  ```
  Scenario: KV cache quantization enabled
    Tool: Bash
    Preconditions: Server restarted after changes
    Steps:
      1. ps aux | grep llama-server | grep -o "cache-type-k q4_0"
      2. Assert: output matches "q4_0"
    Expected Result: KV cache quantization is active
    Evidence: .omo/evidence/task-6-kv-cache-enabled.txt
  ```
  ```
  Scenario: Speculative decoding enabled
    Tool: Bash
    Preconditions: Server restarted
    Steps:
      1. ps aux | grep llama-server | grep -o "spec-type ngram-mod"
      2. Assert: output matches "ngram-mod"
    Expected Result: Ngram speculative decoding is active
    Evidence: .omo/evidence/task-6-ngram-enabled.txt
  ```
  ```
  Scenario: Inference still works
    Tool: Bash (curl)
    Preconditions: Server running with new flags
    Steps:
      1. curl -s -o /dev/null -w "%{http_code}" http://100.114.205.53:9500/api/health
      2. Assert: HTTP 200
    Expected Result: Server is healthy and serving
    Evidence: .omo/evidence/task-6-health-check.txt
  ```

  **Evidence to Capture**:
  - `task-6-kv-cache-enabled.txt` — grep output showing the flag
  - `task-6-ngram-enabled.txt` — grep output showing the flag
  - `task-6-health-check.txt` — health check confirmation

  **Commit**: YES
  - Message: `perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding`
  - Files: apps/server/src/services/inference/llama-args-validator.ts, sidecar/validator.go (if needed)

- [ ] 7. pty-enhancements — PTY exit notifications + session metadata

  **What to do**:
  - Add `notifyOnExit` support to the PTY session manager (likely in `apps/booterm/`)
  - When a PTY process exits AND `notifyOnExit` was set:
    - Emit an event/message to the agent channel with: session ID, title, exit code, total output lines, last line of output
  - Add session metadata fields: agent ID that spawned it, task ID, optional title
  - Add `pty_list` endpoint that returns metadata for all sessions
  - Wire `X-Agent-Flags` header support for agent identification
  - Read `apps/booterm/` to understand the current PTY architecture

  **Must NOT do**:
  - Do NOT change the existing pty_spawn interface (add notifyOnExit as optional param)
  - Do NOT implement sandbox or circuit breaker (out of scope for this wave)
  - Do NOT add new database tables (metadata lives in-memory or in existing session store)

  **Recommended Agent Profile**:
  - **Category**: `unspecified-high`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 2 (with Tasks 6, 8)
  - **Blocks**: F1-F4
  - **Blocked By**: Task 1-5 (Wave 1)

  **References**:
  - `apps/booterm/src/` — PTY session management code
  - `apps/coder/src/services/` — agent dispatch that spawns PTYs
  - `openspec/changes/pty-enhancements/proposal.md` — full spec
  - `apps/server/src/services/inference/` — inference pipeline that may need to handle notifications

  **Acceptance Criteria**:
  - [ ] `notifyOnExit` optional parameter on pty_spawn works
  - [ ] On process exit with notifyOnExit=true, agent receives notification
  - [ ] `pty_list` returns session metadata
  - [ ] `X-Agent-Flags` header is recognized

  **QA Scenarios**:
  ```
  Scenario: notifyOnExit triggers notification
    Tool: Bash + tmux
    Preconditions: booterm running
    Steps:
      1. Start a short PTY with notifyOnExit=true: sleep 1
      2. Wait 2 seconds for completion
      3. Check notification was delivered
    Expected Result: Exit notification received with title, exit code, last line
    Evidence: .omo/evidence/task-7-notify-on-exit.txt
  ```
  ```
  Scenario: pty_list shows metadata
    Tool: Bash (curl)
    Preconditions: PTY sessions exist
    Steps:
      1. curl http://localhost:9501/api/pty/list 2>/dev/null
      2. Assert: response contains session metadata fields
    Expected Result: Metadata returned for each session
    Evidence: .omo/evidence/task-7-pty-list.txt
  ```

  **Evidence to Capture**:
  - `task-7-notify-on-exit.txt` — notification evidence
  - `task-7-pty-list.txt` — pty_list response

  **Commit**: YES
  - Message: `feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags`
  - Files: apps/booterm/src/*.ts, apps/coder/src/services/*.ts

- [ ] 8. token-analyzer-ui — Backend API endpoints for token analytics

  **What to do**:
  - Add read-only API endpoints to serve aggregate token data:
    - `GET /api/coder/token-analytics/sessions` — per-session token usage (input, output, cost)
    - `GET /api/coder/token-analytics/tools` — per-tool cost breakdown (from tool_cost_stats view)
    - `GET /api/coder/token-analytics/trends` — token usage over time
  - Reuse existing data sources:
    - `agent_sessions.input_tokens`, `agent_sessions.output_tokens`, `agent_sessions.cost`
    - `tool_cost_stats` view (per-tool 100-call rolling window)
    - `tasks.token_breakdown` JSONB column
  - Implement in `apps/coder/src/routes/` (follow existing route patterns)
  - Add proper error handling, pagination for large result sets, and date filtering

  **Must NOT do**:
  - Do NOT create new database tables or migrations
  - Do NOT add token tracking logic (data is already accumulated)
  - Do NOT add real-time streaming (data is historical aggregate)

  **Recommended Agent Profile**:
  - **Category**: `unspecified-high`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 2 (with Tasks 6-7)
  - **Blocks**: Task 10 (frontend depends on backend)
  - **Blocked By**: Task 1-5 (Wave 1)

  **References**:
  - `apps/coder/src/routes/` — existing route patterns
  - `apps/server/src/schema.sql` — `tool_cost_stats` view definition
  - `apps/coder/CLAUDE.md` — coder conventions, route registration
  - `packages/contracts/` — shared types for response schemas
  - `openspec/changes/token-analyzer-ui/proposal.md` — full spec

  **Acceptance Criteria**:
  - [ ] `GET /api/coder/token-analytics/sessions?project_id=X` returns 200 with token data
  - [ ] `GET /api/coder/token-analytics/tools?project_id=X` returns 200 with tool breakdown
  - [ ] `GET /api/coder/token-analytics/trends?project_id=X` returns 200 with trend data
  - [ ] All endpoints respect `project_id` filtering
  - [ ] Empty data returns valid empty arrays (not errors)

  **QA Scenarios**:
  ```
  Scenario: Sessions endpoint works
    Tool: Bash (curl)
    Preconditions: Server running, project exists
    Steps:
      1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1"
      2. Assert: HTTP 200
      3. Assert: response is valid JSON with expected fields
    Expected Result: Session token data returned
    Evidence: .omo/evidence/task-8-sessions-endpoint.txt
  ```
  ```
  Scenario: Empty data returns valid response
    Tool: Bash (curl)
    Preconditions: Server running
    Steps:
      1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=999"
      2. Assert: HTTP 200
      3. Assert: response contains empty array (not error)
    Expected Result: Graceful empty state
    Evidence: .omo/evidence/task-8-empty-data.txt
  ```

  **Evidence to Capture**:
  - `task-8-sessions-endpoint.txt` — successful API response
  - `task-8-empty-data.txt` — graceful empty handling

  **Commit**: YES
  - Message: `feat(coder): add token-analytics API endpoints for session/tool/trend data`
  - Files: apps/coder/src/routes/token-analytics.ts, apps/coder/src/services/token-analytics.ts

---

## TODOs (Wave 3)

- [ ] 9. results-page — /results route for orchestrator runs + arena battles

  **What to do**:
  - Add sidebar nav button with `ScrollText` icon (lucide-react), **above** the Token Analytics button
  - Create new `/results` route page with two tabs:
    - "Analysis Runs" — list orchestrator flow runs (research, code-review, investigate, etc.)
    - "Arena Battles" — list battle history
  - Each tab shows: status dot, name/type, band/battle-type, model, timing, error indicator
  - Completed runs show "View Report" link; completed battles show "View Analysis"
  - Uses existing API endpoints (no backend changes needed):
    - `GET /api/coder/runs?project_id=X`
    - `GET /api/coder/battles?project_id=X`
  - Requires `project_id` context — load from sidebar on mount, or show project selector
  - Follow existing route patterns in web (React Router routes, lazy loading)

  **Must NOT do**:
  - Do NOT create new API endpoints
  - Do NOT modify existing API contracts
  - Do NOT add pagination beyond what the API already provides
  - Do NOT add real-time updates (static list, refreshed on mount)

  **Recommended Agent Profile**:
  - **Category**: `visual-engineering`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 3 (with Tasks 10-11)
  - **Blocks**: F1-F4
  - **Blocked By**: Task 1-5 (Wave 1)

  **References**:
  - `apps/web/src/routes/` — existing route patterns (analytics, settings)
  - `apps/web/src/components/sidebar/` — nav button patterns
  - `apps/web/src/api/` — existing API client
  - `openspec/changes/results-page/proposal.md` — full spec
  - `apps/coder/src/routes/runs.ts` — runs endpoint
  - `apps/coder/src/routes/battles.ts` — battles endpoint

  **Acceptance Criteria**:
  - [ ] Sidebar shows "Results" button with ScrollText icon above Token Analytics
  - [ ] Clicking navigates to `/results`
  - [ ] "Analysis Runs" tab loads and displays orchestrator flow history
  - [ ] "Arena Battles" tab loads and displays battle history
  - [ ] Completed runs show "View Report" link
  - [ ] Empty state shown when no data
  - [ ] Error state shown on API failure

  **QA Scenarios**:
  ```
  Scenario: Nav button renders
    Tool: Playwright
    Preconditions: Web app loaded
    Steps:
      1. Navigate to /
      2. Look for sidebar nav button with text "Results"
      3. Assert: button exists and links to /results
    Expected Result: Results nav button present
    Evidence: .omo/evidence/task-9-nav-button.png
  ```
  ```
  Scenario: Results page loads
    Tool: Playwright
    Preconditions: Web app loaded, project exists
    Steps:
      1. Navigate to /results
      2. Wait for "Analysis Runs" tab to appear
      3. Assert: tab shows list of runs or empty state
    Expected Result: Page loads with data
    Evidence: .omo/evidence/task-9-results-page.png
  ```

  **Evidence to Capture**:
  - `task-9-nav-button.png` — screenshot of sidebar with Results button
  - `task-9-results-page.png` — screenshot of /results page with data

  **Commit**: YES
  - Message: `feat(web): add /results page for orchestrator runs and arena battle history`
  - Files: apps/web/src/routes/results.tsx, apps/web/src/components/sidebar/*.tsx

- [ ] 10. token-analyzer-ui — /analytics dashboard route

  **What to do**:
  - Add sidebar nav button with appropriate icon, **above Settings** button
  - Create new `/analytics` route page showing token usage dashboard:
    - Aggregate token usage across sessions (total input/output tokens)
    - Per-tool cost breakdown (bar chart or table)
    - Per-session token history (list or mini chart)
    - Per-provider cost comparison
  - Reuse existing data from the backend endpoints created in Task 8
  - Follow the same route/nav patterns as results-page

  **Must NOT do**:
  - Do NOT add new charting libraries (use what's already available)
  - Do NOT implement real-time updates

  **Recommended Agent Profile**:
  - **Category**: `visual-engineering`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 3 (with Tasks 9, 11)
  - **Blocks**: F1-F4
  - **Blocked By**: Tasks 1-5 (Wave 1), Task 8 (backend endpoints)

  **References**:
  - Same as Task 9 + Task 8 endpoints
  - `openspec/changes/token-analyzer-ui/proposal.md` — full spec
  - `apps/web/src/components/` — existing chart/list components

  **Acceptance Criteria**:
  - [ ] Sidebar shows "Token Analytics" button above Settings
  - [ ] `/analytics` loads and shows token dashboard
  - [ ] Per-session, per-tool, per-provider breakdowns visible
  - [ ] Empty state shown when no data

  **QA Scenarios**:
  ```
  Scenario: Token Analytics nav button renders
    Tool: Playwright
    Preconditions: Web app loaded
    Steps:
      1. Navigate to /
      2. Look for "Token Analytics" button in sidebar
      3. Assert: button exists above Settings
    Expected Result: Nav button present
    Evidence: .omo/evidence/task-10-nav-button.png
  ```
  ```
  Scenario: Analytics dashboard loads
    Tool: Playwright
    Preconditions: Web app loaded
    Steps:
      1. Navigate to /analytics
      2. Wait for dashboard content to render
      3. Assert: token usage data is visible
    Expected Result: Dashboard shows data
    Evidence: .omo/evidence/task-10-analytics-dashboard.png
  ```

  **Evidence to Capture**:
  - `task-10-nav-button.png`
  - `task-10-analytics-dashboard.png`

  **Commit**: YES
  - Message: `feat(web): add /analytics route for token usage dashboard`
  - Files: apps/web/src/routes/analytics.tsx, apps/web/src/components/sidebar/*.tsx

- [ ] 11. enhanced-file-panel — Side-by-side diff, hide whitespace, wrap lines, expand/collapse all

  **What to do**:
  - Add side-by-side diff toggle to the Git diff tab in the file panel
  - Add "Hide whitespace" checkbox that filters whitespace-only changes
  - Add "Wrap long lines" toggle for diff display
  - Add "Expand All" / "Collapse All" buttons for file-level diffs
  - Implement in `apps/web/src/components/` following existing file panel patterns
  - Read `apps/web/src/components/` to find the existing diff rendering components

  **Must NOT do**:
  - Do NOT implement inline diff comments (deferred)
  - Do NOT implement in-browser file editing (deferred)
  - Do NOT change the backend diff generation logic

  **Recommended Agent Profile**:
  - **Category**: `visual-engineering`
  - **Skills**: `[]`

  **Parallelization**:
  - **Can Run In Parallel**: YES
  - **Parallel Group**: Wave 3 (with Tasks 9-10)
  - **Blocks**: F1-F4
  - **Blocked By**: Task 1-5 (Wave 1)

  **References**:
  - `apps/web/src/components/` — existing file panel and diff components
  - `apps/web/src/hooks/` — hooks for diff state management
  - `openspec/changes/enhanced-file-panel/proposal.md` — full spec
  - `apps/server/src/routes/projects.ts` — git diff backend route

  **Acceptance Criteria**:
  - [ ] Side-by-side diff toggles correctly
  - [ ] Hide whitespace checkbox filters whitespace changes
  - [ ] Wrap long lines toggle works
  - [ ] Expand/Collapse All buttons toggle all files
  - [ ] All changes are frontend-only (no new API calls)

  **QA Scenarios**:
  ```
  Scenario: Side-by-side diff renders
    Tool: Playwright
    Preconditions: Repo with uncommitted changes
    Steps:
      1. Open file panel
      2. Click Git tab
      3. Toggle side-by-side view
      4. Assert: diff renders in two columns
    Expected Result: Side-by-side diff visible
    Evidence: .omo/evidence/task-11-side-by-side.png
  ```
  ```
  Scenario: Hide whitespace works
    Tool: Playwright
    Preconditions: Diff has whitespace changes
    Steps:
      1. Open diff with whitespace changes
      2. Check "Hide whitespace"
      3. Assert: only-whitespace hunks hidden
    Expected Result: Whitespace-only changes filtered
    Evidence: .omo/evidence/task-11-hide-whitespace.png
  ```
  ```
  Scenario: Expand/Collapse All toggles
    Tool: Playwright
    Preconditions: Multiple files changed
    Steps:
      1. Click "Collapse All"
      2. Assert: all files collapsed to summary
      3. Click "Expand All"
      4. Assert: all files expanded
    Expected Result: Bulk toggle works
    Evidence: .omo/evidence/task-11-expand-collapse.png
  ```

  **Evidence to Capture**:
  - `task-11-side-by-side.png`
  - `task-11-hide-whitespace.png`
  - `task-11-expand-collapse.png`

  **Commit**: YES
  - Message: `feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse all`
  - Files: apps/web/src/components/*.tsx, apps/web/src/hooks/*.ts

---

## Final Verification Wave

- [ ] F1. **Plan Compliance Audit** — `oracle`
  Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, curl endpoint, run command). For each "Must NOT Have": search codebase for forbidden patterns — reject with file:line if found. Check evidence files exist in .omo/evidence/. Compare deliverables against plan.
  Output: `Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT`

- [ ] F2. **Code Quality Review** — `unspecified-high`
  Run `tsc --noEmit` for any changed apps + `bun test`. Review all changed files for: `as any`/`@ts-ignore`, empty catches, console.log in prod, commented-out code, unused imports.
  Output: `Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT`

- [ ] F3. **Real Manual QA** — `unspecified-high`
  Start from clean state. Execute EVERY QA scenario from EVERY task — follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, missing project_id. Save to `.omo/evidence/final-qa/`.
  Output: `Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT`

- [ ] F4. **Scope Fidelity Check** — `deep`
  For each task: read "What to do", read actual diff. Verify 1:1 — everything in scope was built (no missing), nothing beyond scope was built (no creep). Check "Must NOT do" compliance.
  Output: `Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT`

---

## Commit Strategy

- **1-5** (grouped): `chore(openspec): cleanup openspec folder structure — delete stubs, move proposals, add metadata, populate config`
- **6**: `perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding`
- **7**: `feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags`
- **8**: `feat(coder): add token-analytics API endpoints`
- **9**: `feat(web): add /results page for orchestrator runs + arena battles`
- **10**: `feat(web): add /analytics token usage dashboard`
- **11**: `feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse`

---

## Success Criteria

### Verification Commands
```bash
# OpenSpec cleanup
test ! -f openspec/changes/archived/v1.13.12-skills-audit.md
test -d openspec/changes/boocontext/
test -f openspec/changes/enhanced-file-panel/.openspec.yaml
grep -q "context:" openspec/config.yaml

# llama-cache-and-spec
ps aux | grep llama-server | grep -o "cache-type-k q4_0"
ps aux | grep llama-server | grep -o "spec-type ngram-mod"

# PTY enhancements
curl -s http://localhost:9501/api/pty/list | jq '.'

# Results page
curl -s "http://localhost:3000/api/coder/runs?project_id=1" | jq '.'

# Token analytics
curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1" | jq '.'

# Enhanced file panel
# (visual verification via Playwright)
```

### Final Checklist
- [ ] 11 stub files deleted from archived/
- [ ] 5 misplaced proposals moved/merged into changes/
- [ ] 6 .openspec.yaml files added
- [ ] config.yaml populated with context + rules
- [ ] 10 archived proposals annotated with shipped versions
- [ ] llama-server running with KV cache Q4_0 + ngram
- [ ] PTY exit notifications working
- [ ] `/results` page renders and loads data
- [ ] `/analytics` page renders and loads data
- [ ] Side-by-side diff, hide whitespace, wrap lines, expand/collapse all working
- [ ] All type checks pass
- [ ] All QA scenarios pass