Files
boocode/.omo/plans/openspec-cleanup.md
indifferentketchup 02063072ab chore: add ion package, codesight wiki, work plans, ascli config
New @boocode/ion package (v0.0.1) for inference optimization network.
.codesight/ wiki artifacts for codebase documentation.
.omo/ work plans for openspec cleanup and enhanced file panel.
2026-06-07 22:16:45 +00:00

41 KiB
Raw Blame History

Openspec Cleanup & High-Value Batch Implementation

TL;DR

Quick Summary: Clean up the openspec/ folder structure (delete 11 stub files, move 5 misplaced proposals, add missing .openspec.yaml files), then implement 5 high-value batches: llama-cache-and-spec, pty-enhancements, results-page, token-analyzer-ui, and enhanced-file-panel.

Deliverables:

  • Clean openspec folder: stubs removed, archived/ accurate, all batches schema-compliant
  • llama-server KV cache quantization + ngram speculative decoding enabled
  • PTY exit notifications and session metadata
  • /results page for orchestrator runs and arena battles (new route)
  • /analytics page for token usage dashboard (new route)
  • Enhanced file panel: side-by-side diff, hide whitespace, wrap lines, expand/collapse all

Estimated Effort: Medium-Large Parallel Execution: YES — 3 waves + final verification Critical Path: Cleanup → Backend impls → Frontend impls → Integration


Context

Original Request

Analyze openspec/ folder for structural issues, cross-reference against git tags, and create a work plan for implementing the high-value openspec batch proposals.

Interview Summary

Key Discussions:

  • openspec/changes/ has 22 active batches (all uncommitted, all unshipped) plus archived/ with 29 entries
  • 11 stub files in archived/ are pure noise (49-66 bytes each, "Status: Shipped. Archived." only)
  • 5 misplaced 2026-06-07 proposals were dumped in archived/ — they're active design docs, not shipped batches
  • 6 active batches missing .openspec.yaml; openspec/config.yaml is empty
  • Active proposals overlap: multiple batches cover evaluation, memory, and workflow engine territory

Research Findings:

  • Git tag cross-reference confirms all folder-based archived entries match shipped tags
  • 3 stub files reference wrong tags (v1.13.12→v1.13.14, v1.14.x→v1.13.19, etc.)
  • All 22 active batches have zero git references — pure filesystem artifacts
  • No active batch has shipped yet — zero can be archived

Metis Review

Identified gaps:

  1. Deduplication needed: 2026-06-07 proposals overlap with active changes/ — merging must happen before cleanup is complete
  2. Prioritization needed: 22 batches can't all ship at once — need clear tiers
  3. User sign-off needed: Which Tier 1-2 batches to include in this plan vs defer

Work Objectives

Core Objective

Restore openspec structural integrity and ship the 5 highest-value, lowest-effort batch proposals.

Concrete Deliverables

  • Clean openspec: stubs deleted (11 files ~573 bytes), misplaced proposals moved (5 folders), .openspec.yaml files added (6 batches), config.yaml populated
  • llama-cache-and-spec: KV cache quantization (Q4_0) + ngram speculative decoding enabled
  • pty-enhancements: PTY exit notifications, session metadata, X-Agent-Flags
  • results-page: /results route with Analysis Runs + Arena Battles tabs
  • token-analyzer-ui: /analytics route with token usage dashboard
  • enhanced-file-panel: side-by-side diff toggle, hide whitespace, wrap long lines, expand/collapse all

Must Have

  • All 11 stub files removed from archived/
  • 5 misplaced 2026-06-07 proposals moved from archived/ into changes/ (or merged into existing batches)
  • .openspec.yaml added to all 6 missing batches
  • openspec/config.yaml gets a context: block and rules: block
  • llama-server restarts with new flags (verify via ps aux | grep llama)
  • /results page loads without 404 and shows real data from existing API endpoints
  • /analytics page loads and shows token aggregates
  • Side-by-side diff renders correctly for files with wide lines

Must NOT Have (Guardrails)

  • NO breaking changes to existing routes or API contracts
  • NO new database tables or migrations (all data sources already exist)
  • NO external API dependencies (no cloud embedding models)
  • NO behavioral engine or Pregel state machine work (deferred to future batch)
  • NO touching the conductor flow runner or orchestrator pipeline
  • NO CSS framework changes (stay on Tailwind v4 / shadcn/ui)
  • NO backend changes unless explicitly required by the batch scope

Spec Framework Integration

  • Detected Framework: OpenSpec (folder structure only — no CLI)
  • Config File: openspec/config.yaml
  • Active Specs: 22 batch folders in openspec/changes/
  • Available Commands: Manual folder/file operations (no OpenSpec CLI)

Verification Strategy

ZERO HUMAN INTERVENTION — ALL verification is agent-executed.

Test Decision

  • Infrastructure exists: YES (vitest in apps/server, apps/coder)
  • Automated tests: Tests-after (no TDD — these are config/frontend changes)
  • Framework: vitest for backend, Playwright for frontend verification

QA Policy

Every task includes agent-executed QA scenarios. Evidence saved to .omo/evidence/.

  • Frontend: Playwright — navigate, assert DOM elements, screenshot
  • Backend: Bash (curl) — send requests, assert status + response
  • Config/Restart: Bash — check processes, verify new flags
  • File operations: Bash — verify files exist/deleted with test -f / test ! -f

Execution Strategy

Wave 1 (Structural Cleanup — quick, MAX PARALLEL):
├── Task 1: Delete 11 stub files from archived/ [quick]
├── Task 2: Move 5 misplaced 2026-06-07 proposals → changes/ [quick]
├── Task 3: Add .openspec.yaml to 6 missing batches [quick]
├── Task 4: Populate openspec/config.yaml with project context [quick]
├── Task 5: Add shipped status metadata to archived/ entries [writing]

Wave 2 (Backend — moderate, MAX PARALLEL):
├── Task 6: llama-cache-and-spec — KV cache + ngram flags [quick]
├── Task 7: pty-enhancements — exit notifications + session metadata [unspecified-high]
├── Task 8: token-analyzer-ui — backend API endpoints [unspecified-high]

Wave 3 (Frontend — moderate, MAX PARALLEL):
├── Task 9: results-page — /results route [visual-engineering]
├── Task 10: token-analyzer-ui — /analytics route [visual-engineering]
├── Task 11: enhanced-file-panel — diff modes + UI [visual-engineering]

Wave FINAL (Verification — 4 parallel reviews):
├── Task F1: Plan compliance audit [oracle]
├── Task F2: Code quality + type check [unspecified-high]
├── Task F3: Real QA — execute every scenario [unspecified-high + playwright]
└── Task F4: Scope fidelity check [deep]

Critical Path: Cleanup → Backend → Frontend → Integration
Parallel Speedup: ~60% faster than sequential
Max Concurrent: 4 (Wave 2 & 3)

TODOs

  • 1. Delete 11 stub files from archived/

    What to do:

    • Remove these 11 files from openspec/changes/archived/:
      • v1.13.12-skills-audit.md (57B, wrong tag ref)
      • v1.13.15-codecontext-synth.md (62B)
      • v1.13.17-cross-repo-reads.md (61B)
      • v1.13.18-codecontext-file-path.md (66B)
      • v1.13.20-drop-legacy-cols.md (61B)
      • v1.14-outer-loop.md (52B)
      • v1.14.1-mcp-poc.md (51B)
      • v1.14.x-html-artifact-panes.md (63B, wrong tag ref)
      • v1.15-mcp-multi.md (51B)
      • v2.0-boocoder.md (49B)
      • v2.2-paseo-providers.md (222B)
    • Each file contains ONLY "# Title\n\nStatus: Shipped. Archived.\n" — zero documentation value
    • Git history preserves the knowledge; CHANGELOG.md + tags are the authoritative record

    Must NOT do:

    • Do NOT delete any folder-based archived entries (they have real content)
    • Do NOT delete boocode_batch10.md or handoff files (they're valuable)

    Recommended Agent Profile:

    • Category: quick
    • Skills: []
    • Justification: Trivial file deletion — no domain skills needed

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 2-5)
    • Blocks: F1-F4
    • Blocked By: None

    References:

    • openspec/changes/archived/ — target directory
    • openspec/README.md — schema definition
    • ~/.gitconfig — no special config needed

    Acceptance Criteria:

    • test ! -f openspec/changes/archived/v1.13.12-skills-audit.md → success for all 11 files
    • ls openspec/changes/archived/*.md shows only allowed files (boocode_batch10.md, handoff_*)

    QA Scenarios:

    Scenario: Verify stubs deleted
      Tool: Bash
      Preconditions: Clean working tree
      Steps:
        1. For each stub file, run: test ! -f openspec/changes/archived/{filename}
        2. Assert: all 11 commands return exit code 0 (file does not exist)
        3. List remaining .md files: ls openspec/changes/archived/*.md
        4. Assert: only boocode_batch10.md and handoff_*.md files remain
      Expected Result: 11 stubs absent, 3 valuable files present
      Evidence: .omo/evidence/task-1-stubs-deleted.txt
    
    Scenario: Valuable files preserved
      Tool: Bash
      Preconditions: Stubs deleted
      Steps:
        1. test -f openspec/changes/archived/boocode_batch10.md
        2. test -f openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
        3. test -f openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
      Expected Result: All 3 return exit code 0
      Evidence: .omo/evidence/task-1-valuables-preserved.txt
    

    Evidence to Capture:

    • task-1-stubs-deleted.txt — confirmation each stub is gone
    • task-1-valuables-preserved.txt — confirmation valuable files remain

    Commit: YES

    • Message: chore(openspec): delete 11 stub archive files with zero documentation value
    • Files: openspec/changes/archived/v1.13.12-skills-audit.md, ...
  • 2. Move 5 misplaced 2026-06-07 proposals from archived/ to changes/

    What to do:

    • Move these 5 folders from openspec/changes/archived/2026-06-07-* to openspec/changes/*:
      1. archived/2026-06-07-boocontext/changes/boocontext/ (partially shipped in v2.8.0)
      2. archived/2026-06-07-eval-sandbox-agent-runtime/ → merge into changes/import-llm-evaluator/ and changes/import-pregel-engine/ (overlapping scope)
      3. archived/2026-06-07-hybrid-workflow-engine/ → merge into changes/orchestrator-flow-advanced/
      4. archived/2026-06-07-memory-context-engineering/ → merge into changes/memory-context/
      5. archived/2026-06-07-port-audit-parlant-patterns/ → merge into changes/add-behavioral-engine/ and changes/audit-harness-integration/
    • For merges (2-5): append relevant content from the 2026-06-07 proposal into the existing batch's proposal.md, tasks.md, design.md. The 2026-06-07 versions are "grand vision" — extract the concrete specs relevant to the narrower active batch.
    • For boocontext/ (1): move as-is since it's a new slug with no direct collision.

    Must NOT do:

    • Do NOT delete the content of the 2026-06-07 folders — merge, don't discard
    • Do NOT create duplicate batch slugs
    • Do NOT overwrite existing proposal content — append/extend

    Recommended Agent Profile:

    • Category: writing
    • Skills: []
    • Justification: File organization + content merging — technical writing task

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1, 3-5)
    • Blocks: F1-F4
    • Blocked By: None

    References:

    • openspec/changes/archived/2026-06-07-*/ — source folders
    • openspec/changes/import-llm-evaluator/ — target for eval overlap
    • openspec/changes/import-pregel-engine/ — target for graph overlap
    • openspec/changes/orchestrator-flow-advanced/ — target for workflow overlap
    • openspec/changes/memory-context/ — target for memory overlap
    • openspec/changes/add-behavioral-engine/ — target for port patterns
    • openspec/changes/audit-harness-integration/ — target for audit patterns

    Acceptance Criteria:

    • openspec/changes/boocontext/ exists with proposal.md + tasks.md + design.md + specs/
    • openspec/changes/import-llm-evaluator/ proposal.md now references eval-sandbox content
    • openspec/changes/import-pregel-engine/ proposal.md now references graph engine content
    • openspec/changes/orchestrator-flow-advanced/ proposal.md now references hybrid workflow
    • openspec/changes/memory-context/ proposal.md now references context engineering
    • openspec/changes/add-behavioral-engine/ and audit-harness-integration/ now reference port patterns
    • test ! -d openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/ for each moved folder

    QA Scenarios:

    Scenario: boocontext moved
      Tool: Bash
      Preconditions: Files moved
      Steps:
        1. test -f openspec/changes/boocontext/proposal.md
        2. test -f openspec/changes/boocontext/tasks.md
        3. test ! -f openspec/changes/archived/2026-06-07-boocontext/proposal.md
      Expected Result: Files exist in new location, not in old
      Evidence: .omo/evidence/task-2-boocontext-moved.txt
    
    Scenario: Merged proposals updated
      Tool: Bash
      Preconditions: Files merged
      Steps:
        1. grep -q "eval-sandbox\|graph engine\|hybrid workflow\|context engineering\|port patterns" openspec/changes/*/proposal.md
        2. Assert: each merged batch's proposal.md references the 2026-06-07 source
      Expected Result: grep finds references in the right target files
      Evidence: .omo/evidence/task-2-merges-verified.txt
    

    Evidence to Capture:

    • task-2-boocontext-moved.txt
    • task-2-merges-verified.txt

    Commit: YES (groups with Task 1)

    • Message: chore(openspec): move 5 misplaced proposals from archived/ → changes/, merge overlapping content
    • Files: openspec/changes/boocontext/, openspec/changes//proposal.md, openspec/changes/*/tasks.md
  • 3. Add .openspec.yaml to 6 missing batches

    What to do:

    • Create .openspec.yaml in each of these 6 active batches:
      • enhanced-file-panel/
      • llama-cache-and-spec/
      • memory-v2-hybrid-search/
      • omo-paseo-bridge/
      • orchestrator-flow-advanced/
      • results-page/
    • Each file must contain:
      schema: spec-driven
      created: 2026-06-07
      

    Must NOT do:

    • Do NOT modify existing proposal.md or tasks.md content
    • Do NOT add .openspec.yaml to batches that already have one

    Recommended Agent Profile:

    • Category: quick
    • Skills: []
    • Justification: Trivial boilerplate file creation

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1, 2, 4, 5)
    • Blocks: F1-F4
    • Blocked By: None

    References:

    • openspec/changes/add-3tier-memory/.openspec.yaml — template

    Acceptance Criteria:

    • All 6 created files contain schema: spec-driven
    • find openspec/changes/ -name ".openspec.yaml" | wc -l counts all expected files

    QA Scenarios:

    Scenario: All .openspec.yaml files present
      Tool: Bash
      Preconditions: Files created
      Steps:
        1. For each batch: test -f openspec/changes/{batch}/.openspec.yaml
        2. For each: grep -q "schema: spec-driven" openspec/changes/{batch}/.openspec.yaml
      Expected Result: All 6 files exist with correct content
      Evidence: .omo/evidence/task-3-openspec-yaml-added.txt
    

    Evidence to Capture:

    • task-3-openspec-yaml-added.txt

    Commit: YES (groups with Task 1)

    • Message: chore(openspec): add .openspec.yaml to 6 missing batch folders
    • Files: openspec/changes/enhanced-file-panel/.openspec.yaml, ...
  • 4. Populate openspec/config.yaml with project context

    What to do:

    • Replace the empty openspec/config.yaml with a populated version:
      schema: spec-driven
      
      context: |
        Tech stack: TypeScript, React 18, Vite, Tailwind v4, shadcn/ui, Fastify, PostgreSQL 16, pnpm workspaces
        Apps: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals), Orchestrator (multi-agent conductor)
        Infrastructure: Docker Compose, Tailscale (100.114.205.53), Authelia auth, llama-swap inference
        Monorepo: apps/server, apps/web, apps/booterm, apps/coder, packages/contracts
        Commits: conventional commits, strict TypeScript, NodeNext module resolution
        Testing: vitest (server + coder), Playwright (web E2E), no root tsconfig
      
      rules:
        proposal:
          - Every proposal must have a "Why" section explaining the motivation
          - Every proposal must have a "What Changes" section enumerating deliverables
          - Include "Must Have" / "Must NOT Have" guardrails
          - Reference shipped git tags when applicable
        tasks:
          - Tasks must be ordered by dependency, not priority
          - Each task is one atomic change (file, config, or command)
          - Parallel tasks go in the same wave
      

    Must NOT do:

    • Do NOT delete the schema: spec-driven line

    Recommended Agent Profile:

    • Category: writing
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1-3, 5)
    • Blocks: F1-F4
    • Blocked By: None

    References:

    • openspec/config.yaml — current (empty) file
    • /home/samkintop/opt/boocode/CLAUDE.md — source for context info

    Acceptance Criteria:

    • grep -q "context:" openspec/config.yaml → success
    • grep -q "rules:" openspec/config.yaml → success
    • config.yaml has more than 50 bytes (was 20 bytes)

    QA Scenarios:

    Scenario: config.yaml populated
      Tool: Bash
      Preconditions: File written
      Steps:
        1. wc -c openspec/config.yaml → assert > 500 bytes
        2. grep -q "context:" openspec/config.yaml
        3. grep -q "rules:" openspec/config.yaml
        4. grep -q "schema: spec-driven" openspec/config.yaml
      Expected Result: All assertions pass
      Evidence: .omo/evidence/task-4-config-populated.txt
    

    Evidence to Capture:

    • task-4-config-populated.txt

    Commit: YES (groups with Task 1)

    • Message: chore(openspec): populate config.yaml with project context and rules
    • Files: openspec/config.yaml
  • 5. Add shipped-status metadata to 10 archived folder entries

    What to do:

    • Add frontmatter or status line to each archived folder's proposal.md documenting the shipped version:
      • agent-status-normalize/v2.7.6
      • claude-sdk-sessionstore/v2.7.5
      • contracts-ssot/v2.7.13
      • license-debt-mit/v2.7.0
      • mistake-tracker-file-ledger/v2.7.4
      • orchestrator/v2.7.17
      • sampling-streamjson-tokens/v2.7.3
      • v2-3-provider-lifecycle/v2.5.4v2.5.13
      • v2-6-persistent-agent-sessions/v2.6.4v2.6.8
      • write-edit-robustness/v2.7.1
    • Add line after the ## Why section heading: **Shipped in:** \v2.7.6-agent-status-normalize`` (or equivalent)

    Must NOT do:

    • Do NOT change the body of the proposal beyond the shipped annotation
    • Do NOT add shipped annotations to the 2026-06-07 batches (they're not shipped)

    Recommended Agent Profile:

    • Category: quick
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Tasks 1-4)
    • Blocks: F1-F4
    • Blocked By: None

    References:

    • Git tags: v2.7.0-mit, v2.7.1-write-edit-robustness, etc.

    Acceptance Criteria:

    • All 10 archived batch proposals contain "Shipped in:" referencing a git tag
    • grep -r "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l = 10

    QA Scenarios:

    Scenario: All archived batches annotated
      Tool: Bash
      Preconditions: Files edited
      Steps:
        1. grep -rl "Shipped in:" openspec/changes/archived/*/proposal.md | wc -l
        2. Assert: exactly 10 files contain "Shipped in:"
      Expected Result: 10 files annotated
      Evidence: .omo/evidence/task-5-shipped-annotations.txt
    

    Evidence to Capture:

    • task-5-shipped-annotations.txt

    Commit: YES (groups with Task 1)

    • Message: chore(openspec): add shipped-in version annotations to 10 archived batch proposals
    • Files: openspec/changes/archived/*/proposal.md

TODOs (Wave 2)

  • 6. llama-cache-and-spec — Enable KV cache quantization + ngram speculative decoding

    What to do:

    • Edit apps/server/src/services/inference/providers/llama.ts (or the llama args validator llama-args-validator.ts) to allow --cache-type-k q4_0 and --spec-type ngram-mod through the shadowing lists
    • Change the base llama-server args to include:
      • --cache-type-k q4_0 (4-bit KV cache, ~4× VRAM reduction)
      • --spec-type ngram-mod (ngram speculative decoding, 2-3× tok/s on code)
    • Verify the sidecar validator (sidecar/validator.go) also allows these flags through
    • Read apps/server/src/services/inference/llama-args-validator.ts and sidecar/validator.go to understand the current blocklist
    • Add the two flags to the allowlist instead of the shadow list
    • Update the sidecar Dockerfile or config if needed

    Must NOT do:

    • Do NOT change any other llama-server args
    • Do NOT enable KV cache quantization for Q8_0 or Q3_K (only Q4_0)
    • Do NOT add a separate draft model (ngram is self-contained)

    Recommended Agent Profile:

    • Category: unspecified-high
    • Skills: []
    • Justification: Requires understanding llama.cpp arg validation across two codebases

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 7-8)
    • Blocks: F1-F4
    • Blocked By: Task 1-5 (Wave 1)

    References:

    • apps/server/src/services/inference/llama-args-validator.ts — current arg blocklist/allowlist
    • sidecar/validator.go — sidecar validation (if exists)
    • docker-compose.yml or sidecar Dockerfile — restart config
    • openspec/changes/llama-cache-and-spec/proposal.md — full spec

    Acceptance Criteria:

    • --cache-type-k q4_0 present in llama-server args after restart
    • --spec-type ngram-mod present in llama-server args after restart
    • llama-server starts without errors
    • Inference still works (send test message)

    QA Scenarios:

    Scenario: KV cache quantization enabled
      Tool: Bash
      Preconditions: Server restarted after changes
      Steps:
        1. ps aux | grep llama-server | grep -o "cache-type-k q4_0"
        2. Assert: output matches "q4_0"
      Expected Result: KV cache quantization is active
      Evidence: .omo/evidence/task-6-kv-cache-enabled.txt
    
    Scenario: Speculative decoding enabled
      Tool: Bash
      Preconditions: Server restarted
      Steps:
        1. ps aux | grep llama-server | grep -o "spec-type ngram-mod"
        2. Assert: output matches "ngram-mod"
      Expected Result: Ngram speculative decoding is active
      Evidence: .omo/evidence/task-6-ngram-enabled.txt
    
    Scenario: Inference still works
      Tool: Bash (curl)
      Preconditions: Server running with new flags
      Steps:
        1. curl -s -o /dev/null -w "%{http_code}" http://100.114.205.53:9500/api/health
        2. Assert: HTTP 200
      Expected Result: Server is healthy and serving
      Evidence: .omo/evidence/task-6-health-check.txt
    

    Evidence to Capture:

    • task-6-kv-cache-enabled.txt — grep output showing the flag
    • task-6-ngram-enabled.txt — grep output showing the flag
    • task-6-health-check.txt — health check confirmation

    Commit: YES

    • Message: perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding
    • Files: apps/server/src/services/inference/llama-args-validator.ts, sidecar/validator.go (if needed)
  • 7. pty-enhancements — PTY exit notifications + session metadata

    What to do:

    • Add notifyOnExit support to the PTY session manager (likely in apps/booterm/)
    • When a PTY process exits AND notifyOnExit was set:
      • Emit an event/message to the agent channel with: session ID, title, exit code, total output lines, last line of output
    • Add session metadata fields: agent ID that spawned it, task ID, optional title
    • Add pty_list endpoint that returns metadata for all sessions
    • Wire X-Agent-Flags header support for agent identification
    • Read apps/booterm/ to understand the current PTY architecture

    Must NOT do:

    • Do NOT change the existing pty_spawn interface (add notifyOnExit as optional param)
    • Do NOT implement sandbox or circuit breaker (out of scope for this wave)
    • Do NOT add new database tables (metadata lives in-memory or in existing session store)

    Recommended Agent Profile:

    • Category: unspecified-high
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 6, 8)
    • Blocks: F1-F4
    • Blocked By: Task 1-5 (Wave 1)

    References:

    • apps/booterm/src/ — PTY session management code
    • apps/coder/src/services/ — agent dispatch that spawns PTYs
    • openspec/changes/pty-enhancements/proposal.md — full spec
    • apps/server/src/services/inference/ — inference pipeline that may need to handle notifications

    Acceptance Criteria:

    • notifyOnExit optional parameter on pty_spawn works
    • On process exit with notifyOnExit=true, agent receives notification
    • pty_list returns session metadata
    • X-Agent-Flags header is recognized

    QA Scenarios:

    Scenario: notifyOnExit triggers notification
      Tool: Bash + tmux
      Preconditions: booterm running
      Steps:
        1. Start a short PTY with notifyOnExit=true: sleep 1
        2. Wait 2 seconds for completion
        3. Check notification was delivered
      Expected Result: Exit notification received with title, exit code, last line
      Evidence: .omo/evidence/task-7-notify-on-exit.txt
    
    Scenario: pty_list shows metadata
      Tool: Bash (curl)
      Preconditions: PTY sessions exist
      Steps:
        1. curl http://localhost:9501/api/pty/list 2>/dev/null
        2. Assert: response contains session metadata fields
      Expected Result: Metadata returned for each session
      Evidence: .omo/evidence/task-7-pty-list.txt
    

    Evidence to Capture:

    • task-7-notify-on-exit.txt — notification evidence
    • task-7-pty-list.txt — pty_list response

    Commit: YES

    • Message: feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags
    • Files: apps/booterm/src/.ts, apps/coder/src/services/.ts
  • 8. token-analyzer-ui — Backend API endpoints for token analytics

    What to do:

    • Add read-only API endpoints to serve aggregate token data:
      • GET /api/coder/token-analytics/sessions — per-session token usage (input, output, cost)
      • GET /api/coder/token-analytics/tools — per-tool cost breakdown (from tool_cost_stats view)
      • GET /api/coder/token-analytics/trends — token usage over time
    • Reuse existing data sources:
      • agent_sessions.input_tokens, agent_sessions.output_tokens, agent_sessions.cost
      • tool_cost_stats view (per-tool 100-call rolling window)
      • tasks.token_breakdown JSONB column
    • Implement in apps/coder/src/routes/ (follow existing route patterns)
    • Add proper error handling, pagination for large result sets, and date filtering

    Must NOT do:

    • Do NOT create new database tables or migrations
    • Do NOT add token tracking logic (data is already accumulated)
    • Do NOT add real-time streaming (data is historical aggregate)

    Recommended Agent Profile:

    • Category: unspecified-high
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 6-7)
    • Blocks: Task 10 (frontend depends on backend)
    • Blocked By: Task 1-5 (Wave 1)

    References:

    • apps/coder/src/routes/ — existing route patterns
    • apps/server/src/schema.sqltool_cost_stats view definition
    • apps/coder/CLAUDE.md — coder conventions, route registration
    • packages/contracts/ — shared types for response schemas
    • openspec/changes/token-analyzer-ui/proposal.md — full spec

    Acceptance Criteria:

    • GET /api/coder/token-analytics/sessions?project_id=X returns 200 with token data
    • GET /api/coder/token-analytics/tools?project_id=X returns 200 with tool breakdown
    • GET /api/coder/token-analytics/trends?project_id=X returns 200 with trend data
    • All endpoints respect project_id filtering
    • Empty data returns valid empty arrays (not errors)

    QA Scenarios:

    Scenario: Sessions endpoint works
      Tool: Bash (curl)
      Preconditions: Server running, project exists
      Steps:
        1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1"
        2. Assert: HTTP 200
        3. Assert: response is valid JSON with expected fields
      Expected Result: Session token data returned
      Evidence: .omo/evidence/task-8-sessions-endpoint.txt
    
    Scenario: Empty data returns valid response
      Tool: Bash (curl)
      Preconditions: Server running
      Steps:
        1. curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=999"
        2. Assert: HTTP 200
        3. Assert: response contains empty array (not error)
      Expected Result: Graceful empty state
      Evidence: .omo/evidence/task-8-empty-data.txt
    

    Evidence to Capture:

    • task-8-sessions-endpoint.txt — successful API response
    • task-8-empty-data.txt — graceful empty handling

    Commit: YES

    • Message: feat(coder): add token-analytics API endpoints for session/tool/trend data
    • Files: apps/coder/src/routes/token-analytics.ts, apps/coder/src/services/token-analytics.ts

TODOs (Wave 3)

  • 9. results-page — /results route for orchestrator runs + arena battles

    What to do:

    • Add sidebar nav button with ScrollText icon (lucide-react), above the Token Analytics button
    • Create new /results route page with two tabs:
      • "Analysis Runs" — list orchestrator flow runs (research, code-review, investigate, etc.)
      • "Arena Battles" — list battle history
    • Each tab shows: status dot, name/type, band/battle-type, model, timing, error indicator
    • Completed runs show "View Report" link; completed battles show "View Analysis"
    • Uses existing API endpoints (no backend changes needed):
      • GET /api/coder/runs?project_id=X
      • GET /api/coder/battles?project_id=X
    • Requires project_id context — load from sidebar on mount, or show project selector
    • Follow existing route patterns in web (React Router routes, lazy loading)

    Must NOT do:

    • Do NOT create new API endpoints
    • Do NOT modify existing API contracts
    • Do NOT add pagination beyond what the API already provides
    • Do NOT add real-time updates (static list, refreshed on mount)

    Recommended Agent Profile:

    • Category: visual-engineering
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 10-11)
    • Blocks: F1-F4
    • Blocked By: Task 1-5 (Wave 1)

    References:

    • apps/web/src/routes/ — existing route patterns (analytics, settings)
    • apps/web/src/components/sidebar/ — nav button patterns
    • apps/web/src/api/ — existing API client
    • openspec/changes/results-page/proposal.md — full spec
    • apps/coder/src/routes/runs.ts — runs endpoint
    • apps/coder/src/routes/battles.ts — battles endpoint

    Acceptance Criteria:

    • Sidebar shows "Results" button with ScrollText icon above Token Analytics
    • Clicking navigates to /results
    • "Analysis Runs" tab loads and displays orchestrator flow history
    • "Arena Battles" tab loads and displays battle history
    • Completed runs show "View Report" link
    • Empty state shown when no data
    • Error state shown on API failure

    QA Scenarios:

    Scenario: Nav button renders
      Tool: Playwright
      Preconditions: Web app loaded
      Steps:
        1. Navigate to /
        2. Look for sidebar nav button with text "Results"
        3. Assert: button exists and links to /results
      Expected Result: Results nav button present
      Evidence: .omo/evidence/task-9-nav-button.png
    
    Scenario: Results page loads
      Tool: Playwright
      Preconditions: Web app loaded, project exists
      Steps:
        1. Navigate to /results
        2. Wait for "Analysis Runs" tab to appear
        3. Assert: tab shows list of runs or empty state
      Expected Result: Page loads with data
      Evidence: .omo/evidence/task-9-results-page.png
    

    Evidence to Capture:

    • task-9-nav-button.png — screenshot of sidebar with Results button
    • task-9-results-page.png — screenshot of /results page with data

    Commit: YES

    • Message: feat(web): add /results page for orchestrator runs and arena battle history
    • Files: apps/web/src/routes/results.tsx, apps/web/src/components/sidebar/*.tsx
  • 10. token-analyzer-ui — /analytics dashboard route

    What to do:

    • Add sidebar nav button with appropriate icon, above Settings button
    • Create new /analytics route page showing token usage dashboard:
      • Aggregate token usage across sessions (total input/output tokens)
      • Per-tool cost breakdown (bar chart or table)
      • Per-session token history (list or mini chart)
      • Per-provider cost comparison
    • Reuse existing data from the backend endpoints created in Task 8
    • Follow the same route/nav patterns as results-page

    Must NOT do:

    • Do NOT add new charting libraries (use what's already available)
    • Do NOT implement real-time updates

    Recommended Agent Profile:

    • Category: visual-engineering
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 9, 11)
    • Blocks: F1-F4
    • Blocked By: Tasks 1-5 (Wave 1), Task 8 (backend endpoints)

    References:

    • Same as Task 9 + Task 8 endpoints
    • openspec/changes/token-analyzer-ui/proposal.md — full spec
    • apps/web/src/components/ — existing chart/list components

    Acceptance Criteria:

    • Sidebar shows "Token Analytics" button above Settings
    • /analytics loads and shows token dashboard
    • Per-session, per-tool, per-provider breakdowns visible
    • Empty state shown when no data

    QA Scenarios:

    Scenario: Token Analytics nav button renders
      Tool: Playwright
      Preconditions: Web app loaded
      Steps:
        1. Navigate to /
        2. Look for "Token Analytics" button in sidebar
        3. Assert: button exists above Settings
      Expected Result: Nav button present
      Evidence: .omo/evidence/task-10-nav-button.png
    
    Scenario: Analytics dashboard loads
      Tool: Playwright
      Preconditions: Web app loaded
      Steps:
        1. Navigate to /analytics
        2. Wait for dashboard content to render
        3. Assert: token usage data is visible
      Expected Result: Dashboard shows data
      Evidence: .omo/evidence/task-10-analytics-dashboard.png
    

    Evidence to Capture:

    • task-10-nav-button.png
    • task-10-analytics-dashboard.png

    Commit: YES

    • Message: feat(web): add /analytics route for token usage dashboard
    • Files: apps/web/src/routes/analytics.tsx, apps/web/src/components/sidebar/*.tsx
  • 11. enhanced-file-panel — Side-by-side diff, hide whitespace, wrap lines, expand/collapse all

    What to do:

    • Add side-by-side diff toggle to the Git diff tab in the file panel
    • Add "Hide whitespace" checkbox that filters whitespace-only changes
    • Add "Wrap long lines" toggle for diff display
    • Add "Expand All" / "Collapse All" buttons for file-level diffs
    • Implement in apps/web/src/components/ following existing file panel patterns
    • Read apps/web/src/components/ to find the existing diff rendering components

    Must NOT do:

    • Do NOT implement inline diff comments (deferred)
    • Do NOT implement in-browser file editing (deferred)
    • Do NOT change the backend diff generation logic

    Recommended Agent Profile:

    • Category: visual-engineering
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 9-10)
    • Blocks: F1-F4
    • Blocked By: Task 1-5 (Wave 1)

    References:

    • apps/web/src/components/ — existing file panel and diff components
    • apps/web/src/hooks/ — hooks for diff state management
    • openspec/changes/enhanced-file-panel/proposal.md — full spec
    • apps/server/src/routes/projects.ts — git diff backend route

    Acceptance Criteria:

    • Side-by-side diff toggles correctly
    • Hide whitespace checkbox filters whitespace changes
    • Wrap long lines toggle works
    • Expand/Collapse All buttons toggle all files
    • All changes are frontend-only (no new API calls)

    QA Scenarios:

    Scenario: Side-by-side diff renders
      Tool: Playwright
      Preconditions: Repo with uncommitted changes
      Steps:
        1. Open file panel
        2. Click Git tab
        3. Toggle side-by-side view
        4. Assert: diff renders in two columns
      Expected Result: Side-by-side diff visible
      Evidence: .omo/evidence/task-11-side-by-side.png
    
    Scenario: Hide whitespace works
      Tool: Playwright
      Preconditions: Diff has whitespace changes
      Steps:
        1. Open diff with whitespace changes
        2. Check "Hide whitespace"
        3. Assert: only-whitespace hunks hidden
      Expected Result: Whitespace-only changes filtered
      Evidence: .omo/evidence/task-11-hide-whitespace.png
    
    Scenario: Expand/Collapse All toggles
      Tool: Playwright
      Preconditions: Multiple files changed
      Steps:
        1. Click "Collapse All"
        2. Assert: all files collapsed to summary
        3. Click "Expand All"
        4. Assert: all files expanded
      Expected Result: Bulk toggle works
      Evidence: .omo/evidence/task-11-expand-collapse.png
    

    Evidence to Capture:

    • task-11-side-by-side.png
    • task-11-hide-whitespace.png
    • task-11-expand-collapse.png

    Commit: YES

    • Message: feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse all
    • Files: apps/web/src/components/.tsx, apps/web/src/hooks/.ts

Final Verification Wave

  • F1. Plan Compliance Auditoracle Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, curl endpoint, run command). For each "Must NOT Have": search codebase for forbidden patterns — reject with file:line if found. Check evidence files exist in .omo/evidence/. Compare deliverables against plan. Output: Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT

  • F2. Code Quality Reviewunspecified-high Run tsc --noEmit for any changed apps + bun test. Review all changed files for: as any/@ts-ignore, empty catches, console.log in prod, commented-out code, unused imports. Output: Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT

  • F3. Real Manual QAunspecified-high Start from clean state. Execute EVERY QA scenario from EVERY task — follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, missing project_id. Save to .omo/evidence/final-qa/. Output: Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT

  • F4. Scope Fidelity Checkdeep For each task: read "What to do", read actual diff. Verify 1:1 — everything in scope was built (no missing), nothing beyond scope was built (no creep). Check "Must NOT do" compliance. Output: Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT


Commit Strategy

  • 1-5 (grouped): chore(openspec): cleanup openspec folder structure — delete stubs, move proposals, add metadata, populate config
  • 6: perf(llama): enable KV cache quantization (q4_0) + ngram speculative decoding
  • 7: feat(booterm): PTY exit notifications + session metadata + X-Agent-Flags
  • 8: feat(coder): add token-analytics API endpoints
  • 9: feat(web): add /results page for orchestrator runs + arena battles
  • 10: feat(web): add /analytics token usage dashboard
  • 11: feat(web): enhanced file panel — side-by-side diff, hide whitespace, wrap lines, expand/collapse

Success Criteria

Verification Commands

# OpenSpec cleanup
test ! -f openspec/changes/archived/v1.13.12-skills-audit.md
test -d openspec/changes/boocontext/
test -f openspec/changes/enhanced-file-panel/.openspec.yaml
grep -q "context:" openspec/config.yaml

# llama-cache-and-spec
ps aux | grep llama-server | grep -o "cache-type-k q4_0"
ps aux | grep llama-server | grep -o "spec-type ngram-mod"

# PTY enhancements
curl -s http://localhost:9501/api/pty/list | jq '.'

# Results page
curl -s "http://localhost:3000/api/coder/runs?project_id=1" | jq '.'

# Token analytics
curl -s "http://localhost:3000/api/coder/token-analytics/sessions?project_id=1" | jq '.'

# Enhanced file panel
# (visual verification via Playwright)

Final Checklist

  • 11 stub files deleted from archived/
  • 5 misplaced proposals moved/merged into changes/
  • 6 .openspec.yaml files added
  • config.yaml populated with context + rules
  • 10 archived proposals annotated with shipped versions
  • llama-server running with KV cache Q4_0 + ngram
  • PTY exit notifications working
  • /results page renders and loads data
  • /analytics page renders and loads data
  • Side-by-side diff, hide whitespace, wrap lines, expand/collapse all working
  • All type checks pass
  • All QA scenarios pass