chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/design.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/design.md
@@ -0,0 +1,76 @@
+## Context
+
+boocode currently has no persistent session management for its agents (the persona agents in data/AGENTS.md). When a session is interrupted, there's no recoverable audit trail, no way to detect repeated mistakes, and no mechanism to enforce learned behavioral guidelines across sessions.
+
+audit-harness provides: hooks (PostToolUse buffer→Stop flush→UserPromptSubmit injection), skills (/start→/end→/recover→/report-daily), and a Python core (AuditContext) with unified index schema.
+
+Parlant provides: GuidelineDocumentStore (versioned, tag/label filtered), JourneyStore (graph-based SOPs), and JourneyGuidelineProjection (node→guideline auto-conversion).
+
+This design ports the high-value subset of both into boocode as agent-facing skills and a TypeScript core library.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Define `.boo/runs/` directory convention with auto-creation and `.gitignore`
+- Port /start, /end, /recover, /report-daily as boocode skills (markdown)
+- Port user_correction record format and detection
+- Port GuidelineDocumentStore from Parlant as TypeScript service
+- Port Journey → guideline auto-projection (node→guideline conversion)
+- Implement guideline find_guideline() by content match
+- All features opt-in, zero breaking changes
+
+**Non-Goals:**
+- AuditContext full Python class port (environment snapshots, anomaly lambdas)
+- Hooks implementation (PostToolUse/Stop/UserPromptSubmit) — separate batch
+- Parlant's vector DB / embedder infrastructure
+- Parlant's relationship resolver (ARQ)
+- Web UI for guideline management — CLI/skill-only
+
+## Decisions
+
+### Decision 1: Skill-based commands over CLI tools
+
+**Choice**: Implement /start, /end, /recover, /report-daily as skill markdown files in `data/skills/boocode/`, following the existing `committing-changes` pattern.
+**Rationale**: boocode agents already load skills from this path. Adding a new skill is zero code change to the agent runtime — just a new markdown file with YAML frontmatter. CLI tools would require new API routes, dispatch logic, and frontend work.
+**Alternatives considered**: Fastify API routes (rejected — too heavy for agent-facing commands), shell scripts (rejected — platform-specific).
+
+### Decision 2: JSONL buffer + index.json
+
+**Choice**: Port audit-harness's file layout exactly: `audit_buffer.jsonl` for live writes, `audit_pending.jsonl` for agent-authored AUDIT blocks, per-session `audit_trail.jsonl` for flushed records, `index.json` for cross-session metadata.
+**Rationale**: audit-harness has production-miles with this layout. JSONL is grep-able, append-only, and needs no DB connection.
+**Alternatives considered**: Postgres (rejected — agents don't all have DB access), SQLite (rejected — adds a native dep).
+
+### Decision 3: GUID-based session IDs
+
+**Choice**: `adhoc_YYYYMMDD_HHMM` format for session IDs, matching audit-harness pattern.
+**Rationale**: Human-readable, sort-able, no collision risk within the same second.
+
+### Decision 4: File-based GuidelineStore
+
+**Choice**: Port GuidelineDocumentStore's abstract interface (create/list/read/update/delete/find) but use filesystem JSON storage instead of Parlant's DocumentDatabase.
+**Rationale**: boocode doesn't have Parlant's document DB abstraction. A JSON-file store is simpler and sufficient for single-user operation. The interface stays the same, so a future Postgres backend can be swapped in.
+**Alternatives considered**: Postgres backend (rejected — adds coupling), in-memory only (rejected — no persistence).
+
+### Decision 5: Journey → guideline projection as pure function
+
+**Choice**: Port `JourneyGuidelineProjection` as a pure function (not a class). Takes a Journey + its nodes/edges, returns Guideline[].
+**Rationale**: The projection logic (DFS traversal, node→guideline conversion, edge metadata grafting) is deterministic and has no side effects. A pure function is simpler to test and compose.
+**Alternatives considered**: Class with JourneyStore dependency (rejected — unnecessary indirection for our use case).
+
+## Risks / Trade-offs
+
+- **[Risk]** Skills grow stale if agent runtime doesn't load them → **Mitigation**: Test with existing agent by loading skill explicitly.
+- **[Risk]** JSONL file contention from multiple agents → **Mitigation**: Single-user homelab. Acceptable.
+- **[Risk]** GuidelineStore JSON files grow unbounded → **Mitigation**: TBD — add compaction/archival in future batch.
+- **[Trade-off]** File storage is simple but doesn't scale to multi-user → Acceptable for single-user.
+
+## Migration / Rollout
+
+1. Create openspec spec files (proposal/design/tasks/specs)
+2. Create `.boo/runs/` directory structure (service)
+3. Create 4 skill files in `data/skills/boocode/`
+4. Create core AuditContext TypeScript service
+5. Create GuidelineStore + Journey service
+6. Create user_correction utilities
+7. Update data/AGENTS.md with new agents
+8. Test with skill invocation
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/proposal.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/proposal.md
@@ -0,0 +1,23 @@
+## Why
+
+The audit-harness (hooks + skills + AuditContext) and Parlant (GuidelineStore + Journey engine) provide two proven patterns for agent session management. audit-harness solves context-window loss through persistent audit trails, graded recovery, and structured commands (/start → /end → /recover → /report-daily). Parlant solves behavioral consistency through a versioned guideline document store with tag/label-based retrieval, journey-based SOPs, and backtrack detection.
+
+Porting these patterns into boocode's agent ecosystem gives every agent working in this repo persistent session management, cross-session user correction awareness, and behavioral guideline enforcement — without building any of it from scratch.
+
+## What Changes
+
+### New Capabilities
+
+- **Data Directory Convention**: `.boo/runs/` directory with buffer files, session dirs, `.current_session` handshake, unified `index.json`. `AUDIT_DOT_DIR` env var for platform override.
+
+- **Session Lifecycle Commands**: `/start` creates named audit sessions with auto-recovery (L0+L2). `/end` flushes buffers, runs integrity checks, generates `session_summary.md`. `/recover` graded context loading (L0–L3). `/report-daily` aggregates all sessions into a 7-section report; `/report-daily review` also runs morning self-review.
+
+- **User Correction Tracking**: Structured `user_correction` records with `original_claim`/`correction`/`principle_extracted`/`persisted_to`. Auto-detected on `/end`. Correction-as-precedent enforcement when agent actions contradict prior corrections.
+
+- **Behavioral Guidelines Store**: Versioned GuidelineDocumentStore ported from Parlant with condition+action+description content model, tag/label filtering, and content-based `find_guideline()`. Journey → guideline auto-projection (SOP nodes → guidelines with follow-up edges). Journey backtrack detection batch.
+
+### Dependencies
+
+- Existing audit-harness patterns (audit-context.py, hooks, skills) reference implementation.
+- Parlant's GuidelineStore (guidelines.py) and JourneyStore (journeys.py) reference implementation.
+- No new external services. File-based JSONL storage (audit-harness pattern).
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/behavioral-guidelines/spec.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/behavioral-guidelines/spec.md
@@ -0,0 +1,80 @@
+# Behavioral Guidelines Store — Spec
+
+## Guideline Entity
+
+```typescript
+interface GuidelineContent {
+  condition: string;     // When...
+  action: string | null; // Then...
+  description: string | null;
+}
+
+interface Guideline {
+  id: string;
+  creationUtc: string;
+  content: GuidelineContent;
+  enabled: boolean;
+  tags: string[];
+  labels: string[];
+  metadata: Record<string, unknown>;
+  criticality: "low" | "medium" | "high";
+  title: string | null;
+  priority: number;
+}
+```
+
+## GuidelineDocumentStore
+
+File-based JSON store at `.boo/guidelines/`. Versioned with migration support.
+
+Methods:
+- `createGuideline(condition, action?, description?, ...) → Guideline`
+- `listGuidelines(tags?, labels?) → Guideline[]`
+- `readGuideline(id) → Guideline`
+- `updateGuideline(id, params) → Guideline`
+- `deleteGuideline(id) → void`
+- `findGuideline(content: {condition, action?}) → Guideline`
+
+Version migration chain (port from Parlant v0.1.0 → v0.11.0):
+- v0.1.0 → v0.2.0: add enabled field
+- v0.2.0 → v0.3.0: remove guideline_set (migration script only)
+- v0.3.0 → v0.4.0: add optional action, description, metadata
+- v0.4.0 → v0.5.0: description as optional
+- v0.5.0 → v0.6.0: add criticality (default "medium")
+- v0.6.0 → v0.7.0: add composition_mode (optional)
+- v0.7.0 → v0.8.0: add track (default true)
+- v0.8.0 → v0.9.0: add labels (default empty)
+- v0.9.0 → v0.10.0: add priority (default 0)
+- v0.10.0 → v0.11.0: add title (default null)
+
+## Tag & Label Filtering
+
+- `listGuidelines({tags: ["tag1"]})` → guidelines with ANY of the specified tags
+- `listGuidelines({labels: ["label1"]})` → guidelines with ALL specified labels (subset match)
+- Combined: both filters apply (intersection)
+
+## Journey → Guideline Projection
+
+Port of Parlant's `JourneyGuidelineProjection.project_journey_to_guidelines()`:
+
+- DFS traversal of Journey nodes from root
+- Each (edge, node) pair → one Guideline
+- Edge condition becomes guideline condition
+- Node action becomes guideline action
+- Edge/node metadata merged into guideline metadata with journey_node key
+- follow_ups list populated with downstream guideline IDs
+- BFS queue avoids infinite loops via visited set
+
+## Journey Backtrack Detection
+
+```typescript
+interface BacktrackCheck {
+  journeyId: string;
+  currentNodeId: string;
+  previousNodeId: string;
+  isBacktrack: boolean;
+  recommendation: string | null;
+}
+```
+
+Scans the edge list for source→target relationships. If the agent's current step has an edge back to a previously visited node (and that node is not in a forward path from current), it's flagged as a backtrack regression.
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/session-lifecycle-commands/spec.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/session-lifecycle-commands/spec.md
@@ -0,0 +1,88 @@
+# Session Lifecycle Commands — Spec
+
+## Overview
+
+Four agent-invocable commands that manage audit session lifecycle. Each command is a skill markdown file loaded by the agent on invocation.
+
+## /start
+
+```
+/start "task description"
+```
+
+Creates a named audit session:
+
+1. Generate `session_id = adhoc_YYYYMMDD_HHMM`
+2. `mkdir -p .boo/runs/{session_id}`
+3. Write `session.json`:
+   ```json
+   {
+     "session_id": "adhoc_20260320_1400",
+     "task": "task description",
+     "start_time": "2026-03-20T14:00:00Z",
+     "status": "in_progress",
+     "expected_record_types": ["data", "change", "conversation"]
+   }
+   ```
+4. Write `.boo/runs/.current_session` containing session_id (handshake for hooks)
+5. Run context recovery:
+   - L0: read `index.json` → last 5 entries
+   - L2: scan recent audit_trail.jsonl for `user_correction` records
+6. Output recovery summary: recent activity, corrections, priorities
+7. Check for unfinished sessions: scan for `status: "in_progress"` sessions, prompt user
+
+## /end
+
+```
+/end
+```
+
+Ends the current audit session:
+
+1. Read `.current_session` → get session_id
+2. Collect remaining buffer data from `audit_buffer.jsonl` + `audit_pending.jsonl`
+3. Append to `audit_trail.jsonl`
+4. Clear buffer files
+5. Extract `user_correction` records from audit_trail
+6. Run integrity checks:
+   - Has records? (>0 audit_trail lines)
+   - All files covered? (changes in audit_trail match modified files)
+   - Corrections persisted? (persisted_to is non-empty)
+7. Generate `session_summary.md`
+8. Update `session.json` status=completed, end_time
+9. Clear `.current_session`
+
+## /recover
+
+```
+/recover              # L0+L1+L2
+/recover full         # L3 (full audit_trail)
+/recover {session_id} # load specific session
+```
+
+Graded context loading:
+
+- L0 (~200t): index.json → last 5 entries (id, task, status)
+- L1 (~500t): .current_session + session.json + last 3 audit_trail entries
+- L2 (~1000t): scan all audit_trails for user_correction records + conclusions + daily report §4+§6
+- L3 (~3000t): full audit_trail.jsonl + audit_pending.jsonl
+
+## /report-daily
+
+```
+/report-daily              # today
+/report-daily 20260319     # specific date
+/report-daily review       # + morning self-review
+```
+
+7-section report:
+
+1. Task overview (from index.json)
+2. Operation stats (tool counts)
+3. Change records (file modifications)
+4. User feedback & corrections
+5. Anomaly alerts
+6. Backlog tracking
+7. Integrity summary
+
+`review` variant: adds morning self-review with trend analysis and recommended priorities.
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/user-correction-tracking/spec.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/specs/user-correction-tracking/spec.md
@@ -0,0 +1,42 @@
+# User Correction Tracking — Spec
+
+## Record Schema
+
+```typescript
+interface UserCorrectionRecord {
+  record_type: "conversation";
+  action_type: "user_correction";
+  priority: "critical_for_recovery";
+  timestamp: string;        // ISO 8601
+  original_claim: string;   // what the agent said that was wrong
+  correction: string;       // what the user corrected it to
+  principle_extracted: string;  // general principle derived from this correction
+  persisted_to: string[];   // files where this correction was documented
+}
+```
+
+## Storage
+
+User correction records are stored inline in `audit_trail.jsonl` as regular entries. They are extracted during `/end` and surfaced during `/recover` L2 loading.
+
+## Detection
+
+During `/end`, scan the session's `audit_trail.jsonl` for entries matching:
+- `action_type === "user_correction"`
+
+Also scan `audit_pending.jsonl` for any pending correction records not yet flushed.
+
+## persisted_to Field
+
+When a correction is written to CLAUDE.md, coding standards, or other documentation, the file paths are recorded in `persisted_to[]`. This is populated manually by the agent when it persists the correction.
+
+## Correction-as-Precedent
+
+When an agent considers an action that contradicts a known `user_correction` record, it is flagged with a warning. The agent should:
+
+1. Identify the contradiction (which rule is being violated)
+2. Surface the relevant correction record (with timestamp and original context)
+3. Propose an alternative that respects the correction
+4. If the contradiction is intentional, document why as a new correction
+
+Detection logic: before each significant action, the agent scans loaded user_correction records from the current recovery context and checks if the proposed action matches any known `original_claim` pattern.
--- a/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/tasks.md
+++ b/openspec/changes/archived/2026-06-07-port-audit-parlant-patterns/tasks.md
@@ -0,0 +1,39 @@
+# port-audit-parlant-patterns — Implementation Complete
+
+## boocontext (TypeScript) — src/audit/
+- [x] 1. Data Dir: `dotDir()`, `findRunsDir()`, `ensureRunsDir()` with .gitignore + AUDIT_DOT_DIR
+- [x] 2. Core Types: `RecordEntry`, `CompactRecord`, `Manifest`, `UserCorrectionRecord`, `SessionJson`, `SessionSummary`
+- [x] 3. Hash Utilities: `hashFile()`, `hashBytes()`, `hashDir()` via Node crypto SHA256
+- [x] 4. Anomaly: `AlertRule`, `Anomaly`, `checkAnomalies()` with default rules
+- [x] 5. AuditContext: `createBatchContext()` -> `record()` -> `recordCompact()` -> `finalize()` -> `save()` (writes manifest, trail, compact, anomalies, checksums, index)
+- [x] 6. AmbientContext: `AsyncLocalStorage` wrapper — `runWithAmbient()`, `getAmbientSession()`, `requireAmbientSession()`
+- [x] 7. Guideline Model: `GuidelineContent`, `Guideline`, `GuidelineStore`, `InMemoryGuidelineStore` with CRUD + tag/label filters
+- [x] 8. Guideline Matching: `MatchingContext`, `MatchingBatch` (Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality), `GenericGuidelineMatchingStrategy`, retry policy
+- [x] 9. ARQ Generation: `SchematicGenerator`, typed output schemas per batch, `GenerationInfo` tracking, `createExecutionPlan()` with batch-parallel
+- [x] 10. Relationship Model: `RelationshipKind` (DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES), `FileRelationshipStore`
+- [x] 11. Relational Resolver: 4-step iteration loop (deps -> prioritization -> priority -> entailment), `MAX_ITERATIONS=100`, `ResolutionKind` output
+- [x] 12. Graded Recovery: `recoverL0()`–`recoverL4()`, `scanUserCorrections()`, `formatRecoveryReport()` with source attribution
+- [x] 13. User Corrections: `detectCorrections()`, `addPersistedTarget()`, `findRelatedCorrections()`, `checkContradiction()`
+- [x] 14. Index: `readIndex()`, `writeIndex()` with atomic `.tmp` + `renameSync`
+- [x] 15. MCP Tools: `boocontext_audit_index` + `boocontext_audit_recover` registered in mcp-server.ts
+- [x] 16. Typecheck: `npx tsc --noEmit` passes clean
+
+## codecontext (Go) — internal/audit/ + internal/mcp/
+- [x] 1. Record Types: `RecordEntry`, `CompactRecord`, `RecordStep`/`RecordAction` enums (pre-existing)
+- [x] 2. Index: `UpdateIndexEntry()` with idempotent upsert, `IndexEntry` schema, atomic `.tmp` + `os.Rename()` (pre-existing)
+- [x] 3. Hashchain: `HashFile()`, `HashBytes()`, `HashDir()`, `VerifyHashchain()` with `HashchainVerificationError` (pre-existing)
+- [x] 4. Directory: `DotDir()`, `RunsDir()`, `EnsureRunsDir()` with .gitignore + `AUDIT_DOT_DIR` (pre-existing)
+- [x] 5. Anomaly: `AlertRule`, `Anomaly`, `Manifest` types + `CheckAnomalies()` with condition evaluation (pre-existing stub, now evaluates total_records/error_rate/hash conditions)
+- [x] 6. GenerateChecksums: per-file SHA256 manifest (pre-existing)
+- [x] 7. Session Lifecycle: `SessionLifecycleManager` with `StartSession(task)`, `EndSession()`, `CurrentSession()` — creates adhoc session, writes .current_session, updates index
+- [x] 8. Trail Management: `TrailManager` with `AppendToBuffer()`, `PendingAppend()`, `AppendToTrail()`, `ReadTrail()`, `FlushBuffer()` — auto-generates session if none active
+- [x] 9. MCP Audit Tools: `codecontext_audit_start`, `codecontext_audit_end`, `codecontext_audit_status` in `internal/mcp/audit_tools.go`
+- [x] 10. MCP Middleware Hooks: `recordAuditBuffer()` in server struct, buffer after tool calls, flush on "ready"
+- [x] 11. Build: `go build ./...` passes clean
+
+## boocode (Node.js) — apps/coder/src/services/
+- [x] 1. Session Service (`audit-session.ts`): `startSession()` with L0+L2 recovery, `endSession()` with integrity checks + session_summary.md, `recoverSession()` L0-L3 graded loading, `generateDailyReport()` 7-section report
+- [x] 2. Correction Service (`correction-service.ts`): `recordCorrection()`, `scanForCorrections()`, `checkContradiction()`, `markPersisted()` — JSON store at `.boo/corrections/`
+- [x] 3. Guideline Service (`guideline-service.ts`): `createGuideline()`, `listGuidelines()` with tag/label filters, version migration chain (v0.1.0->v0.11.0), `projectJourneyToGuidelines()` DFS, `checkBacktrack()` — JSON store at `.boo/guidelines/`
+- [x] 4. Skill commands: `command-start/SKILL.md`, `command-end/SKILL.md`, `command-recover/SKILL.md`, `command-report-daily/SKILL.md`
+- [x] 5. Typecheck: `pnpm -C apps/coder typecheck` passes clean