chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions

View File

@@ -0,0 +1,76 @@
## Context
boocode currently has no persistent session management for its agents (the persona agents in data/AGENTS.md). When a session is interrupted, there's no recoverable audit trail, no way to detect repeated mistakes, and no mechanism to enforce learned behavioral guidelines across sessions.
audit-harness provides: hooks (PostToolUse buffer→Stop flush→UserPromptSubmit injection), skills (/start→/end→/recover→/report-daily), and a Python core (AuditContext) with unified index schema.
Parlant provides: GuidelineDocumentStore (versioned, tag/label filtered), JourneyStore (graph-based SOPs), and JourneyGuidelineProjection (node→guideline auto-conversion).
This design ports the high-value subset of both into boocode as agent-facing skills and a TypeScript core library.
## Goals / Non-Goals
**Goals:**
- Define `.boo/runs/` directory convention with auto-creation and `.gitignore`
- Port /start, /end, /recover, /report-daily as boocode skills (markdown)
- Port user_correction record format and detection
- Port GuidelineDocumentStore from Parlant as TypeScript service
- Port Journey → guideline auto-projection (node→guideline conversion)
- Implement guideline find_guideline() by content match
- All features opt-in, zero breaking changes
**Non-Goals:**
- AuditContext full Python class port (environment snapshots, anomaly lambdas)
- Hooks implementation (PostToolUse/Stop/UserPromptSubmit) — separate batch
- Parlant's vector DB / embedder infrastructure
- Parlant's relationship resolver (ARQ)
- Web UI for guideline management — CLI/skill-only
## Decisions
### Decision 1: Skill-based commands over CLI tools
**Choice**: Implement /start, /end, /recover, /report-daily as skill markdown files in `data/skills/boocode/`, following the existing `committing-changes` pattern.
**Rationale**: boocode agents already load skills from this path. Adding a new skill is zero code change to the agent runtime — just a new markdown file with YAML frontmatter. CLI tools would require new API routes, dispatch logic, and frontend work.
**Alternatives considered**: Fastify API routes (rejected — too heavy for agent-facing commands), shell scripts (rejected — platform-specific).
### Decision 2: JSONL buffer + index.json
**Choice**: Port audit-harness's file layout exactly: `audit_buffer.jsonl` for live writes, `audit_pending.jsonl` for agent-authored AUDIT blocks, per-session `audit_trail.jsonl` for flushed records, `index.json` for cross-session metadata.
**Rationale**: audit-harness has production-miles with this layout. JSONL is grep-able, append-only, and needs no DB connection.
**Alternatives considered**: Postgres (rejected — agents don't all have DB access), SQLite (rejected — adds a native dep).
### Decision 3: GUID-based session IDs
**Choice**: `adhoc_YYYYMMDD_HHMM` format for session IDs, matching audit-harness pattern.
**Rationale**: Human-readable, sort-able, no collision risk within the same second.
### Decision 4: File-based GuidelineStore
**Choice**: Port GuidelineDocumentStore's abstract interface (create/list/read/update/delete/find) but use filesystem JSON storage instead of Parlant's DocumentDatabase.
**Rationale**: boocode doesn't have Parlant's document DB abstraction. A JSON-file store is simpler and sufficient for single-user operation. The interface stays the same, so a future Postgres backend can be swapped in.
**Alternatives considered**: Postgres backend (rejected — adds coupling), in-memory only (rejected — no persistence).
### Decision 5: Journey → guideline projection as pure function
**Choice**: Port `JourneyGuidelineProjection` as a pure function (not a class). Takes a Journey + its nodes/edges, returns Guideline[].
**Rationale**: The projection logic (DFS traversal, node→guideline conversion, edge metadata grafting) is deterministic and has no side effects. A pure function is simpler to test and compose.
**Alternatives considered**: Class with JourneyStore dependency (rejected — unnecessary indirection for our use case).
## Risks / Trade-offs
- **[Risk]** Skills grow stale if agent runtime doesn't load them → **Mitigation**: Test with existing agent by loading skill explicitly.
- **[Risk]** JSONL file contention from multiple agents → **Mitigation**: Single-user homelab. Acceptable.
- **[Risk]** GuidelineStore JSON files grow unbounded → **Mitigation**: TBD — add compaction/archival in future batch.
- **[Trade-off]** File storage is simple but doesn't scale to multi-user → Acceptable for single-user.
## Migration / Rollout
1. Create openspec spec files (proposal/design/tasks/specs)
2. Create `.boo/runs/` directory structure (service)
3. Create 4 skill files in `data/skills/boocode/`
4. Create core AuditContext TypeScript service
5. Create GuidelineStore + Journey service
6. Create user_correction utilities
7. Update data/AGENTS.md with new agents
8. Test with skill invocation

View File

@@ -0,0 +1,23 @@
## Why
The audit-harness (hooks + skills + AuditContext) and Parlant (GuidelineStore + Journey engine) provide two proven patterns for agent session management. audit-harness solves context-window loss through persistent audit trails, graded recovery, and structured commands (/start → /end → /recover → /report-daily). Parlant solves behavioral consistency through a versioned guideline document store with tag/label-based retrieval, journey-based SOPs, and backtrack detection.
Porting these patterns into boocode's agent ecosystem gives every agent working in this repo persistent session management, cross-session user correction awareness, and behavioral guideline enforcement — without building any of it from scratch.
## What Changes
### New Capabilities
- **Data Directory Convention**: `.boo/runs/` directory with buffer files, session dirs, `.current_session` handshake, unified `index.json`. `AUDIT_DOT_DIR` env var for platform override.
- **Session Lifecycle Commands**: `/start` creates named audit sessions with auto-recovery (L0+L2). `/end` flushes buffers, runs integrity checks, generates `session_summary.md`. `/recover` graded context loading (L0L3). `/report-daily` aggregates all sessions into a 7-section report; `/report-daily review` also runs morning self-review.
- **User Correction Tracking**: Structured `user_correction` records with `original_claim`/`correction`/`principle_extracted`/`persisted_to`. Auto-detected on `/end`. Correction-as-precedent enforcement when agent actions contradict prior corrections.
- **Behavioral Guidelines Store**: Versioned GuidelineDocumentStore ported from Parlant with condition+action+description content model, tag/label filtering, and content-based `find_guideline()`. Journey → guideline auto-projection (SOP nodes → guidelines with follow-up edges). Journey backtrack detection batch.
### Dependencies
- Existing audit-harness patterns (audit-context.py, hooks, skills) reference implementation.
- Parlant's GuidelineStore (guidelines.py) and JourneyStore (journeys.py) reference implementation.
- No new external services. File-based JSONL storage (audit-harness pattern).

View File

@@ -0,0 +1,80 @@
# Behavioral Guidelines Store — Spec
## Guideline Entity
```typescript
interface GuidelineContent {
condition: string; // When...
action: string | null; // Then...
description: string | null;
}
interface Guideline {
id: string;
creationUtc: string;
content: GuidelineContent;
enabled: boolean;
tags: string[];
labels: string[];
metadata: Record<string, unknown>;
criticality: "low" | "medium" | "high";
title: string | null;
priority: number;
}
```
## GuidelineDocumentStore
File-based JSON store at `.boo/guidelines/`. Versioned with migration support.
Methods:
- `createGuideline(condition, action?, description?, ...) → Guideline`
- `listGuidelines(tags?, labels?) → Guideline[]`
- `readGuideline(id) → Guideline`
- `updateGuideline(id, params) → Guideline`
- `deleteGuideline(id) → void`
- `findGuideline(content: {condition, action?}) → Guideline`
Version migration chain (port from Parlant v0.1.0 → v0.11.0):
- v0.1.0 → v0.2.0: add enabled field
- v0.2.0 → v0.3.0: remove guideline_set (migration script only)
- v0.3.0 → v0.4.0: add optional action, description, metadata
- v0.4.0 → v0.5.0: description as optional
- v0.5.0 → v0.6.0: add criticality (default "medium")
- v0.6.0 → v0.7.0: add composition_mode (optional)
- v0.7.0 → v0.8.0: add track (default true)
- v0.8.0 → v0.9.0: add labels (default empty)
- v0.9.0 → v0.10.0: add priority (default 0)
- v0.10.0 → v0.11.0: add title (default null)
## Tag & Label Filtering
- `listGuidelines({tags: ["tag1"]})` → guidelines with ANY of the specified tags
- `listGuidelines({labels: ["label1"]})` → guidelines with ALL specified labels (subset match)
- Combined: both filters apply (intersection)
## Journey → Guideline Projection
Port of Parlant's `JourneyGuidelineProjection.project_journey_to_guidelines()`:
- DFS traversal of Journey nodes from root
- Each (edge, node) pair → one Guideline
- Edge condition becomes guideline condition
- Node action becomes guideline action
- Edge/node metadata merged into guideline metadata with journey_node key
- follow_ups list populated with downstream guideline IDs
- BFS queue avoids infinite loops via visited set
## Journey Backtrack Detection
```typescript
interface BacktrackCheck {
journeyId: string;
currentNodeId: string;
previousNodeId: string;
isBacktrack: boolean;
recommendation: string | null;
}
```
Scans the edge list for source→target relationships. If the agent's current step has an edge back to a previously visited node (and that node is not in a forward path from current), it's flagged as a backtrack regression.

View File

@@ -0,0 +1,88 @@
# Session Lifecycle Commands — Spec
## Overview
Four agent-invocable commands that manage audit session lifecycle. Each command is a skill markdown file loaded by the agent on invocation.
## /start
```
/start "task description"
```
Creates a named audit session:
1. Generate `session_id = adhoc_YYYYMMDD_HHMM`
2. `mkdir -p .boo/runs/{session_id}`
3. Write `session.json`:
```json
{
"session_id": "adhoc_20260320_1400",
"task": "task description",
"start_time": "2026-03-20T14:00:00Z",
"status": "in_progress",
"expected_record_types": ["data", "change", "conversation"]
}
```
4. Write `.boo/runs/.current_session` containing session_id (handshake for hooks)
5. Run context recovery:
- L0: read `index.json` → last 5 entries
- L2: scan recent audit_trail.jsonl for `user_correction` records
6. Output recovery summary: recent activity, corrections, priorities
7. Check for unfinished sessions: scan for `status: "in_progress"` sessions, prompt user
## /end
```
/end
```
Ends the current audit session:
1. Read `.current_session` → get session_id
2. Collect remaining buffer data from `audit_buffer.jsonl` + `audit_pending.jsonl`
3. Append to `audit_trail.jsonl`
4. Clear buffer files
5. Extract `user_correction` records from audit_trail
6. Run integrity checks:
- Has records? (>0 audit_trail lines)
- All files covered? (changes in audit_trail match modified files)
- Corrections persisted? (persisted_to is non-empty)
7. Generate `session_summary.md`
8. Update `session.json` status=completed, end_time
9. Clear `.current_session`
## /recover
```
/recover # L0+L1+L2
/recover full # L3 (full audit_trail)
/recover {session_id} # load specific session
```
Graded context loading:
- L0 (~200t): index.json → last 5 entries (id, task, status)
- L1 (~500t): .current_session + session.json + last 3 audit_trail entries
- L2 (~1000t): scan all audit_trails for user_correction records + conclusions + daily report §4+§6
- L3 (~3000t): full audit_trail.jsonl + audit_pending.jsonl
## /report-daily
```
/report-daily # today
/report-daily 20260319 # specific date
/report-daily review # + morning self-review
```
7-section report:
1. Task overview (from index.json)
2. Operation stats (tool counts)
3. Change records (file modifications)
4. User feedback & corrections
5. Anomaly alerts
6. Backlog tracking
7. Integrity summary
`review` variant: adds morning self-review with trend analysis and recommended priorities.

View File

@@ -0,0 +1,42 @@
# User Correction Tracking — Spec
## Record Schema
```typescript
interface UserCorrectionRecord {
record_type: "conversation";
action_type: "user_correction";
priority: "critical_for_recovery";
timestamp: string; // ISO 8601
original_claim: string; // what the agent said that was wrong
correction: string; // what the user corrected it to
principle_extracted: string; // general principle derived from this correction
persisted_to: string[]; // files where this correction was documented
}
```
## Storage
User correction records are stored inline in `audit_trail.jsonl` as regular entries. They are extracted during `/end` and surfaced during `/recover` L2 loading.
## Detection
During `/end`, scan the session's `audit_trail.jsonl` for entries matching:
- `action_type === "user_correction"`
Also scan `audit_pending.jsonl` for any pending correction records not yet flushed.
## persisted_to Field
When a correction is written to CLAUDE.md, coding standards, or other documentation, the file paths are recorded in `persisted_to[]`. This is populated manually by the agent when it persists the correction.
## Correction-as-Precedent
When an agent considers an action that contradicts a known `user_correction` record, it is flagged with a warning. The agent should:
1. Identify the contradiction (which rule is being violated)
2. Surface the relevant correction record (with timestamp and original context)
3. Propose an alternative that respects the correction
4. If the contradiction is intentional, document why as a new correction
Detection logic: before each significant action, the agent scans loaded user_correction records from the current recovery context and checks if the proposed action matches any known `original_claim` pattern.

View File

@@ -0,0 +1,39 @@
# port-audit-parlant-patterns — Implementation Complete
## boocontext (TypeScript) — src/audit/
- [x] 1. Data Dir: `dotDir()`, `findRunsDir()`, `ensureRunsDir()` with .gitignore + AUDIT_DOT_DIR
- [x] 2. Core Types: `RecordEntry`, `CompactRecord`, `Manifest`, `UserCorrectionRecord`, `SessionJson`, `SessionSummary`
- [x] 3. Hash Utilities: `hashFile()`, `hashBytes()`, `hashDir()` via Node crypto SHA256
- [x] 4. Anomaly: `AlertRule`, `Anomaly`, `checkAnomalies()` with default rules
- [x] 5. AuditContext: `createBatchContext()` -> `record()` -> `recordCompact()` -> `finalize()` -> `save()` (writes manifest, trail, compact, anomalies, checksums, index)
- [x] 6. AmbientContext: `AsyncLocalStorage` wrapper — `runWithAmbient()`, `getAmbientSession()`, `requireAmbientSession()`
- [x] 7. Guideline Model: `GuidelineContent`, `Guideline`, `GuidelineStore`, `InMemoryGuidelineStore` with CRUD + tag/label filters
- [x] 8. Guideline Matching: `MatchingContext`, `MatchingBatch` (Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality), `GenericGuidelineMatchingStrategy`, retry policy
- [x] 9. ARQ Generation: `SchematicGenerator`, typed output schemas per batch, `GenerationInfo` tracking, `createExecutionPlan()` with batch-parallel
- [x] 10. Relationship Model: `RelationshipKind` (DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES), `FileRelationshipStore`
- [x] 11. Relational Resolver: 4-step iteration loop (deps -> prioritization -> priority -> entailment), `MAX_ITERATIONS=100`, `ResolutionKind` output
- [x] 12. Graded Recovery: `recoverL0()``recoverL4()`, `scanUserCorrections()`, `formatRecoveryReport()` with source attribution
- [x] 13. User Corrections: `detectCorrections()`, `addPersistedTarget()`, `findRelatedCorrections()`, `checkContradiction()`
- [x] 14. Index: `readIndex()`, `writeIndex()` with atomic `.tmp` + `renameSync`
- [x] 15. MCP Tools: `boocontext_audit_index` + `boocontext_audit_recover` registered in mcp-server.ts
- [x] 16. Typecheck: `npx tsc --noEmit` passes clean
## codecontext (Go) — internal/audit/ + internal/mcp/
- [x] 1. Record Types: `RecordEntry`, `CompactRecord`, `RecordStep`/`RecordAction` enums (pre-existing)
- [x] 2. Index: `UpdateIndexEntry()` with idempotent upsert, `IndexEntry` schema, atomic `.tmp` + `os.Rename()` (pre-existing)
- [x] 3. Hashchain: `HashFile()`, `HashBytes()`, `HashDir()`, `VerifyHashchain()` with `HashchainVerificationError` (pre-existing)
- [x] 4. Directory: `DotDir()`, `RunsDir()`, `EnsureRunsDir()` with .gitignore + `AUDIT_DOT_DIR` (pre-existing)
- [x] 5. Anomaly: `AlertRule`, `Anomaly`, `Manifest` types + `CheckAnomalies()` with condition evaluation (pre-existing stub, now evaluates total_records/error_rate/hash conditions)
- [x] 6. GenerateChecksums: per-file SHA256 manifest (pre-existing)
- [x] 7. Session Lifecycle: `SessionLifecycleManager` with `StartSession(task)`, `EndSession()`, `CurrentSession()` — creates adhoc session, writes .current_session, updates index
- [x] 8. Trail Management: `TrailManager` with `AppendToBuffer()`, `PendingAppend()`, `AppendToTrail()`, `ReadTrail()`, `FlushBuffer()` — auto-generates session if none active
- [x] 9. MCP Audit Tools: `codecontext_audit_start`, `codecontext_audit_end`, `codecontext_audit_status` in `internal/mcp/audit_tools.go`
- [x] 10. MCP Middleware Hooks: `recordAuditBuffer()` in server struct, buffer after tool calls, flush on "ready"
- [x] 11. Build: `go build ./...` passes clean
## boocode (Node.js) — apps/coder/src/services/
- [x] 1. Session Service (`audit-session.ts`): `startSession()` with L0+L2 recovery, `endSession()` with integrity checks + session_summary.md, `recoverSession()` L0-L3 graded loading, `generateDailyReport()` 7-section report
- [x] 2. Correction Service (`correction-service.ts`): `recordCorrection()`, `scanForCorrections()`, `checkContradiction()`, `markPersisted()` — JSON store at `.boo/corrections/`
- [x] 3. Guideline Service (`guideline-service.ts`): `createGuideline()`, `listGuidelines()` with tag/label filters, version migration chain (v0.1.0->v0.11.0), `projectJourneyToGuidelines()` DFS, `checkBacktrack()` — JSON store at `.boo/guidelines/`
- [x] 4. Skill commands: `command-start/SKILL.md`, `command-end/SKILL.md`, `command-recover/SKILL.md`, `command-report-daily/SKILL.md`
- [x] 5. Typecheck: `pnpm -C apps/coder typecheck` passes clean