chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions
--- a/openspec/changes/add-behavioral-engine/.openspec.yaml
+++ b/openspec/changes/add-behavioral-engine/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-06-07
--- a/openspec/changes/add-behavioral-engine/design.md
+++ b/openspec/changes/add-behavioral-engine/design.md
@@ -0,0 +1,32 @@
+## Context
+
+BooCode has no structured behavioral enforcement. Agent behavior is guided by system prompts and CLAUDE.md — advisory, not enforceable. The `boocontext-audit` package (already TypeScript, already in /opt/forks) provides a complete behavioral compliance engine: Guideline model, 6-batch matcher, relational resolver, audit trail, and graded recovery.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Import boocontext-audit's Guideline model (condition/action rules with criticality)
+- Import multi-batch matcher (Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality)
+- Import RelationalResolver (DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES)
+- Import audit middleware (PostToolUse, Stop, UserPromptSubmit hooks)
+- Import graded context recovery (L0-L4)
+- Wire guideline evaluation into agent's inference loop
+
+**Non-Goals:**
+- Journey DAG integration (future scope)
+- MCP middleware integration (focus on in-process hooks)
+
+## Decisions
+
+- **Direct import from local fork**: boocontext-audit is at `/opt/forks/boocontext-audit/`. Use workspace dependency or npm link.
+- **Guideline storage**: InMemoryGuidelineStore for development, FileRelationshipStore for production.
+- **Batch execution**: Run observable + actionable batches in parallel, then disambiguation, then response analysis.
+- **SchematicGenerator**: Abstract LLM caller. Configure per-batch model (use cheap model for matching, expensive for disambiguation).
+- **Audit hooks**: Wire PostToolUse → appendToBuffer(), Stop → flushBuffer(), UserPromptSubmit → injectSessionContext().
+- **Recovery**: Load L0 (index) by default. L2 (user corrections) on /recover. L3 (full) on /recover full.
+
+## Risks / Trade-offs
+
+- **LLM overhead**: Each batch is an LLM call. 6 batches × N guidelines could be expensive. Mitigation: batch size limits, parallel execution.
+- **Cold start**: No guidelines exist initially. Users must define them. Ship with 5-10 built-in safety guidelines.
+- **boocontext-audit maturity**: v0.1.0. Review code quality before direct import.
--- a/openspec/changes/add-behavioral-engine/proposal.md
+++ b/openspec/changes/add-behavioral-engine/proposal.md
@@ -0,0 +1,22 @@
+## Why
+
+BooCode has no structured way to enforce agent behavior rules. The `boocontext-audit` package (already TypeScript, zero external deps) provides a complete behavioral compliance engine ported from Parlant: Guideline condition/action model, multi-batch LLM matcher, relational resolver, audit middleware, and graded context recovery. Adding this gives BooCode structured rule enforcement far beyond simple CLAUDE.md guidelines.
+
+## What Changes
+
+- Import boocontext-audit as a dependency in apps/coder/
+- Add Guideline model: natural language condition/action rules with criticality
+- Add multi-batch matcher: observational, actionable, previously-applied, disambiguation, response analysis batches
+- Add RelationalResolver: DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL relationship resolution
+- Add audit middleware: PostToolUse/Stop/UserPromptSubmit hooks with JSONL buffer
+- Add graded context recovery: L0-L4 recovery levels
+- Wire guideline evaluation into agent's inference loop
+
+## Capabilities
+
+### New Capabilities
+- `guideline-model`: Natural language condition/action rules with criticality and priority
+- `multi-batch-matcher`: 6-batch LLM evaluation for context-relevant rule matching
+- `relational-resolver`: Dependency/priority/entailment resolution with iterative convergence
+- `audit-middleware`: PostToolUse/Stop/UserPromptSubmit hooks with JSONL trail
+- `graded-recovery`: L0-L4 context recovery for session continuity
--- a/openspec/changes/add-behavioral-engine/specs/audit-middleware/spec.md
+++ b/openspec/changes/add-behavioral-engine/specs/audit-middleware/spec.md
@@ -0,0 +1,21 @@
+## ADDED Requirements
+
+### Requirement: PostToolUse audit logging
+- **WHEN** a tool is used
+- **THEN** the tool name, input summary, and timestamp are appended to the JSONL audit buffer
+
+### Requirement: Stop hook flush
+- **WHEN** a response completes
+- **THEN** the audit buffer is flushed to the session audit trail and index is updated
+
+### Requirement: UserPromptSubmit context injection
+- **WHEN** a user message is submitted
+- **THEN** session context (session ID, record count, critical alerts) is injected into the prompt
+
+### Requirement: Anomaly detection
+- **WHEN** audit records are checked against alert rules
+- **THEN** anomalies at CRITICAL level are injected into the context
+
+#### Scenario: Full audit trail
+- **WHEN** an agent runs 10 tool calls across 3 turns
+- **THEN** the audit trail contains 10 JSONL records, a session summary, and an updated index
--- a/openspec/changes/add-behavioral-engine/specs/graded-recovery/spec.md
+++ b/openspec/changes/add-behavioral-engine/specs/graded-recovery/spec.md
@@ -0,0 +1,25 @@
+## ADDED Requirements
+
+### Requirement: L0 recovery (index summary)
+- **WHEN** /recover is called without arguments
+- **THEN** the last 5 index entries are loaded (~200 tokens)
+
+### Requirement: L1 recovery (session state)
+- **WHEN** /recover L1 is called
+- **THEN** current session.json + last 3 audit trail entries are loaded (~500 tokens)
+
+### Requirement: L2 recovery (user corrections)
+- **WHEN** /recover L2 is called
+- **THEN** ALL user_correction records across all sessions are loaded (~1000 tokens)
+
+### Requirement: L3 recovery (full context)
+- **WHEN** /recover L3 is called
+- **THEN** full audit trail + all pending records are loaded (~3000 tokens)
+
+### Requirement: Priority loading
+- **WHEN** recovering context
+- **THEN** user_correction records are loaded first (highest priority)
+
+#### Scenario: Session crash recovery
+- **WHEN** an agent session crashes and restarts with /recover
+- **THEN** the agent gets the index summary, last session state, and all user corrections
--- a/openspec/changes/add-behavioral-engine/specs/guideline-model/spec.md
+++ b/openspec/changes/add-behavioral-engine/specs/guideline-model/spec.md
@@ -0,0 +1,17 @@
+## ADDED Requirements
+
+### Requirement: Guideline creation
+- **WHEN** creating a guideline with condition, action, and criticality
+- **THEN** it is stored with unique ID and metadata
+
+### Requirement: Guideline evaluation
+- **WHEN** an agent action triggers guideline evaluation
+- **THEN** matching guidelines are activated with score and rationale
+
+### Requirement: Criticality levels
+- **WHEN** evaluating guidelines
+- **THEN** guidelines are filtered by criticality (low/medium/high/critical) with higher-criticality taking precedence
+
+#### Scenario: Security policy enforcement
+- **WHEN** an agent attempts to edit a file matching a security guideline condition
+- **THEN** the guideline matcher returns the relevant rule with CRITICAL severity
--- a/openspec/changes/add-behavioral-engine/specs/multi-batch-matcher/spec.md
+++ b/openspec/changes/add-behavioral-engine/specs/multi-batch-matcher/spec.md
@@ -0,0 +1,17 @@
+## ADDED Requirements
+
+### Requirement: Six batch types
+- **WHEN** guidelines are evaluated
+- **THEN** they are processed through: Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, and LowCriticality batches
+
+### Requirement: Parallel batch execution
+- **WHEN** independent batches are ready
+- **THEN** they execute in parallel (observational + actionable run concurrently)
+
+### Requirement: Structured LLM output per batch
+- **WHEN** a batch calls the LLM
+- **THEN** it uses a structured schema specific to the batch type (e.g., applies: boolean for actionable, was_followed: boolean for response analysis)
+
+#### Scenario: Multi-rule evaluation
+- **WHEN** an agent action matches 3 guidelines across different criticalities
+- **THEN** the matcher returns all applicable matches with scores, with CRITICAL matches flagged
--- a/openspec/changes/add-behavioral-engine/specs/relational-resolver/spec.md
+++ b/openspec/changes/add-behavioral-engine/specs/relational-resolver/spec.md
@@ -0,0 +1,21 @@
+## ADDED Requirements
+
+### Requirement: DEPENDS_ON resolution
+- **WHEN** guideline A depends on guideline B
+- **THEN** B is activated if A is activated
+
+### Requirement: PRIORITIZES resolution
+- **WHEN** guideline A prioritizes over guideline B
+- **THEN** B is filtered out if both match
+
+### Requirement: ENTAILS resolution
+- **WHEN** guideline A entails guideline B
+- **THEN** B is automatically activated when A is activated
+
+### Requirement: Iterative convergence
+- **WHEN** resolving relationships
+- **THEN** the resolver iterates (max 100 iterations) until no more changes or stable state
+
+#### Scenario: Conflicting guideline resolution
+- **WHEN** a HIGH priority guideline matches and a LOW priority guideline also matches
+- **THEN** the LOW priority guideline is filtered out via numerical priority resolution
--- a/openspec/changes/add-behavioral-engine/tasks.md
+++ b/openspec/changes/add-behavioral-engine/tasks.md
@@ -0,0 +1,56 @@
+## 1. Import boocontext-audit as dependency
+
+- [ ] 1.1 Add boocontext-audit as workspace dependency
+- [ ] 1.2 Verify Guideline, GuidelineStore, SchematicGenerator exports
+
+## 2. Implement Guideline model
+
+- [ ] 2.1 Create GuidelineManager wrapping GuidelineStore
+- [ ] 2.2 Add CRUD operations for guidelines (create, read, update, delete, list)
+- [ ] 2.3 Add InMemoryGuidelineStore and FileRelationshipStore backends
+- [ ] 2.4 Add criticality filtering and priority sorting
+
+## 3. Implement multi-batch matcher
+
+- [ ] 3.1 Create MatcherService wrapping GenericGuidelineMatchingStrategy
+- [ ] 3.2 Add Observable, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality batch types
+- [ ] 3.3 Add parallel batch execution for independent batches
+- [ ] 3.4 Add SchematicGenerator abstraction for LLM batch calls
+
+## 4. Implement RelationalResolver
+
+- [ ] 4.1 Create ResolverService wrapping RelationalResolver
+- [ ] 4.2 Implement DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES resolution
+- [ ] 4.3 Add iterative convergence loop (max 100 iterations)
+- [ ] 4.4 Add resolution logging
+
+## 5. Implement audit middleware
+
+- [ ] 5.1 Create AuditService with PostToolUse middleware (JSONL buffer append)
+- [ ] 5.2 Add Stop middleware (buffer flush to session trail)
+- [ ] 5.3 Add UserPromptSubmit middleware (session context injection + CRITICAL alerts)
+- [ ] 5.4 Wire audit middleware into agent's inference lifecycle
+
+## 6. Implement graded context recovery
+
+- [ ] 6.1 Create RecoveryService with L0-L4 recovery methods
+- [ ] 6.2 Implement L0: read last 5 index entries
+- [ ] 6.3 Implement L1: session.json + last 3 audit trail entries
+- [ ] 6.4 Implement L2: all user_correction records
+- [ ] 6.5 Implement L3: full audit trail
+- [ ] 6.6 Add priority loading (user corrections first)
+
+## 7. Wire into agent inference loop
+
+- [ ] 7.1 Run guideline evaluation before each agent turn
+- [ ] 7.2 Inject active guidelines into system prompt
+- [ ] 7.3 Record guideline matches in turn metadata
+- [ ] 7.4 Add guideline management commands (add-guideline, list-guidelines, remove-guideline)
+
+## 8. Test and verify
+
+- [ ] 8.1 Test guideline creation and storage
+- [ ] 8.2 Test multi-batch matching with sample guidelines
+- [ ] 8.3 Test relational resolution with dependencies
+- [ ] 8.4 Test audit middleware tool logging
+- [ ] 8.5 Test graded recovery at all levels