chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
2
openspec/changes/add-behavioral-engine/.openspec.yaml
Normal file
2
openspec/changes/add-behavioral-engine/.openspec.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-06-07
|
||||
32
openspec/changes/add-behavioral-engine/design.md
Normal file
32
openspec/changes/add-behavioral-engine/design.md
Normal file
@@ -0,0 +1,32 @@
|
||||
## Context
|
||||
|
||||
BooCode has no structured behavioral enforcement. Agent behavior is guided by system prompts and CLAUDE.md — advisory, not enforceable. The `boocontext-audit` package (already TypeScript, already in /opt/forks) provides a complete behavioral compliance engine: Guideline model, 6-batch matcher, relational resolver, audit trail, and graded recovery.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Import boocontext-audit's Guideline model (condition/action rules with criticality)
|
||||
- Import multi-batch matcher (Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality)
|
||||
- Import RelationalResolver (DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES)
|
||||
- Import audit middleware (PostToolUse, Stop, UserPromptSubmit hooks)
|
||||
- Import graded context recovery (L0-L4)
|
||||
- Wire guideline evaluation into agent's inference loop
|
||||
|
||||
**Non-Goals:**
|
||||
- Journey DAG integration (future scope)
|
||||
- MCP middleware integration (focus on in-process hooks)
|
||||
|
||||
## Decisions
|
||||
|
||||
- **Direct import from local fork**: boocontext-audit is at `/opt/forks/boocontext-audit/`. Use workspace dependency or npm link.
|
||||
- **Guideline storage**: InMemoryGuidelineStore for development, FileRelationshipStore for production.
|
||||
- **Batch execution**: Run observable + actionable batches in parallel, then disambiguation, then response analysis.
|
||||
- **SchematicGenerator**: Abstract LLM caller. Configure per-batch model (use cheap model for matching, expensive for disambiguation).
|
||||
- **Audit hooks**: Wire PostToolUse → appendToBuffer(), Stop → flushBuffer(), UserPromptSubmit → injectSessionContext().
|
||||
- **Recovery**: Load L0 (index) by default. L2 (user corrections) on /recover. L3 (full) on /recover full.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **LLM overhead**: Each batch is an LLM call. 6 batches × N guidelines could be expensive. Mitigation: batch size limits, parallel execution.
|
||||
- **Cold start**: No guidelines exist initially. Users must define them. Ship with 5-10 built-in safety guidelines.
|
||||
- **boocontext-audit maturity**: v0.1.0. Review code quality before direct import.
|
||||
22
openspec/changes/add-behavioral-engine/proposal.md
Normal file
22
openspec/changes/add-behavioral-engine/proposal.md
Normal file
@@ -0,0 +1,22 @@
|
||||
## Why
|
||||
|
||||
BooCode has no structured way to enforce agent behavior rules. The `boocontext-audit` package (already TypeScript, zero external deps) provides a complete behavioral compliance engine ported from Parlant: Guideline condition/action model, multi-batch LLM matcher, relational resolver, audit middleware, and graded context recovery. Adding this gives BooCode structured rule enforcement far beyond simple CLAUDE.md guidelines.
|
||||
|
||||
## What Changes
|
||||
|
||||
- Import boocontext-audit as a dependency in apps/coder/
|
||||
- Add Guideline model: natural language condition/action rules with criticality
|
||||
- Add multi-batch matcher: observational, actionable, previously-applied, disambiguation, response analysis batches
|
||||
- Add RelationalResolver: DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL relationship resolution
|
||||
- Add audit middleware: PostToolUse/Stop/UserPromptSubmit hooks with JSONL buffer
|
||||
- Add graded context recovery: L0-L4 recovery levels
|
||||
- Wire guideline evaluation into agent's inference loop
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- `guideline-model`: Natural language condition/action rules with criticality and priority
|
||||
- `multi-batch-matcher`: 6-batch LLM evaluation for context-relevant rule matching
|
||||
- `relational-resolver`: Dependency/priority/entailment resolution with iterative convergence
|
||||
- `audit-middleware`: PostToolUse/Stop/UserPromptSubmit hooks with JSONL trail
|
||||
- `graded-recovery`: L0-L4 context recovery for session continuity
|
||||
@@ -0,0 +1,21 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: PostToolUse audit logging
|
||||
- **WHEN** a tool is used
|
||||
- **THEN** the tool name, input summary, and timestamp are appended to the JSONL audit buffer
|
||||
|
||||
### Requirement: Stop hook flush
|
||||
- **WHEN** a response completes
|
||||
- **THEN** the audit buffer is flushed to the session audit trail and index is updated
|
||||
|
||||
### Requirement: UserPromptSubmit context injection
|
||||
- **WHEN** a user message is submitted
|
||||
- **THEN** session context (session ID, record count, critical alerts) is injected into the prompt
|
||||
|
||||
### Requirement: Anomaly detection
|
||||
- **WHEN** audit records are checked against alert rules
|
||||
- **THEN** anomalies at CRITICAL level are injected into the context
|
||||
|
||||
#### Scenario: Full audit trail
|
||||
- **WHEN** an agent runs 10 tool calls across 3 turns
|
||||
- **THEN** the audit trail contains 10 JSONL records, a session summary, and an updated index
|
||||
@@ -0,0 +1,25 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: L0 recovery (index summary)
|
||||
- **WHEN** /recover is called without arguments
|
||||
- **THEN** the last 5 index entries are loaded (~200 tokens)
|
||||
|
||||
### Requirement: L1 recovery (session state)
|
||||
- **WHEN** /recover L1 is called
|
||||
- **THEN** current session.json + last 3 audit trail entries are loaded (~500 tokens)
|
||||
|
||||
### Requirement: L2 recovery (user corrections)
|
||||
- **WHEN** /recover L2 is called
|
||||
- **THEN** ALL user_correction records across all sessions are loaded (~1000 tokens)
|
||||
|
||||
### Requirement: L3 recovery (full context)
|
||||
- **WHEN** /recover L3 is called
|
||||
- **THEN** full audit trail + all pending records are loaded (~3000 tokens)
|
||||
|
||||
### Requirement: Priority loading
|
||||
- **WHEN** recovering context
|
||||
- **THEN** user_correction records are loaded first (highest priority)
|
||||
|
||||
#### Scenario: Session crash recovery
|
||||
- **WHEN** an agent session crashes and restarts with /recover
|
||||
- **THEN** the agent gets the index summary, last session state, and all user corrections
|
||||
@@ -0,0 +1,17 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Guideline creation
|
||||
- **WHEN** creating a guideline with condition, action, and criticality
|
||||
- **THEN** it is stored with unique ID and metadata
|
||||
|
||||
### Requirement: Guideline evaluation
|
||||
- **WHEN** an agent action triggers guideline evaluation
|
||||
- **THEN** matching guidelines are activated with score and rationale
|
||||
|
||||
### Requirement: Criticality levels
|
||||
- **WHEN** evaluating guidelines
|
||||
- **THEN** guidelines are filtered by criticality (low/medium/high/critical) with higher-criticality taking precedence
|
||||
|
||||
#### Scenario: Security policy enforcement
|
||||
- **WHEN** an agent attempts to edit a file matching a security guideline condition
|
||||
- **THEN** the guideline matcher returns the relevant rule with CRITICAL severity
|
||||
@@ -0,0 +1,17 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Six batch types
|
||||
- **WHEN** guidelines are evaluated
|
||||
- **THEN** they are processed through: Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, and LowCriticality batches
|
||||
|
||||
### Requirement: Parallel batch execution
|
||||
- **WHEN** independent batches are ready
|
||||
- **THEN** they execute in parallel (observational + actionable run concurrently)
|
||||
|
||||
### Requirement: Structured LLM output per batch
|
||||
- **WHEN** a batch calls the LLM
|
||||
- **THEN** it uses a structured schema specific to the batch type (e.g., applies: boolean for actionable, was_followed: boolean for response analysis)
|
||||
|
||||
#### Scenario: Multi-rule evaluation
|
||||
- **WHEN** an agent action matches 3 guidelines across different criticalities
|
||||
- **THEN** the matcher returns all applicable matches with scores, with CRITICAL matches flagged
|
||||
@@ -0,0 +1,21 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: DEPENDS_ON resolution
|
||||
- **WHEN** guideline A depends on guideline B
|
||||
- **THEN** B is activated if A is activated
|
||||
|
||||
### Requirement: PRIORITIZES resolution
|
||||
- **WHEN** guideline A prioritizes over guideline B
|
||||
- **THEN** B is filtered out if both match
|
||||
|
||||
### Requirement: ENTAILS resolution
|
||||
- **WHEN** guideline A entails guideline B
|
||||
- **THEN** B is automatically activated when A is activated
|
||||
|
||||
### Requirement: Iterative convergence
|
||||
- **WHEN** resolving relationships
|
||||
- **THEN** the resolver iterates (max 100 iterations) until no more changes or stable state
|
||||
|
||||
#### Scenario: Conflicting guideline resolution
|
||||
- **WHEN** a HIGH priority guideline matches and a LOW priority guideline also matches
|
||||
- **THEN** the LOW priority guideline is filtered out via numerical priority resolution
|
||||
56
openspec/changes/add-behavioral-engine/tasks.md
Normal file
56
openspec/changes/add-behavioral-engine/tasks.md
Normal file
@@ -0,0 +1,56 @@
|
||||
## 1. Import boocontext-audit as dependency
|
||||
|
||||
- [ ] 1.1 Add boocontext-audit as workspace dependency
|
||||
- [ ] 1.2 Verify Guideline, GuidelineStore, SchematicGenerator exports
|
||||
|
||||
## 2. Implement Guideline model
|
||||
|
||||
- [ ] 2.1 Create GuidelineManager wrapping GuidelineStore
|
||||
- [ ] 2.2 Add CRUD operations for guidelines (create, read, update, delete, list)
|
||||
- [ ] 2.3 Add InMemoryGuidelineStore and FileRelationshipStore backends
|
||||
- [ ] 2.4 Add criticality filtering and priority sorting
|
||||
|
||||
## 3. Implement multi-batch matcher
|
||||
|
||||
- [ ] 3.1 Create MatcherService wrapping GenericGuidelineMatchingStrategy
|
||||
- [ ] 3.2 Add Observable, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality batch types
|
||||
- [ ] 3.3 Add parallel batch execution for independent batches
|
||||
- [ ] 3.4 Add SchematicGenerator abstraction for LLM batch calls
|
||||
|
||||
## 4. Implement RelationalResolver
|
||||
|
||||
- [ ] 4.1 Create ResolverService wrapping RelationalResolver
|
||||
- [ ] 4.2 Implement DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES resolution
|
||||
- [ ] 4.3 Add iterative convergence loop (max 100 iterations)
|
||||
- [ ] 4.4 Add resolution logging
|
||||
|
||||
## 5. Implement audit middleware
|
||||
|
||||
- [ ] 5.1 Create AuditService with PostToolUse middleware (JSONL buffer append)
|
||||
- [ ] 5.2 Add Stop middleware (buffer flush to session trail)
|
||||
- [ ] 5.3 Add UserPromptSubmit middleware (session context injection + CRITICAL alerts)
|
||||
- [ ] 5.4 Wire audit middleware into agent's inference lifecycle
|
||||
|
||||
## 6. Implement graded context recovery
|
||||
|
||||
- [ ] 6.1 Create RecoveryService with L0-L4 recovery methods
|
||||
- [ ] 6.2 Implement L0: read last 5 index entries
|
||||
- [ ] 6.3 Implement L1: session.json + last 3 audit trail entries
|
||||
- [ ] 6.4 Implement L2: all user_correction records
|
||||
- [ ] 6.5 Implement L3: full audit trail
|
||||
- [ ] 6.6 Add priority loading (user corrections first)
|
||||
|
||||
## 7. Wire into agent inference loop
|
||||
|
||||
- [ ] 7.1 Run guideline evaluation before each agent turn
|
||||
- [ ] 7.2 Inject active guidelines into system prompt
|
||||
- [ ] 7.3 Record guideline matches in turn metadata
|
||||
- [ ] 7.4 Add guideline management commands (add-guideline, list-guidelines, remove-guideline)
|
||||
|
||||
## 8. Test and verify
|
||||
|
||||
- [ ] 8.1 Test guideline creation and storage
|
||||
- [ ] 8.2 Test multi-batch matching with sample guidelines
|
||||
- [ ] 8.3 Test relational resolution with dependencies
|
||||
- [ ] 8.4 Test audit middleware tool logging
|
||||
- [ ] 8.5 Test graded recovery at all levels
|
||||
Reference in New Issue
Block a user