chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
@@ -0,0 +1,76 @@
|
||||
## Context
|
||||
|
||||
boocode currently has no persistent session management for its agents (the persona agents in data/AGENTS.md). When a session is interrupted, there's no recoverable audit trail, no way to detect repeated mistakes, and no mechanism to enforce learned behavioral guidelines across sessions.
|
||||
|
||||
audit-harness provides: hooks (PostToolUse buffer→Stop flush→UserPromptSubmit injection), skills (/start→/end→/recover→/report-daily), and a Python core (AuditContext) with unified index schema.
|
||||
|
||||
Parlant provides: GuidelineDocumentStore (versioned, tag/label filtered), JourneyStore (graph-based SOPs), and JourneyGuidelineProjection (node→guideline auto-conversion).
|
||||
|
||||
This design ports the high-value subset of both into boocode as agent-facing skills and a TypeScript core library.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Define `.boo/runs/` directory convention with auto-creation and `.gitignore`
|
||||
- Port /start, /end, /recover, /report-daily as boocode skills (markdown)
|
||||
- Port user_correction record format and detection
|
||||
- Port GuidelineDocumentStore from Parlant as TypeScript service
|
||||
- Port Journey → guideline auto-projection (node→guideline conversion)
|
||||
- Implement guideline find_guideline() by content match
|
||||
- All features opt-in, zero breaking changes
|
||||
|
||||
**Non-Goals:**
|
||||
- AuditContext full Python class port (environment snapshots, anomaly lambdas)
|
||||
- Hooks implementation (PostToolUse/Stop/UserPromptSubmit) — separate batch
|
||||
- Parlant's vector DB / embedder infrastructure
|
||||
- Parlant's relationship resolver (ARQ)
|
||||
- Web UI for guideline management — CLI/skill-only
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Skill-based commands over CLI tools
|
||||
|
||||
**Choice**: Implement /start, /end, /recover, /report-daily as skill markdown files in `data/skills/boocode/`, following the existing `committing-changes` pattern.
|
||||
**Rationale**: boocode agents already load skills from this path. Adding a new skill is zero code change to the agent runtime — just a new markdown file with YAML frontmatter. CLI tools would require new API routes, dispatch logic, and frontend work.
|
||||
**Alternatives considered**: Fastify API routes (rejected — too heavy for agent-facing commands), shell scripts (rejected — platform-specific).
|
||||
|
||||
### Decision 2: JSONL buffer + index.json
|
||||
|
||||
**Choice**: Port audit-harness's file layout exactly: `audit_buffer.jsonl` for live writes, `audit_pending.jsonl` for agent-authored AUDIT blocks, per-session `audit_trail.jsonl` for flushed records, `index.json` for cross-session metadata.
|
||||
**Rationale**: audit-harness has production-miles with this layout. JSONL is grep-able, append-only, and needs no DB connection.
|
||||
**Alternatives considered**: Postgres (rejected — agents don't all have DB access), SQLite (rejected — adds a native dep).
|
||||
|
||||
### Decision 3: GUID-based session IDs
|
||||
|
||||
**Choice**: `adhoc_YYYYMMDD_HHMM` format for session IDs, matching audit-harness pattern.
|
||||
**Rationale**: Human-readable, sort-able, no collision risk within the same second.
|
||||
|
||||
### Decision 4: File-based GuidelineStore
|
||||
|
||||
**Choice**: Port GuidelineDocumentStore's abstract interface (create/list/read/update/delete/find) but use filesystem JSON storage instead of Parlant's DocumentDatabase.
|
||||
**Rationale**: boocode doesn't have Parlant's document DB abstraction. A JSON-file store is simpler and sufficient for single-user operation. The interface stays the same, so a future Postgres backend can be swapped in.
|
||||
**Alternatives considered**: Postgres backend (rejected — adds coupling), in-memory only (rejected — no persistence).
|
||||
|
||||
### Decision 5: Journey → guideline projection as pure function
|
||||
|
||||
**Choice**: Port `JourneyGuidelineProjection` as a pure function (not a class). Takes a Journey + its nodes/edges, returns Guideline[].
|
||||
**Rationale**: The projection logic (DFS traversal, node→guideline conversion, edge metadata grafting) is deterministic and has no side effects. A pure function is simpler to test and compose.
|
||||
**Alternatives considered**: Class with JourneyStore dependency (rejected — unnecessary indirection for our use case).
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **[Risk]** Skills grow stale if agent runtime doesn't load them → **Mitigation**: Test with existing agent by loading skill explicitly.
|
||||
- **[Risk]** JSONL file contention from multiple agents → **Mitigation**: Single-user homelab. Acceptable.
|
||||
- **[Risk]** GuidelineStore JSON files grow unbounded → **Mitigation**: TBD — add compaction/archival in future batch.
|
||||
- **[Trade-off]** File storage is simple but doesn't scale to multi-user → Acceptable for single-user.
|
||||
|
||||
## Migration / Rollout
|
||||
|
||||
1. Create openspec spec files (proposal/design/tasks/specs)
|
||||
2. Create `.boo/runs/` directory structure (service)
|
||||
3. Create 4 skill files in `data/skills/boocode/`
|
||||
4. Create core AuditContext TypeScript service
|
||||
5. Create GuidelineStore + Journey service
|
||||
6. Create user_correction utilities
|
||||
7. Update data/AGENTS.md with new agents
|
||||
8. Test with skill invocation
|
||||
@@ -0,0 +1,23 @@
|
||||
## Why
|
||||
|
||||
The audit-harness (hooks + skills + AuditContext) and Parlant (GuidelineStore + Journey engine) provide two proven patterns for agent session management. audit-harness solves context-window loss through persistent audit trails, graded recovery, and structured commands (/start → /end → /recover → /report-daily). Parlant solves behavioral consistency through a versioned guideline document store with tag/label-based retrieval, journey-based SOPs, and backtrack detection.
|
||||
|
||||
Porting these patterns into boocode's agent ecosystem gives every agent working in this repo persistent session management, cross-session user correction awareness, and behavioral guideline enforcement — without building any of it from scratch.
|
||||
|
||||
## What Changes
|
||||
|
||||
### New Capabilities
|
||||
|
||||
- **Data Directory Convention**: `.boo/runs/` directory with buffer files, session dirs, `.current_session` handshake, unified `index.json`. `AUDIT_DOT_DIR` env var for platform override.
|
||||
|
||||
- **Session Lifecycle Commands**: `/start` creates named audit sessions with auto-recovery (L0+L2). `/end` flushes buffers, runs integrity checks, generates `session_summary.md`. `/recover` graded context loading (L0–L3). `/report-daily` aggregates all sessions into a 7-section report; `/report-daily review` also runs morning self-review.
|
||||
|
||||
- **User Correction Tracking**: Structured `user_correction` records with `original_claim`/`correction`/`principle_extracted`/`persisted_to`. Auto-detected on `/end`. Correction-as-precedent enforcement when agent actions contradict prior corrections.
|
||||
|
||||
- **Behavioral Guidelines Store**: Versioned GuidelineDocumentStore ported from Parlant with condition+action+description content model, tag/label filtering, and content-based `find_guideline()`. Journey → guideline auto-projection (SOP nodes → guidelines with follow-up edges). Journey backtrack detection batch.
|
||||
|
||||
### Dependencies
|
||||
|
||||
- Existing audit-harness patterns (audit-context.py, hooks, skills) reference implementation.
|
||||
- Parlant's GuidelineStore (guidelines.py) and JourneyStore (journeys.py) reference implementation.
|
||||
- No new external services. File-based JSONL storage (audit-harness pattern).
|
||||
@@ -0,0 +1,80 @@
|
||||
# Behavioral Guidelines Store — Spec
|
||||
|
||||
## Guideline Entity
|
||||
|
||||
```typescript
|
||||
interface GuidelineContent {
|
||||
condition: string; // When...
|
||||
action: string | null; // Then...
|
||||
description: string | null;
|
||||
}
|
||||
|
||||
interface Guideline {
|
||||
id: string;
|
||||
creationUtc: string;
|
||||
content: GuidelineContent;
|
||||
enabled: boolean;
|
||||
tags: string[];
|
||||
labels: string[];
|
||||
metadata: Record<string, unknown>;
|
||||
criticality: "low" | "medium" | "high";
|
||||
title: string | null;
|
||||
priority: number;
|
||||
}
|
||||
```
|
||||
|
||||
## GuidelineDocumentStore
|
||||
|
||||
File-based JSON store at `.boo/guidelines/`. Versioned with migration support.
|
||||
|
||||
Methods:
|
||||
- `createGuideline(condition, action?, description?, ...) → Guideline`
|
||||
- `listGuidelines(tags?, labels?) → Guideline[]`
|
||||
- `readGuideline(id) → Guideline`
|
||||
- `updateGuideline(id, params) → Guideline`
|
||||
- `deleteGuideline(id) → void`
|
||||
- `findGuideline(content: {condition, action?}) → Guideline`
|
||||
|
||||
Version migration chain (port from Parlant v0.1.0 → v0.11.0):
|
||||
- v0.1.0 → v0.2.0: add enabled field
|
||||
- v0.2.0 → v0.3.0: remove guideline_set (migration script only)
|
||||
- v0.3.0 → v0.4.0: add optional action, description, metadata
|
||||
- v0.4.0 → v0.5.0: description as optional
|
||||
- v0.5.0 → v0.6.0: add criticality (default "medium")
|
||||
- v0.6.0 → v0.7.0: add composition_mode (optional)
|
||||
- v0.7.0 → v0.8.0: add track (default true)
|
||||
- v0.8.0 → v0.9.0: add labels (default empty)
|
||||
- v0.9.0 → v0.10.0: add priority (default 0)
|
||||
- v0.10.0 → v0.11.0: add title (default null)
|
||||
|
||||
## Tag & Label Filtering
|
||||
|
||||
- `listGuidelines({tags: ["tag1"]})` → guidelines with ANY of the specified tags
|
||||
- `listGuidelines({labels: ["label1"]})` → guidelines with ALL specified labels (subset match)
|
||||
- Combined: both filters apply (intersection)
|
||||
|
||||
## Journey → Guideline Projection
|
||||
|
||||
Port of Parlant's `JourneyGuidelineProjection.project_journey_to_guidelines()`:
|
||||
|
||||
- DFS traversal of Journey nodes from root
|
||||
- Each (edge, node) pair → one Guideline
|
||||
- Edge condition becomes guideline condition
|
||||
- Node action becomes guideline action
|
||||
- Edge/node metadata merged into guideline metadata with journey_node key
|
||||
- follow_ups list populated with downstream guideline IDs
|
||||
- BFS queue avoids infinite loops via visited set
|
||||
|
||||
## Journey Backtrack Detection
|
||||
|
||||
```typescript
|
||||
interface BacktrackCheck {
|
||||
journeyId: string;
|
||||
currentNodeId: string;
|
||||
previousNodeId: string;
|
||||
isBacktrack: boolean;
|
||||
recommendation: string | null;
|
||||
}
|
||||
```
|
||||
|
||||
Scans the edge list for source→target relationships. If the agent's current step has an edge back to a previously visited node (and that node is not in a forward path from current), it's flagged as a backtrack regression.
|
||||
@@ -0,0 +1,88 @@
|
||||
# Session Lifecycle Commands — Spec
|
||||
|
||||
## Overview
|
||||
|
||||
Four agent-invocable commands that manage audit session lifecycle. Each command is a skill markdown file loaded by the agent on invocation.
|
||||
|
||||
## /start
|
||||
|
||||
```
|
||||
/start "task description"
|
||||
```
|
||||
|
||||
Creates a named audit session:
|
||||
|
||||
1. Generate `session_id = adhoc_YYYYMMDD_HHMM`
|
||||
2. `mkdir -p .boo/runs/{session_id}`
|
||||
3. Write `session.json`:
|
||||
```json
|
||||
{
|
||||
"session_id": "adhoc_20260320_1400",
|
||||
"task": "task description",
|
||||
"start_time": "2026-03-20T14:00:00Z",
|
||||
"status": "in_progress",
|
||||
"expected_record_types": ["data", "change", "conversation"]
|
||||
}
|
||||
```
|
||||
4. Write `.boo/runs/.current_session` containing session_id (handshake for hooks)
|
||||
5. Run context recovery:
|
||||
- L0: read `index.json` → last 5 entries
|
||||
- L2: scan recent audit_trail.jsonl for `user_correction` records
|
||||
6. Output recovery summary: recent activity, corrections, priorities
|
||||
7. Check for unfinished sessions: scan for `status: "in_progress"` sessions, prompt user
|
||||
|
||||
## /end
|
||||
|
||||
```
|
||||
/end
|
||||
```
|
||||
|
||||
Ends the current audit session:
|
||||
|
||||
1. Read `.current_session` → get session_id
|
||||
2. Collect remaining buffer data from `audit_buffer.jsonl` + `audit_pending.jsonl`
|
||||
3. Append to `audit_trail.jsonl`
|
||||
4. Clear buffer files
|
||||
5. Extract `user_correction` records from audit_trail
|
||||
6. Run integrity checks:
|
||||
- Has records? (>0 audit_trail lines)
|
||||
- All files covered? (changes in audit_trail match modified files)
|
||||
- Corrections persisted? (persisted_to is non-empty)
|
||||
7. Generate `session_summary.md`
|
||||
8. Update `session.json` status=completed, end_time
|
||||
9. Clear `.current_session`
|
||||
|
||||
## /recover
|
||||
|
||||
```
|
||||
/recover # L0+L1+L2
|
||||
/recover full # L3 (full audit_trail)
|
||||
/recover {session_id} # load specific session
|
||||
```
|
||||
|
||||
Graded context loading:
|
||||
|
||||
- L0 (~200t): index.json → last 5 entries (id, task, status)
|
||||
- L1 (~500t): .current_session + session.json + last 3 audit_trail entries
|
||||
- L2 (~1000t): scan all audit_trails for user_correction records + conclusions + daily report §4+§6
|
||||
- L3 (~3000t): full audit_trail.jsonl + audit_pending.jsonl
|
||||
|
||||
## /report-daily
|
||||
|
||||
```
|
||||
/report-daily # today
|
||||
/report-daily 20260319 # specific date
|
||||
/report-daily review # + morning self-review
|
||||
```
|
||||
|
||||
7-section report:
|
||||
|
||||
1. Task overview (from index.json)
|
||||
2. Operation stats (tool counts)
|
||||
3. Change records (file modifications)
|
||||
4. User feedback & corrections
|
||||
5. Anomaly alerts
|
||||
6. Backlog tracking
|
||||
7. Integrity summary
|
||||
|
||||
`review` variant: adds morning self-review with trend analysis and recommended priorities.
|
||||
@@ -0,0 +1,42 @@
|
||||
# User Correction Tracking — Spec
|
||||
|
||||
## Record Schema
|
||||
|
||||
```typescript
|
||||
interface UserCorrectionRecord {
|
||||
record_type: "conversation";
|
||||
action_type: "user_correction";
|
||||
priority: "critical_for_recovery";
|
||||
timestamp: string; // ISO 8601
|
||||
original_claim: string; // what the agent said that was wrong
|
||||
correction: string; // what the user corrected it to
|
||||
principle_extracted: string; // general principle derived from this correction
|
||||
persisted_to: string[]; // files where this correction was documented
|
||||
}
|
||||
```
|
||||
|
||||
## Storage
|
||||
|
||||
User correction records are stored inline in `audit_trail.jsonl` as regular entries. They are extracted during `/end` and surfaced during `/recover` L2 loading.
|
||||
|
||||
## Detection
|
||||
|
||||
During `/end`, scan the session's `audit_trail.jsonl` for entries matching:
|
||||
- `action_type === "user_correction"`
|
||||
|
||||
Also scan `audit_pending.jsonl` for any pending correction records not yet flushed.
|
||||
|
||||
## persisted_to Field
|
||||
|
||||
When a correction is written to CLAUDE.md, coding standards, or other documentation, the file paths are recorded in `persisted_to[]`. This is populated manually by the agent when it persists the correction.
|
||||
|
||||
## Correction-as-Precedent
|
||||
|
||||
When an agent considers an action that contradicts a known `user_correction` record, it is flagged with a warning. The agent should:
|
||||
|
||||
1. Identify the contradiction (which rule is being violated)
|
||||
2. Surface the relevant correction record (with timestamp and original context)
|
||||
3. Propose an alternative that respects the correction
|
||||
4. If the contradiction is intentional, document why as a new correction
|
||||
|
||||
Detection logic: before each significant action, the agent scans loaded user_correction records from the current recovery context and checks if the proposed action matches any known `original_claim` pattern.
|
||||
@@ -0,0 +1,39 @@
|
||||
# port-audit-parlant-patterns — Implementation Complete
|
||||
|
||||
## boocontext (TypeScript) — src/audit/
|
||||
- [x] 1. Data Dir: `dotDir()`, `findRunsDir()`, `ensureRunsDir()` with .gitignore + AUDIT_DOT_DIR
|
||||
- [x] 2. Core Types: `RecordEntry`, `CompactRecord`, `Manifest`, `UserCorrectionRecord`, `SessionJson`, `SessionSummary`
|
||||
- [x] 3. Hash Utilities: `hashFile()`, `hashBytes()`, `hashDir()` via Node crypto SHA256
|
||||
- [x] 4. Anomaly: `AlertRule`, `Anomaly`, `checkAnomalies()` with default rules
|
||||
- [x] 5. AuditContext: `createBatchContext()` -> `record()` -> `recordCompact()` -> `finalize()` -> `save()` (writes manifest, trail, compact, anomalies, checksums, index)
|
||||
- [x] 6. AmbientContext: `AsyncLocalStorage` wrapper — `runWithAmbient()`, `getAmbientSession()`, `requireAmbientSession()`
|
||||
- [x] 7. Guideline Model: `GuidelineContent`, `Guideline`, `GuidelineStore`, `InMemoryGuidelineStore` with CRUD + tag/label filters
|
||||
- [x] 8. Guideline Matching: `MatchingContext`, `MatchingBatch` (Observational, Actionable, PreviouslyApplied, Disambiguation, ResponseAnalysis, LowCriticality), `GenericGuidelineMatchingStrategy`, retry policy
|
||||
- [x] 9. ARQ Generation: `SchematicGenerator`, typed output schemas per batch, `GenerationInfo` tracking, `createExecutionPlan()` with batch-parallel
|
||||
- [x] 10. Relationship Model: `RelationshipKind` (DEPENDS_ON, PRIORITIZES, ENTAILS, TAG_ALL, TAG_PRIORITIZES), `FileRelationshipStore`
|
||||
- [x] 11. Relational Resolver: 4-step iteration loop (deps -> prioritization -> priority -> entailment), `MAX_ITERATIONS=100`, `ResolutionKind` output
|
||||
- [x] 12. Graded Recovery: `recoverL0()`–`recoverL4()`, `scanUserCorrections()`, `formatRecoveryReport()` with source attribution
|
||||
- [x] 13. User Corrections: `detectCorrections()`, `addPersistedTarget()`, `findRelatedCorrections()`, `checkContradiction()`
|
||||
- [x] 14. Index: `readIndex()`, `writeIndex()` with atomic `.tmp` + `renameSync`
|
||||
- [x] 15. MCP Tools: `boocontext_audit_index` + `boocontext_audit_recover` registered in mcp-server.ts
|
||||
- [x] 16. Typecheck: `npx tsc --noEmit` passes clean
|
||||
|
||||
## codecontext (Go) — internal/audit/ + internal/mcp/
|
||||
- [x] 1. Record Types: `RecordEntry`, `CompactRecord`, `RecordStep`/`RecordAction` enums (pre-existing)
|
||||
- [x] 2. Index: `UpdateIndexEntry()` with idempotent upsert, `IndexEntry` schema, atomic `.tmp` + `os.Rename()` (pre-existing)
|
||||
- [x] 3. Hashchain: `HashFile()`, `HashBytes()`, `HashDir()`, `VerifyHashchain()` with `HashchainVerificationError` (pre-existing)
|
||||
- [x] 4. Directory: `DotDir()`, `RunsDir()`, `EnsureRunsDir()` with .gitignore + `AUDIT_DOT_DIR` (pre-existing)
|
||||
- [x] 5. Anomaly: `AlertRule`, `Anomaly`, `Manifest` types + `CheckAnomalies()` with condition evaluation (pre-existing stub, now evaluates total_records/error_rate/hash conditions)
|
||||
- [x] 6. GenerateChecksums: per-file SHA256 manifest (pre-existing)
|
||||
- [x] 7. Session Lifecycle: `SessionLifecycleManager` with `StartSession(task)`, `EndSession()`, `CurrentSession()` — creates adhoc session, writes .current_session, updates index
|
||||
- [x] 8. Trail Management: `TrailManager` with `AppendToBuffer()`, `PendingAppend()`, `AppendToTrail()`, `ReadTrail()`, `FlushBuffer()` — auto-generates session if none active
|
||||
- [x] 9. MCP Audit Tools: `codecontext_audit_start`, `codecontext_audit_end`, `codecontext_audit_status` in `internal/mcp/audit_tools.go`
|
||||
- [x] 10. MCP Middleware Hooks: `recordAuditBuffer()` in server struct, buffer after tool calls, flush on "ready"
|
||||
- [x] 11. Build: `go build ./...` passes clean
|
||||
|
||||
## boocode (Node.js) — apps/coder/src/services/
|
||||
- [x] 1. Session Service (`audit-session.ts`): `startSession()` with L0+L2 recovery, `endSession()` with integrity checks + session_summary.md, `recoverSession()` L0-L3 graded loading, `generateDailyReport()` 7-section report
|
||||
- [x] 2. Correction Service (`correction-service.ts`): `recordCorrection()`, `scanForCorrections()`, `checkContradiction()`, `markPersisted()` — JSON store at `.boo/corrections/`
|
||||
- [x] 3. Guideline Service (`guideline-service.ts`): `createGuideline()`, `listGuidelines()` with tag/label filters, version migration chain (v0.1.0->v0.11.0), `projectJourneyToGuidelines()` DFS, `checkBacktrack()` — JSON store at `.boo/guidelines/`
|
||||
- [x] 4. Skill commands: `command-start/SKILL.md`, `command-end/SKILL.md`, `command-recover/SKILL.md`, `command-report-daily/SKILL.md`
|
||||
- [x] 5. Typecheck: `pnpm -C apps/coder typecheck` passes clean
|
||||
Reference in New Issue
Block a user