feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
23 lines
1.4 KiB
Markdown
23 lines
1.4 KiB
Markdown
## Why
|
|
|
|
Per-agent llama-server tuning today is limited to the sampler fields that flow through `providerOptions.openaiCompatible` in the request body (top_k, min_p, dry_*, etc.). Flags that affect server startup configuration -- KV cache quantization (`--cache-type-k`), context size (`-c`), flash attention (`--flash-attn`), GPU layer count (`-ngl`) -- cannot be overridden per-agent without spawning a separate sidecar process with different BASE_ARGS.
|
|
|
|
The llama-sidecar already parses an `X-Agent-Flags: --top-k 20 --cache-type-k q8_0` header and applies those flags when routing to a sidecar process. BooCode just needs to emit this header from agent config.
|
|
|
|
## What Changes
|
|
|
|
- Add a `llama_flags` field to the Agent type (raw llama CLI args string)
|
|
- Parse `llama_flags` from AGENTS.md frontmatter
|
|
- Build and emit `X-Agent-Flags` header on inference requests routed to the sidecar
|
|
- The sidecar handles deny/shadow flag validation sidecar-side
|
|
|
|
## Scope
|
|
|
|
apps/server only. The sidecar (`/opt/forks/llama-sidecar`) already supports `X-Agent-Flags` -- no out-of-repo changes needed.
|
|
|
|
## Non-goals
|
|
|
|
- No new typed fields for individual llama-server flags (use `llama_flags` for raw args)
|
|
- No changes to the sampler body path (top_k, min_p, etc. continue via providerOptions.openaiCompatible)
|
|
- No changes to compaction or task-model direct-fetch paths (they don't need per-agent flags)
|