boocode/openspec/changes/x-agent-flags/proposal.md

## Why

Per-agent llama-server tuning today is limited to the sampler fields that flow through `providerOptions.openaiCompatible` in the request body (top_k, min_p, dry_*, etc.). Flags that affect server startup configuration -- KV cache quantization (`--cache-type-k`), context size (`-c`), flash attention (`--flash-attn`), GPU layer count (`-ngl`) -- cannot be overridden per-agent without spawning a separate sidecar process with different BASE_ARGS.

The llama-sidecar already parses an `X-Agent-Flags: --top-k 20 --cache-type-k q8_0` header and applies those flags when routing to a sidecar process. BooCode just needs to emit this header from agent config.

## What Changes

- Add a `llama_flags` field to the Agent type (raw llama CLI args string)
- Parse `llama_flags` from AGENTS.md frontmatter
- Build and emit `X-Agent-Flags` header on inference requests routed to the sidecar
- The sidecar handles deny/shadow flag validation sidecar-side

## Scope

apps/server only. The sidecar (`/opt/forks/llama-sidecar`) already supports `X-Agent-Flags` -- no out-of-repo changes needed.

## Non-goals

- No new typed fields for individual llama-server flags (use `llama_flags` for raw args)
- No changes to the sampler body path (top_k, min_p, etc. continue via providerOptions.openaiCompatible)
- No changes to compaction or task-model direct-fetch paths (they don't need per-agent flags)