## Why Per-agent llama-server tuning today is limited to the sampler fields that flow through `providerOptions.openaiCompatible` in the request body (top_k, min_p, dry_*, etc.). Flags that affect server startup configuration -- KV cache quantization (`--cache-type-k`), context size (`-c`), flash attention (`--flash-attn`), GPU layer count (`-ngl`) -- cannot be overridden per-agent without spawning a separate sidecar process with different BASE_ARGS. The llama-sidecar already parses an `X-Agent-Flags: --top-k 20 --cache-type-k q8_0` header and applies those flags when routing to a sidecar process. BooCode just needs to emit this header from agent config. ## What Changes - Add a `llama_flags` field to the Agent type (raw llama CLI args string) - Parse `llama_flags` from AGENTS.md frontmatter - Build and emit `X-Agent-Flags` header on inference requests routed to the sidecar - The sidecar handles deny/shadow flag validation sidecar-side ## Scope apps/server only. The sidecar (`/opt/forks/llama-sidecar`) already supports `X-Agent-Flags` -- no out-of-repo changes needed. ## Non-goals - No new typed fields for individual llama-server flags (use `llama_flags` for raw args) - No changes to the sampler body path (top_k, min_p, etc. continue via providerOptions.openaiCompatible) - No changes to compaction or task-model direct-fetch paths (they don't need per-agent flags)