Files
boocode/openspec/changes/x-agent-flags/proposal.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

1.4 KiB

Why

Per-agent llama-server tuning today is limited to the sampler fields that flow through providerOptions.openaiCompatible in the request body (top_k, min_p, dry_*, etc.). Flags that affect server startup configuration -- KV cache quantization (--cache-type-k), context size (-c), flash attention (--flash-attn), GPU layer count (-ngl) -- cannot be overridden per-agent without spawning a separate sidecar process with different BASE_ARGS.

The llama-sidecar already parses an X-Agent-Flags: --top-k 20 --cache-type-k q8_0 header and applies those flags when routing to a sidecar process. BooCode just needs to emit this header from agent config.

What Changes

  • Add a llama_flags field to the Agent type (raw llama CLI args string)
  • Parse llama_flags from AGENTS.md frontmatter
  • Build and emit X-Agent-Flags header on inference requests routed to the sidecar
  • The sidecar handles deny/shadow flag validation sidecar-side

Scope

apps/server only. The sidecar (/opt/forks/llama-sidecar) already supports X-Agent-Flags -- no out-of-repo changes needed.

Non-goals

  • No new typed fields for individual llama-server flags (use llama_flags for raw args)
  • No changes to the sampler body path (top_k, min_p, etc. continue via providerOptions.openaiCompatible)
  • No changes to compaction or task-model direct-fetch paths (they don't need per-agent flags)