Files
boocode/openspec/changes/x-agent-flags/design.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

5.3 KiB

Overview

Add a llama_flags string field to the Agent type. On each inference request, if the agent has llama_flags set, emit an X-Agent-Flags HTTP header with the raw CLI args. The llama-sidecar parses this header and applies the flags when routing to a sidecar process.

Header injection point

AI SDK v6 streamText() accepts a headers option (Record<string, string | undefined>) via CallSettings. The @ai-sdk/openai-compatible provider merges these with static headers via combineHeaders() at request time. This is the cleanest injection point -- no modification to the cached provider or fetch wrapper needed.

File: apps/server/src/services/inference/stream-phase-adapter.ts

// In streamCompletion(), add headers to the streamText() call:
const agentFlagsHeader = buildAgentFlagsHeader(agent);
const result = streamText({
  model: upstreamModel(ctx.config, model, agent ?? null, 'boochat'),
  messages: aiMessages,
  // ...existing options...
  headers: agentFlagsHeader
    ? { 'X-Agent-Flags': agentFlagsHeader }
    : undefined,
});

Builder function

New pure helper buildAgentFlagsHeader(agent: Agent | null): string | undefined in stream-phase-adapter.ts:

export function buildAgentFlagsHeader(agent: Agent | null): string | undefined {
  if (!agent?.llama_flags) return undefined;
  const trimmed = agent.llama_flags.trim();
  return trimmed.length > 0 ? trimmed : undefined;
}

The function is trivial because the sidecar does all validation (denylist, shadow flags). BooCode just passes the raw string through.

Agent type change

File: apps/server/src/types/api.ts

Add to the Agent interface:

llama_flags: string | null;  // raw llama CLI args sent as X-Agent-Flags header

null means no header emitted (default).

Frontmatter parsing (V1 fix)

File: apps/server/src/services/agents.ts

The parseFrontmatter() function has an explicit if/else-if chain for known keys. Unknown keys are silently ignored (line 258: // Unknown keys silently ignored). An explicit branch MUST be added:

} else if (key === 'llama_flags') {
  data.llama_flags = stripQuotes(valueRaw);
}

Add to ParsedFrontmatter:

llama_flags?: string;

Agent return-object wiring (V2 fix)

File: apps/server/src/services/agents.ts

parseAgentSection() explicitly constructs every field of the returned agent object. An explicit line must be added:

llama_flags: typeof fm.llama_flags === 'string' ? fm.llama_flags : null,

Sentinel summaries (V3 fix)

File: apps/server/src/services/inference/sentinel-summaries.ts

runWrapUpSummary() calls streamCompletion() at lines 96-113 but omits the 8th agent parameter. Two options:

Option A (recommended): Add agent to the call so sentinel summaries also get agent flags. This is consistent -- the summary uses the same model as the conversation.

Option B: Document that sentinel summaries intentionally don't use agent flags (e.g., "summaries use FAST_MODEL, a separate slot"). This requires verifying that compaction/summaries actually use FAST_MODEL.

The plan recommends Option A for consistency. Add , agent after signal in the streamCompletion call.

Provider scope (JD-003 note)

The streamText({ headers }) approach sends the header to ALL providers (DeepSeek, gateway, llama-swap). This is acceptable because:

  • DeepSeek API ignores unknown headers (standard HTTP behavior)
  • The gateway re-forwards headers to the chosen backend
  • Only the sidecar parses X-Agent-Flags

If this becomes an issue, provider-aware filtering can be added later by checking isDeepSeekModel(model) before emitting the header.

Why not extend the fetch wrapper

The existing getSwapProvider() fetch wrapper (provider.ts:23-33) is cached per baseURL. Agent flags are per-agent, not per-provider. Extending the wrapper would either:

  • Create N cached providers per baseURL (one per unique flags combination) -- wasteful
  • Use a mutable closure variable -- not thread-safe

The streamText({ headers }) approach is the AI-SDK's intended per-request header mechanism and avoids both problems.

Why not forward existing sampler fields as X-Agent-Fields

The existing sampler fields (top_k, min_p, etc.) already flow through providerOptions.openaiCompatible in the request body. The llama-server processes these dynamically. X-Agent-Flags are for startup args that can't be changed per-request (context size, cache quantization, GPU layers). Forwarding sampler fields as X-Agent-Flags would be redundant and create process-spawn overhead for no benefit.

Compaction scope

Compaction (compaction.ts) uses resolveModelEndpoint() for direct fetch() calls and does not go through streamCompletion(). It does not need agent flags because:

  1. Compaction uses FAST_MODEL (a cheaper model per CLAUDE.md), which is a separate model slot with its own startup flags
  2. Compaction is a background maintenance task, not a user-facing agent interaction

Data flow

Agent.llama_flags (from AGENTS.md)
  -> buildAgentFlagsHeader(agent)
  -> streamText({ headers: { 'X-Agent-Flags': '...' } })
  -> @ai-sdk/openai-compatible combineHeaders()
  -> fetch() request to llama-swap/sidecar
  -> sidecar parseFlags() + ValidateExtraArgs()
  -> sidecar routes to process with matching (model, flags) hash