feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
128 lines
5.3 KiB
Markdown
128 lines
5.3 KiB
Markdown
## Overview
|
|
|
|
Add a `llama_flags` string field to the Agent type. On each inference request, if the agent has `llama_flags` set, emit an `X-Agent-Flags` HTTP header with the raw CLI args. The llama-sidecar parses this header and applies the flags when routing to a sidecar process.
|
|
|
|
## Header injection point
|
|
|
|
AI SDK v6 `streamText()` accepts a `headers` option (`Record<string, string | undefined>`) via `CallSettings`. The `@ai-sdk/openai-compatible` provider merges these with static headers via `combineHeaders()` at request time. This is the cleanest injection point -- no modification to the cached provider or fetch wrapper needed.
|
|
|
|
File: `apps/server/src/services/inference/stream-phase-adapter.ts`
|
|
|
|
```typescript
|
|
// In streamCompletion(), add headers to the streamText() call:
|
|
const agentFlagsHeader = buildAgentFlagsHeader(agent);
|
|
const result = streamText({
|
|
model: upstreamModel(ctx.config, model, agent ?? null, 'boochat'),
|
|
messages: aiMessages,
|
|
// ...existing options...
|
|
headers: agentFlagsHeader
|
|
? { 'X-Agent-Flags': agentFlagsHeader }
|
|
: undefined,
|
|
});
|
|
```
|
|
|
|
## Builder function
|
|
|
|
New pure helper `buildAgentFlagsHeader(agent: Agent | null): string | undefined` in `stream-phase-adapter.ts`:
|
|
|
|
```typescript
|
|
export function buildAgentFlagsHeader(agent: Agent | null): string | undefined {
|
|
if (!agent?.llama_flags) return undefined;
|
|
const trimmed = agent.llama_flags.trim();
|
|
return trimmed.length > 0 ? trimmed : undefined;
|
|
}
|
|
```
|
|
|
|
The function is trivial because the sidecar does all validation (denylist, shadow flags). BooCode just passes the raw string through.
|
|
|
|
## Agent type change
|
|
|
|
File: `apps/server/src/types/api.ts`
|
|
|
|
Add to the `Agent` interface:
|
|
|
|
```typescript
|
|
llama_flags: string | null; // raw llama CLI args sent as X-Agent-Flags header
|
|
```
|
|
|
|
`null` means no header emitted (default).
|
|
|
|
## Frontmatter parsing (V1 fix)
|
|
|
|
File: `apps/server/src/services/agents.ts`
|
|
|
|
The `parseFrontmatter()` function has an explicit if/else-if chain for known keys. Unknown keys are silently ignored (line 258: `// Unknown keys silently ignored`). An explicit branch MUST be added:
|
|
|
|
```typescript
|
|
} else if (key === 'llama_flags') {
|
|
data.llama_flags = stripQuotes(valueRaw);
|
|
}
|
|
```
|
|
|
|
Add to `ParsedFrontmatter`:
|
|
|
|
```typescript
|
|
llama_flags?: string;
|
|
```
|
|
|
|
## Agent return-object wiring (V2 fix)
|
|
|
|
File: `apps/server/src/services/agents.ts`
|
|
|
|
`parseAgentSection()` explicitly constructs every field of the returned agent object. An explicit line must be added:
|
|
|
|
```typescript
|
|
llama_flags: typeof fm.llama_flags === 'string' ? fm.llama_flags : null,
|
|
```
|
|
|
|
## Sentinel summaries (V3 fix)
|
|
|
|
File: `apps/server/src/services/inference/sentinel-summaries.ts`
|
|
|
|
`runWrapUpSummary()` calls `streamCompletion()` at lines 96-113 but omits the 8th `agent` parameter. Two options:
|
|
|
|
**Option A (recommended):** Add `agent` to the call so sentinel summaries also get agent flags. This is consistent -- the summary uses the same model as the conversation.
|
|
|
|
**Option B:** Document that sentinel summaries intentionally don't use agent flags (e.g., "summaries use FAST_MODEL, a separate slot"). This requires verifying that compaction/summaries actually use FAST_MODEL.
|
|
|
|
The plan recommends Option A for consistency. Add `, agent` after `signal` in the `streamCompletion` call.
|
|
|
|
## Provider scope (JD-003 note)
|
|
|
|
The `streamText({ headers })` approach sends the header to ALL providers (DeepSeek, gateway, llama-swap). This is acceptable because:
|
|
- DeepSeek API ignores unknown headers (standard HTTP behavior)
|
|
- The gateway re-forwards headers to the chosen backend
|
|
- Only the sidecar parses `X-Agent-Flags`
|
|
|
|
If this becomes an issue, provider-aware filtering can be added later by checking `isDeepSeekModel(model)` before emitting the header.
|
|
|
|
## Why not extend the fetch wrapper
|
|
|
|
The existing `getSwapProvider()` fetch wrapper (`provider.ts:23-33`) is cached per baseURL. Agent flags are per-agent, not per-provider. Extending the wrapper would either:
|
|
- Create N cached providers per baseURL (one per unique flags combination) -- wasteful
|
|
- Use a mutable closure variable -- not thread-safe
|
|
|
|
The `streamText({ headers })` approach is the AI-SDK's intended per-request header mechanism and avoids both problems.
|
|
|
|
## Why not forward existing sampler fields as X-Agent-Fields
|
|
|
|
The existing sampler fields (top_k, min_p, etc.) already flow through `providerOptions.openaiCompatible` in the request body. The llama-server processes these dynamically. X-Agent-Flags are for startup args that can't be changed per-request (context size, cache quantization, GPU layers). Forwarding sampler fields as X-Agent-Flags would be redundant and create process-spawn overhead for no benefit.
|
|
|
|
## Compaction scope
|
|
|
|
Compaction (`compaction.ts`) uses `resolveModelEndpoint()` for direct `fetch()` calls and does not go through `streamCompletion()`. It does not need agent flags because:
|
|
1. Compaction uses `FAST_MODEL` (a cheaper model per CLAUDE.md), which is a separate model slot with its own startup flags
|
|
2. Compaction is a background maintenance task, not a user-facing agent interaction
|
|
|
|
## Data flow
|
|
|
|
```
|
|
Agent.llama_flags (from AGENTS.md)
|
|
-> buildAgentFlagsHeader(agent)
|
|
-> streamText({ headers: { 'X-Agent-Flags': '...' } })
|
|
-> @ai-sdk/openai-compatible combineHeaders()
|
|
-> fetch() request to llama-swap/sidecar
|
|
-> sidecar parseFlags() + ValidateExtraArgs()
|
|
-> sidecar routes to process with matching (model, flags) hash
|
|
```
|