# llama-sidecar Per-agent llama-server process pool daemon. Runs on sam-desktop alongside llama-swap. Spawns or reuses llama-server processes keyed on (modelID, flags) hash. ## License AGPL-3.0-only. The validator package (`internal/validator/`) is ported from [Unsloth Studio](https://github.com/unslothai/unsloth/blob/main/studio/backend/core/inference/llama_server_args.py) (AGPL-3.0). BooCode's TypeScript port (`apps/server/src/services/inference/llama-args-validator.ts`) is the sibling — update both when upstream changes. ## Build ```bash # Linux (development) make build # Windows AMD64 (production target — cross-compile from Linux) make build-windows # Copy to sam-desktop # scp bin/llama-sidecar.exe sam-desktop:C:\llama-sidecar\ ``` ## Configuration All via environment variables (no CLI flags): | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `LLAMA_SERVER_BIN` | yes | — | Path to llama-server.exe | | `MODEL_DIR_MAP_FILE` | yes | — | JSON file mapping model IDs to GGUF paths | | `LLAMA_SIDECAR_BIND` | no | `127.0.0.1:8402` | Listen address | | `PORT_RANGE` | no | `8500-8599` | Port range for sidecar processes | | `MAX_SIDECARS` | no | `2` | Max concurrent sidecar processes | | `LOG_LEVEL` | no | `info` | Log level (debug, info, warn, error) | | `BASE_ARGS` | no | `["-ngl","999","-c","32768","--flash-attn","on","--no-mmap"]` | JSON array of base llama-server args | | `HEALTH_TIMEOUT_SECONDS` | no | `60` | Max wait for sidecar health check | | `HEALTH_INTERVAL_SECONDS` | no | `30` | Background health check interval | ## API ### `GET /health` Returns daemon status. ### `GET /sidecars` Returns list of active sidecar processes. ### `DELETE /sidecars/{hash}` Kill and remove a sidecar process. ### `POST /v1/chat/completions` OpenAI-compatible proxy. Routes to a sidecar process based on model + flags. Headers: - `X-Agent-Flags: --top-k 20 --cache-type-k q8_0` (optional) - `X-Model-Id: qwen3.6-35b-a3b-mxfp4` (optional, overrides body.model) ## Test ```bash make test # unit tests make test-integration # requires real llama-server + GGUF make lint # vet + gofmt ``` ## NSSM Service Pre-configured on sam-desktop as `llama-sidecar`. Start/stop via: ``` C:\Tools\nssm\nssm.exe start llama-sidecar C:\Tools\nssm\nssm.exe stop llama-sidecar C:\Tools\nssm\nssm.exe status llama-sidecar ```