• v0.1.0 fe7f36ae98

    llama-sidecar v0.1.0: daemon + benchmarks + eval suite

    indifferentketchup released this 2026-05-28 01:55:13 +00:00

    Go daemon (cmd/llama-sidecar): per-agent llama-server process pool with
    LRU eviction, OpenAI-compatible proxy, flag validation (Unsloth port),
    deterministic hash-keyed sidecar reuse. Windows service support via
    schtasks/NSSM with DETACHED_PROCESS, stdout pipe drain, and request-ctx
    decoupled child lifetime.

    Bug fixes (3b.1–3b5): -c flag drop from StripShadowingFlags, UTF-8 BOM
    in JSON config, -fa → --flash-attn on default, child process exit after
    one request (stdin devnull, stdout pipe, CREATE_NO_WINDOW → DETACHED,
    context.Background for child lifetime, background reaper goroutine).

    bench/: MTP on/off throughput sweep across 8 GGUFs via SSH+schtasks
    automation to sam-desktop. Per-GGUF production flags from llama-swap
    config with --ctx-size 32768 override.

    eval/: accuracy benchmarks (MMLU 100q, GSM8K 50q, HumanEval 164) +
    A/B model comparison (14 agent-typed prompts × 8 models). All scripts
    resumable at individual question level.

    94 Go tests, race detector clean.

    Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

    Downloads