llama-sidecar/eval/run_all.sh at fe7f36ae98c6c04c1bdf761146f41a58e2391d4b - llama-sidecar - Gitgaard

indifferentketchup/llama-sidecar

Files

indifferentketchup fe7f36ae98 llama-sidecar v0.1.0: daemon + benchmarks + eval suite

Go daemon (cmd/llama-sidecar): per-agent llama-server process pool with
LRU eviction, OpenAI-compatible proxy, flag validation (Unsloth port),
deterministic hash-keyed sidecar reuse. Windows service support via
schtasks/NSSM with DETACHED_PROCESS, stdout pipe drain, and request-ctx
decoupled child lifetime.

Bug fixes (3b.1–3b5): -c flag drop from StripShadowingFlags, UTF-8 BOM
in JSON config, -fa → --flash-attn on default, child process exit after
one request (stdin devnull, stdout pipe, CREATE_NO_WINDOW → DETACHED,
context.Background for child lifetime, background reaper goroutine).

bench/: MTP on/off throughput sweep across 8 GGUFs via SSH+schtasks
automation to sam-desktop. Per-GGUF production flags from llama-swap
config with --ctx-size 32768 override.

eval/: accuracy benchmarks (MMLU 100q, GSM8K 50q, HumanEval 164) +
A/B model comparison (14 agent-typed prompts × 8 models). All scripts
resumable at individual question level.

94 Go tests, race detector clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-28 01:55:13 +00:00

21 lines

333 B

Bash

Executable File

Raw Blame History

 #!/usr/bin/env bash
 set -euo pipefail
 EVAL_DIR="$(cd "$(dirname "$0")" && pwd)"
 VENV="${EVAL_DIR}/.venv/bin/python3"
 cd "$EVAL_DIR"
 echo "Starting eval sweep at $(date)"
 echo "Using venv: ${VENV}"
 echo ""
 $VENV run_all.py 2>&1 | tee eval.log
 echo ""
 echo "Generating summary..."
 $VENV analyze.py
 echo ""
 echo "Done at $(date)"