Go daemon (cmd/llama-sidecar): per-agent llama-server process pool with LRU eviction, OpenAI-compatible proxy, flag validation (Unsloth port), deterministic hash-keyed sidecar reuse. Windows service support via schtasks/NSSM with DETACHED_PROCESS, stdout pipe drain, and request-ctx decoupled child lifetime. Bug fixes (3b.1–3b5): -c flag drop from StripShadowingFlags, UTF-8 BOM in JSON config, -fa → --flash-attn on default, child process exit after one request (stdin devnull, stdout pipe, CREATE_NO_WINDOW → DETACHED, context.Background for child lifetime, background reaper goroutine). bench/: MTP on/off throughput sweep across 8 GGUFs via SSH+schtasks automation to sam-desktop. Per-GGUF production flags from llama-swap config with --ctx-size 32768 override. eval/: accuracy benchmarks (MMLU 100q, GSM8K 50q, HumanEval 164) + A/B model comparison (14 agent-typed prompts × 8 models). All scripts resumable at individual question level. 94 Go tests, race detector clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 lines
333 B
Bash
Executable File
21 lines
333 B
Bash
Executable File
#!/usr/bin/env bash
|
|
set -euo pipefail
|
|
|
|
EVAL_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
VENV="${EVAL_DIR}/.venv/bin/python3"
|
|
|
|
cd "$EVAL_DIR"
|
|
|
|
echo "Starting eval sweep at $(date)"
|
|
echo "Using venv: ${VENV}"
|
|
echo ""
|
|
|
|
$VENV run_all.py 2>&1 | tee eval.log
|
|
|
|
echo ""
|
|
echo "Generating summary..."
|
|
$VENV analyze.py
|
|
|
|
echo ""
|
|
echo "Done at $(date)"
|