llama-sidecar v0.1.0: daemon + benchmarks + eval suite

Go daemon (cmd/llama-sidecar): per-agent llama-server process pool with LRU eviction, OpenAI-compatible proxy, flag validation (Unsloth port), deterministic hash-keyed sidecar reuse. Windows service support via schtasks/NSSM with DETACHED_PROCESS, stdout pipe drain, and request-ctx decoupled child lifetime. Bug fixes (3b.1–3b5): -c flag drop from StripShadowingFlags, UTF-8 BOM in JSON config, -fa → --flash-attn on default, child process exit after one request (stdin devnull, stdout pipe, CREATE_NO_WINDOW → DETACHED, context.Background for child lifetime, background reaper goroutine). bench/: MTP on/off throughput sweep across 8 GGUFs via SSH+schtasks automation to sam-desktop. Per-GGUF production flags from llama-swap config with --ctx-size 32768 override. eval/: accuracy benchmarks (MMLU 100q, GSM8K 50q, HumanEval 164) + A/B model comparison (14 agent-typed prompts × 8 models). All scripts resumable at individual question level. 94 Go tests, race detector clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-28 01:55:13 +00:00
parent babbb4f39b
commit fe7f36ae98
39 changed files with 4228 additions and 0 deletions
--- a/internal/pool/ports.go
+++ b/internal/pool/ports.go
@@ -0,0 +1,28 @@
+package pool
+
+import "fmt"
+
+type PortAllocator struct {
+	ports chan int
+}
+
+func NewPortAllocator(lo, hi int) *PortAllocator {
+	ch := make(chan int, hi-lo+1)
+	for p := lo; p <= hi; p++ {
+		ch <- p
+	}
+	return &PortAllocator{ports: ch}
+}
+
+func (pa *PortAllocator) Allocate() (int, error) {
+	select {
+	case p := <-pa.ports:
+		return p, nil
+	default:
+		return 0, fmt.Errorf("port allocator exhausted")
+	}
+}
+
+func (pa *PortAllocator) Release(port int) {
+	pa.ports <- port
+}