Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
1.7 KiB
llama-cache-and-spec — tasks
Files to change
Three files across two repos:
/opt/forks/llama-sidecar/internal/config/config.go/opt/boocode/apps/server/src/services/inference/llama-args-validator.ts/opt/forks/llama-sidecar/internal/validator/validator.go
Tasks
-
1. Update sidecar default base args
/opt/forks/llama-sidecar/internal/config/config.goedited.defaultBaseArgs()now includes:--cache-type-k q4_0— KV cache quant → ~4× VRAM savings--cache-reuse 256— KV cache reuse across turns → prompt caching--slot-save-path /tmp/llama-slots— disk-persistent KV cache--cache-idle-slots— auto-save idle slots to disk--spec-type ngram-mod --spec-ngram-mod-thsh 2— spec decoding → 2× tok/s--ctx-checkpoints 32— context overflow protection--sleep-idle-seconds 600— GPU memory reclaim when idle--metrics— Prometheus/metricsendpoint Build verified:go build ./...exits 0. -
2. No change needed — shadow lists are correct
The shadow lists in
llama-args-validator.tsalready prevent agents from overriding cache/spec/template flags. Adding the flags todefaultBaseArgs+ keeping the shadow lists is the correct architecture: flags are enabled by default, agents can't override them. -
3. No change needed — same reasoning as task 2
The sidecar
validator.goshadow lists serve the same purpose. Both code paths are consistent. -
4. Deploy + verify
- Rebuild sidecar binary:
go build -o ... ./...→ ✅ done - Restart docker compose: needs manual deploy
- Verify
/metricsendpoint returns data - Verify
nvidia-smishows reduced VRAM (expected: ~4× savings on KV cache)
- Rebuild sidecar binary: