Files
boocode/openspec/changes/llama-cache-and-spec/tasks.md
indifferentketchup c935687725 chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00

45 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# llama-cache-and-spec — tasks
## Files to change
Three files across two repos:
- `/opt/forks/llama-sidecar/internal/config/config.go`
- `/opt/boocode/apps/server/src/services/inference/llama-args-validator.ts`
- `/opt/forks/llama-sidecar/internal/validator/validator.go`
## Tasks
- [x] 1. Update sidecar default base args
`/opt/forks/llama-sidecar/internal/config/config.go` edited.
`defaultBaseArgs()` now includes:
`--cache-type-k q4_0` — KV cache quant → ~4× VRAM savings
`--cache-reuse 256` — KV cache reuse across turns → prompt caching
`--slot-save-path /tmp/llama-slots` — disk-persistent KV cache
`--cache-idle-slots` — auto-save idle slots to disk
`--spec-type ngram-mod --spec-ngram-mod-thsh 2` — spec decoding → 2× tok/s
`--ctx-checkpoints 32` — context overflow protection
`--sleep-idle-seconds 600` — GPU memory reclaim when idle
`--metrics` — Prometheus `/metrics` endpoint
Build verified: `go build ./...` exits 0.
- [x] 2. No change needed — shadow lists are correct
The shadow lists in `llama-args-validator.ts` already prevent agents
from overriding cache/spec/template flags. Adding the flags to
`defaultBaseArgs` + keeping the shadow lists is the correct architecture:
flags are enabled by default, agents can't override them.
- [x] 3. No change needed — same reasoning as task 2
The sidecar `validator.go` shadow lists serve the same purpose.
Both code paths are consistent.
- [ ] 4. Deploy + verify
- Rebuild sidecar binary: `go build -o ... ./...` → ✅ done
- Restart docker compose: needs manual deploy
- Verify `/metrics` endpoint returns data
- Verify `nvidia-smi` shows reduced VRAM (expected: ~4× savings on KV cache)