chore: snapshot main sync

2026-06-17 20:08:31 +00:00
parent b18de2a331
commit 8bd32537cf
354 changed files with 10208 additions and 9230 deletions
--- a/data/skills/booskills/boo-router/SKILL.md
+++ b/data/skills/booskills/boo-router/SKILL.md
@@ -0,0 +1,91 @@
+---
+name: boo-router
+description: >
+  Resolves the single best provider string for a Paseo dispatch from the active
+  orchestration preset's candidate pool, using the deterministic model-router
+  script (grade and role fit, effective cost, quota, locality, plus the
+  never-subagent guardrail). Use when a role maps to an array of candidates and
+  you must pick ONE provider before create_agent, when the operator says "route
+  this", "which model for <role>", "pick the model", or just before fanning out
+  subagents. Do NOT use to dispatch a skill to a subagent; that is paseo-boo. Do
+  NOT use to decompose a goal into a skill pipeline; that is boo-meta.
+metadata:
+  version: "1.1"
+---
+
+# Boo-Router
+
+Resolves one provider for one dispatch. This skill is a thin protocol around the deterministic router script; it holds no model knowledge of its own. The registry (`~/.paseo/model-tiers.json`) and the active preset (`~/.paseo/orchestration-preferences.json`) are the only sources of truth; the script reads both.
+
+## Size
+
+Not sized. One deterministic call per resolution, no fan-out, no agents dispatched.
+
+## Process
+
+1. Gather the request. Required: `role` (one of `impl`, `ui`, `audit`, `research`, `planning`) and a short `task` description. Optional: `difficulty` (`simple`, `standard`, `hard`), `priority` (`cost-efficiency`, `speed`, `quality`, `balanced` - default `balanced`), `context-tokens` (approx input size), `requires` (comma-separated hard modality needs, e.g. `vision,computer-use`), `fanout` (parallel agent count), `resident-local` (the local model currently loaded in llama-swap).
+   - Invocation shorthand: `boo-router <preset> <priority>` (e.g. `boo-router workhorse cost-efficiency`) means `--preset ~/.paseo/presets/<preset>.json --priority <priority>`; role and task still come from the operator's request.
+   - Priority profiles tune the deterministic scorer (they nudge, they do not override role fit): `cost-efficiency` weights effective cost + quota heavily and leans reasoning lower; `speed` rewards the per-model speed signal (TTFT-oriented) and leans reasoning lower; `quality` rewards higher grade and leans reasoning higher; `balanced` is neutral. Legacy `--budget` (cost_sensitive/balanced/quality) still maps onto these.
+2. Run the router (deterministic, no LLM):
+   ```
+   node ~/.agents/skills/boo-router/scripts/router.mjs --role <role> --task "<task>" \
+     [--priority <p>] [--difficulty <d>] [--context-tokens <n>] [--requires <list>] \
+     [--fanout <n>] [--resident-local <id>] [--reserve <id>] [--no-ledger] [--preset <path>] --json
+   ```
+   It defaults to the active preset and registry; pass `--preset`/`--model-tiers` only to override.
+   - Load awareness: the router reconciles a shared cross-process ledger (`~/.paseo/router-load.jsonl`) so concurrent fan-out dispatches spread across providers instead of all picking the same top score. Pass `--reserve <id>` on a real dispatch to record the pick as in-flight (the dispatcher, paseo-boo, then calls `--release <id>` at closure); omit it for a preview. `--no-ledger` routes statelessly. The penalties are soft: in-flight crowding vs a per-source `concurrency_soft` cap, remaining 5h quota, and host saturation for local models. They nudge, they never eliminate a candidate.
+3. Read `result.provider` from the JSON. Pass EXACTLY that string to `create_agent`'s `provider` field.
+4. Apply `result.reasoning` `{ effort, apply }` to the dispatched model. OpenCode uses one unified option, `reasoningEffort` (verified in the opencode binary: it emits `reasoning_effort` for OpenAI/DeepSeek/MiniMax and maps to an Anthropic thinking budget):
+   - `effort` is a concrete value (`high`, `max`, `medium`, `none`, etc.) -> set `options.reasoningEffort = <effort>` on the model (OpenCode per-model config options, or the model option at create_agent).
+   - `effort: "auto"` -> set nothing; leave the model default. OpenCode rejects `reasoningEffort` on non-reasoning models, so never force it.
+   - DeepSeek `effort: "max"` needs a large context window (>=384K) and a generous output cap, and thinking mode ignores `temperature`.
+   - Via Paseo, the cleanest path is `create_agent settings.thinkingOptionId = <effort>` (Paseo maps it per backend); the `options.reasoningEffort` form is the standalone path.
+5. Apply `result.permissions` `{ backend, mode, settings }`. The default `mode` is `bypass` (fully unattended, per operator policy). Pass `settings` to `create_agent`: opencode -> `{ modeId: "build", features: { auto_accept: true } }`; claude/claude-ib -> `{ modeId: "bypassPermissions" }`; codex -> `{ modeId: "full-access" }`; reasonix -> `{ modeId: "yolo" }`. For the standalone CLI path use `cliBypass` (e.g. claude `--dangerously-skip-permissions`, codex `--dangerously-bypass-approvals-and-sandbox`). To downgrade from yolo, read the backend's `safe`/`readonly` entry from the registry `permissions` block instead.
+6. Keep `result.fallbacks` (the remaining survivors, in score order) for failover. If the dispatched model fails, retry the next provider in the chain ONLY for a transient error (registry `fallback.transient`: 408/409/425/429/5xx, RateLimit/Timeout/Connection/Overloaded/ContextOverflow); fail fast on permanent errors (registry `fallback.permanent`: 400/401/403/404/422, auth/validation/not-found). Re-resolve reasoning + permissions for the fallback model (its backend and supported levels differ). The router has already clamped `effort` to the model's supported levels and stepped it down under context pressure, so use the value as given.
+7. Relay `result.rationale` as the why. For the full per-candidate trace plus the reasoning and permission notes, re-run with `--explain` instead of `--json`.
+8. If the script cannot run (no `node`, file missing), use the manual fallback: read the active preset and registry yourself and apply the same order: eliminate `neverSubagent` and non-`routable` candidates, then any whose `modalities` miss a `requires` need or whose `ctx_max` is below `context-tokens`; rank survivors by `attributes.roles[role]`, then effective cost (output-weighted, `_over_256k` band when context crosses 256K), then quota, then locality; default to the first array element when nothing distinguishes them. For reasoning, read `reasoning[<model>].by_difficulty[difficulty]` (or `.default`) from the registry and set `options.reasoningEffort` the same way as step 4; skip it when the entry is `{ "effort": "auto" }`.
+
+## What NOT to do
+
+- Do not pass an array, the `scores` list, or any object to `create_agent`; pass the single `provider` string only.
+- Do not route a subscription-high model (`gpt-5`, `gpt-5.5`, `opus`, `fable`) as a subagent; the router eliminates them by the `neverSubagent` guardrail, never re-add one by hand.
+- Do not invent or remember provider strings; the active preset and registry are the only sources.
+- Do not call an LLM to judge task fit; routing is deterministic by design.
+- Do not dispatch the agent yourself (that is paseo-boo) or decompose a multi-skill goal (that is boo-meta).
+
+## Gotchas
+
+- The active preset is `~/.paseo/orchestration-preferences.json` (the file `paseo-preset` copies a named preset onto), not a file under `presets/`. Switch grade pools with `paseo-preset grade-L|grade-C|grade-B|grade-A|grade-S`.
+- Provider strings carry the provider prefix: `opencode/opencode-go/<model>` (cloud gateway), `opencode/deepseek/<model>` (DeepSeek direct API, used for `deepseek-v4-flash` and `deepseek-v4-pro`), `opencode/==qwen==/<model>` (local, llama-swap), `claude/<model>`, `codex/<model>`. The `agents` map values omit the `opencode/` prefix; the router handles both. The router keys attributes/pricing/quota by the last path segment, so the provider namespace can change without touching the registry.
+- Local models are served through the OpenCode provider's `==qwen==` namespace; only one is resident in llama-swap at a time, so pass `--resident-local <id>` to earn the no-swap bonus and avoid thrashing.
+- A role may be a pinned string (not an array) in the preset; the router returns it as-is with no scoring. That is expected, not a failure.
+- The MiniMax M3 promo is priced via the registry `effective_*` fields, not router code; if a pick looks too M3-favorable, check whether the promo ended and the fields were removed.
+- Provider priority: the registry `provider_priority` block adds a per-SOURCE bonus so equivalent models route to the preferred provider. Current order (2026-06): digitalocean (free GitHub Student credits, spend first) > reasonix > openrouter > opencode-zen (free) > local (sam-desktop) > local-edge > opencode-go (DEPRIORITIZED, usage low, last-resort fallback) > subscription. Source is classified from the provider string. The `credits-first` preset is the cross-provider default that exploits this; switch with `paseo-preset credits-first`.
+- Cross-provider pools: a role pool may list the same logical model via several providers (e.g. deepseek-v4-pro via oc-digitalocean, reasonix, oc-openrouter, opencode-go). The router picks the highest-priority source and returns the rest as the `fallbacks` chain, so a dead provider fails over to the next.
+- New providers all extend opencode in Paseo, so `oc-digitalocean`/`oc-openrouter`/`oc-sam-desktop`/`oc-embedding` resolve to the opencode permission posture (build + auto_accept); reasonix stays yolo. DigitalOcean's flash id is literally `deepseek-4-flash` (missing the v).
+<!-- standing-rules:core:start -->
+- **No commit**: never commit, push, or stage changes; never `git add -A`. Prove any edits with `git diff --stat`.
+- **No em dashes**: never use em dashes (U+2014) in output or files you write.
+<!-- standing-rules:core:end -->
+
+## Output format
+
+```
+Routed: <role> -> <provider>
+reasoningEffort: <effort> (or "auto" = left at model default)
+Permissions: <backend> <mode> -> <settings to pass to create_agent> (default mode = bypass/yolo)
+Fallbacks: <provider2> -> <provider3> (ordered failover chain, transient errors only)
+Preset: <active preset name>
+Why: <rationale line from result.rationale>
+```
+
+Every report ends with:
+## Claims I did not verify
+- <anything taken on the script's word without re-running --explain>
+
+## Failure modes
+
+- Role has no entry in the active preset: the router errors `Preset has no provider entry for role`; report it and stop.
+- All candidates eliminated: the router errors with each candidate's disqualifying reason; relay them and stop. The usual cause is a pool with no model meeting a hard modality or context need; suggest a different preset.
+- `node` missing or script absent: use the manual fallback in Process step 5; say you used it.
+- Registry or preset unparseable: report the failing path and the parse error; never guess a provider.
--- a/data/skills/booskills/boo-router/scripts/router.mjs
+++ b/data/skills/booskills/boo-router/scripts/router.mjs
@@ -0,0 +1,547 @@
+#!/usr/bin/env node
+import fs from "node:fs";
+import os from "node:os";
+import path from "node:path";
+import { execFileSync } from "node:child_process";
+import * as ledger from "./load-ledger.mjs";
+
+const DEFAULT_MODEL_TIERS = "~/.paseo/model-tiers.json";
+const DEFAULT_PRESET = "~/.paseo/orchestration-preferences.json";
+const ROLES = new Set(["impl", "ui", "audit", "research", "planning"]);
+const DIFFICULTIES = new Set(["simple", "standard", "hard"]);
+const BUDGETS = new Set(["cost_sensitive", "balanced", "quality"]);
+const LOCAL_MODEL_MARKER = "==qwen==/";
+
+// Quality grade -> numeric. S>A>B>C; L (local) ranks with C on the quality axis.
+const GRADE_VALUE = { S: 4, A: 3, B: 2, C: 1, L: 1 };
+// Minimum grade value a task of each difficulty wants. Below floor = under-spec penalty.
+const DIFFICULTY_FLOOR = { simple: 1, standard: 2, hard: 3 };
+
+// Routing priority profiles. costWeight penalizes effective cost; speedWeight
+// rewards the per-model speed signal; qualityBonus rewards higher grade; effortBias
+// steps the recommended reasoningEffort up (+1) or down (-1) within the model's levels.
+const PRIORITIES = {
+  balanced: { costWeight: 6, speedWeight: 0, qualityBonus: 0, effortBias: 0 },
+  "cost-efficiency": { costWeight: 14, speedWeight: 0, qualityBonus: 0, effortBias: -1 },
+  speed: { costWeight: 4, speedWeight: 60, qualityBonus: 0, effortBias: -1 },
+  quality: { costWeight: 2, speedWeight: 0, qualityBonus: 15, effortBias: 1 },
+};
+// Back-compat: the legacy --budget values map onto priorities.
+const BUDGET_TO_PRIORITY = { cost_sensitive: "cost-efficiency", balanced: "balanced", quality: "quality" };
+
+function expandHome(filePath) {
+  if (!filePath) return filePath;
+  if (filePath === "~") return os.homedir();
+  if (filePath.startsWith("~/")) return path.join(os.homedir(), filePath.slice(2));
+  return filePath;
+}
+
+function readJson(filePath) {
+  return JSON.parse(fs.readFileSync(expandHome(filePath), "utf8"));
+}
+
+function parseArgs(argv) {
+  const args = {
+    budget: "balanced",
+    contextTokens: 0,
+    difficulty: "standard",
+    fanout: 1,
+    modelTiersPath: DEFAULT_MODEL_TIERS,
+    presetPath: DEFAULT_PRESET,
+    requires: [],
+    residentLocal: "",
+    task: "",
+  };
+
+  for (let index = 0; index < argv.length; index += 1) {
+    const arg = argv[index];
+    const next = () => argv[++index];
+
+    if (arg === "--dry-run-samples") args.dryRunSamples = true;
+    else if (arg === "--json") args.json = true;
+    else if (arg === "--explain") args.explain = true;
+    else if (arg === "--role") args.role = next();
+    else if (arg === "--task") args.task = next() || "";
+    else if (arg === "--difficulty") args.difficulty = next() || "standard";
+    else if (arg === "--budget") args.budget = next() || "balanced";
+    else if (arg === "--priority") args.priority = next();
+    else if (arg === "--context-tokens") args.contextTokens = Number(next() || 0);
+    else if (arg === "--fanout") args.fanout = Number(next() || 1);
+    else if (arg === "--requires") args.requires = String(next() || "").split(",").map((s) => s.trim()).filter(Boolean);
+    else if (arg === "--resident-local") args.residentLocal = next() || "";
+    else if (arg === "--preset") args.presetPath = next();
+    else if (arg === "--model-tiers") args.modelTiersPath = next();
+    else if (arg === "--reserve") args.reserve = next() || "";
+    else if (arg === "--release") args.release = next() || "";
+    else if (arg === "--tokens") args.tokens = Number(next() || 0);
+    else if (arg === "--no-ledger") args.noLedger = true;
+    else if (arg === "--load-snapshot") args.loadSnapshot = true;
+    else if (arg === "--live-status") args.liveStatus = next() || "";
+    else if (arg === "--help" || arg === "-h") args.help = true;
+    else throw new Error(`Unknown argument: ${arg}`);
+  }
+
+  return args;
+}
+
+function normalizeModelId(provider) {
+  return String(provider).replace(/^opencode\//, "");
+}
+
+function modelKey(provider) {
+  const parts = normalizeModelId(provider).split("/");
+  return parts[parts.length - 1];
+}
+
+function isLocalProvider(provider) {
+  return normalizeModelId(provider).startsWith(LOCAL_MODEL_MARKER);
+}
+
+// Models flagged neverSubagent in any tier object must never be routed (the
+// router only ever selects subagents). This is the guardrail the README promised.
+function neverSubagentSet(registry) {
+  const set = new Set();
+  for (const value of Object.values(registry)) {
+    if (value && typeof value === "object" && !Array.isArray(value) && value.neverSubagent && Array.isArray(value.models)) {
+      for (const id of value.models) set.add(modelKey(id));
+    }
+  }
+  return set;
+}
+
+function tierName(modelId, tiers) {
+  const normalized = normalizeModelId(modelId);
+  for (const [name, value] of Object.entries(tiers)) {
+    if (Array.isArray(value) && value.includes(normalized)) return name;
+    if (value && Array.isArray(value.models) && value.models.includes(normalized)) return name;
+  }
+  return "unknown";
+}
+
+// Pick the pricing band that matches the request context size. Qwen Plus models
+// carry an _over_256k band that roughly triples cost past 256K tokens.
+function pricingForContext(key, registry, contextTokens) {
+  const base = registry.pricing?.[key] || {};
+  if (contextTokens > 256000 && base._over_256k) return { ...base, ...base._over_256k, _band: "over_256k" };
+  return base;
+}
+
+function effectivePricing(pricing = {}) {
+  return {
+    input: Number(pricing.effective_input ?? pricing.input ?? 0),
+    output: Number(pricing.effective_output ?? pricing.output ?? 0),
+    cachedRead: Number(pricing.effective_cached_read ?? pricing.cached_read ?? 0),
+  };
+}
+
+// Output dominates real spend; weight it 3:1 over input.
+function blendedCost(pricing) {
+  const e = effectivePricing(pricing);
+  return e.output * 0.75 + e.input * 0.25;
+}
+
+// Hard filters. Returns null if the candidate survives, else a disqualifying reason.
+function disqualify(key, attrs, request, neverSub) {
+  if (neverSub.has(key)) return "neverSubagent guardrail (S-tier never routed as subagent)";
+  if (!attrs) return null; // unknown model: do not eliminate, it just scores low
+  if (attrs.routable === false) return `not routable (${attrs.routable_note || "availability/license hold"})`;
+  const mods = attrs.modalities || [];
+  for (const need of request.requires) {
+    if (!mods.includes(need)) return `missing required modality: ${need}`;
+  }
+  if (request.contextTokens > 0 && attrs.ctx_max && request.contextTokens > attrs.ctx_max) {
+    return `context ${request.contextTokens} exceeds ctx_max ${attrs.ctx_max}`;
+  }
+  return null;
+}
+
+// Soft quality and role-fit score.
+function fitScore(attrs, request) {
+  const reasons = [];
+  let score = 0;
+  const add = (points, reason) => { score += points; if (points) reasons.push(`${points > 0 ? "+" : ""}${points.toFixed(0)} ${reason}`); };
+
+  if (!attrs) {
+    add(50, "unknown model (neutral baseline)");
+    return { score, reasons, grade: "?" };
+  }
+
+  // Role affinity is the backbone.
+  const affinity = attrs.roles?.[request.role] ?? 0.5;
+  add(affinity * 100, `role ${request.role} affinity ${affinity}`);
+
+  // Trait overlap with the task/role/requires text.
+  const haystack = `${request.task} ${request.role} ${request.requires.join(" ")}`.toLowerCase();
+  const hits = (attrs.traits || []).filter((t) => haystack.includes(String(t).toLowerCase()));
+  if (hits.length) add(Math.min(hits.length, 4) * 8, `traits: ${hits.slice(0, 4).join(", ")}`);
+
+  // Difficulty headroom: penalize under-spec, do not reward overkill (economics handles that).
+  const gradeVal = GRADE_VALUE[attrs.grade] ?? 1;
+  const floor = DIFFICULTY_FLOOR[request.difficulty] ?? 2;
+  if (gradeVal < floor) add(-(floor - gradeVal) * 40, `under-spec for ${request.difficulty} (grade ${attrs.grade})`);
+
+  // Context sweet-spot: fits the ceiling but degrades past the sweet spot.
+  if (request.contextTokens > 0 && attrs.ctx_sweet_spot && request.contextTokens > attrs.ctx_sweet_spot) {
+    add(-30, `past ctx sweet-spot ${attrs.ctx_sweet_spot}`);
+  }
+
+  // Quality priority: reward higher grade.
+  const priority = PRIORITIES[request.priority] || PRIORITIES.balanced;
+  if (priority.qualityBonus) add(gradeVal * priority.qualityBonus, `quality priority (grade ${attrs.grade})`);
+
+  return { score, reasons, grade: attrs.grade };
+}
+
+// Classify a provider string into a cost/priority source for provider_priority.
+// Order matters: check the cloud gateways before the generic opencode markers.
+function sourceOf(provider) {
+  const s = String(provider);
+  if (s.includes("digitalocean")) return "digitalocean";
+  if (s.includes("openrouter")) return "openrouter";
+  if (s.startsWith("reasonix/")) return "reasonix";
+  if (s.includes("==edge-")) return "local-edge";
+  if (s.includes("==")) return "local";
+  if (s.includes("opencode-go/")) return "opencode-go";
+  if (s.startsWith("claude/") || s.startsWith("codex/")) return "subscription";
+  if (s.includes("opencode/opencode/") || s.endsWith("-free")) return "opencode-zen";
+  return "other";
+}
+
+// Economics tiebreak: provider priority, cost, quota, live load, locality, residency.
+function economics(provider, key, request, registry, presetIndex, loadCtx) {
+  const reasons = [];
+  let score = 0;
+  const add = (points, reason) => { score += points; if (points) reasons.push(`${points > 0 ? "+" : ""}${points.toFixed(1)} ${reason}`); };
+
+  const priority = PRIORITIES[request.priority] || PRIORITIES.balanced;
+
+  // Provider priority: route equivalent models to the preferred source (free DO
+  // credits first, then cheap cloud, then local; opencode-go deprioritized).
+  const src = sourceOf(provider);
+  const pp = registry.provider_priority?.[src];
+  if (typeof pp === "number" && pp) add(pp, `provider ${src}`);
+
+  const pricing = pricingForContext(key, registry, request.contextTokens);
+  const cost = blendedCost(pricing);
+  add(-cost * priority.costWeight, `blended cost $${cost.toFixed(3)}/M${pricing._band ? ` (${pricing._band})` : ""}`);
+
+  // Speed priority: reward the per-model responsiveness signal (TTFT-oriented).
+  const speed = registry.speed?.[key];
+  if (priority.speedWeight && typeof speed === "number") add(speed * priority.speedWeight, `speed ${speed}`);
+
+  const quota = Number(registry.quotas_per_5h?.[key] ?? (isLocalProvider(provider) ? 200 : 0));
+  add(Math.min(quota, 30000) / 1000, `quota ${quota}/5h`);
+
+  const local = isLocalProvider(provider);
+  if (local && request.fanout > 1) {
+    add(-80, `local penalized for fan-out x${request.fanout}`);
+  } else if (local) {
+    add(request.priority === "cost-efficiency" ? 20 : 5, "local zero-dollar serial option");
+    if (request.residentLocal && key === modelKey(request.residentLocal)) {
+      add(25, "already resident in llama-swap (no model swap)");
+    }
+  }
+
+  add(-presetIndex * 0.01, "preset order");
+
+  // Live load: soft penalties for in-flight crowding, quota exhaustion, and (local
+  // only) host saturation. Reconciled from the shared cross-process ledger.
+  if (loadCtx) {
+    const adj = ledger.loadAdjustment({
+      src,
+      isLocal: local,
+      inflight: loadCtx.bySrc?.[src]?.inflight,
+      usage: loadCtx.byKey?.[key]?.usage,
+      quota,
+      host: loadCtx.host,
+      tuning: loadCtx.tuning,
+    });
+    score += adj.score;
+    reasons.push(...adj.reasons);
+  }
+
+  return { score, cost, quota, isLocal: local, band: pricing._band || "base", reasons };
+}
+
+function scoreCandidate(provider, request, registry, neverSub, presetIndex, loadCtx) {
+  const key = modelKey(provider);
+  const normalized = normalizeModelId(provider);
+  const attrs = registry.attributes?.[key];
+  const dq = disqualify(key, attrs, request, neverSub);
+
+  if (dq) {
+    return { provider, modelId: normalized, key, tier: tierName(normalized, registry), eliminated: true, reason: dq, score: -Infinity, reasons: [`ELIMINATED: ${dq}`] };
+  }
+
+  const fit = fitScore(attrs, request);
+  const econ = economics(provider, key, request, registry, presetIndex, loadCtx);
+
+  return {
+    provider,
+    modelId: normalized,
+    key,
+    tier: tierName(normalized, registry),
+    grade: fit.grade,
+    eliminated: false,
+    score: fit.score + econ.score,
+    fitScore: Math.round(fit.score),
+    econScore: Number(econ.score.toFixed(1)),
+    effectiveCostPerMTok: econ.cost,
+    quotaPer5h: econ.quota,
+    isLocal: econ.isLocal,
+    band: econ.band,
+    reasons: [...fit.reasons, ...econ.reasons],
+  };
+}
+
+// Merge the registry's optional `load` block over the code defaults (concurrency
+// caps merge per-source rather than replacing the map wholesale).
+function mergeTuning(registry) {
+  return {
+    ...ledger.LOAD_DEFAULTS,
+    ...(registry.load || {}),
+    concurrency_soft: { ...ledger.LOAD_DEFAULTS.concurrency_soft, ...(registry.load?.concurrency_soft || {}) },
+  };
+}
+
+// Resolve the live-load context once per routing call: merged tuning, the
+// reconciled cross-process ledger snapshot, and host pressure. Returns null when
+// load awareness is disabled, so the scorer falls back to stateless behavior.
+function buildLoadContext(request, registry) {
+  const tuning = mergeTuning(registry);
+  if (request.noLedger || tuning.enabled === false) return null;
+  const snap = ledger.snapshot(Date.now(), { windowSec: tuning.window_sec, ttlSec: tuning.reservation_ttl_sec });
+  return { byKey: snap.byKey, bySrc: snap.bySrc, host: ledger.hostLoad(), tuning };
+}
+
+// Emit the raw load snapshot as JSON for the control UI's dashboard. Self-contained
+// so the UI can shell out the same way it runs a routing decision.
+function printLoadSnapshot(args) {
+  const registry = readJson(args.modelTiersPath);
+  const tuning = mergeTuning(registry);
+  const snap = ledger.snapshot(Date.now(), { windowSec: tuning.window_sec, ttlSec: tuning.reservation_ttl_sec });
+  console.log(JSON.stringify({ now: Date.now(), host: ledger.hostLoad(), byKey: snap.byKey, bySrc: snap.bySrc, tuning }));
+}
+
+function chooseProvider(request, preset, registry) {
+  if (!ROLES.has(request.role)) throw new Error(`Role must be one of: ${[...ROLES].join(", ")}`);
+  if (!DIFFICULTIES.has(request.difficulty)) throw new Error(`Difficulty must be one of: ${[...DIFFICULTIES].join(", ")}`);
+  if (!BUDGETS.has(request.budget)) throw new Error(`Budget must be one of: ${[...BUDGETS].join(", ")}`);
+  if (!PRIORITIES[request.priority]) throw new Error(`Priority must be one of: ${Object.keys(PRIORITIES).join(", ")}`);
+
+  const roleValue = preset.providers?.[request.role];
+  if (!roleValue) throw new Error(`Preset has no provider entry for role: ${request.role}`);
+
+  if (typeof roleValue === "string") {
+    return { provider: roleValue, modelId: normalizeModelId(roleValue), rationale: "Preset role is pinned to a single provider.", reasoning: resolveReasoning(roleValue, registry, request.difficulty, request.contextTokens, request.priority), permissions: resolvePermissions(roleValue, registry), fallbacks: [], scores: [] };
+  }
+  if (!Array.isArray(roleValue)) throw new Error(`Provider entry for ${request.role} must be a string or array`);
+
+  const neverSub = neverSubagentSet(registry);
+  const loadCtx = buildLoadContext(request, registry);
+  const scored = roleValue.map((provider, index) => scoreCandidate(provider, request, registry, neverSub, index, loadCtx));
+  const survivors = scored.filter((c) => !c.eliminated).sort((a, b) => b.score - a.score);
+
+  if (!survivors.length) {
+    const why = scored.map((c) => `${c.key} (${c.reason})`).join("; ");
+    throw new Error(`All candidates eliminated for role ${request.role}: ${why}`);
+  }
+
+  return { provider: survivors[0].provider, modelId: survivors[0].modelId, rationale: buildRationale(survivors[0], request), reasoning: resolveReasoning(survivors[0].provider, registry, request.difficulty, request.contextTokens, request.priority), permissions: resolvePermissions(survivors[0].provider, registry), fallbacks: survivors.slice(1).map((s) => s.provider), scores: scored };
+}
+
+// Resolve the recommended OpenCode reasoningEffort for the chosen model at this
+// difficulty. "auto" means do not set reasoningEffort (leave the model default).
+// Clamps to the model's supported levels and steps effort down under context
+// pressure (reasoning consumes output headroom that a near-full window lacks).
+function resolveReasoning(provider, registry, difficulty, contextTokens = 0, priorityName = "balanced") {
+  const key = modelKey(provider);
+  const r = registry.reasoning?.[key];
+  if (!r) return { effort: "auto", apply: "no reasoning profile in registry; leave model default" };
+  if (r.effort === "auto") return { effort: "auto", apply: "model self-manages; do not set reasoningEffort" };
+
+  const levels = Array.isArray(r.levels) ? r.levels : null;
+  let effort = r.by_difficulty?.[difficulty] ?? r.default;
+  const notes = [];
+
+  // Priority bias: speed/cost-efficiency lean reasoning down, quality leans it up.
+  const bias = (PRIORITIES[priorityName] || PRIORITIES.balanced).effortBias;
+  if (levels && bias) {
+    const i = levels.indexOf(effort);
+    const j = Math.max(0, Math.min(levels.length - 1, i + bias));
+    if (j !== i) { effort = levels[j]; notes.push(`${bias > 0 ? "raised" : "lowered"} for ${priorityName} priority`); }
+  }
+
+  // Clamp to the model's supported set (e.g. OpenAI never accepts "max").
+  if (levels && !levels.includes(effort)) {
+    effort = levels.includes(r.default) ? r.default : levels[levels.length - 1];
+    notes.push(`clamped to supported level ${effort}`);
+  }
+  // Context-pressure downgrade: past 70% of the model's ctx_max, step down one
+  // level so reasoning leaves room for output (levels are ordered low->high).
+  const ctxMax = registry.attributes?.[key]?.ctx_max;
+  if (levels && contextTokens > 0 && ctxMax && contextTokens > ctxMax * 0.7) {
+    const i = levels.indexOf(effort);
+    if (i > 0) { effort = levels[i - 1]; notes.push(`stepped down for context pressure (>${Math.round(ctxMax * 0.7)})`); }
+  }
+
+  const apply = (registry.reasoning?._apply || "") + (notes.length ? ` [${notes.join("; ")}]` : "");
+  return { effort, apply };
+}
+
+// Resolve the permission posture for the chosen provider's backend. Defaults to
+// bypass/yolo (registry permissions._default); settings go to create_agent.
+function resolvePermissions(provider, registry) {
+  let backend = String(provider).split("/")[0];
+  if (backend.startsWith("oc-")) backend = "opencode"; // oc-digitalocean/oc-openrouter/oc-sam-desktop/oc-embedding extend opencode
+  const mode = registry.permissions?._default || "bypass";
+  const p = registry.permissions?.[backend];
+  if (!p) return { backend, mode, settings: null, note: "no permission profile for this backend; pass nothing" };
+  return { backend, mode, settings: p[mode] ?? null, cliBypass: p.cli_bypass || null };
+}
+
+function buildRationale(winner, request) {
+  const parts = [
+    `${winner.key} (grade ${winner.grade}) won for ${request.role}`,
+    `fit ${winner.fitScore}`,
+    `econ ${winner.econScore}`,
+    `effective cost $${winner.effectiveCostPerMTok.toFixed(3)}/M`,
+    `quota ${winner.quotaPer5h}/5h`,
+  ];
+  if (winner.reasons.length) parts.push(winner.reasons.slice(0, 3).join(", "));
+  return parts.join("; ");
+}
+
+function printHuman(result, request, presetPath, explain) {
+  console.log(`role: ${request.role}  difficulty: ${request.difficulty}  priority: ${request.priority}  fanout: ${request.fanout}`);
+  console.log(`preset: ${presetPath}`);
+  console.log(`pick: ${result.provider}`);
+  console.log(`rationale: ${result.rationale}`);
+  if (result.reasoning) {
+    console.log(`reasoningEffort: ${result.reasoning.effort}`);
+    if (explain && result.reasoning.apply) console.log(`  apply: ${result.reasoning.apply}`);
+  }
+  if (result.permissions) {
+    console.log(`permissions: ${result.permissions.backend} ${result.permissions.mode} -> ${JSON.stringify(result.permissions.settings)}`);
+    if (explain && result.permissions.cliBypass) console.log(`  cli: ${result.permissions.cliBypass}`);
+  }
+  if (result.fallbacks && result.fallbacks.length) {
+    console.log(`fallbacks: ${result.fallbacks.join(" -> ")}`);
+  }
+  if (result.scores.length) {
+    console.log("candidates:");
+    for (const c of result.scores) {
+      if (c.eliminated) {
+        console.log(`  - ${c.provider} ELIMINATED: ${c.reason}`);
+        continue;
+      }
+      console.log(`  - ${c.provider} score=${c.score.toFixed(2)} grade=${c.grade} fit=${c.fitScore} econ=${c.econScore} cost=${c.effectiveCostPerMTok.toFixed(3)} quota=${c.quotaPer5h} band=${c.band}`);
+      if (explain) for (const r of c.reasons) console.log(`      ${r}`);
+    }
+  }
+}
+
+function usage() {
+  console.log(`Usage:
+  node model-router/router.mjs --role <role> --task <text> [options]
+
+Options:
+  --preset <path>          Active preset JSON path. Default: ${DEFAULT_PRESET}
+  --model-tiers <path>     Model registry path. Default: ${DEFAULT_MODEL_TIERS}
+  --difficulty <value>     simple, standard, or hard. Default: standard
+  --budget <value>         cost_sensitive, balanced, or quality (legacy alias for --priority)
+  --priority <value>       cost-efficiency, speed, quality, or balanced. Default: balanced
+  --context-tokens <n>     Approximate input context size. Default: 0
+  --requires <list>        Comma-separated hard modality needs, e.g. vision,computer-use
+  --fanout <n>             Parallel agent count for this dispatch. Default: 1
+  --resident-local <id>    Local model currently loaded in llama-swap (residency bonus)
+  --reserve <id>           Record this pick as in-flight under <id> (real dispatch)
+  --release <id>           Mark dispatch <id> complete; no routing performed
+  --tokens <n>             Optional token spend recorded with --reserve/--release
+  --no-ledger              Ignore the shared load ledger (stateless routing)
+  --live-status <url>      Query <url> for running models to auto-populate --resident-local
+  --json                   Print JSON
+  --explain                Print full per-candidate scoring trace
+  --dry-run-samples        Run sample selections
+`);
+}
+
+// Query GET /api/providers and return the first loaded local model name.
+// Uses execFileSync+curl with args as array (no shell, no injection risk).
+function resolveLiveStatus(statusUrl) {
+  try {
+    const raw = execFileSync("curl", ["-s", "--max-time", "3", statusUrl], { timeout: 4000, encoding: "utf8" });
+    const data = JSON.parse(raw);
+    for (const p of data.providers ?? []) {
+      if (!p.ok || !p.models?.length) continue;
+      const m = p.models[0];
+      return typeof m === "string" ? m : (m.id || m.model || "");
+    }
+    return "";
+  } catch {
+    return "";
+  }
+}
+
+function runOne(args) {
+  const registry = readJson(args.modelTiersPath);
+  const preset = readJson(args.presetPath);
+
+  let residentLocal = args.residentLocal;
+  if (!residentLocal && args.liveStatus) {
+    residentLocal = resolveLiveStatus(args.liveStatus);
+  }
+
+  const request = {
+    role: args.role,
+    task: args.task,
+    difficulty: args.difficulty,
+    contextTokens: args.contextTokens,
+    budget: args.budget,
+    priority: args.priority || BUDGET_TO_PRIORITY[args.budget] || "balanced",
+    fanout: args.fanout,
+    requires: args.requires,
+    residentLocal,
+    noLedger: args.noLedger,
+  };
+  const result = chooseProvider(request, preset, registry);
+
+  // Record the pick so sibling fan-out calls see it as in-flight. Only with an
+  // explicit --reserve id (a real dispatch); previews and samples never write.
+  if (args.reserve && !args.noLedger) {
+    ledger.reserve({ id: args.reserve, key: modelKey(result.provider), src: sourceOf(result.provider), at: Date.now(), tokens: args.tokens });
+  }
+
+  if (args.json) console.log(JSON.stringify({ request, result }, null, 2));
+  else printHuman(result, request, args.presetPath, args.explain);
+}
+
+function runSamples(args) {
+  const samples = [
+    { ...args, presetPath: "~/.paseo/presets/workhorse-mid.json", role: "ui", task: "review a screenshot-heavy frontend flow with visual design risks", requires: ["image"], contextTokens: 120000, difficulty: "standard", fanout: 1, explain: true },
+    { ...args, presetPath: "~/.paseo/presets/workhorse-mid.json", role: "impl", task: "apply a mechanical OpenSpec implementation across files", contextTokens: 80000, difficulty: "simple", fanout: 3, explain: true },
+    { ...args, presetPath: "~/.paseo/presets/workhorse-mid.json", role: "research", task: "browse a 1M-token repo and synthesize findings", contextTokens: 400000, difficulty: "hard", fanout: 1, explain: true },
+  ];
+  for (const sample of samples) {
+    runOne(sample);
+    console.log("");
+  }
+}
+
+function main() {
+  const args = parseArgs(process.argv.slice(2));
+  if (args.help) return usage();
+  if (args.loadSnapshot) return printLoadSnapshot(args);
+  // Release-only: mark a prior dispatch complete so it stops counting as in-flight.
+  if (args.release) return ledger.release(args.release, { at: Date.now(), tokens: args.tokens });
+  if (args.dryRunSamples) return runSamples(args);
+  if (!args.role) throw new Error("--role is required unless --dry-run-samples is used");
+  runOne(args);
+}
+
+try {
+  main();
+} catch (error) {
+  console.error(`router error: ${error.message}`);
+  process.exit(1);
+}