feat: DeepSeek API integration + Whale lift (hooks, tool repair, MCP permissions, token tracking)

DeepSeek API: - @ai-sdk/deepseek provider replaces openai-compatible for deepseek-* models - Token tracking: cache_hit/reasoning tokens flow API → DB → WS frames → UI - thinking effort levels (off/low/medium/high/xhigh/max) via AGENTS.md frontmatter - V4 models: deepseek-v4-flash, deepseek-v4-pro - Wired for both chat and coder panes Whale lifts: - Tool input repair (schema-based type coercion, markdown link unwrapping) - Hooks system (6 lifecycle events, shell exec, JSON stdin/stdout contract) - Per-MCP-server permissions (allow/ask/deny) - token tracking UI (cache N, think N in message stats line) Infra: - New DB columns: messages.cache_tokens, messages.reasoning_tokens - New WS frame fields: cache_tokens, reasoning_tokens on message_complete - coder provider snapshot merges DeepSeek models alongside llama-swap
2026-06-08 01:24:23 +00:00
parent c11e26090f
commit 203cfd2fa8
29 changed files with 916 additions and 42 deletions
--- a/apps/server/src/services/model-context.ts
+++ b/apps/server/src/services/model-context.ts
@@ -37,7 +37,18 @@ export function configureModelContext(opts: { llamaSwapUrl: string }): void {
  llamaSwapUrl = opts.llamaSwapUrl;
 }

+// vDeepSeek: DeepSeek models don't have a /upstream/<model>/props endpoint.
+// Return a reasonable default context so compaction estimates work.
+const DEEPSEEK_DEFAULT_N_CTX = 131_072;
+const DEEPSEEK_MODEL_PREFIX = 'deepseek-';
+
 export async function getModelContext(model: string): Promise<ModelContext | null> {
+  // vDeepSeek: DeepSeek models have no /upstream/<model>/props. Use a static
+  // default so compaction doesn't fall to the buffer-only path with tiny limits.
+  if (model.startsWith(DEEPSEEK_MODEL_PREFIX)) {
+    return { n_ctx: DEEPSEEK_DEFAULT_N_CTX };
+  }
+
  // 1. Positive cache hit — no TTL check, model n_ctx is invariant.
  const pos = positiveCache.get(model);
  if (pos) return pos;