# Codecontext + TypeScript: recon and plan **Date:** 2026-05-22 **Author:** read-only recon, evidence-first ## Part A — Current codecontext usage in BooCode ### A1. Server-side synthesis pipeline BooCode runs a **forced second-inference synthesis pass** after a model emits any of three codecontext tool calls. The list is hard-coded: `/opt/boocode/apps/server/src/services/synthesisPipeline.ts:34-38` ```ts export const SYNTHESIS_TOOLS: ReadonlySet = new Set([ 'get_codebase_overview', 'get_framework_analysis', 'get_semantic_neighborhoods', ]); ``` The pipeline is triggered from the tool-phase, not by the model: `/opt/boocode/apps/server/src/services/inference/tool-phase.ts:200-279`. After tool-phase records the tool_call/tool_result rows it picks the first synth-eligible entry, expands the inline-truncated head via tmpfs (`readTruncation`), pulls top-N referenced files + project docs (BOOCHAT.md, AGENTS.md, CONTEXT.md, *roadmap*.md), token-budgets to 32k chars/4 (`synthesisPipeline.ts:45-46`), streams a second model inference with a 90s timeout (`synthesisPipeline.ts:50`), and either emits a `kind='synthesis'` message-part or falls through to the recursive turn on failure (`synthesisPipeline.ts:250-272`). The pipeline is **invoked once per turn that contains a SYNTHESIS_TOOLS call** — at most one synthesis pass per turn (the loop picks the first synth-eligible entry, `tool-phase.ts:256`). The codecontext tools themselves are HTTP wrappers over the sidecar: `/opt/boocode/codecontext/shim.go:412-419` registers eight POST routes (`/v1/get_codebase_overview` … `/v1/get_framework_analysis`). The shim serialises calls under `callMu` and forwards JSON-RPC to a single `codecontext mcp` child (`shim.go:194`, `shim.go:328-333`). The child binary is built from `github.com/nmakod/codecontext` tag `v3.2.1` (`/opt/boocode/codecontext/Dockerfile:18-22`), NOT from the local fork at `/opt/forks/codecontext` (which is `github.com/nuthan-ms/codecontext`, fork go.mod: `/opt/forks/codecontext/go.mod:1`). Container reports `codecontext version dev` (recon: `docker exec boocode_codecontext codecontext --version` returned `codecontext version dev / Build Date: unknown / Git Commit: unknown`). Wrapper boundaries: - `/opt/boocode/apps/server/src/services/codecontext_client.ts:68-70` hard timeout `REQUEST_TIMEOUT_MS = 30_000`, inline truncation `TRUNCATION_LIMIT = 32_000`. - Same file lines 80-95: realpath project + target_dir, reject any target_dir that escapes the project root. The eight wrappers never pass `target_dir` (`callCodecontext` injects it server-side, line 99). - Lines 130-141 surface the upstream "content is empty" parser bug (issue #37) with an actionable hint pointing at `.codecontextignore`. ### A2. Agent-exposed tool surface Source of truth: `/opt/boocode/data/AGENTS.md` (six agents) plus the `DEFAULT_TOOLS` fallback in `/opt/boocode/apps/server/src/services/agents.ts:19-20` (every tool in `ALL_TOOLS`). Per-agent codecontext exposure (cited from `/opt/boocode/data/AGENTS.md:6,41,62,100,138,179`): | Agent | Codecontext tools exposed | |---|---| | Code Reviewer (line 3) | get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, search_symbols, watch_changes | | Debugger (line 38) | same eight | | Refactorer (line 59) | same eight | | Architect (line 97) | same eight | | Security Auditor (line 135) | same eight | | Prompt Builder (line 176) | **none** — `tools: [view_file, list_dir, grep, find_files]` | Every project-less or no-agent chat falls back to `DEFAULT_TOOLS` = `ALL_TOOLS` (all 21 tools including the eight codecontext ones) (`agents.ts:19-20,196`). The `BOOCODE_TOOLS` env var can narrow further via `resolveToolTier()` (`tools.ts:712-732`): `core` (4 tools, no codecontext) / `standard` (16, all eight codecontext) / `all` (21). `STANDARD_TOOL_NAMES` includes all eight codecontext tools (`tools.ts:719-732`). The eight codecontext tool registrations live in `tools.ts:653-660` and are all marked read-only in `READ_ONLY_TOOL_NAMES` (`tools.ts:689-696`). ### A3. Actual usage (DB) Tool-call frequency from `message_parts` (all-time; DB only has data back to 2026-05-22 today — see "Claims I did not verify" for the retention question): Query: `SELECT payload->>'name', COUNT(*) FROM message_parts WHERE kind='tool_call' GROUP BY 1 ORDER BY 2 DESC` | Tool | Calls | Chats | |---|---:|---:| | view_file | 129 | — | | grep | 81 | — | | list_dir | 78 | — | | find_files | 25 | — | | **get_codebase_overview** | **24** | 23 | | **search_symbols** | **8** | 5 | | ask_user_input | 5 | 3 | | `foo` (typo/invalid) | 4 | 2 | | view_truncated_output | 4 | 2 | | git_status | 3 | 2 | | **get_file_analysis** | **3** | 1 | | **get_framework_analysis** | **1** | 1 | | `([^` (typo/invalid) | 1 | 1 | Codecontext-tool calls observed: **only 5 of 8** ever invoked (`get_codebase_overview`, `search_symbols`, `get_file_analysis`, `get_framework_analysis`, and `get_dependencies` does not appear). **Never called** (in the recorded window): `get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods`, `watch_changes`. Per-call args sample (`mp.created_at` desc, last 12 calls; recon-verified by query against message_parts): - `get_codebase_overview` invoked ~9 times in a row with `{"include_stats":true}` — repeated overview fetches within minutes. - `search_symbols` examples: `{"limit":20,"query":"Kind"}`, `{"limit":20,"query":"SymbolKind"}`, `{"limit":20,"query":"Kind","framework_type":"typescript"}`. - `get_file_analysis` invoked 3 times in one chat with `file_path` = `apps/server/src/services/inference.ts`, `apps/server/src/services/inference/parts.ts`, `apps/server/src/services/system-prompt.ts` — **all three failed** with "File not found in graph" (see C3). ### A4. Hang and drift correlation **Cohort analysis** (query against `messages` joined to chats that ever used any codecontext tool): | Cohort | status | rows | |---|---|---:| | no_codecontext | complete | 24 | | no_codecontext | cancelled | 1 | | used_codecontext | complete | 191 | | used_codecontext | streaming | 2 | | used_codecontext | **failed** | **2** | Two failed assistant messages, both in chats that used codecontext. Both have empty `content` — characteristic of a synth pass that aborted before any deltas streamed (see `synthesisPipeline.ts:278-303`, `markSynthFailed`). DB query: ``` SELECT id, status, created_at, LEFT(content,200) FROM messages WHERE role='assistant' AND status IN ('failed','streaming') ``` returned two `failed` rows with empty content at 2026-05-22 18:43:39 and 2026-05-22 19:59:56. The 18:43 failure correlates with the codecontext sidecar log line `2026/05/22 18:44:10.842554 get_framework_analysis target_dir=/opt/boocode duration_ms=30002 status=rpc_error` — a 30 s timeout (`codecontext_client.ts:70`) under a `get_framework_analysis` call (`synthesisPipeline.ts:34-38` would have triggered synthesis on success — failure path skipped synthesis and surfaced the error). **Drift / format leakage:** the query `SELECT * FROM messages WHERE role='assistant' AND (content LIKE '%` as a *topic*, not actually emitting a tool call as text. **One real drift case** at 2026-05-22 19:05:03 — content begins "I need to investigate the codecontext fork to write this design document. Let me start by reading the key files.\n\n…" — an Anthropic-format leak. This message is in a chat that did use codecontext, but the drift evidence is too thin (n=1) to claim a correlation. ## Part B — TypeScript parsing gap ### B1. TS-targeted workload Per-language breakdown of codecontext calls that target a specific file or framework (DB query): | Language hint | Calls | |---|---:| | no file_path (overview/framework/symbol search) | 33 | | ts/tsx | 3 | | (no other extension observed) | — | The three TS-targeted calls were all `get_file_analysis` in a single chat: `inference.ts`, `inference/parts.ts`, `system-prompt.ts`. **All three failed** with `File not found in graph` (see C3 — relative path mishandling). One `search_symbols` call carried `framework_type=typescript` (Q="Kind"). So **TS is the actual workload** for narrow codecontext use; the rest is whole-repo overview/framework analysis with no specific language filter. ### B2. Symbol recovery quality I called the live container against three load-bearing BooCode TS files and compared the symbol list against a manual grep of top-level declarations. **File 1: `/opt/boocode/apps/server/src/types/api.ts` (371 lines)** Manual count (grep `^(export )?(interface|type|const) `): - interfaces: 36 - top-level types: 15 - top-level consts: 5 - total significant: 56 Codecontext output (live HTTP call to `http://codecontext:8080/v1/get_file_analysis`): ```json { "result": "# File Analysis: ...\n**Lines:** 372\n**Symbols:** 10\n\n## Symbols\n\n- **PROJECT_STATUSES** () - Line 2\n- **PROJECT_STATUSES** () - Line 2\n- **CHAT_STATUSES** () - Line 91\n..." } ``` Total reported: 10 symbols, all five `*_STATUSES` consts duplicated (line 2 appears twice, etc.). After regex-extracting names: - Unique symbols reported by codecontext: 8 (5 *_STATUSES consts + 3 header strings `Language:`/`Lines:`/`Symbols:`) - Interfaces / types found: **0 of 51**. - Symbol-recovery rate: **5/56 = ~9%** (only the const arrays the JS grammar understands). Specific misses checked against the actual file (grep -nE on `/opt/boocode/apps/server/src/types/api.ts`): - Line 5 `export interface Project` — MISSED - Line 26 `export type SessionStatus` — MISSED - Line 28 `export interface Session` — MISSED - Line 47 `export type WorkspacePaneKind` — MISSED - All 36 interface declarations and 15 type aliases — MISSED. **File 2: `/opt/boocode/apps/server/src/services/tools.ts` (763 lines)** Manual count: 47 top-level decls (grep `^(export )?(interface|type|enum|namespace|const|function|class|async function) `). Codecontext output: **112 symbols** reported (but many are noise: local function-scope variables, the literal token `"unknown"` from type cast positions, even raw labels like `out:`). Python-extracted from result: 71 unique names. Cross-checked against 20 significant TS exports the file declares: - Found: `ListDirInput`, `READ_ONLY_TOOL_NAMES`, `CORE_TOOL_NAMES`, `STANDARD_TOOL_NAMES` (4 / 20) - **MISSED: `ToolDef`, `ViewFileInput`, `viewFile`, `listDir`, `grep`, `findFiles`, `viewTruncatedOutput`, `gitStatus`, `skillFind`, `skillUse`, `skillResource`, `askUserInput`, `ALL_TOOLS`, `TOOLS_BY_NAME`, `resolveToolTier`, `toolJsonSchemas`** — every exported `ToolDef<…>` named constant is missed because the JS grammar can't parse the TS type annotation `: ToolDef<…>` that precedes the `=` and bails out of recognising the const at top-level. - Symbol-recovery rate (significant): **4/20 = 20%**. **File 3: `/opt/boocode/apps/server/src/services/inference/stream-phase.ts` (482 lines)** Manual count: 5 top-level decls (2 are `export async function`, 1 interface, 1 type, 1 const). Codecontext output: 53 symbols extracted, but the first 20 are header strings (`Language:`, `Lines:`, `Symbols:`), imports (`api.js`, `model-context.js`, …), local function names from inside bodies (`toolNameById`, `out:`, `hasTools`), and string literals (`parts:`). Neither `streamCompletion` nor `executeStreamPhase` (the two `export async function` declarations at lines 145, 346) appear in the symbol list explicitly. **Aggregate:** across the three files, codecontext recovers type/interface/enum symbols at effectively **0%**, and function/const symbols at roughly **20%**. The 9596-symbol whole-repo overview is heavily noise-padded. Generic type parameters and decorators were not checked individually because they're a strict subset of the already-broken case. ### B3. Fork status **`docs/ts-bindings-design.md` does NOT exist.** Verified by `ls /opt/forks/codecontext/docs/ts-bindings-design.md` → `No such file or directory`. The `/opt/forks/codecontext/docs/` tree has 23 markdown files; none mention TypeScript bindings work (greps under `/opt/forks/codecontext/docs/` for `TypescriptLanguage|tree-sitter-tsx` returned nothing beyond a CodeContext example in `HLD.md:831` and config mentions in `ARCHITECTURE.md:297`). **go.mod dependencies (`/opt/forks/codecontext/go.mod:5-18`):** - `github.com/tree-sitter/tree-sitter-javascript v0.23.1` (present) - `github.com/tree-sitter/tree-sitter-typescript` — **NOT present**. **TS-as-JS fallback in `internal/parser/manager.go:72-79`:** ```go // TypeScript - use JavaScript grammar as fallback until TypeScript bindings are fixed // Both JS and TS have similar syntax and this provides basic parsing capability tsLang := sitter.NewLanguage(javascript.Language()) m.languages["typescript"] = tsLang tsParser := sitter.NewParser() tsParser.SetLanguage(tsLang) m.parsers["typescript"] = tsParser ``` The comment claims this provides "basic parsing capability". B2 shows that interface/type recovery is effectively zero — the JS grammar does not recognise `interface`, `type`, generic params, decorators, or even TS-typed const declarations. **Downstream code IS prepared for TS-specific nodes.** In `internal/parser/manager.go:746-765` `nodeToSymbolJS` already has cases for `interface_declaration` and `type_alias_declaration`: ```go case "interface_declaration", "interface": return &types.Symbol{Type: types.SymbolTypeInterface, ...} case "type_alias_declaration", "type_declaration": return &types.Symbol{Type: types.SymbolTypeType, ...} ``` These cases are dead code with the JS grammar — they only fire when the parser is the TypeScript grammar. The fork already has the symbol extraction wiring; it's just missing the grammar. **`SymbolType` is open (string), not an iota** — `/opt/forks/codecontext/pkg/types/graph.go:14`: ```go type SymbolType string ``` with constants like `SymbolTypeInterface`, `SymbolTypeType`, `SymbolTypeNamespace` already declared (`graph.go:16-48`). No code changes needed there to add TS-aware symbol types. **Upstream `tree-sitter-typescript` Go bindings exist.** Context7 docs for `/tree-sitter/tree-sitter-typescript` show the Go package `github.com/tree-sitter/tree-sitter-typescript` exporting `LanguageTypescript()` and `LanguageTSX()`: ```go typescript := sitter.NewLanguage(tree_sitter_typescript.LanguageTypescript()) tsx := sitter.NewLanguage(tree_sitter_typescript.LanguageTSX()) ``` (Context7 query `/tree-sitter/tree-sitter-typescript`, "Go bindings package name and how to import…", returned a working sample.) **The fork (`/opt/forks/codecontext`) is not what runs in production.** The deployed image is built from `github.com/nmakod/codecontext` tag v3.2.1 (`/opt/boocode/codecontext/Dockerfile:18-22`). The fork is a separate working tree at `/opt/forks/codecontext` on `github.com/nuthan-ms/codecontext` (`/opt/forks/codecontext/go.mod:1`). Any TS-grammar work landing in either repo requires a Dockerfile update to point at the right source. **Fork HEAD:** `ba6b94c 2025-09-01 12:43:09 +0530 Merge pull request #29 from nmakod/release-please--branches--main` — newer than the deployed v3.2.1 tag but on the same upstream lineage. ### B4. Existing TS-aware alternatives Searches in `/opt/boocode`: - `grep -rln 'ts-morph|@typescript/vfs|createCompilerHost' /opt/boocode/apps` → **no matches** in source (only types). - Only the `typescript` package is depended on (`/opt/boocode/package.json`, `/opt/boocode/apps/booterm/package.json`, `/opt/boocode/apps/server/package.json`, `/opt/boocode/apps/web/package.json` — each declares `"typescript": "^5.5.0"`). That's the tsc compiler, used for building, not for runtime symbol extraction. - No tool in `/opt/boocode/apps/server/src` parses TS at runtime for any reason other than what codecontext provides. So BooCode has **no existing fallback** for TS symbol data: if codecontext can't extract it, nobody else does. ## Part C — Optimization opportunities ### C1. Tool surface review Cross-referencing the agent whitelist (A2) with actual usage (A3): | Tool | Exposed to 5 agents? | Calls observed | Recommendation | |---|---|---:|---| | get_codebase_overview | yes | 24 | **Keep** — load-bearing, synth-triggering | | search_symbols | yes | 8 | **Keep** — only viable TS query path | | get_file_analysis | yes | 3 | **Keep** but fix relative-path bug (C3) | | get_framework_analysis | yes | 1 | Low-use; **keep** for synth signalling | | get_dependencies | yes | **0** | **Demote** — unused, considered for removal | | get_symbol_info | yes | **0** | **Demote** — unused, considered for removal | | get_semantic_neighborhoods | yes | **0** | **Demote** — unused, considered for removal | | watch_changes | yes | **0** | **Remove** from agent whitelist — also pulled out of synthesis if currently kept | `watch_changes` in particular is a state-changing async tool with no sensible LLM consumer (the model can't await fsnotify events). It should not be in the 5 agents' whitelists; the synthesis pipeline only calls 3 specific tools (`synthesisPipeline.ts:34-38`) so removing `watch_changes` from agent whitelists does not affect the pipeline. `get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods` are credible tools but the model never reaches for them — likely a descriptions/discoverability issue. Either improve their tool descriptions (the `.description` strings registered in `tools/codecontext/*.ts`) or remove them from agent whitelists. ### C2. Latency and token cost Latencies parsed from the codecontext sidecar access log (`docker logs boocode_codecontext --since 24h | grep duration_ms=`): - Total calls observed: 40 in 24h - Total time: 610,404 ms - Avg: **15,260 ms per call** - Min: 1,379 ms - p50: 9,417 ms - p90: 27,611 ms - Max: 30,002 ms (= the 30 s rpc_error timeout) Sampled MCP-server log lines confirm overview rebuilds cost 2–8 s on /opt/boocode (`6575 files, 115601 symbols, 1186758 chars markdown` in 8.22 s). The shim's per-tool log shows the analysis dominates; markdown serialization is sub-second. **Synthesis pipeline expansion** (from `docker logs boocode`): Five completed synthesis passes today, sample sizes: - `originalChars` (truncated head shipped to synth): **32,078** in every case (= the wrapper's 32 kB cap). - `fullChars` (full overview after re-expansion from tmpfs): 83,406 / 83,408 / 83,410 / 97,283 / 97,464. In other words, every overview is over the wrapper cap and synthesis always pays a tmpfs round-trip to recover the full content for reference-file extraction. The full content is *not* shipped to the synth model (the truncated head is — `synthesisPipeline.ts:141`), so the token-budget contract holds, but the synth still has to wait on the file I/O. One synthesis timeout in the day (`synthesis pass timed out; falling through to recursive turn`, chatId a74bfecb…, toolName get_codebase_overview, 90 s after expansion completed — the synth inference itself was too slow). The retry inside the same chat then completed in 31 s with `files: 0` (no referenced files extracted), suggesting the timeout repeated until reference extraction was empty. I have no cache-hit statistics to report — the shim does not log cache hits. The codecontext binary itself logs `Refreshing analysis for codebase overview…` on every call (`[MCP] Refreshing analysis…` appears for each `get_codebase_overview` in the sidecar log), so the analysis is rebuilt per call. ### C3. Failure modes Sidecar errors in the last 7 days (`docker logs boocode_codecontext --since 168h | grep -E "status=tool_error|content is empty|panic"`): 1. **`content is empty` parser bug** — 2026-05-22 17:37:41 and 17:43:41, both against `/opt/homelabhealth`, on `frontend/node_modules/hono/dist/adapter/aws-lambda/types.js`. The wrapper's `.codecontextignore` template installation (`codecontext_client.ts:30-52`) didn't help because the file is under `node_modules` which is supposedly in the template. Suggests either the template hadn't been copied yet or the template's ignore list doesn't cover the path. Each failed call cost ~25 s. 2. **Relative-path failures** — 2026-05-22 17:56:51 through 17:57:07 (three back-to-back), all `get_file_analysis`: ``` [MCP] ERROR: File not found in graph: apps/server/src/services/inference.ts (available files: 6575) ``` The wrapper resolves `target_dir` to an absolute realpath (`codecontext_client.ts:80-99`) but `file_path` is forwarded unchanged. The codecontext binary's file index is keyed on absolute paths (the 115,876-symbol overview reports absolute paths). The model passed `apps/server/src/services/inference.ts` and the binary couldn't find it. Each failure cost 8–24 s. 3. **30 s rpc_error timeout** — 2026-05-22 18:44:10 (get_framework_analysis) and 19:38:06 (search_symbols vs /opt/forks/codecontext). The shim's per-call context timeout is 60 s (`shim.go:325`) but the wrapper aborts at 30 s (`codecontext_client.ts:70`), so the client gives up before the shim does — the call still runs to completion on the codecontext side, wasting CPU. 4. **Panic in `searchSymbols`** — concurrent map iteration crash in `internal/mcp/server.go:1305` (`getFilePathForSymbol`) under `matchesFramework`, captured in `docker logs boocode_codecontext --since 24h`: ``` internal/runtime/maps.fatal(...) github.com/nuthan-ms/codecontext/internal/mcp.(*CodeContextMCPServer).getFilePathForSymbol(...) /build/codecontext/internal/mcp/server.go:1305 ``` This is an upstream bug in v3.2.1 — concurrent map access without a lock. The shim's `callMu` serialises *its* calls but the codecontext binary itself appears to have internal concurrency that hits this. **Pattern:** the 2 failed assistant messages in A4 align with the 30 s rpc_error timeout (18:44:10) and one other failure window. Failed turns leave empty `content` because synthesis aborts before any deltas — the model never sees the codecontext error. ## Part D — Plan ### D1. Tool surface decisions **Title:** Trim agent codecontext exposure to the four tools that earn their keep; demote the rest until evidence justifies them. **Why:** A3 shows 4 of 8 codecontext tools have zero observed calls, and `watch_changes` (a fsnotify-coupled tool) has no LLM consumer. The synthesis pipeline only auto-triggers on three tools (`synthesisPipeline.ts:34-38`), so removing tools from agent whitelists does not affect the server-side synth path. **Scope:** edit `/opt/boocode/data/AGENTS.md` lines 6, 41, 62, 100, 138 (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor) to drop `get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods`, `watch_changes` from each `tools:` array. Roughly 5 line edits. **Risk:** if there's a legitimate workflow not yet captured in 24 h of DB data, dropping these tools removes that affordance. Mitigation: keep them registered in `tools.ts` (the server-side wrappers stay) so the synth pipeline can still call them if `SYNTHESIS_TOOLS` expands later, and so the `BOOCODE_TOOLS=standard` tier continues to expose them via the tier filter. Tests: `agents.test.ts`, `tools.test.ts`, any agent-roundtrip tests. **Effort:** 30 min. **Sequence:** standalone. Unblocks D3 (smaller tool list = smaller system prompt = better prompt-cache stability per `tools.ts:629-632`). ### D2. TypeScript support path **Title:** Narrow the TS fork scope to "interfaces, types, enums, top- level typed consts" — defer generics and decorators. **Why:** Evidence from B1 (3 TS-targeted calls — all `get_file_analysis` — and 1 `search_symbols framework_type=typescript`) shows TS is in the workload but at low volume. Evidence from B2 shows symbol recovery is **~0% for interfaces/types and ~20% for typed consts**. That gap is what actually breaks model behaviour: when the model asks `get_file_analysis` for `api.ts` (which IS what happened today) it gets 10 noise symbols and no `interface Project`, `interface Session`, `type SessionStatus`. The narrow scope (declarations only; skip generics, JSX, decorators) covers ~90% of the recovered-symbol gap and is achievable with one new dependency and one parser-init change. **Scope:** 1. `/opt/forks/codecontext/go.mod`: add `github.com/tree-sitter/tree-sitter-typescript v0.23.x` to the `require` block. 2. `/opt/forks/codecontext/internal/parser/manager.go:72-79`: replace the JS-fallback init with ```go typescript "github.com/tree-sitter/tree-sitter-typescript/bindings/go" ... tsLang := sitter.NewLanguage(typescript.LanguageTypescript()) m.languages["typescript"] = tsLang tsxLang := sitter.NewLanguage(typescript.LanguageTSX()) m.languages["tsx"] = tsxLang ``` Plus parser registrations. `nodeToSymbolJS` already handles `interface_declaration` and `type_alias_declaration` (lines 746-765) — no extraction code changes needed for the narrow scope. 3. `/opt/forks/codecontext/internal/parser/manager.go:357-395` `detectLanguage` (skim verified to live around line 357): ensure `.tsx` maps to `"tsx"` not `"typescript"`. Likely already correct — verify. 4. Tests in `internal/parser/` — add TS-grammar fixtures (a small `.ts` file with interface, type, enum) to assert recovery. 5. Update `/opt/boocode/codecontext/Dockerfile:18-22` to clone from the fork instead of `github.com/nmakod/codecontext` v3.2.1 once the TS-grammar branch lands. **Or** PR the change upstream first if `nmakod/codecontext` is open to it. 6. Drop the fork's own `tree-sitter-javascript` dependency? No — `tree-sitter-typescript` Go binding is separate and the JS grammar is still needed for `.js`/`.jsx` files. Rough LoC: ~20 lines in manager.go, +1 line go.mod, +1 import, +1 language-detect entry; ~50 lines of tests; ~5 lines in Dockerfile. **Risk:** TS grammar parses superset syntax; some TS files may now hit `ERROR` nodes the JS grammar happily accepted. Mitigate by keeping the JS grammar registered for `.js`/`.jsx` and not changing JS handling. Regression risk lives in the codecontext-binary CI (JS+TS combined corpus) — verify their existing tests still pass. Tests to add: a fixture file containing each B2 missed symbol and a manager_test that asserts the symbols are recovered. **Effort:** Phase A (grammar swap + tests + Dockerfile pin): 90 min once a build-and-test loop is set up in the fork. **Sequence:** Blocked on a decision about whether to PR upstream (`nmakod/codecontext`) or fork-and-deploy (`nuthan-ms/codecontext`). Unblocks D3 (cleaner TS results = smaller noise in synthesis output = smaller token cost). **Decision:** **Narrow**, not "drop" and not "full TS support". Drop is wrong because TS *is* the workload (A2 + B1 show every agent and the codebase under analysis are TS-heavy). Full Phase 3-4 TS support (generics, decorators, full type queries) is overkill for current usage — interface/type/enum recovery captures the model's actual need. ### D3. Synthesis pipeline optimizations **Title:** Reduce per-turn codecontext latency and cache the overview. **Why:** C2 shows avg 15.2 s per codecontext call and an overview that rebuilds on every call. Synthesis always pays the 30 s wrapper timeout when the codecontext binary panics (C3 case 4) or hangs. **Three sub-items:** D3a. **Cache the overview at the shim layer.** The shim already serialises calls under `callMu` (`shim.go:74-77`). Add a per- `target_dir` overview cache keyed on a directory-mtime hash, TTL ~60s. Sub-second cache hits for repeated `get_codebase_overview` calls (today shows ~9 in a single chat over a few minutes). - File: `/opt/boocode/codecontext/shim.go` - LoC: ~80 - Effort: 90 min - Risk: invalidation. Use the fastest cheap invalidator (mtime of target_dir + a hash of the file count via `os.ReadDir`). On any doubt, bypass cache. D3b. **Align wrapper and shim timeouts.** Wrapper 30 s (`codecontext_client.ts:70`), shim ctx 60 s (`shim.go:325`). The mismatch wastes CPU when the wrapper gives up but the shim keeps running. Either drop the shim ctx to 30 s, or raise the wrapper to 60 s (depending on which budget is right). Recommended: align both to 45 s, abort upstream on wrapper cancel. - LoC: 2 lines - Effort: 30 min D3c. **Fix the relative-path bug in `get_file_analysis`.** The wrapper resolves `target_dir` but not `file_path`. Three failures in one chat today wasted 48 s of CPU. Fix: - File: `/opt/boocode/apps/server/src/services/tools/codecontext/get_file_analysis.ts` (and possibly the shared client at `codecontext_client.ts`). - Have the wrapper resolve `file_path` against the realpath'd project root before forwarding, mirroring `target_dir`. Error out if the resolved path doesn't start with the project root. - LoC: ~20 - Effort: 60 min - Risk: low — the model loses no affordance; absolute and relative both work. - Tests: `codecontext_client.test.ts`. **Sequence:** D3c is independent and high-ROI. D3a depends on nothing. D3b is independent. Recommended order: D3c → D3b → D3a. ### D4. Removal candidates 1. **`watch_changes` agent exposure** (A3 + A2). Server-side handler stays for completeness; it should not appear in agent `tools:` arrays. Edit `/opt/boocode/data/AGENTS.md` lines 6, 41, 62, 100, 138. 2. **The dead "csharp" comment-out block** in `/opt/forks/codecontext/internal/parser/manager.go:146-152` — delete-on-touch when D2 lands; not part of D2's core scope. 3. **The 3 zero-use codecontext tool exposures** — `get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods`. Same surgical edits as item 1. Consider keeping `get_dependencies` on the Refactorer because the agent description explicitly invokes "Use get_dependencies to map call sites" (`AGENTS.md:92-93`); if the model isn't using it despite the system-prompt nudge, the description in `tools/codecontext/get_dependencies.ts` likely needs the same verb-forward rewrite. ## Claims I did not verify - **DB retention horizon.** All `message_parts` rows are dated 2026-05-22. That could mean (a) the DB was wiped today, (b) the schema/path moved today, or (c) the project is brand-new and 24 h is genuinely the full history. The CLAUDE.md project context references "v1.13.15-codecontext-synth" which is recent. To verify: `docker exec boocode_db psql -U boocode -d boocode -c "SELECT MIN(created_at), MAX(created_at), COUNT(*) FROM messages;"` then cross-check against the BooCode roadmap's release dates. The 30-day window in A3's query may simply not have older data to find. - **Whether `nmakod/codecontext` v3.2.1 hosts the same `nodeToSymbolJS` switch I read in the fork.** The fork at `/opt/forks/codecontext` is `nuthan-ms/codecontext` per go.mod. The deployed v3.2.1 is `nmakod/codecontext`. The Dockerfile comment (`/opt/boocode/codecontext/Dockerfile:13-16`) says the module path differs but "the tagged v3.2.1 source tree is the same either way." To verify, clone `https://github.com/nmakod/codecontext` at tag v3.2.1 and diff `internal/parser/manager.go` against the fork — outside this recon's read-only scope. - **Whether `tree-sitter-typescript v0.23.x` Go bindings actually build under the fork's `go 1.24.5` + Tree-sitter `v0.25.0` combination.** Context7 docs confirm the *API exists*. Confirm by `go get github.com/tree-sitter/tree-sitter-typescript@latest` followed by `go build ./...` in a scratch worktree. - **Whether the codecontext panic in `searchSymbols` is reproducible on `/opt/boocode` or only on `/opt/forks/codecontext`** (the panic was captured against target_dir `/opt/forks/codecontext`). Reproduce via `docker exec boocode_codecontext wget -qO - --post-data='{"target_dir":"/opt/boocode","query":"foo","limit":10}' --header='Content-Type: application/json' http://localhost:8080/v1/search_symbols`. - **Cache hit rate of codecontext analysis (per call vs reused).** The MCP-server log line `Refreshing analysis for codebase overview…` suggests rebuild-every-call, but I did not confirm by reading the codecontext source — only the deployed binary's log output. To verify, read `/opt/forks/codecontext/internal/mcp/server.go` around the `Refreshing analysis…` log lines. - **Drift correlation strength.** N=1 confirmed drift case is too small to call a correlation with codecontext use. To raise the signal: extend retention, re-query after a week of synthetic load with and without codecontext tools. - **Whether the synth pipeline's `truncated head only` ships fewer tokens than a full inlined codecontext result would.** Today's budget contract assumes yes (`synthesisPipeline.ts:138-145` comment "Truncated head only — full content was used for reference extraction above"). To verify: instrument the per-pass `promptTokens` and compare against a one-off pass with the full content. - **The Architect/Code-Reviewer agents' system-prompt copy versus actual tool usage.** AGENTS.md text claims agents will "Use get_dependencies to map call sites" (line 92) and "Use get_semantic_neighborhoods to find related components" (line 132), but A3 shows neither is called. To verify whether the model is ignoring the prompt or whether these agents simply aren't being invoked, query `SELECT s.name, COUNT(*) FROM sessions s JOIN chats c ON c.session_id=s.id JOIN messages m ON m.chat_id=c.id WHERE m.role='assistant' GROUP BY 1 ORDER BY 2 DESC;` and compare named agents to chat counts.