Files
boocode/docs/codecontext-ts-plan.md
indifferentketchup 04673eaf59 v2.1.1: roadmap cleanup + README update + openspec archive
- Archive all 10 shipped openspec changes to openspec/changes/archived/
- Update boocode_roadmap.md: date, shipped status for v1.14/v1.15/v2.0, add v2.1.0 section
- Update README.md: 3-app monorepo, add services table, add What's shipped section
- Remove stale active openspec folders (all work shipped)
2026-05-25 20:23:22 +00:00

33 KiB
Raw Permalink Blame History

Codecontext + TypeScript: recon and plan

Date: 2026-05-22 Author: read-only recon, evidence-first

Part A — Current codecontext usage in BooCode

A1. Server-side synthesis pipeline

BooCode runs a forced second-inference synthesis pass after a model emits any of three codecontext tool calls. The list is hard-coded:

/opt/boocode/apps/server/src/services/synthesisPipeline.ts:34-38

export const SYNTHESIS_TOOLS: ReadonlySet<string> = new Set([
  'get_codebase_overview',
  'get_framework_analysis',
  'get_semantic_neighborhoods',
]);

The pipeline is triggered from the tool-phase, not by the model: /opt/boocode/apps/server/src/services/inference/tool-phase.ts:200-279. After tool-phase records the tool_call/tool_result rows it picks the first synth-eligible entry, expands the inline-truncated head via tmpfs (readTruncation), pulls top-N referenced files + project docs (BOOCHAT.md, AGENTS.md, CONTEXT.md, roadmap.md), token-budgets to 32k chars/4 (synthesisPipeline.ts:45-46), streams a second model inference with a 90s timeout (synthesisPipeline.ts:50), and either emits a kind='synthesis' message-part or falls through to the recursive turn on failure (synthesisPipeline.ts:250-272).

The pipeline is invoked once per turn that contains a SYNTHESIS_TOOLS call — at most one synthesis pass per turn (the loop picks the first synth-eligible entry, tool-phase.ts:256).

The codecontext tools themselves are HTTP wrappers over the sidecar: /opt/boocode/codecontext/shim.go:412-419 registers eight POST routes (/v1/get_codebase_overview/v1/get_framework_analysis). The shim serialises calls under callMu and forwards JSON-RPC to a single codecontext mcp child (shim.go:194, shim.go:328-333). The child binary is built from github.com/nmakod/codecontext tag v3.2.1 (/opt/boocode/codecontext/Dockerfile:18-22), NOT from the local fork at /opt/forks/codecontext (which is github.com/nuthan-ms/codecontext, fork go.mod: /opt/forks/codecontext/go.mod:1). Container reports codecontext version dev (recon: docker exec boocode_codecontext codecontext --version returned codecontext version dev / Build Date: unknown / Git Commit: unknown).

Wrapper boundaries:

  • /opt/boocode/apps/server/src/services/codecontext_client.ts:68-70 hard timeout REQUEST_TIMEOUT_MS = 30_000, inline truncation TRUNCATION_LIMIT = 32_000.
  • Same file lines 80-95: realpath project + target_dir, reject any target_dir that escapes the project root. The eight wrappers never pass target_dir (callCodecontext injects it server-side, line 99).
  • Lines 130-141 surface the upstream "content is empty" parser bug (issue #37) with an actionable hint pointing at .codecontextignore.

A2. Agent-exposed tool surface

Source of truth: /opt/boocode/data/AGENTS.md (six agents) plus the DEFAULT_TOOLS fallback in /opt/boocode/apps/server/src/services/agents.ts:19-20 (every tool in ALL_TOOLS).

Per-agent codecontext exposure (cited from /opt/boocode/data/AGENTS.md:6,41,62,100,138,179):

Agent Codecontext tools exposed
Code Reviewer (line 3) get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, search_symbols, watch_changes
Debugger (line 38) same eight
Refactorer (line 59) same eight
Architect (line 97) same eight
Security Auditor (line 135) same eight
Prompt Builder (line 176) nonetools: [view_file, list_dir, grep, find_files]

Every project-less or no-agent chat falls back to DEFAULT_TOOLS = ALL_TOOLS (all 21 tools including the eight codecontext ones) (agents.ts:19-20,196). The BOOCODE_TOOLS env var can narrow further via resolveToolTier() (tools.ts:712-732): core (4 tools, no codecontext) / standard (16, all eight codecontext) / all (21). STANDARD_TOOL_NAMES includes all eight codecontext tools (tools.ts:719-732).

The eight codecontext tool registrations live in tools.ts:653-660 and are all marked read-only in READ_ONLY_TOOL_NAMES (tools.ts:689-696).

A3. Actual usage (DB)

Tool-call frequency from message_parts (all-time; DB only has data back to 2026-05-22 today — see "Claims I did not verify" for the retention question):

Query: SELECT payload->>'name', COUNT(*) FROM message_parts WHERE kind='tool_call' GROUP BY 1 ORDER BY 2 DESC

Tool Calls Chats
view_file 129
grep 81
list_dir 78
find_files 25
get_codebase_overview 24 23
search_symbols 8 5
ask_user_input 5 3
foo (typo/invalid) 4 2
view_truncated_output 4 2
git_status 3 2
get_file_analysis 3 1
get_framework_analysis 1 1
([^ (typo/invalid) 1 1

Codecontext-tool calls observed: only 5 of 8 ever invoked (get_codebase_overview, search_symbols, get_file_analysis, get_framework_analysis, and get_dependencies does not appear).

Never called (in the recorded window): get_dependencies, get_symbol_info, get_semantic_neighborhoods, watch_changes.

Per-call args sample (mp.created_at desc, last 12 calls; recon-verified by query against message_parts):

  • get_codebase_overview invoked ~9 times in a row with {"include_stats":true} — repeated overview fetches within minutes.
  • search_symbols examples: {"limit":20,"query":"Kind"}, {"limit":20,"query":"SymbolKind"}, {"limit":20,"query":"Kind","framework_type":"typescript"}.
  • get_file_analysis invoked 3 times in one chat with file_path = apps/server/src/services/inference.ts, apps/server/src/services/inference/parts.ts, apps/server/src/services/system-prompt.tsall three failed with "File not found in graph" (see C3).

A4. Hang and drift correlation

Cohort analysis (query against messages joined to chats that ever used any codecontext tool):

Cohort status rows
no_codecontext complete 24
no_codecontext cancelled 1
used_codecontext complete 191
used_codecontext streaming 2
used_codecontext failed 2

Two failed assistant messages, both in chats that used codecontext. Both have empty content — characteristic of a synth pass that aborted before any deltas streamed (see synthesisPipeline.ts:278-303, markSynthFailed). DB query:

SELECT id, status, created_at, LEFT(content,200)
FROM messages WHERE role='assistant' AND status IN ('failed','streaming')

returned two failed rows with empty content at 2026-05-22 18:43:39 and 2026-05-22 19:59:56. The 18:43 failure correlates with the codecontext sidecar log line 2026/05/22 18:44:10.842554 get_framework_analysis target_dir=/opt/boocode duration_ms=30002 status=rpc_error — a 30 s timeout (codecontext_client.ts:70) under a get_framework_analysis call (synthesisPipeline.ts:34-38 would have triggered synthesis on success — failure path skipped synthesis and surfaced the error).

Drift / format leakage: the query SELECT * FROM messages WHERE role='assistant' AND (content LIKE '%<invoke%' OR content LIKE '%<tool_call%') returned 8 rows; manual review showed 7 are recon/discussion content where the model is quoting <invoke> as a topic, not actually emitting a tool call as text. One real drift case at 2026-05-22 19:05:03 — content begins "I need to investigate the codecontext fork to write this design document. Let me start by reading the key files.\n\n<invoke name="read_file">…" — an Anthropic-format leak. This message is in a chat that did use codecontext, but the drift evidence is too thin (n=1) to claim a correlation.

Part B — TypeScript parsing gap

B1. TS-targeted workload

Per-language breakdown of codecontext calls that target a specific file or framework (DB query):

Language hint Calls
no file_path (overview/framework/symbol search) 33
ts/tsx 3
(no other extension observed)

The three TS-targeted calls were all get_file_analysis in a single chat: inference.ts, inference/parts.ts, system-prompt.ts. All three failed with File not found in graph (see C3 — relative path mishandling). One search_symbols call carried framework_type=typescript (Q="Kind").

So TS is the actual workload for narrow codecontext use; the rest is whole-repo overview/framework analysis with no specific language filter.

B2. Symbol recovery quality

I called the live container against three load-bearing BooCode TS files and compared the symbol list against a manual grep of top-level declarations.

File 1: /opt/boocode/apps/server/src/types/api.ts (371 lines)

Manual count (grep ^(export )?(interface|type|const) ):

  • interfaces: 36
  • top-level types: 15
  • top-level consts: 5
  • total significant: 56

Codecontext output (live HTTP call to http://codecontext:8080/v1/get_file_analysis):

{
  "result": "# File Analysis: ...\n**Lines:** 372\n**Symbols:** 10\n\n## Symbols\n\n- **PROJECT_STATUSES** () - Line 2\n- **PROJECT_STATUSES** () - Line 2\n- **CHAT_STATUSES** () - Line 91\n..."
}

Total reported: 10 symbols, all five *_STATUSES consts duplicated (line 2 appears twice, etc.). After regex-extracting names:

  • Unique symbols reported by codecontext: 8 (5 *_STATUSES consts + 3 header strings Language:/Lines:/Symbols:)
  • Interfaces / types found: 0 of 51.
  • Symbol-recovery rate: 5/56 = ~9% (only the const arrays the JS grammar understands).

Specific misses checked against the actual file (grep -nE on /opt/boocode/apps/server/src/types/api.ts):

  • Line 5 export interface Project — MISSED
  • Line 26 export type SessionStatus — MISSED
  • Line 28 export interface Session — MISSED
  • Line 47 export type WorkspacePaneKind — MISSED
  • All 36 interface declarations and 15 type aliases — MISSED.

File 2: /opt/boocode/apps/server/src/services/tools.ts (763 lines)

Manual count: 47 top-level decls (grep ^(export )?(interface|type|enum|namespace|const|function|class|async function) ).

Codecontext output: 112 symbols reported (but many are noise: local function-scope variables, the literal token "unknown" from type cast positions, even raw labels like out:).

Python-extracted from result: 71 unique names. Cross-checked against 20 significant TS exports the file declares:

  • Found: ListDirInput, READ_ONLY_TOOL_NAMES, CORE_TOOL_NAMES, STANDARD_TOOL_NAMES (4 / 20)
  • MISSED: ToolDef, ViewFileInput, viewFile, listDir, grep, findFiles, viewTruncatedOutput, gitStatus, skillFind, skillUse, skillResource, askUserInput, ALL_TOOLS, TOOLS_BY_NAME, resolveToolTier, toolJsonSchemas — every exported ToolDef<…> named constant is missed because the JS grammar can't parse the TS type annotation : ToolDef<…> that precedes the = and bails out of recognising the const at top-level.
  • Symbol-recovery rate (significant): 4/20 = 20%.

File 3: /opt/boocode/apps/server/src/services/inference/stream-phase.ts (482 lines)

Manual count: 5 top-level decls (2 are export async function, 1 interface, 1 type, 1 const).

Codecontext output: 53 symbols extracted, but the first 20 are header strings (Language:, Lines:, Symbols:), imports (api.js, model-context.js, …), local function names from inside bodies (toolNameById, out:, hasTools), and string literals (parts:). Neither streamCompletion nor executeStreamPhase (the two export async function declarations at lines 145, 346) appear in the symbol list explicitly.

Aggregate: across the three files, codecontext recovers type/interface/enum symbols at effectively 0%, and function/const symbols at roughly 20%. The 9596-symbol whole-repo overview is heavily noise-padded. Generic type parameters and decorators were not checked individually because they're a strict subset of the already-broken case.

B3. Fork status

docs/ts-bindings-design.md does NOT exist. Verified by ls /opt/forks/codecontext/docs/ts-bindings-design.mdNo such file or directory. The /opt/forks/codecontext/docs/ tree has 23 markdown files; none mention TypeScript bindings work (greps under /opt/forks/codecontext/docs/ for TypescriptLanguage|tree-sitter-tsx returned nothing beyond a CodeContext example in HLD.md:831 and config mentions in ARCHITECTURE.md:297).

go.mod dependencies (/opt/forks/codecontext/go.mod:5-18):

  • github.com/tree-sitter/tree-sitter-javascript v0.23.1 (present)
  • github.com/tree-sitter/tree-sitter-typescriptNOT present.

TS-as-JS fallback in internal/parser/manager.go:72-79:

// TypeScript - use JavaScript grammar as fallback until TypeScript bindings are fixed
// Both JS and TS have similar syntax and this provides basic parsing capability
tsLang := sitter.NewLanguage(javascript.Language())
m.languages["typescript"] = tsLang

tsParser := sitter.NewParser()
tsParser.SetLanguage(tsLang)
m.parsers["typescript"] = tsParser

The comment claims this provides "basic parsing capability". B2 shows that interface/type recovery is effectively zero — the JS grammar does not recognise interface, type, generic params, decorators, or even TS-typed const declarations.

Downstream code IS prepared for TS-specific nodes. In internal/parser/manager.go:746-765 nodeToSymbolJS already has cases for interface_declaration and type_alias_declaration:

case "interface_declaration", "interface":
    return &types.Symbol{Type: types.SymbolTypeInterface, ...}
case "type_alias_declaration", "type_declaration":
    return &types.Symbol{Type: types.SymbolTypeType, ...}

These cases are dead code with the JS grammar — they only fire when the parser is the TypeScript grammar. The fork already has the symbol extraction wiring; it's just missing the grammar.

SymbolType is open (string), not an iota/opt/forks/codecontext/pkg/types/graph.go:14:

type SymbolType string

with constants like SymbolTypeInterface, SymbolTypeType, SymbolTypeNamespace already declared (graph.go:16-48). No code changes needed there to add TS-aware symbol types.

Upstream tree-sitter-typescript Go bindings exist. Context7 docs for /tree-sitter/tree-sitter-typescript show the Go package github.com/tree-sitter/tree-sitter-typescript exporting LanguageTypescript() and LanguageTSX():

typescript := sitter.NewLanguage(tree_sitter_typescript.LanguageTypescript())
tsx := sitter.NewLanguage(tree_sitter_typescript.LanguageTSX())

(Context7 query /tree-sitter/tree-sitter-typescript, "Go bindings package name and how to import…", returned a working sample.)

The fork (/opt/forks/codecontext) is not what runs in production. The deployed image is built from github.com/nmakod/codecontext tag v3.2.1 (/opt/boocode/codecontext/Dockerfile:18-22). The fork is a separate working tree at /opt/forks/codecontext on github.com/nuthan-ms/codecontext (/opt/forks/codecontext/go.mod:1). Any TS-grammar work landing in either repo requires a Dockerfile update to point at the right source.

Fork HEAD: ba6b94c 2025-09-01 12:43:09 +0530 Merge pull request #29 from nmakod/release-please--branches--main — newer than the deployed v3.2.1 tag but on the same upstream lineage.

B4. Existing TS-aware alternatives

Searches in /opt/boocode:

  • grep -rln 'ts-morph|@typescript/vfs|createCompilerHost' /opt/boocode/appsno matches in source (only types).
  • Only the typescript package is depended on (/opt/boocode/package.json, /opt/boocode/apps/booterm/package.json, /opt/boocode/apps/server/package.json, /opt/boocode/apps/web/package.json — each declares "typescript": "^5.5.0"). That's the tsc compiler, used for building, not for runtime symbol extraction.
  • No tool in /opt/boocode/apps/server/src parses TS at runtime for any reason other than what codecontext provides.

So BooCode has no existing fallback for TS symbol data: if codecontext can't extract it, nobody else does.

Part C — Optimization opportunities

C1. Tool surface review

Cross-referencing the agent whitelist (A2) with actual usage (A3):

Tool Exposed to 5 agents? Calls observed Recommendation
get_codebase_overview yes 24 Keep — load-bearing, synth-triggering
search_symbols yes 8 Keep — only viable TS query path
get_file_analysis yes 3 Keep but fix relative-path bug (C3)
get_framework_analysis yes 1 Low-use; keep for synth signalling
get_dependencies yes 0 Demote — unused, considered for removal
get_symbol_info yes 0 Demote — unused, considered for removal
get_semantic_neighborhoods yes 0 Demote — unused, considered for removal
watch_changes yes 0 Remove from agent whitelist — also pulled out of synthesis if currently kept

watch_changes in particular is a state-changing async tool with no sensible LLM consumer (the model can't await fsnotify events). It should not be in the 5 agents' whitelists; the synthesis pipeline only calls 3 specific tools (synthesisPipeline.ts:34-38) so removing watch_changes from agent whitelists does not affect the pipeline.

get_dependencies, get_symbol_info, get_semantic_neighborhoods are credible tools but the model never reaches for them — likely a descriptions/discoverability issue. Either improve their tool descriptions (the .description strings registered in tools/codecontext/*.ts) or remove them from agent whitelists.

C2. Latency and token cost

Latencies parsed from the codecontext sidecar access log (docker logs boocode_codecontext --since 24h | grep duration_ms=):

  • Total calls observed: 40 in 24h
  • Total time: 610,404 ms
  • Avg: 15,260 ms per call
  • Min: 1,379 ms
  • p50: 9,417 ms
  • p90: 27,611 ms
  • Max: 30,002 ms (= the 30 s rpc_error timeout)

Sampled MCP-server log lines confirm overview rebuilds cost 28 s on /opt/boocode (6575 files, 115601 symbols, 1186758 chars markdown in 8.22 s). The shim's per-tool log shows the analysis dominates; markdown serialization is sub-second.

Synthesis pipeline expansion (from docker logs boocode):

Five completed synthesis passes today, sample sizes:

  • originalChars (truncated head shipped to synth): 32,078 in every case (= the wrapper's 32 kB cap).
  • fullChars (full overview after re-expansion from tmpfs): 83,406 / 83,408 / 83,410 / 97,283 / 97,464.

In other words, every overview is over the wrapper cap and synthesis always pays a tmpfs round-trip to recover the full content for reference-file extraction. The full content is not shipped to the synth model (the truncated head is — synthesisPipeline.ts:141), so the token-budget contract holds, but the synth still has to wait on the file I/O.

One synthesis timeout in the day (synthesis pass timed out; falling through to recursive turn, chatId a74bfecb…, toolName get_codebase_overview, 90 s after expansion completed — the synth inference itself was too slow). The retry inside the same chat then completed in 31 s with files: 0 (no referenced files extracted), suggesting the timeout repeated until reference extraction was empty.

I have no cache-hit statistics to report — the shim does not log cache hits. The codecontext binary itself logs Refreshing analysis for codebase overview… on every call ([MCP] Refreshing analysis… appears for each get_codebase_overview in the sidecar log), so the analysis is rebuilt per call.

C3. Failure modes

Sidecar errors in the last 7 days (docker logs boocode_codecontext --since 168h | grep -E "status=tool_error|content is empty|panic"):

  1. content is empty parser bug — 2026-05-22 17:37:41 and 17:43:41, both against /opt/homelabhealth, on frontend/node_modules/hono/dist/adapter/aws-lambda/types.js. The wrapper's .codecontextignore template installation (codecontext_client.ts:30-52) didn't help because the file is under node_modules which is supposedly in the template. Suggests either the template hadn't been copied yet or the template's ignore list doesn't cover the path. Each failed call cost ~25 s.
  2. Relative-path failures — 2026-05-22 17:56:51 through 17:57:07 (three back-to-back), all get_file_analysis:
    [MCP] ERROR: File not found in graph: apps/server/src/services/inference.ts (available files: 6575)
    
    The wrapper resolves target_dir to an absolute realpath (codecontext_client.ts:80-99) but file_path is forwarded unchanged. The codecontext binary's file index is keyed on absolute paths (the 115,876-symbol overview reports absolute paths). The model passed apps/server/src/services/inference.ts and the binary couldn't find it. Each failure cost 824 s.
  3. 30 s rpc_error timeout — 2026-05-22 18:44:10 (get_framework_analysis) and 19:38:06 (search_symbols vs /opt/forks/codecontext). The shim's per-call context timeout is 60 s (shim.go:325) but the wrapper aborts at 30 s (codecontext_client.ts:70), so the client gives up before the shim does — the call still runs to completion on the codecontext side, wasting CPU.
  4. Panic in searchSymbols — concurrent map iteration crash in internal/mcp/server.go:1305 (getFilePathForSymbol) under matchesFramework, captured in docker logs boocode_codecontext --since 24h:
    internal/runtime/maps.fatal(...)
    github.com/nuthan-ms/codecontext/internal/mcp.(*CodeContextMCPServer).getFilePathForSymbol(...)
        /build/codecontext/internal/mcp/server.go:1305
    
    This is an upstream bug in v3.2.1 — concurrent map access without a lock. The shim's callMu serialises its calls but the codecontext binary itself appears to have internal concurrency that hits this.

Pattern: the 2 failed assistant messages in A4 align with the 30 s rpc_error timeout (18:44:10) and one other failure window. Failed turns leave empty content because synthesis aborts before any deltas — the model never sees the codecontext error.

Part D — Plan

D1. Tool surface decisions

Title: Trim agent codecontext exposure to the four tools that earn their keep; demote the rest until evidence justifies them.

Why: A3 shows 4 of 8 codecontext tools have zero observed calls, and watch_changes (a fsnotify-coupled tool) has no LLM consumer. The synthesis pipeline only auto-triggers on three tools (synthesisPipeline.ts:34-38), so removing tools from agent whitelists does not affect the server-side synth path.

Scope: edit /opt/boocode/data/AGENTS.md lines 6, 41, 62, 100, 138 (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor) to drop get_dependencies, get_symbol_info, get_semantic_neighborhoods, watch_changes from each tools: array. Roughly 5 line edits.

Risk: if there's a legitimate workflow not yet captured in 24 h of DB data, dropping these tools removes that affordance. Mitigation: keep them registered in tools.ts (the server-side wrappers stay) so the synth pipeline can still call them if SYNTHESIS_TOOLS expands later, and so the BOOCODE_TOOLS=standard tier continues to expose them via the tier filter. Tests: agents.test.ts, tools.test.ts, any agent-roundtrip tests.

Effort: 30 min.

Sequence: standalone. Unblocks D3 (smaller tool list = smaller system prompt = better prompt-cache stability per tools.ts:629-632).

D2. TypeScript support path

Title: Narrow the TS fork scope to "interfaces, types, enums, top- level typed consts" — defer generics and decorators.

Why: Evidence from B1 (3 TS-targeted calls — all get_file_analysis — and 1 search_symbols framework_type=typescript) shows TS is in the workload but at low volume. Evidence from B2 shows symbol recovery is ~0% for interfaces/types and ~20% for typed consts. That gap is what actually breaks model behaviour: when the model asks get_file_analysis for api.ts (which IS what happened today) it gets 10 noise symbols and no interface Project, interface Session, type SessionStatus. The narrow scope (declarations only; skip generics, JSX, decorators) covers ~90% of the recovered-symbol gap and is achievable with one new dependency and one parser-init change.

Scope:

  1. /opt/forks/codecontext/go.mod: add github.com/tree-sitter/tree-sitter-typescript v0.23.x to the require block.
  2. /opt/forks/codecontext/internal/parser/manager.go:72-79: replace the JS-fallback init with
    typescript "github.com/tree-sitter/tree-sitter-typescript/bindings/go"
    ...
    tsLang := sitter.NewLanguage(typescript.LanguageTypescript())
    m.languages["typescript"] = tsLang
    tsxLang := sitter.NewLanguage(typescript.LanguageTSX())
    m.languages["tsx"] = tsxLang
    
    Plus parser registrations. nodeToSymbolJS already handles interface_declaration and type_alias_declaration (lines 746-765) — no extraction code changes needed for the narrow scope.
  3. /opt/forks/codecontext/internal/parser/manager.go:357-395 detectLanguage (skim verified to live around line 357): ensure .tsx maps to "tsx" not "typescript". Likely already correct — verify.
  4. Tests in internal/parser/ — add TS-grammar fixtures (a small .ts file with interface, type, enum) to assert recovery.
  5. Update /opt/boocode/codecontext/Dockerfile:18-22 to clone from the fork instead of github.com/nmakod/codecontext v3.2.1 once the TS-grammar branch lands. Or PR the change upstream first if nmakod/codecontext is open to it.
  6. Drop the fork's own tree-sitter-javascript dependency? No — tree-sitter-typescript Go binding is separate and the JS grammar is still needed for .js/.jsx files.

Rough LoC: ~20 lines in manager.go, +1 line go.mod, +1 import, +1 language-detect entry; ~50 lines of tests; ~5 lines in Dockerfile.

Risk: TS grammar parses superset syntax; some TS files may now hit ERROR nodes the JS grammar happily accepted. Mitigate by keeping the JS grammar registered for .js/.jsx and not changing JS handling. Regression risk lives in the codecontext-binary CI (JS+TS combined corpus) — verify their existing tests still pass. Tests to add: a fixture file containing each B2 missed symbol and a manager_test that asserts the symbols are recovered.

Effort: Phase A (grammar swap + tests + Dockerfile pin): 90 min once a build-and-test loop is set up in the fork.

Sequence: Blocked on a decision about whether to PR upstream (nmakod/codecontext) or fork-and-deploy (nuthan-ms/codecontext). Unblocks D3 (cleaner TS results = smaller noise in synthesis output = smaller token cost).

Decision: Narrow, not "drop" and not "full TS support". Drop is wrong because TS is the workload (A2 + B1 show every agent and the codebase under analysis are TS-heavy). Full Phase 3-4 TS support (generics, decorators, full type queries) is overkill for current usage — interface/type/enum recovery captures the model's actual need.

D3. Synthesis pipeline optimizations

Title: Reduce per-turn codecontext latency and cache the overview.

Why: C2 shows avg 15.2 s per codecontext call and an overview that rebuilds on every call. Synthesis always pays the 30 s wrapper timeout when the codecontext binary panics (C3 case 4) or hangs.

Three sub-items:

D3a. Cache the overview at the shim layer. The shim already serialises calls under callMu (shim.go:74-77). Add a per- target_dir overview cache keyed on a directory-mtime hash, TTL ~60s. Sub-second cache hits for repeated get_codebase_overview calls (today shows ~9 in a single chat over a few minutes).

  • File: /opt/boocode/codecontext/shim.go
  • LoC: ~80
  • Effort: 90 min
  • Risk: invalidation. Use the fastest cheap invalidator (mtime of target_dir + a hash of the file count via os.ReadDir). On any doubt, bypass cache.

D3b. Align wrapper and shim timeouts. Wrapper 30 s (codecontext_client.ts:70), shim ctx 60 s (shim.go:325). The mismatch wastes CPU when the wrapper gives up but the shim keeps running. Either drop the shim ctx to 30 s, or raise the wrapper to 60 s (depending on which budget is right). Recommended: align both to 45 s, abort upstream on wrapper cancel.

  • LoC: 2 lines
  • Effort: 30 min

D3c. Fix the relative-path bug in get_file_analysis. The wrapper resolves target_dir but not file_path. Three failures in one chat today wasted 48 s of CPU. Fix:

  • File: /opt/boocode/apps/server/src/services/tools/codecontext/get_file_analysis.ts (and possibly the shared client at codecontext_client.ts).
  • Have the wrapper resolve file_path against the realpath'd project root before forwarding, mirroring target_dir. Error out if the resolved path doesn't start with the project root.
  • LoC: ~20
  • Effort: 60 min
  • Risk: low — the model loses no affordance; absolute and relative both work.
  • Tests: codecontext_client.test.ts.

Sequence: D3c is independent and high-ROI. D3a depends on nothing. D3b is independent. Recommended order: D3c → D3b → D3a.

D4. Removal candidates

  1. watch_changes agent exposure (A3 + A2). Server-side handler stays for completeness; it should not appear in agent tools: arrays. Edit /opt/boocode/data/AGENTS.md lines 6, 41, 62, 100, 138.
  2. The dead "csharp" comment-out block in /opt/forks/codecontext/internal/parser/manager.go:146-152 — delete-on-touch when D2 lands; not part of D2's core scope.
  3. The 3 zero-use codecontext tool exposuresget_dependencies, get_symbol_info, get_semantic_neighborhoods. Same surgical edits as item 1. Consider keeping get_dependencies on the Refactorer because the agent description explicitly invokes "Use get_dependencies to map call sites" (AGENTS.md:92-93); if the model isn't using it despite the system-prompt nudge, the description in tools/codecontext/get_dependencies.ts likely needs the same verb-forward rewrite.

Claims I did not verify

  • DB retention horizon. All message_parts rows are dated 2026-05-22. That could mean (a) the DB was wiped today, (b) the schema/path moved today, or (c) the project is brand-new and 24 h is genuinely the full history. The CLAUDE.md project context references "v1.13.15-codecontext-synth" which is recent. To verify: docker exec boocode_db psql -U boocode -d boocode -c "SELECT MIN(created_at), MAX(created_at), COUNT(*) FROM messages;" then cross-check against the BooCode roadmap's release dates. The 30-day window in A3's query may simply not have older data to find.
  • Whether nmakod/codecontext v3.2.1 hosts the same nodeToSymbolJS switch I read in the fork. The fork at /opt/forks/codecontext is nuthan-ms/codecontext per go.mod. The deployed v3.2.1 is nmakod/codecontext. The Dockerfile comment (/opt/boocode/codecontext/Dockerfile:13-16) says the module path differs but "the tagged v3.2.1 source tree is the same either way." To verify, clone https://github.com/nmakod/codecontext at tag v3.2.1 and diff internal/parser/manager.go against the fork — outside this recon's read-only scope.
  • Whether tree-sitter-typescript v0.23.x Go bindings actually build under the fork's go 1.24.5 + Tree-sitter v0.25.0 combination. Context7 docs confirm the API exists. Confirm by go get github.com/tree-sitter/tree-sitter-typescript@latest followed by go build ./... in a scratch worktree.
  • Whether the codecontext panic in searchSymbols is reproducible on /opt/boocode or only on /opt/forks/codecontext (the panic was captured against target_dir /opt/forks/codecontext). Reproduce via docker exec boocode_codecontext wget -qO - --post-data='{"target_dir":"/opt/boocode","query":"foo","limit":10}' --header='Content-Type: application/json' http://localhost:8080/v1/search_symbols.
  • Cache hit rate of codecontext analysis (per call vs reused). The MCP-server log line Refreshing analysis for codebase overview… suggests rebuild-every-call, but I did not confirm by reading the codecontext source — only the deployed binary's log output. To verify, read /opt/forks/codecontext/internal/mcp/server.go around the Refreshing analysis… log lines.
  • Drift correlation strength. N=1 confirmed drift case is too small to call a correlation with codecontext use. To raise the signal: extend retention, re-query after a week of synthetic load with and without codecontext tools.
  • Whether the synth pipeline's truncated head only ships fewer tokens than a full inlined codecontext result would. Today's budget contract assumes yes (synthesisPipeline.ts:138-145 comment "Truncated head only — full content was used for reference extraction above"). To verify: instrument the per-pass promptTokens and compare against a one-off pass with the full content.
  • The Architect/Code-Reviewer agents' system-prompt copy versus actual tool usage. AGENTS.md text claims agents will "Use get_dependencies to map call sites" (line 92) and "Use get_semantic_neighborhoods to find related components" (line 132), but A3 shows neither is called. To verify whether the model is ignoring the prompt or whether these agents simply aren't being invoked, query SELECT s.name, COUNT(*) FROM sessions s JOIN chats c ON c.session_id=s.id JOIN messages m ON m.chat_id=c.id WHERE m.role='assistant' GROUP BY 1 ORDER BY 2 DESC; and compare named agents to chat counts.