Files
boocode/docs/codecontext-ts-plan.md
indifferentketchup 04673eaf59 v2.1.1: roadmap cleanup + README update + openspec archive
- Archive all 10 shipped openspec changes to openspec/changes/archived/
- Update boocode_roadmap.md: date, shipped status for v1.14/v1.15/v2.0, add v2.1.0 section
- Update README.md: 3-app monorepo, add services table, add What's shipped section
- Remove stale active openspec folders (all work shipped)
2026-05-25 20:23:22 +00:00

743 lines
33 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Codecontext + TypeScript: recon and plan
**Date:** 2026-05-22
**Author:** read-only recon, evidence-first
## Part A — Current codecontext usage in BooCode
### A1. Server-side synthesis pipeline
BooCode runs a **forced second-inference synthesis pass** after a model
emits any of three codecontext tool calls. The list is hard-coded:
`/opt/boocode/apps/server/src/services/synthesisPipeline.ts:34-38`
```ts
export const SYNTHESIS_TOOLS: ReadonlySet<string> = new Set([
'get_codebase_overview',
'get_framework_analysis',
'get_semantic_neighborhoods',
]);
```
The pipeline is triggered from the tool-phase, not by the model:
`/opt/boocode/apps/server/src/services/inference/tool-phase.ts:200-279`.
After tool-phase records the tool_call/tool_result rows it picks the first
synth-eligible entry, expands the inline-truncated head via tmpfs
(`readTruncation`), pulls top-N referenced files + project docs
(BOOCHAT.md, AGENTS.md, CONTEXT.md, *roadmap*.md), token-budgets to
32k chars/4 (`synthesisPipeline.ts:45-46`), streams a second model
inference with a 90s timeout (`synthesisPipeline.ts:50`), and either
emits a `kind='synthesis'` message-part or falls through to the
recursive turn on failure (`synthesisPipeline.ts:250-272`).
The pipeline is **invoked once per turn that contains a SYNTHESIS_TOOLS
call** — at most one synthesis pass per turn (the loop picks the first
synth-eligible entry, `tool-phase.ts:256`).
The codecontext tools themselves are HTTP wrappers over the sidecar:
`/opt/boocode/codecontext/shim.go:412-419` registers eight POST routes
(`/v1/get_codebase_overview``/v1/get_framework_analysis`). The shim
serialises calls under `callMu` and forwards JSON-RPC to a single
`codecontext mcp` child (`shim.go:194`, `shim.go:328-333`). The child
binary is built from `github.com/nmakod/codecontext` tag `v3.2.1`
(`/opt/boocode/codecontext/Dockerfile:18-22`), NOT from the local fork at
`/opt/forks/codecontext` (which is `github.com/nuthan-ms/codecontext`,
fork go.mod: `/opt/forks/codecontext/go.mod:1`). Container reports
`codecontext version dev` (recon: `docker exec boocode_codecontext
codecontext --version` returned `codecontext version dev / Build Date:
unknown / Git Commit: unknown`).
Wrapper boundaries:
- `/opt/boocode/apps/server/src/services/codecontext_client.ts:68-70`
hard timeout `REQUEST_TIMEOUT_MS = 30_000`, inline truncation
`TRUNCATION_LIMIT = 32_000`.
- Same file lines 80-95: realpath project + target_dir, reject any
target_dir that escapes the project root. The eight wrappers never
pass `target_dir` (`callCodecontext` injects it server-side, line 99).
- Lines 130-141 surface the upstream "content is empty" parser bug
(issue #37) with an actionable hint pointing at `.codecontextignore`.
### A2. Agent-exposed tool surface
Source of truth: `/opt/boocode/data/AGENTS.md` (six agents) plus the
`DEFAULT_TOOLS` fallback in
`/opt/boocode/apps/server/src/services/agents.ts:19-20` (every tool in
`ALL_TOOLS`).
Per-agent codecontext exposure (cited from
`/opt/boocode/data/AGENTS.md:6,41,62,100,138,179`):
| Agent | Codecontext tools exposed |
|---|---|
| Code Reviewer (line 3) | get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, search_symbols, watch_changes |
| Debugger (line 38) | same eight |
| Refactorer (line 59) | same eight |
| Architect (line 97) | same eight |
| Security Auditor (line 135) | same eight |
| Prompt Builder (line 176) | **none**`tools: [view_file, list_dir, grep, find_files]` |
Every project-less or no-agent chat falls back to `DEFAULT_TOOLS` =
`ALL_TOOLS` (all 21 tools including the eight codecontext ones)
(`agents.ts:19-20,196`). The `BOOCODE_TOOLS` env var can narrow further
via `resolveToolTier()` (`tools.ts:712-732`): `core` (4 tools, no
codecontext) / `standard` (16, all eight codecontext) / `all` (21).
`STANDARD_TOOL_NAMES` includes all eight codecontext tools
(`tools.ts:719-732`).
The eight codecontext tool registrations live in `tools.ts:653-660` and
are all marked read-only in `READ_ONLY_TOOL_NAMES` (`tools.ts:689-696`).
### A3. Actual usage (DB)
Tool-call frequency from `message_parts` (all-time; DB only has data
back to 2026-05-22 today — see "Claims I did not verify" for the
retention question):
Query: `SELECT payload->>'name', COUNT(*) FROM message_parts WHERE
kind='tool_call' GROUP BY 1 ORDER BY 2 DESC`
| Tool | Calls | Chats |
|---|---:|---:|
| view_file | 129 | — |
| grep | 81 | — |
| list_dir | 78 | — |
| find_files | 25 | — |
| **get_codebase_overview** | **24** | 23 |
| **search_symbols** | **8** | 5 |
| ask_user_input | 5 | 3 |
| `foo` (typo/invalid) | 4 | 2 |
| view_truncated_output | 4 | 2 |
| git_status | 3 | 2 |
| **get_file_analysis** | **3** | 1 |
| **get_framework_analysis** | **1** | 1 |
| `([^` (typo/invalid) | 1 | 1 |
Codecontext-tool calls observed: **only 5 of 8** ever invoked
(`get_codebase_overview`, `search_symbols`, `get_file_analysis`,
`get_framework_analysis`, and `get_dependencies` does not appear).
**Never called** (in the recorded window): `get_dependencies`,
`get_symbol_info`, `get_semantic_neighborhoods`, `watch_changes`.
Per-call args sample (`mp.created_at` desc, last 12 calls;
recon-verified by query against message_parts):
- `get_codebase_overview` invoked ~9 times in a row with
`{"include_stats":true}` — repeated overview fetches within minutes.
- `search_symbols` examples: `{"limit":20,"query":"Kind"}`,
`{"limit":20,"query":"SymbolKind"}`,
`{"limit":20,"query":"Kind","framework_type":"typescript"}`.
- `get_file_analysis` invoked 3 times in one chat with
`file_path` = `apps/server/src/services/inference.ts`,
`apps/server/src/services/inference/parts.ts`,
`apps/server/src/services/system-prompt.ts`**all three failed**
with "File not found in graph" (see C3).
### A4. Hang and drift correlation
**Cohort analysis** (query against `messages` joined to chats that
ever used any codecontext tool):
| Cohort | status | rows |
|---|---|---:|
| no_codecontext | complete | 24 |
| no_codecontext | cancelled | 1 |
| used_codecontext | complete | 191 |
| used_codecontext | streaming | 2 |
| used_codecontext | **failed** | **2** |
Two failed assistant messages, both in chats that used codecontext.
Both have empty `content` — characteristic of a synth pass that aborted
before any deltas streamed (see `synthesisPipeline.ts:278-303`,
`markSynthFailed`). DB query:
```
SELECT id, status, created_at, LEFT(content,200)
FROM messages WHERE role='assistant' AND status IN ('failed','streaming')
```
returned two `failed` rows with empty content at 2026-05-22 18:43:39 and
2026-05-22 19:59:56. The 18:43 failure correlates with the codecontext
sidecar log line `2026/05/22 18:44:10.842554 get_framework_analysis
target_dir=/opt/boocode duration_ms=30002 status=rpc_error` — a 30 s
timeout (`codecontext_client.ts:70`) under a `get_framework_analysis`
call (`synthesisPipeline.ts:34-38` would have triggered synthesis on
success — failure path skipped synthesis and surfaced the error).
**Drift / format leakage:** the query
`SELECT * FROM messages WHERE role='assistant' AND (content LIKE
'%<invoke%' OR content LIKE '%<tool_call%')` returned 8 rows; manual
review showed 7 are recon/discussion content where the model is
quoting `<invoke>` as a *topic*, not actually emitting a tool call as
text. **One real drift case** at 2026-05-22 19:05:03 — content begins
"I need to investigate the codecontext fork to write this design
document. Let me start by reading the key files.\n\n<invoke
name=\"read_file\">…" — an Anthropic-format leak. This message is in a
chat that did use codecontext, but the drift evidence is too thin
(n=1) to claim a correlation.
## Part B — TypeScript parsing gap
### B1. TS-targeted workload
Per-language breakdown of codecontext calls that target a specific
file or framework (DB query):
| Language hint | Calls |
|---|---:|
| no file_path (overview/framework/symbol search) | 33 |
| ts/tsx | 3 |
| (no other extension observed) | — |
The three TS-targeted calls were all `get_file_analysis` in a single
chat: `inference.ts`, `inference/parts.ts`, `system-prompt.ts`. **All
three failed** with `File not found in graph` (see C3 — relative path
mishandling). One `search_symbols` call carried
`framework_type=typescript` (Q="Kind").
So **TS is the actual workload** for narrow codecontext use; the rest
is whole-repo overview/framework analysis with no specific language
filter.
### B2. Symbol recovery quality
I called the live container against three load-bearing BooCode TS files
and compared the symbol list against a manual grep of top-level
declarations.
**File 1: `/opt/boocode/apps/server/src/types/api.ts` (371 lines)**
Manual count (grep `^(export )?(interface|type|const) `):
- interfaces: 36
- top-level types: 15
- top-level consts: 5
- total significant: 56
Codecontext output (live HTTP call to
`http://codecontext:8080/v1/get_file_analysis`):
```json
{
"result": "# File Analysis: ...\n**Lines:** 372\n**Symbols:** 10\n\n## Symbols\n\n- **PROJECT_STATUSES** () - Line 2\n- **PROJECT_STATUSES** () - Line 2\n- **CHAT_STATUSES** () - Line 91\n..."
}
```
Total reported: 10 symbols, all five `*_STATUSES` consts duplicated
(line 2 appears twice, etc.). After regex-extracting names:
- Unique symbols reported by codecontext: 8 (5 *_STATUSES consts + 3
header strings `Language:`/`Lines:`/`Symbols:`)
- Interfaces / types found: **0 of 51**.
- Symbol-recovery rate: **5/56 = ~9%** (only the const arrays the JS
grammar understands).
Specific misses checked against the actual file
(grep -nE on `/opt/boocode/apps/server/src/types/api.ts`):
- Line 5 `export interface Project` — MISSED
- Line 26 `export type SessionStatus` — MISSED
- Line 28 `export interface Session` — MISSED
- Line 47 `export type WorkspacePaneKind` — MISSED
- All 36 interface declarations and 15 type aliases — MISSED.
**File 2: `/opt/boocode/apps/server/src/services/tools.ts` (763 lines)**
Manual count: 47 top-level decls
(grep `^(export )?(interface|type|enum|namespace|const|function|class|async function) `).
Codecontext output: **112 symbols** reported (but many are noise:
local function-scope variables, the literal token `"unknown"` from
type cast positions, even raw labels like `out:`).
Python-extracted from result: 71 unique names. Cross-checked against
20 significant TS exports the file declares:
- Found: `ListDirInput`, `READ_ONLY_TOOL_NAMES`, `CORE_TOOL_NAMES`,
`STANDARD_TOOL_NAMES` (4 / 20)
- **MISSED: `ToolDef`, `ViewFileInput`, `viewFile`, `listDir`, `grep`,
`findFiles`, `viewTruncatedOutput`, `gitStatus`, `skillFind`,
`skillUse`, `skillResource`, `askUserInput`, `ALL_TOOLS`,
`TOOLS_BY_NAME`, `resolveToolTier`, `toolJsonSchemas`** — every
exported `ToolDef<…>` named constant is missed because the JS
grammar can't parse the TS type annotation `: ToolDef<…>` that
precedes the `=` and bails out of recognising the const at
top-level.
- Symbol-recovery rate (significant): **4/20 = 20%**.
**File 3: `/opt/boocode/apps/server/src/services/inference/stream-phase.ts` (482 lines)**
Manual count: 5 top-level decls (2 are `export async function`,
1 interface, 1 type, 1 const).
Codecontext output: 53 symbols extracted, but the first 20 are header
strings (`Language:`, `Lines:`, `Symbols:`), imports (`api.js`,
`model-context.js`, …), local function names from inside bodies
(`toolNameById`, `out:`, `hasTools`), and string literals
(`parts:`). Neither `streamCompletion` nor `executeStreamPhase` (the
two `export async function` declarations at lines 145, 346) appear in
the symbol list explicitly.
**Aggregate:** across the three files, codecontext recovers
type/interface/enum symbols at effectively **0%**, and function/const
symbols at roughly **20%**. The 9596-symbol whole-repo overview is
heavily noise-padded. Generic type parameters and decorators were not
checked individually because they're a strict subset of the
already-broken case.
### B3. Fork status
**`docs/ts-bindings-design.md` does NOT exist.** Verified by
`ls /opt/forks/codecontext/docs/ts-bindings-design.md``No such file
or directory`. The `/opt/forks/codecontext/docs/` tree has 23 markdown
files; none mention TypeScript bindings work (greps under
`/opt/forks/codecontext/docs/` for `TypescriptLanguage|tree-sitter-tsx`
returned nothing beyond a CodeContext example in `HLD.md:831` and
config mentions in `ARCHITECTURE.md:297`).
**go.mod dependencies (`/opt/forks/codecontext/go.mod:5-18`):**
- `github.com/tree-sitter/tree-sitter-javascript v0.23.1` (present)
- `github.com/tree-sitter/tree-sitter-typescript`**NOT present**.
**TS-as-JS fallback in `internal/parser/manager.go:72-79`:**
```go
// TypeScript - use JavaScript grammar as fallback until TypeScript bindings are fixed
// Both JS and TS have similar syntax and this provides basic parsing capability
tsLang := sitter.NewLanguage(javascript.Language())
m.languages["typescript"] = tsLang
tsParser := sitter.NewParser()
tsParser.SetLanguage(tsLang)
m.parsers["typescript"] = tsParser
```
The comment claims this provides "basic parsing capability". B2 shows
that interface/type recovery is effectively zero — the JS grammar does
not recognise `interface`, `type`, generic params, decorators, or even
TS-typed const declarations.
**Downstream code IS prepared for TS-specific nodes.** In
`internal/parser/manager.go:746-765` `nodeToSymbolJS` already has
cases for `interface_declaration` and `type_alias_declaration`:
```go
case "interface_declaration", "interface":
return &types.Symbol{Type: types.SymbolTypeInterface, ...}
case "type_alias_declaration", "type_declaration":
return &types.Symbol{Type: types.SymbolTypeType, ...}
```
These cases are dead code with the JS grammar — they only fire when
the parser is the TypeScript grammar. The fork already has the symbol
extraction wiring; it's just missing the grammar.
**`SymbolType` is open (string), not an iota** —
`/opt/forks/codecontext/pkg/types/graph.go:14`:
```go
type SymbolType string
```
with constants like `SymbolTypeInterface`, `SymbolTypeType`,
`SymbolTypeNamespace` already declared (`graph.go:16-48`). No code
changes needed there to add TS-aware symbol types.
**Upstream `tree-sitter-typescript` Go bindings exist.** Context7 docs
for `/tree-sitter/tree-sitter-typescript` show the Go package
`github.com/tree-sitter/tree-sitter-typescript` exporting
`LanguageTypescript()` and `LanguageTSX()`:
```go
typescript := sitter.NewLanguage(tree_sitter_typescript.LanguageTypescript())
tsx := sitter.NewLanguage(tree_sitter_typescript.LanguageTSX())
```
(Context7 query `/tree-sitter/tree-sitter-typescript`,
"Go bindings package name and how to import…", returned a working
sample.)
**The fork (`/opt/forks/codecontext`) is not what runs in production.**
The deployed image is built from `github.com/nmakod/codecontext` tag
v3.2.1 (`/opt/boocode/codecontext/Dockerfile:18-22`). The fork is a
separate working tree at `/opt/forks/codecontext` on
`github.com/nuthan-ms/codecontext` (`/opt/forks/codecontext/go.mod:1`).
Any TS-grammar work landing in either repo requires a Dockerfile
update to point at the right source.
**Fork HEAD:** `ba6b94c 2025-09-01 12:43:09 +0530 Merge pull request
#29 from nmakod/release-please--branches--main` — newer than the
deployed v3.2.1 tag but on the same upstream lineage.
### B4. Existing TS-aware alternatives
Searches in `/opt/boocode`:
- `grep -rln 'ts-morph|@typescript/vfs|createCompilerHost'
/opt/boocode/apps` → **no matches** in source (only types).
- Only the `typescript` package is depended on
(`/opt/boocode/package.json`, `/opt/boocode/apps/booterm/package.json`,
`/opt/boocode/apps/server/package.json`,
`/opt/boocode/apps/web/package.json` — each declares
`"typescript": "^5.5.0"`). That's the tsc compiler, used for
building, not for runtime symbol extraction.
- No tool in `/opt/boocode/apps/server/src` parses TS at runtime for
any reason other than what codecontext provides.
So BooCode has **no existing fallback** for TS symbol data: if
codecontext can't extract it, nobody else does.
## Part C — Optimization opportunities
### C1. Tool surface review
Cross-referencing the agent whitelist (A2) with actual usage (A3):
| Tool | Exposed to 5 agents? | Calls observed | Recommendation |
|---|---|---:|---|
| get_codebase_overview | yes | 24 | **Keep** — load-bearing, synth-triggering |
| search_symbols | yes | 8 | **Keep** — only viable TS query path |
| get_file_analysis | yes | 3 | **Keep** but fix relative-path bug (C3) |
| get_framework_analysis | yes | 1 | Low-use; **keep** for synth signalling |
| get_dependencies | yes | **0** | **Demote** — unused, considered for removal |
| get_symbol_info | yes | **0** | **Demote** — unused, considered for removal |
| get_semantic_neighborhoods | yes | **0** | **Demote** — unused, considered for removal |
| watch_changes | yes | **0** | **Remove** from agent whitelist — also pulled out of synthesis if currently kept |
`watch_changes` in particular is a state-changing async tool with no
sensible LLM consumer (the model can't await fsnotify events). It
should not be in the 5 agents' whitelists; the synthesis pipeline only
calls 3 specific tools (`synthesisPipeline.ts:34-38`) so removing
`watch_changes` from agent whitelists does not affect the pipeline.
`get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods`
are credible tools but the model never reaches for them — likely a
descriptions/discoverability issue. Either improve their tool
descriptions (the `.description` strings registered in
`tools/codecontext/*.ts`) or remove them from agent whitelists.
### C2. Latency and token cost
Latencies parsed from the codecontext sidecar access log
(`docker logs boocode_codecontext --since 24h | grep duration_ms=`):
- Total calls observed: 40 in 24h
- Total time: 610,404 ms
- Avg: **15,260 ms per call**
- Min: 1,379 ms
- p50: 9,417 ms
- p90: 27,611 ms
- Max: 30,002 ms (= the 30 s rpc_error timeout)
Sampled MCP-server log lines confirm overview rebuilds cost 28 s on
/opt/boocode (`6575 files, 115601 symbols, 1186758 chars markdown`
in 8.22 s). The shim's per-tool log shows the analysis dominates;
markdown serialization is sub-second.
**Synthesis pipeline expansion** (from `docker logs boocode`):
Five completed synthesis passes today, sample sizes:
- `originalChars` (truncated head shipped to synth): **32,078** in
every case (= the wrapper's 32 kB cap).
- `fullChars` (full overview after re-expansion from tmpfs): 83,406 /
83,408 / 83,410 / 97,283 / 97,464.
In other words, every overview is over the wrapper cap and synthesis
always pays a tmpfs round-trip to recover the full content for
reference-file extraction. The full content is *not* shipped to the
synth model (the truncated head is — `synthesisPipeline.ts:141`), so
the token-budget contract holds, but the synth still has to wait on
the file I/O.
One synthesis timeout in the day (`synthesis pass timed out; falling
through to recursive turn`, chatId a74bfecb…, toolName
get_codebase_overview, 90 s after expansion completed — the synth
inference itself was too slow). The retry inside the same chat then
completed in 31 s with `files: 0` (no referenced files extracted),
suggesting the timeout repeated until reference extraction was
empty.
I have no cache-hit statistics to report — the shim does not log
cache hits. The codecontext binary itself logs `Refreshing analysis
for codebase overview…` on every call (`[MCP] Refreshing analysis…`
appears for each `get_codebase_overview` in the sidecar log), so the
analysis is rebuilt per call.
### C3. Failure modes
Sidecar errors in the last 7 days
(`docker logs boocode_codecontext --since 168h | grep -E
"status=tool_error|content is empty|panic"`):
1. **`content is empty` parser bug** — 2026-05-22 17:37:41 and
17:43:41, both against `/opt/homelabhealth`, on
`frontend/node_modules/hono/dist/adapter/aws-lambda/types.js`.
The wrapper's `.codecontextignore` template installation
(`codecontext_client.ts:30-52`) didn't help because the file is
under `node_modules` which is supposedly in the template. Suggests
either the template hadn't been copied yet or the template's
ignore list doesn't cover the path. Each failed call cost ~25 s.
2. **Relative-path failures** — 2026-05-22 17:56:51 through 17:57:07
(three back-to-back), all `get_file_analysis`:
```
[MCP] ERROR: File not found in graph: apps/server/src/services/inference.ts (available files: 6575)
```
The wrapper resolves `target_dir` to an absolute realpath
(`codecontext_client.ts:80-99`) but `file_path` is forwarded
unchanged. The codecontext binary's file index is keyed on
absolute paths (the 115,876-symbol overview reports absolute
paths). The model passed `apps/server/src/services/inference.ts`
and the binary couldn't find it. Each failure cost 824 s.
3. **30 s rpc_error timeout** — 2026-05-22 18:44:10
(get_framework_analysis) and 19:38:06 (search_symbols vs
/opt/forks/codecontext). The shim's per-call context timeout is
60 s (`shim.go:325`) but the wrapper aborts at 30 s
(`codecontext_client.ts:70`), so the client gives up before the
shim does — the call still runs to completion on the codecontext
side, wasting CPU.
4. **Panic in `searchSymbols`** — concurrent map iteration crash in
`internal/mcp/server.go:1305` (`getFilePathForSymbol`) under
`matchesFramework`, captured in
`docker logs boocode_codecontext --since 24h`:
```
internal/runtime/maps.fatal(...)
github.com/nuthan-ms/codecontext/internal/mcp.(*CodeContextMCPServer).getFilePathForSymbol(...)
/build/codecontext/internal/mcp/server.go:1305
```
This is an upstream bug in v3.2.1 — concurrent map access without
a lock. The shim's `callMu` serialises *its* calls but the
codecontext binary itself appears to have internal concurrency
that hits this.
**Pattern:** the 2 failed assistant messages in A4 align with the 30 s
rpc_error timeout (18:44:10) and one other failure window. Failed
turns leave empty `content` because synthesis aborts before any
deltas — the model never sees the codecontext error.
## Part D — Plan
### D1. Tool surface decisions
**Title:** Trim agent codecontext exposure to the four tools that earn
their keep; demote the rest until evidence justifies them.
**Why:** A3 shows 4 of 8 codecontext tools have zero observed calls,
and `watch_changes` (a fsnotify-coupled tool) has no LLM consumer.
The synthesis pipeline only auto-triggers on three tools
(`synthesisPipeline.ts:34-38`), so removing tools from agent
whitelists does not affect the server-side synth path.
**Scope:** edit `/opt/boocode/data/AGENTS.md` lines 6, 41, 62, 100,
138 (Code Reviewer, Debugger, Refactorer, Architect, Security
Auditor) to drop `get_dependencies`, `get_symbol_info`,
`get_semantic_neighborhoods`, `watch_changes` from each `tools:`
array. Roughly 5 line edits.
**Risk:** if there's a legitimate workflow not yet captured in 24 h
of DB data, dropping these tools removes that affordance. Mitigation:
keep them registered in `tools.ts` (the server-side wrappers stay) so
the synth pipeline can still call them if `SYNTHESIS_TOOLS` expands
later, and so the `BOOCODE_TOOLS=standard` tier continues to expose
them via the tier filter. Tests: `agents.test.ts`, `tools.test.ts`,
any agent-roundtrip tests.
**Effort:** 30 min.
**Sequence:** standalone. Unblocks D3 (smaller tool list = smaller
system prompt = better prompt-cache stability per `tools.ts:629-632`).
### D2. TypeScript support path
**Title:** Narrow the TS fork scope to "interfaces, types, enums, top-
level typed consts" — defer generics and decorators.
**Why:** Evidence from B1 (3 TS-targeted calls — all
`get_file_analysis` — and 1 `search_symbols framework_type=typescript`)
shows TS is in the workload but at low volume. Evidence from B2
shows symbol recovery is **~0% for interfaces/types and ~20% for
typed consts**. That gap is what actually breaks model behaviour:
when the model asks `get_file_analysis` for `api.ts` (which IS what
happened today) it gets 10 noise symbols and no `interface Project`,
`interface Session`, `type SessionStatus`. The narrow scope
(declarations only; skip generics, JSX, decorators) covers ~90% of
the recovered-symbol gap and is achievable with one new dependency
and one parser-init change.
**Scope:**
1. `/opt/forks/codecontext/go.mod`: add
`github.com/tree-sitter/tree-sitter-typescript v0.23.x` to the
`require` block.
2. `/opt/forks/codecontext/internal/parser/manager.go:72-79`:
replace the JS-fallback init with
```go
typescript "github.com/tree-sitter/tree-sitter-typescript/bindings/go"
...
tsLang := sitter.NewLanguage(typescript.LanguageTypescript())
m.languages["typescript"] = tsLang
tsxLang := sitter.NewLanguage(typescript.LanguageTSX())
m.languages["tsx"] = tsxLang
```
Plus parser registrations. `nodeToSymbolJS` already handles
`interface_declaration` and `type_alias_declaration` (lines
746-765) — no extraction code changes needed for the narrow scope.
3. `/opt/forks/codecontext/internal/parser/manager.go:357-395`
`detectLanguage` (skim verified to live around line 357): ensure
`.tsx` maps to `"tsx"` not `"typescript"`. Likely already correct
— verify.
4. Tests in `internal/parser/` — add TS-grammar fixtures (a small
`.ts` file with interface, type, enum) to assert recovery.
5. Update `/opt/boocode/codecontext/Dockerfile:18-22` to clone from
the fork instead of `github.com/nmakod/codecontext` v3.2.1 once
the TS-grammar branch lands. **Or** PR the change upstream first
if `nmakod/codecontext` is open to it.
6. Drop the fork's own `tree-sitter-javascript` dependency? No —
`tree-sitter-typescript` Go binding is separate and the JS
grammar is still needed for `.js`/`.jsx` files.
Rough LoC: ~20 lines in manager.go, +1 line go.mod, +1 import, +1
language-detect entry; ~50 lines of tests; ~5 lines in Dockerfile.
**Risk:** TS grammar parses superset syntax; some TS files may now
hit `ERROR` nodes the JS grammar happily accepted. Mitigate by
keeping the JS grammar registered for `.js`/`.jsx` and not changing
JS handling. Regression risk lives in the codecontext-binary CI
(JS+TS combined corpus) — verify their existing tests still pass.
Tests to add: a fixture file containing each B2 missed symbol and a
manager_test that asserts the symbols are recovered.
**Effort:** Phase A (grammar swap + tests + Dockerfile pin): 90 min
once a build-and-test loop is set up in the fork.
**Sequence:** Blocked on a decision about whether to PR upstream
(`nmakod/codecontext`) or fork-and-deploy (`nuthan-ms/codecontext`).
Unblocks D3 (cleaner TS results = smaller noise in synthesis output
= smaller token cost).
**Decision:** **Narrow**, not "drop" and not "full TS support". Drop
is wrong because TS *is* the workload (A2 + B1 show every agent and
the codebase under analysis are TS-heavy). Full Phase 3-4 TS support
(generics, decorators, full type queries) is overkill for current
usage — interface/type/enum recovery captures the model's actual
need.
### D3. Synthesis pipeline optimizations
**Title:** Reduce per-turn codecontext latency and cache the overview.
**Why:** C2 shows avg 15.2 s per codecontext call and an overview
that rebuilds on every call. Synthesis always pays the 30 s wrapper
timeout when the codecontext binary panics (C3 case 4) or hangs.
**Three sub-items:**
D3a. **Cache the overview at the shim layer.** The shim already
serialises calls under `callMu` (`shim.go:74-77`). Add a per-
`target_dir` overview cache keyed on a directory-mtime hash, TTL ~60s.
Sub-second cache hits for repeated `get_codebase_overview` calls
(today shows ~9 in a single chat over a few minutes).
- File: `/opt/boocode/codecontext/shim.go`
- LoC: ~80
- Effort: 90 min
- Risk: invalidation. Use the fastest cheap invalidator (mtime of
target_dir + a hash of the file count via `os.ReadDir`). On any
doubt, bypass cache.
D3b. **Align wrapper and shim timeouts.** Wrapper 30 s
(`codecontext_client.ts:70`), shim ctx 60 s (`shim.go:325`). The
mismatch wastes CPU when the wrapper gives up but the shim keeps
running. Either drop the shim ctx to 30 s, or raise the wrapper
to 60 s (depending on which budget is right). Recommended: align
both to 45 s, abort upstream on wrapper cancel.
- LoC: 2 lines
- Effort: 30 min
D3c. **Fix the relative-path bug in `get_file_analysis`.** The
wrapper resolves `target_dir` but not `file_path`. Three failures
in one chat today wasted 48 s of CPU. Fix:
- File: `/opt/boocode/apps/server/src/services/tools/codecontext/get_file_analysis.ts`
(and possibly the shared client at `codecontext_client.ts`).
- Have the wrapper resolve `file_path` against the realpath'd
project root before forwarding, mirroring `target_dir`. Error out
if the resolved path doesn't start with the project root.
- LoC: ~20
- Effort: 60 min
- Risk: low — the model loses no affordance; absolute and relative
both work.
- Tests: `codecontext_client.test.ts`.
**Sequence:** D3c is independent and high-ROI. D3a depends on
nothing. D3b is independent. Recommended order: D3c → D3b → D3a.
### D4. Removal candidates
1. **`watch_changes` agent exposure** (A3 + A2). Server-side handler
stays for completeness; it should not appear in agent
`tools:` arrays. Edit `/opt/boocode/data/AGENTS.md` lines 6, 41,
62, 100, 138.
2. **The dead "csharp" comment-out block** in
`/opt/forks/codecontext/internal/parser/manager.go:146-152` —
delete-on-touch when D2 lands; not part of D2's core scope.
3. **The 3 zero-use codecontext tool exposures** —
`get_dependencies`, `get_symbol_info`, `get_semantic_neighborhoods`.
Same surgical edits as item 1. Consider keeping
`get_dependencies` on the Refactorer because the agent
description explicitly invokes "Use get_dependencies to map call
sites" (`AGENTS.md:92-93`); if the model isn't using it despite
the system-prompt nudge, the description in
`tools/codecontext/get_dependencies.ts` likely needs the same
verb-forward rewrite.
## Claims I did not verify
- **DB retention horizon.** All `message_parts` rows are dated
2026-05-22. That could mean (a) the DB was wiped today, (b) the
schema/path moved today, or (c) the project is brand-new and 24 h
is genuinely the full history. The CLAUDE.md project context
references "v1.13.15-codecontext-synth" which is recent. To verify:
`docker exec boocode_db psql -U boocode -d boocode -c "SELECT
MIN(created_at), MAX(created_at), COUNT(*) FROM messages;"` then
cross-check against the BooCode roadmap's release dates. The 30-day
window in A3's query may simply not have older data to find.
- **Whether `nmakod/codecontext` v3.2.1 hosts the same
`nodeToSymbolJS` switch I read in the fork.** The fork at
`/opt/forks/codecontext` is `nuthan-ms/codecontext` per
go.mod. The deployed v3.2.1 is `nmakod/codecontext`. The Dockerfile
comment (`/opt/boocode/codecontext/Dockerfile:13-16`) says the
module path differs but "the tagged v3.2.1 source tree is the same
either way." To verify, clone
`https://github.com/nmakod/codecontext` at tag v3.2.1 and diff
`internal/parser/manager.go` against the fork — outside this
recon's read-only scope.
- **Whether `tree-sitter-typescript v0.23.x` Go bindings actually
build under the fork's `go 1.24.5` + Tree-sitter `v0.25.0`
combination.** Context7 docs confirm the *API exists*. Confirm by
`go get github.com/tree-sitter/tree-sitter-typescript@latest`
followed by `go build ./...` in a scratch worktree.
- **Whether the codecontext panic in `searchSymbols` is reproducible
on `/opt/boocode` or only on `/opt/forks/codecontext`** (the panic
was captured against target_dir `/opt/forks/codecontext`). Reproduce
via `docker exec boocode_codecontext wget -qO -
--post-data='{"target_dir":"/opt/boocode","query":"foo","limit":10}'
--header='Content-Type: application/json'
http://localhost:8080/v1/search_symbols`.
- **Cache hit rate of codecontext analysis (per call vs reused).**
The MCP-server log line `Refreshing analysis for codebase
overview…` suggests rebuild-every-call, but I did not confirm by
reading the codecontext source — only the deployed binary's log
output. To verify, read
`/opt/forks/codecontext/internal/mcp/server.go` around the
`Refreshing analysis…` log lines.
- **Drift correlation strength.** N=1 confirmed drift case is too
small to call a correlation with codecontext use. To raise the
signal: extend retention, re-query after a week of synthetic
load with and without codecontext tools.
- **Whether the synth pipeline's `truncated head only` ships fewer
tokens than a full inlined codecontext result would.** Today's
budget contract assumes yes (`synthesisPipeline.ts:138-145`
comment "Truncated head only — full content was used for
reference extraction above"). To verify: instrument the
per-pass `promptTokens` and compare against a one-off pass with
the full content.
- **The Architect/Code-Reviewer agents' system-prompt copy versus
actual tool usage.** AGENTS.md text claims agents will "Use
get_dependencies to map call sites" (line 92) and "Use
get_semantic_neighborhoods to find related components"
(line 132), but A3 shows neither is called. To verify whether the
model is ignoring the prompt or whether these agents simply
aren't being invoked, query
`SELECT s.name, COUNT(*) FROM sessions s JOIN chats c ON
c.session_id=s.id JOIN messages m ON m.chat_id=c.id WHERE
m.role='assistant' GROUP BY 1 ORDER BY 2 DESC;` and compare
named agents to chat counts.