# Redactor Utility Implementation Plan > Forward-looking. No code is written by this document. > Branch: `redactor` (off master `aec835e`). Backup tag: `backup/pre-redactor`. > Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`. **Goal:** Land the `RedactorInterface` plus a concrete `ProjectZomboidRedactor` implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer. **Architecture:** Standalone string-in/string-out utility under a new top-level `src/Util/` directory, with per-game implementations under `src/Util//`. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (`redactSteamIds`, `redactPlayerNames`, `redactCoordinates`); defaults all on; "all toggles off" yields verbatim passthrough. **Tech stack:** PHP 8.4+, PHPUnit 12, Composer (`indifferentketchup/codex` v0.1.0+). All command invocations wrap in the `composer:latest` Docker image per `CLAUDE.md`. --- ## Design questions — resolved ### a. Render-time vs ingest-time **Decision: render-time. Confirm spec's lean.** Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time `Filter` chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way *out* of storage, not on the way in. **Why:** the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste. **Implication for iblogs schema:** iblogs stores raw content; the redaction toggle in the iblogs UI invokes `ProjectZomboidRedactor::redact()` at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature. ### b. Redactor as standalone class vs Printer decorator **Decision: standalone utility (option iii from the question).** The Redactor is a `string → string` function. It does not know about `Insight`, `Printer`, or any other codex type. Three options were considered: - **(i) Printer wrapper.** Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client). - **(ii) Pre-Printer pass on Insights.** Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1. - **(iii) Standalone string utility.** Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights. The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core. ### c. PII field taxonomy for PZ **Decision: regex-based with lexical context anchors. No structured-field detection in v1.** PZ-specific PII categories observed in the in-tree fixtures and the `.scratch/pz/Logs/` reference corpus: | Field | Detection | Rationale | |---|---|---| | Steam ID | regex with `76561198\d{9}` prefix anchor and word-boundary classes | Steam's `76561198` SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers). | | Player name | regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, `Combat:`/`Safety:` subsystem) | Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes. | | World coordinate triple | regex with bracket / paren / `at`-clause anchors | Generic `\d+,\d+,\d+` would over-redact server metadata (`f:0, t:NNNN, st:48,648,157,584`). Lexical context disambiguates. | **Not redacted in v1:** - **IP addresses.** PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side `IPv4Filter` / `IPv6Filter` (ported from upstream mclogs) covers the rare case where a mod might log them. - **Server-side usernames distinct from player names.** PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's `UsernameFilter` is Minecraft-specific and isn't mirrored here. - **BurdJournals scientific-notation Steam IDs** (`7.65611…E16`). Spec open-question 2 explicitly defers this to v2; the `[BurdJournals]` tag already disambiguates them as mod-internal. **Hybrid (regex + structured-field) deferred.** A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. `ConnectionFailureProblem::$steamId` → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need. ### d. Replacement strategy **Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.** Per the spec: | Category | Replacement | |---|---| | Steam ID | `76561198000000000` (zeroed placeholder, still a syntactically valid Steam ID) | | Player name | `` | | Coordinates | `0,0,0` (with shape preserved per anchor — bracketed, parenthesised, or `at` clause) | Why these specifically and not `[REDACTED]` / `[STEAM_ID]` / hashed: - The placeholders **match the existing synthetic test fixtures** (`76561198000000001`–`76561198000000004` collapse to `76561198000000000`; player names `Player1`/`Player2`/`AdminUser` collapse to ``). Tests can verify "redacted output looks like a synthetic fixture." - Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities. - Type-tagged replacements (`[STEAM_ID]`) break shape preservation: a Pattern looking for `\d{17}` would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only. - Hashing breaks shape preservation similarly and adds determinism / collision concerns. If a consumer later needs `[STEAM_ID]`-style output, a `setReplacementStyle('typed' | 'placeholder' | 'redacted')` setter can be added without breaking the v1 API. v1 ships placeholder-only. ### e. Game-agnostic vs PZ-specific layout **Decision: thin generic interface in `src/Util/` plus PZ-specific implementation in `src/Util/ProjectZomboid/`.** ``` src/Util/ ├── RedactorInterface.php (1 method: redact(string): string) └── ProjectZomboid/ └── ProjectZomboidRedactor.php (toggles + regex passes) ``` **YAGNI tradeoff stated:** the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just `ProjectZomboidRedactor` and skip the interface. The interface earns its keep because **iblogs's call sites will type-hint against `RedactorInterface`**, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites. The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (`src/Util//`) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern). **Note on the new `src/Util/` directory.** Codex currently has no `src/Util/` (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified. ### f. Test strategy **Decision: hybrid — small dedicated synthetic fixtures under `test/src/Util/Redactor/` for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.** **Dedicated unit fixtures** (small string constants in test classes, not separate files): per spec test plan #1–#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast. **Integration test** that re-uses an existing PZ fixture (e.g. `test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt`). Two assertions: - The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding `ProjectZomboidAdminLog`). - Idempotence: `redact(redact($x)) === redact($x)`. Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op. **False-positive avoidance.** The synthetic fixtures use `76561198000000001` etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the `76561198\d{9}` prefix and replaces with `76561198000000000` — so `76561198000000001` becomes `76561198000000000` (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like `f:0, t:1776297642406, st:48,648,157,584`) is **not** touched. --- ## Tasks Tasks are intended for the `redactor` branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed). ### Task 0 — Plan doc commit - [ ] **Step 0.1.** Already done out-of-band: `git checkout -b redactor` off master `aec835e`; `git tag backup/pre-redactor` at branch tip; this plan written. - [ ] **Step 0.2.** Commit this plan: `docs: add Redactor implementation plan` on branch `redactor`. Push branch to origin for review. ### Task 1 — Scaffold (interface + skeleton class with toggles) - [ ] **Step 1.1.** Create `src/Util/RedactorInterface.php`. Single method: `public function redact(string $content): string;` PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters before `redact()`. - [ ] **Step 1.2.** Create `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` that implements the interface. Class structure: three private bool properties (`$redactSteamIds`, `$redactPlayerNames`, `$redactCoordinates`) all defaulting to `true`; three fluent setters (`redactSteamIds(bool): static`, etc.); `redact(string): string` body that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks). - [ ] **Step 1.3.** Run `composer test` — expect 195 tests still green (no Redactor tests yet). - [ ] **Step 1.4.** Commit: `feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles`. ### Task 2 — Steam ID redaction pass - [ ] **Step 2.1.** Add `STEAM_ID_REGEX` and `STEAM_ID_REPLACEMENT` constants on `ProjectZomboidRedactor`. Regex uses the `76561198\d{9}` prefix anchor with word-boundary classes (per spec). The `/u` flag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII. - [ ] **Step 2.2.** Implement the Steam ID branch of `redact()`: when `$redactSteamIds` is true, run `preg_replace` against the input. - [ ] **Step 2.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php`. Tests: redaction of various distinct synthetic Steam IDs collapses all to `76561198000000000`; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact. - [ ] **Step 2.4.** Run `composer test`. Expect new tests pass; old 195 unaffected. - [ ] **Step 2.5.** Commit: `feat: add Steam ID redaction pass`. ### Task 3 — Player name redaction pass - [ ] **Step 3.1.** Add three regex constants on `ProjectZomboidRedactor` for the three player-name lexical contexts: `PLAYER_AFTER_STEAMID_REGEX`, `PLAYER_IN_CHATMESSAGE_REGEX`, `PLAYER_IN_PVP_SUBSYSTEM_REGEX`. Replacement is `` for all. **Order constraint:** the after-Steam-ID context anchors on the post-redaction Steam ID `76561198000000000`, so the player-name pass must run *after* the Steam ID pass. Document this in a class-level docblock. - [ ] **Step 3.2.** Implement the player-name branch of `redact()`: three sequential `preg_replace` calls when `$redactPlayerNames` is true. - [ ] **Step 3.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php`. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g. `"foo"` not preceded by a Steam ID) is **not** touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder. - [ ] **Step 3.4.** Run `composer test`. Expect new tests pass. - [ ] **Step 3.5.** Commit: `feat: add player name redaction pass`. ### Task 4 — Coordinates redaction pass - [ ] **Step 4.1.** Add three regex constants on `ProjectZomboidRedactor` for the three coordinate contexts: `COORDS_AT_CLAUSE_REGEX`, `COORDS_BRACKETED_REGEX`, `COORDS_PARENTHESISED_REGEX`. Replacements preserve shape (`0,0,0` inside whatever bracket/paren wrapper). - [ ] **Step 4.2.** Implement the coords branch of `redact()`: three sequential `preg_replace_callback` (or `preg_replace`) calls when `$redactCoordinates` is true. - [ ] **Step 4.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php`. Tests: each of the three contexts redacts correctly; **negative test** — server metadata `f:0, t:1776297642406, st:48,648,157,584` is not touched; basement Z-coordinates (`-1`) are handled; toggle-off leaves coords intact. - [ ] **Step 4.4.** Run `composer test`. Expect new tests pass. - [ ] **Step 4.5.** Commit: `feat: add coordinates redaction pass`. ### Task 5 — Combined / toggle / idempotence tests - [ ] **Step 5.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php`. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (`===` equality). - [ ] **Step 5.2.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php`. Tests: `redact(redact($x)) === redact($x)` for several input shapes including all three PII categories. - [ ] **Step 5.3.** Run `composer test`. Expect new tests pass. - [ ] **Step 5.4.** Commit: `test: add Redactor combined and idempotence coverage`. ### Task 6 — Existing-fixture integration tests - [ ] **Step 6.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php`. Loads each existing PZ fixture (`admin-minimal.txt`, `chat-minimal.txt`, etc.) via `PathLogFile`, calls `redact()` on the content, and asserts: (a) the redacted content still parses cleanly through the corresponding `ProjectZomboidLog`'s parser without throwing; (b) the synthetic Steam IDs `76561198000000001`–`76561198000000004` all collapse to `76561198000000000`; (c) the synthetic player names (`Player1`, `Player2`, `AdminUser`, `PlayerSuspect`) all collapse to ``. - [ ] **Step 6.2.** Run `composer test`. Expect all integration assertions pass without modifying any existing test or fixture. - [ ] **Step 6.3.** Commit: `test: add Redactor integration coverage against existing PZ fixtures`. ### Task 7 — Documentation updates - [ ] **Step 7.1.** Update `CLAUDE.md`: add a one-line `src/Util/` mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing at `ProjectZomboidRedactor` for downstream PII scrubbing; update the "Scaffolded games" line to mention that `ProjectZomboid` now also has a Redactor implementation under `src/Util/ProjectZomboid/`. - [ ] **Step 7.2.** Update `README.md`: add a short usage block showing `(new ProjectZomboidRedactor())->redact($logContent)` as a render-time scrub option, alongside the existing worked example. - [ ] **Step 7.3.** Update `CHANGELOG.md`: move Redactor out of the **Deferred** section under `[0.1.0]`, OR add a new `[Unreleased]` section if the v0.1.0 line should remain accurate as-shipped. Decision: **add `[Unreleased]`** — v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth. - [ ] **Step 7.4.** Run `composer test` once more for safety; confirm 195+(redactor tests) green. - [ ] **Step 7.5.** Commit: `docs: document Redactor utility in CLAUDE.md, README, CHANGELOG`. ### Task 8 — Final verification - [ ] **Step 8.1.** Run `composer test`. All tests green. - [ ] **Step 8.2.** Re-run `vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors`. Expect zero output beyond the standard pass summary. - [ ] **Step 8.3.** Sanity-check the branch with `git log --oneline master..redactor`. Should be the plan-doc commit plus 7 implementation commits = 8 commits total. - [ ] **Step 8.4.** Push final state: `git push origin redactor`. **Do NOT merge to master.** User reviews diff and approves merge separately. --- ## Open questions / spec gaps The spec is generally tight. Items worth flagging while implementing: 1. **`/u` flag for Unicode safety.** Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use `/u` on all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock. 2. **Replacement order.** Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in `redact()` (Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant. 3. **HTML / JSON-encoded input.** Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g. `"` instead of `"`), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode. 4. **Future PII categories.** v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides. 5. **`src/Util/` is a new top-level directory** in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive. 6. **The empty `src/Printer//.gitkeep` situation.** Phase A scaffolding chose not to create `Printer//` directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home in `src/Util//` mirrors that — `src/Util/` is created with PZ as its first occupant; no stub `Hytale/`/`Minecraft/`/`SevenDaysToDie/` placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point. No spec contradictions found. No existing-code modifications required (additive-only design). --- ## Branch / commit invariants - All commits land on the `redactor` branch. - Master is not touched until the user explicitly approves merge after reviewing the diff. - Conventional commit prefixes: `docs:`, `feat:`, `test:`, `refactor:`. (No `fix:` expected — this is greenfield work.) - One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits. - Backup tag `backup/pre-redactor` at `aec835e` lets us discard the branch and recover if the implementation goes sideways. - Branch can be pushed to origin freely for visibility / review checkpoints. ## Pointers - Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`. - Synthetic fixtures the integration test will reuse: `test/src/Games/ProjectZomboid/fixtures/*.txt`. - Existing per-game layout precedent: `src/Analyser/ProjectZomboid/`, `src/Pattern/ProjectZomboid/`, `src/Log/ProjectZomboid/`. - Workflow conventions and pitfalls: `CLAUDE.md`.