Files
ik-codex/docs/superpowers/plans/2026-05-01-redactor.md
indifferentketchup 409de16003 docs: add Redactor implementation plan
Forward-looking plan on the redactor branch covering all eight design
questions called out in the careful-protocol kickoff: render-time
filter (raw is canonical), standalone string utility (not a Printer
decorator), regex-based detection with lexical anchors per PII
category, per-category placeholder replacement matching synthetic
fixture conventions, thin generic interface plus per-game
implementation under src/Util/ProjectZomboid/, hybrid fixture strategy
(unit-level synthetic plus integration against existing PZ fixtures).

Branch off master aec835e. backup/pre-redactor tag pins start.
No code is written by this commit. Implementation pass kicks off
separately after plan review.
2026-05-01 14:28:44 +00:00

20 KiB
Raw Blame History

Redactor Utility Implementation Plan

Forward-looking. No code is written by this document. Branch: redactor (off master aec835e). Backup tag: backup/pre-redactor. Spec: docs/superpowers/specs/2026-04-30-redactor-design.md.

Goal: Land the RedactorInterface plus a concrete ProjectZomboidRedactor implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer.

Architecture: Standalone string-in/string-out utility under a new top-level src/Util/ directory, with per-game implementations under src/Util/<Game>/. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (redactSteamIds, redactPlayerNames, redactCoordinates); defaults all on; "all toggles off" yields verbatim passthrough.

Tech stack: PHP 8.4+, PHPUnit 12, Composer (indifferentketchup/codex v0.1.0+). All command invocations wrap in the composer:latest Docker image per CLAUDE.md.


Design questions — resolved

a. Render-time vs ingest-time

Decision: render-time. Confirm spec's lean.

Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time Filter chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way out of storage, not on the way in.

Why: the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste.

Implication for iblogs schema: iblogs stores raw content; the redaction toggle in the iblogs UI invokes ProjectZomboidRedactor::redact() at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature.

b. Redactor as standalone class vs Printer decorator

Decision: standalone utility (option iii from the question).

The Redactor is a string → string function. It does not know about Insight, Printer, or any other codex type. Three options were considered:

  • (i) Printer wrapper. Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client).
  • (ii) Pre-Printer pass on Insights. Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1.
  • (iii) Standalone string utility. Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights.

The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core.

c. PII field taxonomy for PZ

Decision: regex-based with lexical context anchors. No structured-field detection in v1.

PZ-specific PII categories observed in the in-tree fixtures and the .scratch/pz/Logs/ reference corpus:

Field Detection Rationale
Steam ID regex with 76561198\d{9} prefix anchor and word-boundary classes Steam's 76561198 SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers).
Player name regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, Combat:/Safety: subsystem) Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes.
World coordinate triple regex with bracket / paren / at-clause anchors Generic \d+,\d+,\d+ would over-redact server metadata (f:0, t:NNNN, st:48,648,157,584). Lexical context disambiguates.

Not redacted in v1:

  • IP addresses. PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side IPv4Filter / IPv6Filter (ported from upstream mclogs) covers the rare case where a mod might log them.
  • Server-side usernames distinct from player names. PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's UsernameFilter is Minecraft-specific and isn't mirrored here.
  • BurdJournals scientific-notation Steam IDs (7.65611…E16). Spec open-question 2 explicitly defers this to v2; the [BurdJournals] tag already disambiguates them as mod-internal.

Hybrid (regex + structured-field) deferred. A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. ConnectionFailureProblem::$steamId → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need.

d. Replacement strategy

Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.

Per the spec:

Category Replacement
Steam ID 76561198000000000 (zeroed placeholder, still a syntactically valid Steam ID)
Player name <player>
Coordinates 0,0,0 (with shape preserved per anchor — bracketed, parenthesised, or at clause)

Why these specifically and not [REDACTED] / [STEAM_ID] / hashed:

  • The placeholders match the existing synthetic test fixtures (7656119800000000176561198000000004 collapse to 76561198000000000; player names Player1/Player2/AdminUser collapse to <player>). Tests can verify "redacted output looks like a synthetic fixture."
  • Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities.
  • Type-tagged replacements ([STEAM_ID]) break shape preservation: a Pattern looking for \d{17} would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only.
  • Hashing breaks shape preservation similarly and adds determinism / collision concerns.

If a consumer later needs [STEAM_ID]-style output, a setReplacementStyle('typed' | 'placeholder' | 'redacted') setter can be added without breaking the v1 API. v1 ships placeholder-only.

e. Game-agnostic vs PZ-specific layout

Decision: thin generic interface in src/Util/ plus PZ-specific implementation in src/Util/ProjectZomboid/.

src/Util/
├── RedactorInterface.php          (1 method: redact(string): string)
└── ProjectZomboid/
    └── ProjectZomboidRedactor.php (toggles + regex passes)

YAGNI tradeoff stated: the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just ProjectZomboidRedactor and skip the interface. The interface earns its keep because iblogs's call sites will type-hint against RedactorInterface, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites.

The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (src/Util/<Game>/) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern).

Note on the new src/Util/ directory. Codex currently has no src/Util/ (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified.

f. Test strategy

Decision: hybrid — small dedicated synthetic fixtures under test/src/Util/Redactor/ for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.

Dedicated unit fixtures (small string constants in test classes, not separate files): per spec test plan #1#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast.

Integration test that re-uses an existing PZ fixture (e.g. test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt). Two assertions:

  • The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding ProjectZomboidAdminLog).
  • Idempotence: redact(redact($x)) === redact($x). Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op.

False-positive avoidance. The synthetic fixtures use 76561198000000001 etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the 76561198\d{9} prefix and replaces with 76561198000000000 — so 76561198000000001 becomes 76561198000000000 (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like f:0, t:1776297642406, st:48,648,157,584) is not touched.


Tasks

Tasks are intended for the redactor branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed).

Task 0 — Plan doc commit

  • Step 0.1. Already done out-of-band: git checkout -b redactor off master aec835e; git tag backup/pre-redactor at branch tip; this plan written.
  • Step 0.2. Commit this plan: docs: add Redactor implementation plan on branch redactor. Push branch to origin for review.

Task 1 — Scaffold (interface + skeleton class with toggles)

  • Step 1.1. Create src/Util/RedactorInterface.php. Single method: public function redact(string $content): string; PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters before redact().
  • Step 1.2. Create src/Util/ProjectZomboid/ProjectZomboidRedactor.php that implements the interface. Class structure: three private bool properties ($redactSteamIds, $redactPlayerNames, $redactCoordinates) all defaulting to true; three fluent setters (redactSteamIds(bool): static, etc.); redact(string): string body that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks).
  • Step 1.3. Run composer test — expect 195 tests still green (no Redactor tests yet).
  • Step 1.4. Commit: feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles.

Task 2 — Steam ID redaction pass

  • Step 2.1. Add STEAM_ID_REGEX and STEAM_ID_REPLACEMENT constants on ProjectZomboidRedactor. Regex uses the 76561198\d{9} prefix anchor with word-boundary classes (per spec). The /u flag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII.
  • Step 2.2. Implement the Steam ID branch of redact(): when $redactSteamIds is true, run preg_replace against the input.
  • Step 2.3. Create test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php. Tests: redaction of various distinct synthetic Steam IDs collapses all to 76561198000000000; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact.
  • Step 2.4. Run composer test. Expect new tests pass; old 195 unaffected.
  • Step 2.5. Commit: feat: add Steam ID redaction pass.

Task 3 — Player name redaction pass

  • Step 3.1. Add three regex constants on ProjectZomboidRedactor for the three player-name lexical contexts: PLAYER_AFTER_STEAMID_REGEX, PLAYER_IN_CHATMESSAGE_REGEX, PLAYER_IN_PVP_SUBSYSTEM_REGEX. Replacement is <player> for all. Order constraint: the after-Steam-ID context anchors on the post-redaction Steam ID 76561198000000000, so the player-name pass must run after the Steam ID pass. Document this in a class-level docblock.
  • Step 3.2. Implement the player-name branch of redact(): three sequential preg_replace calls when $redactPlayerNames is true.
  • Step 3.3. Create test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g. "foo" not preceded by a Steam ID) is not touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder.
  • Step 3.4. Run composer test. Expect new tests pass.
  • Step 3.5. Commit: feat: add player name redaction pass.

Task 4 — Coordinates redaction pass

  • Step 4.1. Add three regex constants on ProjectZomboidRedactor for the three coordinate contexts: COORDS_AT_CLAUSE_REGEX, COORDS_BRACKETED_REGEX, COORDS_PARENTHESISED_REGEX. Replacements preserve shape (0,0,0 inside whatever bracket/paren wrapper).
  • Step 4.2. Implement the coords branch of redact(): three sequential preg_replace_callback (or preg_replace) calls when $redactCoordinates is true.
  • Step 4.3. Create test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php. Tests: each of the three contexts redacts correctly; negative test — server metadata f:0, t:1776297642406, st:48,648,157,584 is not touched; basement Z-coordinates (-1) are handled; toggle-off leaves coords intact.
  • Step 4.4. Run composer test. Expect new tests pass.
  • Step 4.5. Commit: feat: add coordinates redaction pass.

Task 5 — Combined / toggle / idempotence tests

  • Step 5.1. Create test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (=== equality).
  • Step 5.2. Create test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php. Tests: redact(redact($x)) === redact($x) for several input shapes including all three PII categories.
  • Step 5.3. Run composer test. Expect new tests pass.
  • Step 5.4. Commit: test: add Redactor combined and idempotence coverage.

Task 6 — Existing-fixture integration tests

  • Step 6.1. Create test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php. Loads each existing PZ fixture (admin-minimal.txt, chat-minimal.txt, etc.) via PathLogFile, calls redact() on the content, and asserts: (a) the redacted content still parses cleanly through the corresponding ProjectZomboid<X>Log's parser without throwing; (b) the synthetic Steam IDs 7656119800000000176561198000000004 all collapse to 76561198000000000; (c) the synthetic player names (Player1, Player2, AdminUser, PlayerSuspect) all collapse to <player>.
  • Step 6.2. Run composer test. Expect all integration assertions pass without modifying any existing test or fixture.
  • Step 6.3. Commit: test: add Redactor integration coverage against existing PZ fixtures.

Task 7 — Documentation updates

  • Step 7.1. Update CLAUDE.md: add a one-line src/Util/ mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing at ProjectZomboidRedactor for downstream PII scrubbing; update the "Scaffolded games" line to mention that ProjectZomboid now also has a Redactor implementation under src/Util/ProjectZomboid/.
  • Step 7.2. Update README.md: add a short usage block showing (new ProjectZomboidRedactor())->redact($logContent) as a render-time scrub option, alongside the existing worked example.
  • Step 7.3. Update CHANGELOG.md: move Redactor out of the Deferred section under [0.1.0], OR add a new [Unreleased] section if the v0.1.0 line should remain accurate as-shipped. Decision: add [Unreleased] — v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth.
  • Step 7.4. Run composer test once more for safety; confirm 195+(redactor tests) green.
  • Step 7.5. Commit: docs: document Redactor utility in CLAUDE.md, README, CHANGELOG.

Task 8 — Final verification

  • Step 8.1. Run composer test. All tests green.
  • Step 8.2. Re-run vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors. Expect zero output beyond the standard pass summary.
  • Step 8.3. Sanity-check the branch with git log --oneline master..redactor. Should be the plan-doc commit plus 7 implementation commits = 8 commits total.
  • Step 8.4. Push final state: git push origin redactor. Do NOT merge to master. User reviews diff and approves merge separately.

Open questions / spec gaps

The spec is generally tight. Items worth flagging while implementing:

  1. /u flag for Unicode safety. Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use /u on all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock.
  2. Replacement order. Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in redact() (Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant.
  3. HTML / JSON-encoded input. Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g. &quot; instead of "), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode.
  4. Future PII categories. v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides.
  5. src/Util/ is a new top-level directory in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive.
  6. The empty src/Printer/<Game>/.gitkeep situation. Phase A scaffolding chose not to create Printer/<Game>/ directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home in src/Util/<Game>/ mirrors that — src/Util/ is created with PZ as its first occupant; no stub Hytale//Minecraft//SevenDaysToDie/ placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point.

No spec contradictions found. No existing-code modifications required (additive-only design).


Branch / commit invariants

  • All commits land on the redactor branch.
  • Master is not touched until the user explicitly approves merge after reviewing the diff.
  • Conventional commit prefixes: docs:, feat:, test:, refactor:. (No fix: expected — this is greenfield work.)
  • One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits.
  • Backup tag backup/pre-redactor at aec835e lets us discard the branch and recover if the implementation goes sideways.
  • Branch can be pushed to origin freely for visibility / review checkpoints.

Pointers

  • Spec: docs/superpowers/specs/2026-04-30-redactor-design.md.
  • Synthetic fixtures the integration test will reuse: test/src/Games/ProjectZomboid/fixtures/*.txt.
  • Existing per-game layout precedent: src/Analyser/ProjectZomboid/, src/Pattern/ProjectZomboid/, src/Log/ProjectZomboid/.
  • Workflow conventions and pitfalls: CLAUDE.md.