Forward-looking plan on the redactor branch covering all eight design
questions called out in the careful-protocol kickoff: render-time
filter (raw is canonical), standalone string utility (not a Printer
decorator), regex-based detection with lexical anchors per PII
category, per-category placeholder replacement matching synthetic
fixture conventions, thin generic interface plus per-game
implementation under src/Util/ProjectZomboid/, hybrid fixture strategy
(unit-level synthetic plus integration against existing PZ fixtures).
Branch off master aec835e. backup/pre-redactor tag pins start.
No code is written by this commit. Implementation pass kicks off
separately after plan review.
20 KiB
Redactor Utility Implementation Plan
Forward-looking. No code is written by this document. Branch:
redactor(off masteraec835e). Backup tag:backup/pre-redactor. Spec:docs/superpowers/specs/2026-04-30-redactor-design.md.
Goal: Land the RedactorInterface plus a concrete ProjectZomboidRedactor implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer.
Architecture: Standalone string-in/string-out utility under a new top-level src/Util/ directory, with per-game implementations under src/Util/<Game>/. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (redactSteamIds, redactPlayerNames, redactCoordinates); defaults all on; "all toggles off" yields verbatim passthrough.
Tech stack: PHP 8.4+, PHPUnit 12, Composer (indifferentketchup/codex v0.1.0+). All command invocations wrap in the composer:latest Docker image per CLAUDE.md.
Design questions — resolved
a. Render-time vs ingest-time
Decision: render-time. Confirm spec's lean.
Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time Filter chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way out of storage, not on the way in.
Why: the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste.
Implication for iblogs schema: iblogs stores raw content; the redaction toggle in the iblogs UI invokes ProjectZomboidRedactor::redact() at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature.
b. Redactor as standalone class vs Printer decorator
Decision: standalone utility (option iii from the question).
The Redactor is a string → string function. It does not know about Insight, Printer, or any other codex type. Three options were considered:
- (i) Printer wrapper. Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client).
- (ii) Pre-Printer pass on Insights. Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1.
- (iii) Standalone string utility. Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights.
The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core.
c. PII field taxonomy for PZ
Decision: regex-based with lexical context anchors. No structured-field detection in v1.
PZ-specific PII categories observed in the in-tree fixtures and the .scratch/pz/Logs/ reference corpus:
| Field | Detection | Rationale |
|---|---|---|
| Steam ID | regex with 76561198\d{9} prefix anchor and word-boundary classes |
Steam's 76561198 SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers). |
| Player name | regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, Combat:/Safety: subsystem) |
Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes. |
| World coordinate triple | regex with bracket / paren / at-clause anchors |
Generic \d+,\d+,\d+ would over-redact server metadata (f:0, t:NNNN, st:48,648,157,584). Lexical context disambiguates. |
Not redacted in v1:
- IP addresses. PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side
IPv4Filter/IPv6Filter(ported from upstream mclogs) covers the rare case where a mod might log them. - Server-side usernames distinct from player names. PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's
UsernameFilteris Minecraft-specific and isn't mirrored here. - BurdJournals scientific-notation Steam IDs (
7.65611…E16). Spec open-question 2 explicitly defers this to v2; the[BurdJournals]tag already disambiguates them as mod-internal.
Hybrid (regex + structured-field) deferred. A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. ConnectionFailureProblem::$steamId → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need.
d. Replacement strategy
Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.
Per the spec:
| Category | Replacement |
|---|---|
| Steam ID | 76561198000000000 (zeroed placeholder, still a syntactically valid Steam ID) |
| Player name | <player> |
| Coordinates | 0,0,0 (with shape preserved per anchor — bracketed, parenthesised, or at clause) |
Why these specifically and not [REDACTED] / [STEAM_ID] / hashed:
- The placeholders match the existing synthetic test fixtures (
76561198000000001–76561198000000004collapse to76561198000000000; player namesPlayer1/Player2/AdminUsercollapse to<player>). Tests can verify "redacted output looks like a synthetic fixture." - Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities.
- Type-tagged replacements (
[STEAM_ID]) break shape preservation: a Pattern looking for\d{17}would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only. - Hashing breaks shape preservation similarly and adds determinism / collision concerns.
If a consumer later needs [STEAM_ID]-style output, a setReplacementStyle('typed' | 'placeholder' | 'redacted') setter can be added without breaking the v1 API. v1 ships placeholder-only.
e. Game-agnostic vs PZ-specific layout
Decision: thin generic interface in src/Util/ plus PZ-specific implementation in src/Util/ProjectZomboid/.
src/Util/
├── RedactorInterface.php (1 method: redact(string): string)
└── ProjectZomboid/
└── ProjectZomboidRedactor.php (toggles + regex passes)
YAGNI tradeoff stated: the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just ProjectZomboidRedactor and skip the interface. The interface earns its keep because iblogs's call sites will type-hint against RedactorInterface, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites.
The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (src/Util/<Game>/) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern).
Note on the new src/Util/ directory. Codex currently has no src/Util/ (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified.
f. Test strategy
Decision: hybrid — small dedicated synthetic fixtures under test/src/Util/Redactor/ for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.
Dedicated unit fixtures (small string constants in test classes, not separate files): per spec test plan #1–#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast.
Integration test that re-uses an existing PZ fixture (e.g. test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt). Two assertions:
- The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding
ProjectZomboidAdminLog). - Idempotence:
redact(redact($x)) === redact($x). Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op.
False-positive avoidance. The synthetic fixtures use 76561198000000001 etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the 76561198\d{9} prefix and replaces with 76561198000000000 — so 76561198000000001 becomes 76561198000000000 (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like f:0, t:1776297642406, st:48,648,157,584) is not touched.
Tasks
Tasks are intended for the redactor branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed).
Task 0 — Plan doc commit
- Step 0.1. Already done out-of-band:
git checkout -b redactoroff masteraec835e;git tag backup/pre-redactorat branch tip; this plan written. - Step 0.2. Commit this plan:
docs: add Redactor implementation planon branchredactor. Push branch to origin for review.
Task 1 — Scaffold (interface + skeleton class with toggles)
- Step 1.1. Create
src/Util/RedactorInterface.php. Single method:public function redact(string $content): string;PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters beforeredact(). - Step 1.2. Create
src/Util/ProjectZomboid/ProjectZomboidRedactor.phpthat implements the interface. Class structure: three private bool properties ($redactSteamIds,$redactPlayerNames,$redactCoordinates) all defaulting totrue; three fluent setters (redactSteamIds(bool): static, etc.);redact(string): stringbody that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks). - Step 1.3. Run
composer test— expect 195 tests still green (no Redactor tests yet). - Step 1.4. Commit:
feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles.
Task 2 — Steam ID redaction pass
- Step 2.1. Add
STEAM_ID_REGEXandSTEAM_ID_REPLACEMENTconstants onProjectZomboidRedactor. Regex uses the76561198\d{9}prefix anchor with word-boundary classes (per spec). The/uflag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII. - Step 2.2. Implement the Steam ID branch of
redact(): when$redactSteamIdsis true, runpreg_replaceagainst the input. - Step 2.3. Create
test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php. Tests: redaction of various distinct synthetic Steam IDs collapses all to76561198000000000; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact. - Step 2.4. Run
composer test. Expect new tests pass; old 195 unaffected. - Step 2.5. Commit:
feat: add Steam ID redaction pass.
Task 3 — Player name redaction pass
- Step 3.1. Add three regex constants on
ProjectZomboidRedactorfor the three player-name lexical contexts:PLAYER_AFTER_STEAMID_REGEX,PLAYER_IN_CHATMESSAGE_REGEX,PLAYER_IN_PVP_SUBSYSTEM_REGEX. Replacement is<player>for all. Order constraint: the after-Steam-ID context anchors on the post-redaction Steam ID76561198000000000, so the player-name pass must run after the Steam ID pass. Document this in a class-level docblock. - Step 3.2. Implement the player-name branch of
redact(): three sequentialpreg_replacecalls when$redactPlayerNamesis true. - Step 3.3. Create
test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g."foo"not preceded by a Steam ID) is not touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder. - Step 3.4. Run
composer test. Expect new tests pass. - Step 3.5. Commit:
feat: add player name redaction pass.
Task 4 — Coordinates redaction pass
- Step 4.1. Add three regex constants on
ProjectZomboidRedactorfor the three coordinate contexts:COORDS_AT_CLAUSE_REGEX,COORDS_BRACKETED_REGEX,COORDS_PARENTHESISED_REGEX. Replacements preserve shape (0,0,0inside whatever bracket/paren wrapper). - Step 4.2. Implement the coords branch of
redact(): three sequentialpreg_replace_callback(orpreg_replace) calls when$redactCoordinatesis true. - Step 4.3. Create
test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php. Tests: each of the three contexts redacts correctly; negative test — server metadataf:0, t:1776297642406, st:48,648,157,584is not touched; basement Z-coordinates (-1) are handled; toggle-off leaves coords intact. - Step 4.4. Run
composer test. Expect new tests pass. - Step 4.5. Commit:
feat: add coordinates redaction pass.
Task 5 — Combined / toggle / idempotence tests
- Step 5.1. Create
test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (===equality). - Step 5.2. Create
test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php. Tests:redact(redact($x)) === redact($x)for several input shapes including all three PII categories. - Step 5.3. Run
composer test. Expect new tests pass. - Step 5.4. Commit:
test: add Redactor combined and idempotence coverage.
Task 6 — Existing-fixture integration tests
- Step 6.1. Create
test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php. Loads each existing PZ fixture (admin-minimal.txt,chat-minimal.txt, etc.) viaPathLogFile, callsredact()on the content, and asserts: (a) the redacted content still parses cleanly through the correspondingProjectZomboid<X>Log's parser without throwing; (b) the synthetic Steam IDs76561198000000001–76561198000000004all collapse to76561198000000000; (c) the synthetic player names (Player1,Player2,AdminUser,PlayerSuspect) all collapse to<player>. - Step 6.2. Run
composer test. Expect all integration assertions pass without modifying any existing test or fixture. - Step 6.3. Commit:
test: add Redactor integration coverage against existing PZ fixtures.
Task 7 — Documentation updates
- Step 7.1. Update
CLAUDE.md: add a one-linesrc/Util/mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing atProjectZomboidRedactorfor downstream PII scrubbing; update the "Scaffolded games" line to mention thatProjectZomboidnow also has a Redactor implementation undersrc/Util/ProjectZomboid/. - Step 7.2. Update
README.md: add a short usage block showing(new ProjectZomboidRedactor())->redact($logContent)as a render-time scrub option, alongside the existing worked example. - Step 7.3. Update
CHANGELOG.md: move Redactor out of the Deferred section under[0.1.0], OR add a new[Unreleased]section if the v0.1.0 line should remain accurate as-shipped. Decision: add[Unreleased]— v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth. - Step 7.4. Run
composer testonce more for safety; confirm 195+(redactor tests) green. - Step 7.5. Commit:
docs: document Redactor utility in CLAUDE.md, README, CHANGELOG.
Task 8 — Final verification
- Step 8.1. Run
composer test. All tests green. - Step 8.2. Re-run
vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors. Expect zero output beyond the standard pass summary. - Step 8.3. Sanity-check the branch with
git log --oneline master..redactor. Should be the plan-doc commit plus 7 implementation commits = 8 commits total. - Step 8.4. Push final state:
git push origin redactor. Do NOT merge to master. User reviews diff and approves merge separately.
Open questions / spec gaps
The spec is generally tight. Items worth flagging while implementing:
/uflag for Unicode safety. Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use/uon all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock.- Replacement order. Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in
redact()(Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant. - HTML / JSON-encoded input. Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g.
"instead of"), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode. - Future PII categories. v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides.
src/Util/is a new top-level directory in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive.- The empty
src/Printer/<Game>/.gitkeepsituation. Phase A scaffolding chose not to createPrinter/<Game>/directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home insrc/Util/<Game>/mirrors that —src/Util/is created with PZ as its first occupant; no stubHytale//Minecraft//SevenDaysToDie/placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point.
No spec contradictions found. No existing-code modifications required (additive-only design).
Branch / commit invariants
- All commits land on the
redactorbranch. - Master is not touched until the user explicitly approves merge after reviewing the diff.
- Conventional commit prefixes:
docs:,feat:,test:,refactor:. (Nofix:expected — this is greenfield work.) - One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits.
- Backup tag
backup/pre-redactorataec835elets us discard the branch and recover if the implementation goes sideways. - Branch can be pushed to origin freely for visibility / review checkpoints.
Pointers
- Spec:
docs/superpowers/specs/2026-04-30-redactor-design.md. - Synthetic fixtures the integration test will reuse:
test/src/Games/ProjectZomboid/fixtures/*.txt. - Existing per-game layout precedent:
src/Analyser/ProjectZomboid/,src/Pattern/ProjectZomboid/,src/Log/ProjectZomboid/. - Workflow conventions and pitfalls:
CLAUDE.md.