diff --git a/CHANGELOG.md b/CHANGELOG.md index 38ceea6..c0d3824 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,16 @@ All notable changes to `indifferentketchup/codex` are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +### Added + +- `RedactorInterface` (`src/Util/RedactorInterface.php`) and `ProjectZomboidRedactor` (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — render-time PII filter that scrubs Steam IDs, player names, and world coordinates from Project Zomboid log content. Three independent toggles default to on. Designed as a string-in/string-out utility so consumers can apply it at any rendering or export step. Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted; victim's name and coords (after `hit`) are deferred to v2. In admin lines, `teleported X to ` coordinates are not redacted in v1. + +### Changed + +- New top-level `src/Util/` directory introduced. The Redactor is its first occupant; future utilities (e.g. tokenising redactor variants) land here. + ## [0.1.0] — 2026-05-01 First public release. Codex is a generic PHP log parsing and analysis framework with full Project Zomboid server-log support across eight analysers. The Composer package name is `indifferentketchup/codex` (the repository directory and Gitea slug are `ik-codex`; the package name is not). @@ -32,7 +42,6 @@ First public release. Codex is a generic PHP log parsing and analysis framework ### Deferred -- **Codex `Redactor` utility** — design captured in `docs/superpowers/specs/2026-04-30-redactor-design.md`. Not implemented in v0.1.0. iblogs (the downstream consumer) handles upload-time PII filtering for this release; codex itself ships no PII helper. The deferred spec exists so iblogs's privacy story has a referenced design to point at and so a future implementation pass has a clear contract to start from. - **Other game implementations** — `Minecraft`, `Hytale`, and `SevenDaysToDie` are detective-stub-only. Each has a TODO `Detective` extending base `Detective`; their per-component subdirectories under `Analyser`, `Log`, `Parser`, and `Pattern` contain only `.gitkeep` placeholders. Real implementations land if and when fixtures and demand exist. - **Packagist publication** — v0.1.0 is consumable via Composer's `vcs` repository entry pointing at the Gitea remote. Pushing to Packagist is a separate decision and is not in scope for this release. diff --git a/CLAUDE.md b/CLAUDE.md index bad7a06..5404b64 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -49,6 +49,7 @@ Analysis of Insight[] - **`PatternParser`** is regex-driven. Lines that don't match the LINE regex append to the previous `Entry` — this is the mechanism that handles multi-line records like Java stack traces under an ERROR header. - **`PatternAnalyser`** walks entries, runs each registered insight class's static `getPatterns()` against entry text via `preg_match_all`, and emits coalesced insights (equal insights bump a counter instead of duplicating). - **Custom `Analyser` subclasses** are the right move when analysis needs cross-entry state — pairing events, sliding-window thresholds, comparing consecutive snapshots. `PatternAnalyser` operates per-entry only and can't express those. Phase B.3 (`ConnectionFailureAnalyser`, `ItemDuplicationAnalyser`, `SkillProgressionAnomalyAnalyser`) shows the shape: extend `Analyser`, override `analyse()`, walk `$this->log` once, aggregate, then emit coalesced `Problem`/`Information` insights at the end. Tunable thresholds belong as `public const` constants on the subclass with the rationale in a docblock. +- **`RedactorInterface`** is a render-time PII filter — string-in/string-out, configured per game, implemented at `src/Util//Redactor.php`. Consumers call `redact(string $content): string` on a concrete instance before rendering or exporting log content. - Detectors available out of the box: `SinglePatternDetector`, `WeightedSinglePatternDetector`, `LinePatternDetector` (returns match ratio), `MultiPatternDetector` (AND), and the path-based `FilenameDetector` (uses `LogFileInterface::getPath()`, returns `false` when no path is available). ## Game subtrees @@ -58,10 +59,13 @@ Layout is **components-outer with game suffix**, not games-outer: ``` src///... e.g. src/Log/ProjectZomboid/ProjectZomboidServerLog.php src/Pattern//Pattern.php (regex string constants; not a framework abstraction) +src/Util//... e.g. src/Util/ProjectZomboid/ProjectZomboidRedactor.php test/tests/Games//... test/src/Games//fixtures/-minimal.txt (synthetic fixtures only) ``` +`src/Util/` is the sixth top-level component directory, introduced post-v0.1.0-tag. Its first occupant is the Redactor; future game-agnostic utilities (tokenising redactor variants, etc.) land here too. + Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 11 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic. `src/Pattern/` is **not a framework abstraction** — patterns are plain `string` class constants. Each `Pattern` typically holds a `LINE` constant for the parser plus named-group extractor constants (`FIELDS`, `COMBAT`, `MOD_LOAD`, etc.) for analysers. @@ -74,6 +78,7 @@ Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty - A custom `Analyser` subclass (cross-entry logic): `UserLog → ConnectionFailureAnalyser`, `ItemLog → ItemDuplicationAnalyser`, `PerkLog → SkillProgressionAnomalyAnalyser`. - A configured `PatternAnalyser` (per-entry pattern matching): `ServerLog`, `PvpLog`, `AdminLog` register their respective Insight classes. - An empty `PatternAnalyser` for logs with no analysers yet: `ChatLog`, `ClientActionLog`, `CmdLog`, `MapLog`, `BurdJournalsLog`. These are wiring stubs awaiting future analysis work. +- **`ProjectZomboidRedactor`** at `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` — concrete `RedactorInterface` implementation. Downstream consumers call `redact(string): string` to scrub Steam IDs (zeroed placeholder), player names (``), and world coordinates (`0,0,0`) from log content. Three independent toggle methods default to on: `redactSteamIds(bool)`, `redactPlayerNames(bool)`, `redactCoordinates(bool)`. Pass order (Steam ID → player name → coords) is mandatory and enforced internally — see Pitfall 5. ### Standard test template for a Log subclass @@ -85,6 +90,7 @@ At minimum: (1) entry count after `parse()` matches the synthetic fixture's line 2. **PHPUnit 12 requires the `#[DataProvider('methodName')]` attribute.** The legacy `@dataProvider` annotation silently passes zero args and fails with `ArgumentCountError`. 3. **`Level::fromString()` defaults to `Level::INFO` for unknown tokens.** Project Zomboid log levels map: `LOG`/`INFO` → INFO; `WARN` → WARNING; `ERROR` → ERROR. 4. **`PatternParser` matches array** must declare a match-type for **every** capture group in the regex (`TIME`, `LEVEL`, or `PREFIX`); otherwise the parser throws on the unmapped index. Use non-capturing groups `(?:...)` for fields you want to skip. +5. **`ProjectZomboidRedactor` pass order is mandatory.** `PLAYER_AFTER_STEAMID_REGEX` anchors on the already-redacted Steam ID placeholder — it will not match raw Steam IDs. Do NOT swap the Steam ID and player-name passes, and do NOT stub out the Steam ID pass while leaving the player-name pass enabled. ## Workflow conventions diff --git a/README.md b/README.md index dc3b537..f6a823f 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,21 @@ Project Zomboid Debug Server Log If the log content arrives without a filesystem path (clipboard paste, web upload, stream), use `StringLogFile` or `StreamLogFile` instead of `PathLogFile`. The detective falls back to content signatures when the filename hint is absent. +## Redaction + +Before rendering or exporting log content, pass it through `ProjectZomboidRedactor` to strip PII: + +```php +use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor; + +$redactor = new ProjectZomboidRedactor(); +$safe = $redactor->redact($logContent); +``` + +This scrubs three categories in a fixed pass order: Steam IDs are replaced with a zeroed placeholder, player names with ``, and world coordinates with `0,0,0`. All three passes are on by default; opt out per category with `redactSteamIds(bool)`, `redactPlayerNames(bool)`, or `redactCoordinates(bool)`. + +Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted — the victim's name and coords (appearing after `hit`) are deferred to v2. In admin lines, `teleported X to ` coordinates are not redacted in v1. + ## Architecture ``` diff --git a/docs/superpowers/plans/2026-05-01-redactor.md b/docs/superpowers/plans/2026-05-01-redactor.md new file mode 100644 index 0000000..404a6e5 --- /dev/null +++ b/docs/superpowers/plans/2026-05-01-redactor.md @@ -0,0 +1,211 @@ +# Redactor Utility Implementation Plan + +> Forward-looking. No code is written by this document. +> Branch: `redactor` (off master `aec835e`). Backup tag: `backup/pre-redactor`. +> Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`. + +**Goal:** Land the `RedactorInterface` plus a concrete `ProjectZomboidRedactor` implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer. + +**Architecture:** Standalone string-in/string-out utility under a new top-level `src/Util/` directory, with per-game implementations under `src/Util//`. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (`redactSteamIds`, `redactPlayerNames`, `redactCoordinates`); defaults all on; "all toggles off" yields verbatim passthrough. + +**Tech stack:** PHP 8.4+, PHPUnit 12, Composer (`indifferentketchup/codex` v0.1.0+). All command invocations wrap in the `composer:latest` Docker image per `CLAUDE.md`. + +--- + +## Design questions — resolved + +### a. Render-time vs ingest-time + +**Decision: render-time. Confirm spec's lean.** + +Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time `Filter` chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way *out* of storage, not on the way in. + +**Why:** the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste. + +**Implication for iblogs schema:** iblogs stores raw content; the redaction toggle in the iblogs UI invokes `ProjectZomboidRedactor::redact()` at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature. + +### b. Redactor as standalone class vs Printer decorator + +**Decision: standalone utility (option iii from the question).** + +The Redactor is a `string → string` function. It does not know about `Insight`, `Printer`, or any other codex type. Three options were considered: + +- **(i) Printer wrapper.** Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client). +- **(ii) Pre-Printer pass on Insights.** Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1. +- **(iii) Standalone string utility.** Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights. + +The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core. + +### c. PII field taxonomy for PZ + +**Decision: regex-based with lexical context anchors. No structured-field detection in v1.** + +PZ-specific PII categories observed in the in-tree fixtures and the `.scratch/pz/Logs/` reference corpus: + +| Field | Detection | Rationale | +|---|---|---| +| Steam ID | regex with `76561198\d{9}` prefix anchor and word-boundary classes | Steam's `76561198` SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers). | +| Player name | regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, `Combat:`/`Safety:` subsystem) | Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes. | +| World coordinate triple | regex with bracket / paren / `at`-clause anchors | Generic `\d+,\d+,\d+` would over-redact server metadata (`f:0, t:NNNN, st:48,648,157,584`). Lexical context disambiguates. | + +**Not redacted in v1:** + +- **IP addresses.** PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side `IPv4Filter` / `IPv6Filter` (ported from upstream mclogs) covers the rare case where a mod might log them. +- **Server-side usernames distinct from player names.** PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's `UsernameFilter` is Minecraft-specific and isn't mirrored here. +- **BurdJournals scientific-notation Steam IDs** (`7.65611…E16`). Spec open-question 2 explicitly defers this to v2; the `[BurdJournals]` tag already disambiguates them as mod-internal. + +**Hybrid (regex + structured-field) deferred.** A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. `ConnectionFailureProblem::$steamId` → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need. + +### d. Replacement strategy + +**Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.** + +Per the spec: + +| Category | Replacement | +|---|---| +| Steam ID | `76561198000000000` (zeroed placeholder, still a syntactically valid Steam ID) | +| Player name | `` | +| Coordinates | `0,0,0` (with shape preserved per anchor — bracketed, parenthesised, or `at` clause) | + +Why these specifically and not `[REDACTED]` / `[STEAM_ID]` / hashed: + +- The placeholders **match the existing synthetic test fixtures** (`76561198000000001`–`76561198000000004` collapse to `76561198000000000`; player names `Player1`/`Player2`/`AdminUser` collapse to ``). Tests can verify "redacted output looks like a synthetic fixture." +- Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities. +- Type-tagged replacements (`[STEAM_ID]`) break shape preservation: a Pattern looking for `\d{17}` would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only. +- Hashing breaks shape preservation similarly and adds determinism / collision concerns. + +If a consumer later needs `[STEAM_ID]`-style output, a `setReplacementStyle('typed' | 'placeholder' | 'redacted')` setter can be added without breaking the v1 API. v1 ships placeholder-only. + +### e. Game-agnostic vs PZ-specific layout + +**Decision: thin generic interface in `src/Util/` plus PZ-specific implementation in `src/Util/ProjectZomboid/`.** + +``` +src/Util/ +├── RedactorInterface.php (1 method: redact(string): string) +└── ProjectZomboid/ + └── ProjectZomboidRedactor.php (toggles + regex passes) +``` + +**YAGNI tradeoff stated:** the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just `ProjectZomboidRedactor` and skip the interface. The interface earns its keep because **iblogs's call sites will type-hint against `RedactorInterface`**, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites. + +The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (`src/Util//`) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern). + +**Note on the new `src/Util/` directory.** Codex currently has no `src/Util/` (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified. + +### f. Test strategy + +**Decision: hybrid — small dedicated synthetic fixtures under `test/src/Util/Redactor/` for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.** + +**Dedicated unit fixtures** (small string constants in test classes, not separate files): per spec test plan #1–#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast. + +**Integration test** that re-uses an existing PZ fixture (e.g. `test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt`). Two assertions: + +- The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding `ProjectZomboidAdminLog`). +- Idempotence: `redact(redact($x)) === redact($x)`. Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op. + +**False-positive avoidance.** The synthetic fixtures use `76561198000000001` etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the `76561198\d{9}` prefix and replaces with `76561198000000000` — so `76561198000000001` becomes `76561198000000000` (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like `f:0, t:1776297642406, st:48,648,157,584`) is **not** touched. + +--- + +## Tasks + +Tasks are intended for the `redactor` branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed). + +### Task 0 — Plan doc commit + +- [ ] **Step 0.1.** Already done out-of-band: `git checkout -b redactor` off master `aec835e`; `git tag backup/pre-redactor` at branch tip; this plan written. +- [ ] **Step 0.2.** Commit this plan: `docs: add Redactor implementation plan` on branch `redactor`. Push branch to origin for review. + +### Task 1 — Scaffold (interface + skeleton class with toggles) + +- [ ] **Step 1.1.** Create `src/Util/RedactorInterface.php`. Single method: `public function redact(string $content): string;` PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters before `redact()`. +- [ ] **Step 1.2.** Create `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` that implements the interface. Class structure: three private bool properties (`$redactSteamIds`, `$redactPlayerNames`, `$redactCoordinates`) all defaulting to `true`; three fluent setters (`redactSteamIds(bool): static`, etc.); `redact(string): string` body that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks). +- [ ] **Step 1.3.** Run `composer test` — expect 195 tests still green (no Redactor tests yet). +- [ ] **Step 1.4.** Commit: `feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles`. + +### Task 2 — Steam ID redaction pass + +- [ ] **Step 2.1.** Add `STEAM_ID_REGEX` and `STEAM_ID_REPLACEMENT` constants on `ProjectZomboidRedactor`. Regex uses the `76561198\d{9}` prefix anchor with word-boundary classes (per spec). The `/u` flag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII. +- [ ] **Step 2.2.** Implement the Steam ID branch of `redact()`: when `$redactSteamIds` is true, run `preg_replace` against the input. +- [ ] **Step 2.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php`. Tests: redaction of various distinct synthetic Steam IDs collapses all to `76561198000000000`; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact. +- [ ] **Step 2.4.** Run `composer test`. Expect new tests pass; old 195 unaffected. +- [ ] **Step 2.5.** Commit: `feat: add Steam ID redaction pass`. + +### Task 3 — Player name redaction pass + +- [ ] **Step 3.1.** Add three regex constants on `ProjectZomboidRedactor` for the three player-name lexical contexts: `PLAYER_AFTER_STEAMID_REGEX`, `PLAYER_IN_CHATMESSAGE_REGEX`, `PLAYER_IN_PVP_SUBSYSTEM_REGEX`. Replacement is `` for all. **Order constraint:** the after-Steam-ID context anchors on the post-redaction Steam ID `76561198000000000`, so the player-name pass must run *after* the Steam ID pass. Document this in a class-level docblock. +- [ ] **Step 3.2.** Implement the player-name branch of `redact()`: three sequential `preg_replace` calls when `$redactPlayerNames` is true. +- [ ] **Step 3.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php`. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g. `"foo"` not preceded by a Steam ID) is **not** touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder. +- [ ] **Step 3.4.** Run `composer test`. Expect new tests pass. +- [ ] **Step 3.5.** Commit: `feat: add player name redaction pass`. + +### Task 4 — Coordinates redaction pass + +- [ ] **Step 4.1.** Add three regex constants on `ProjectZomboidRedactor` for the three coordinate contexts: `COORDS_AT_CLAUSE_REGEX`, `COORDS_BRACKETED_REGEX`, `COORDS_PARENTHESISED_REGEX`. Replacements preserve shape (`0,0,0` inside whatever bracket/paren wrapper). +- [ ] **Step 4.2.** Implement the coords branch of `redact()`: three sequential `preg_replace_callback` (or `preg_replace`) calls when `$redactCoordinates` is true. +- [ ] **Step 4.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php`. Tests: each of the three contexts redacts correctly; **negative test** — server metadata `f:0, t:1776297642406, st:48,648,157,584` is not touched; basement Z-coordinates (`-1`) are handled; toggle-off leaves coords intact. +- [ ] **Step 4.4.** Run `composer test`. Expect new tests pass. +- [ ] **Step 4.5.** Commit: `feat: add coordinates redaction pass`. + +### Task 5 — Combined / toggle / idempotence tests + +- [ ] **Step 5.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php`. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (`===` equality). +- [ ] **Step 5.2.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php`. Tests: `redact(redact($x)) === redact($x)` for several input shapes including all three PII categories. +- [ ] **Step 5.3.** Run `composer test`. Expect new tests pass. +- [ ] **Step 5.4.** Commit: `test: add Redactor combined and idempotence coverage`. + +### Task 6 — Existing-fixture integration tests + +- [ ] **Step 6.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php`. Loads each existing PZ fixture (`admin-minimal.txt`, `chat-minimal.txt`, etc.) via `PathLogFile`, calls `redact()` on the content, and asserts: (a) the redacted content still parses cleanly through the corresponding `ProjectZomboidLog`'s parser without throwing; (b) the synthetic Steam IDs `76561198000000001`–`76561198000000004` all collapse to `76561198000000000`; (c) the synthetic player names (`Player1`, `Player2`, `AdminUser`, `PlayerSuspect`) all collapse to ``. +- [ ] **Step 6.2.** Run `composer test`. Expect all integration assertions pass without modifying any existing test or fixture. +- [ ] **Step 6.3.** Commit: `test: add Redactor integration coverage against existing PZ fixtures`. + +### Task 7 — Documentation updates + +- [ ] **Step 7.1.** Update `CLAUDE.md`: add a one-line `src/Util/` mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing at `ProjectZomboidRedactor` for downstream PII scrubbing; update the "Scaffolded games" line to mention that `ProjectZomboid` now also has a Redactor implementation under `src/Util/ProjectZomboid/`. +- [ ] **Step 7.2.** Update `README.md`: add a short usage block showing `(new ProjectZomboidRedactor())->redact($logContent)` as a render-time scrub option, alongside the existing worked example. +- [ ] **Step 7.3.** Update `CHANGELOG.md`: move Redactor out of the **Deferred** section under `[0.1.0]`, OR add a new `[Unreleased]` section if the v0.1.0 line should remain accurate as-shipped. Decision: **add `[Unreleased]`** — v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth. +- [ ] **Step 7.4.** Run `composer test` once more for safety; confirm 195+(redactor tests) green. +- [ ] **Step 7.5.** Commit: `docs: document Redactor utility in CLAUDE.md, README, CHANGELOG`. + +### Task 8 — Final verification + +- [ ] **Step 8.1.** Run `composer test`. All tests green. +- [ ] **Step 8.2.** Re-run `vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors`. Expect zero output beyond the standard pass summary. +- [ ] **Step 8.3.** Sanity-check the branch with `git log --oneline master..redactor`. Should be the plan-doc commit plus 7 implementation commits = 8 commits total. +- [ ] **Step 8.4.** Push final state: `git push origin redactor`. **Do NOT merge to master.** User reviews diff and approves merge separately. + +--- + +## Open questions / spec gaps + +The spec is generally tight. Items worth flagging while implementing: + +1. **`/u` flag for Unicode safety.** Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use `/u` on all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock. +2. **Replacement order.** Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in `redact()` (Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant. +3. **HTML / JSON-encoded input.** Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g. `"` instead of `"`), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode. +4. **Future PII categories.** v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides. +5. **`src/Util/` is a new top-level directory** in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive. +6. **The empty `src/Printer//.gitkeep` situation.** Phase A scaffolding chose not to create `Printer//` directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home in `src/Util//` mirrors that — `src/Util/` is created with PZ as its first occupant; no stub `Hytale/`/`Minecraft/`/`SevenDaysToDie/` placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point. + +No spec contradictions found. No existing-code modifications required (additive-only design). + +--- + +## Branch / commit invariants + +- All commits land on the `redactor` branch. +- Master is not touched until the user explicitly approves merge after reviewing the diff. +- Conventional commit prefixes: `docs:`, `feat:`, `test:`, `refactor:`. (No `fix:` expected — this is greenfield work.) +- One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits. +- Backup tag `backup/pre-redactor` at `aec835e` lets us discard the branch and recover if the implementation goes sideways. +- Branch can be pushed to origin freely for visibility / review checkpoints. + +## Pointers + +- Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`. +- Synthetic fixtures the integration test will reuse: `test/src/Games/ProjectZomboid/fixtures/*.txt`. +- Existing per-game layout precedent: `src/Analyser/ProjectZomboid/`, `src/Pattern/ProjectZomboid/`, `src/Log/ProjectZomboid/`. +- Workflow conventions and pitfalls: `CLAUDE.md`. diff --git a/docs/superpowers/specs/2026-04-30-redactor-design.md b/docs/superpowers/specs/2026-04-30-redactor-design.md index 574ef85..e2e7a34 100644 --- a/docs/superpowers/specs/2026-04-30-redactor-design.md +++ b/docs/superpowers/specs/2026-04-30-redactor-design.md @@ -1,7 +1,7 @@ # Codex Redactor utility — design spec > Retroactive: written 2026-05-01. -> **Status: deferred — not implemented.** This is a forward-looking design captured here for backfill symmetry and to inform iblogs's upload-time PII handling. +> **Status: implemented on the `redactor` branch (2026-05-01).** Plan: `docs/superpowers/plans/2026-05-01-redactor.md`. Arrival commit set documented in `CHANGELOG.md` `[Unreleased]`. The "Status: deferred" framing below is preserved for historical context; treat this file as the as-built design contract. ## Summary diff --git a/src/Util/ProjectZomboid/ProjectZomboidRedactor.php b/src/Util/ProjectZomboid/ProjectZomboidRedactor.php new file mode 100644 index 0000000..03b0d54 --- /dev/null +++ b/src/Util/ProjectZomboid/ProjectZomboidRedactor.php @@ -0,0 +1,123 @@ + name -> coordinates is mandatory. + * 3. Coordinates pass — replaces world coordinate triplets with a placeholder + * token. + * + * All regex passes use the /u flag for Unicode safety. + * + * Replacements are not reversible; do not apply to content that must later be + * restored to its original form. + */ +class ProjectZomboidRedactor implements RedactorInterface +{ + /** Regex matching a 17-digit SteamID64 anchored on the 76561198 universe prefix, with lookaround boundaries that reject embedded occurrences. */ + public const string STEAM_ID_REGEX = '/(?'; + + /** Matches a double-quoted player name that immediately follows the redacted Steam ID placeholder (cmd.txt / admin.txt shape); relies on the Steam ID pass having run first. */ + public const string PLAYER_AFTER_STEAMID_REGEX = '/(?<=76561198000000000) "(?[^"]+)"/u'; + + /** Matches the author value inside a ChatMessage{...} envelope, using a fixed-length lookbehind on ", author='" and a lookahead on the closing "'" so only the bare name is replaced. */ + public const string PLAYER_IN_CHATMESSAGE_REGEX = '/(?<=, author=\')(?[^\']+)(?=\')/u'; + + /** Matches the first double-quoted player name following a Combat: or Safety: subsystem token (pvp.txt shape); does NOT redact the second name after "hit" — deferred to v2. */ + public const string PLAYER_IN_PVP_SUBSYSTEM_REGEX = '/(?<=(?:Combat|Safety): )"(?[^"]+)"/u'; + + /** Zeroed-out coordinate triple used as the inner replacement; bracket/paren/`at` wrapper is preserved by the regex lookaround anchors. */ + public const string COORDS_REPLACEMENT = '0,0,0'; + + /** Matches integer or float coordinate triplets that immediately follow the literal ` at ` token (map.txt / item.txt shape); the trailing dot is preserved via lookahead. */ + public const string COORDS_AT_CLAUSE_REGEX = '/(?<= at )(?[\d.]+),(?[\d.]+),(?-?[\d.]+)(?=\.)/u'; + + /** Matches integer coordinate triplets enclosed in square brackets (ClientActionLog.txt / PerkLog.txt / cmd.txt @-context shape); the surrounding brackets are preserved via lookaround. */ + public const string COORDS_BRACKETED_REGEX = '/(?<=\[)(?\d+),(?\d+),(?-?\d+)(?=\])/u'; + + /** Matches integer coordinate triplets enclosed in round parentheses, anchored on a trailing PvP verb to disambiguate from server-metadata triples (pvp.txt Combat:/Safety: shape); only the attacker/first-coord set is redacted per line — the victim coords lack the trailing keyword and are deferred to v2. */ + public const string COORDS_PARENTHESISED_REGEX = '/(?<=\()(?\d+),(?\d+),(?-?\d+)(?=\) (?:hit|restore|store|true|false))/u'; + + private bool $redactSteamIds = true; + private bool $redactPlayerNames = true; + private bool $redactCoordinates = true; + + /** + * Enable or disable the Steam ID redaction pass. + * + * @param bool $on Pass true to enable, false to disable. + * @return static + */ + public function redactSteamIds(bool $on): static + { + $this->redactSteamIds = $on; + return $this; + } + + /** + * Enable or disable the player-name redaction pass. + * + * @param bool $on Pass true to enable, false to disable. + * @return static + */ + public function redactPlayerNames(bool $on): static + { + $this->redactPlayerNames = $on; + return $this; + } + + /** + * Enable or disable the coordinates redaction pass. + * + * @param bool $on Pass true to enable, false to disable. + * @return static + */ + public function redactCoordinates(bool $on): static + { + $this->redactCoordinates = $on; + return $this; + } + + /** + * Redact PII from the given Project Zomboid log content. + * + * Passes are applied in the mandatory order: Steam ID -> player name -> + * coordinates. See class docblock for rationale. + * + * @param string $content Raw log content that may contain PII. + * @return string Content with enabled PII categories replaced by tokens. + */ + public function redact(string $content): string + { + if ($this->redactSteamIds) { + $content = preg_replace(self::STEAM_ID_REGEX, self::STEAM_ID_REPLACEMENT, $content); + } + if ($this->redactPlayerNames) { + $content = preg_replace(self::PLAYER_AFTER_STEAMID_REGEX, ' "' . self::PLAYER_NAME_REPLACEMENT . '"', $content); + $content = preg_replace(self::PLAYER_IN_CHATMESSAGE_REGEX, self::PLAYER_NAME_REPLACEMENT, $content); + $content = preg_replace(self::PLAYER_IN_PVP_SUBSYSTEM_REGEX, '"' . self::PLAYER_NAME_REPLACEMENT . '"', $content); + } + if ($this->redactCoordinates) { + $content = preg_replace(self::COORDS_AT_CLAUSE_REGEX, self::COORDS_REPLACEMENT, $content); + $content = preg_replace(self::COORDS_BRACKETED_REGEX, self::COORDS_REPLACEMENT, $content); + $content = preg_replace(self::COORDS_PARENTHESISED_REGEX, self::COORDS_REPLACEMENT, $content); + } + return $content; + } +} diff --git a/src/Util/RedactorInterface.php b/src/Util/RedactorInterface.php new file mode 100644 index 0000000..9e8e8fc --- /dev/null +++ b/src/Util/RedactorInterface.php @@ -0,0 +1,20 @@ +" added Base.Aerosolbomb at 0,0,0.', + '[16-04-26 12:00:01.000] 76561198000000000 "" added IsoObject (fence_01) at 0,0,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='', text='hello'}.", + '[16-04-26 17:14:35.128][INFO] Combat: "" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.', + '[16-04-26 16:17:49.731][LOG] Safety: "" (0,0,0) restore true.', + '[16-04-26 12:00:02.000] [76561198000000000][ISEnterVehicle][Player2][0,0,0][Van_LectroMax].', + ]); + + $output = (new ProjectZomboidRedactor())->redact($input); + + $this->assertSame($expected, $output, 'With all three toggles on, every Steam ID, player name context, and coord shape must be replaced.'); + } + + public function testSteamIdToggleOffLeavesSteamIdsIntact(): void + { + // All three PII categories present; Steam ID toggle is disabled. + // + // Important nuance: PLAYER_AFTER_STEAMID_REGEX anchors on the redacted placeholder + // 76561198000000000. With redactSteamIds(false) the raw Steam ID survives, so the + // regex does NOT fire for lines in the "after-Steam-ID" shape — those names survive + // too. Names anchored by other contexts (ChatMessage author, Combat:/Safety:) are + // still redacted because those regexes don't depend on the Steam ID pass. + $input = implode("\n", [ + // after-Steam-ID shape: name will NOT be redacted because the Steam ID is raw + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + // ChatMessage author: still redacted (anchor is independent of Steam ID pass) + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.", + // Combat: name + attacker coords + '[16-04-26 17:14:35.128][INFO] Combat: "Player2" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.', + ]); + + $expected = implode("\n", [ + // Steam ID intact; "Player1" NOT redacted (anchor regex didn't fire) + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 0,0,0.', + // ChatMessage name redacted; coords were an at-clause → redacted + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='', text='hello'}.", + // Combat: name + attacker coords both redacted + '[16-04-26 17:14:35.128][INFO] Combat: "" (0,0,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.', + ]); + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + $this->assertSame( + $expected, + $output, + 'With Steam ID toggle off: raw Steam IDs survive; PLAYER_AFTER_STEAMID_REGEX does not fire (no placeholder to anchor on) so those names also survive; ChatMessage and Combat:/Safety: names are still redacted; coords are still redacted.', + ); + } + + public function testPlayerNameToggleOffLeavesNamesIntact(): void + { + // Steam IDs and coords redact; player names survive verbatim. + $input = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.", + '[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (1050,2050,0) restore true.', + ]); + + $expected = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198000000000 "Player1" added Base.Aerosolbomb at 0,0,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.", + '[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (0,0,0) restore true.', + ]); + + $output = (new ProjectZomboidRedactor()) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'With player-name toggle off, all player names must survive; Steam IDs and coords must still be redacted.'); + } + + public function testCoordinatesToggleOffLeavesCoordsIntact(): void + { + // Steam IDs and player names redact; coordinates survive verbatim. + $input = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + '[16-04-26 12:00:01.000] [76561198222222222][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].', + '[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.', + ]); + + $expected = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198000000000 "" added Base.Aerosolbomb at 1000,2000,0.', + '[16-04-26 12:00:01.000] [76561198000000000][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].', + '[16-04-26 17:14:35.128][INFO] Combat: "" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.', + ]); + + $output = (new ProjectZomboidRedactor()) + ->redactCoordinates(false) + ->redact($input); + + $this->assertSame($expected, $output, 'With coordinates toggle off, all coord triplets must survive; Steam IDs and player names must still be redacted.'); + } + + public function testAllTogglesOffReturnsInputByteForByte(): void + { + // Disabling every toggle must produce an output identical to the input — + // the "passthrough" contract: opt-out means truly nothing happens. + $input = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='hello'}.", + '[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.', + '[16-04-26 12:00:01.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].', + ]); + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redactCoordinates(false) + ->redact($input); + + $this->assertSame($input, $output, 'With all three toggles disabled, the output must be byte-for-byte identical to the input.'); + } +} diff --git a/test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php b/test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php new file mode 100644 index 0000000..6b241a3 --- /dev/null +++ b/test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php @@ -0,0 +1,124 @@ +redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Integer coords following " at " must be replaced; leading "at " and trailing "." must be preserved.'); + } + + public function testRedactsAtClauseFloatCoords(): void + { + // map.txt shape: IsoObject form with float coords (x.x,y.y,z.z). + $input = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 1010.0,2010.0,0.0.'; + $expected = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 0,0,0.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Float coords following " at " must be replaced; the IsoObject parenthesised form must be unaffected.'); + } + + public function testRedactsBracketedCoords(): void + { + // ClientActionLog.txt shape: strict 5-field bracketed structure. + // The Steam ID bracket and action/player/param brackets must survive. + $input = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].'; + $expected = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Coord bracket must become [0,0,0]; Steam ID, action, player name, and param brackets must be unaffected.'); + } + + public function testRedactsBracketedNegativeZ(): void + { + // Basement Z coordinates are negative; the regex must handle the leading minus. + $input = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].'; + $expected = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Negative Z (basement level) inside square brackets must be replaced.'); + } + + public function testRedactsParenthesisedCoordsBeforeHit(): void + { + // pvp.txt Combat: shape. The attacker coords are followed by ") hit" and ARE + // redacted. The victim coords are followed by ") weapon=" and are NOT redacted + // in v1 — the trailing-keyword anchor is intentionally absent for that position. + $input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.'; + $expected = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + // Attacker coords (before "hit") are redacted; victim coords (before "weapon=") are NOT — deferred to v2. + $this->assertSame($expected, $output, 'Attacker coords before "hit" must be replaced; victim coords without a trailing keyword must survive.'); + } + + public function testRedactsParenthesisedCoordsBeforeSafetyVerb(): void + { + // pvp.txt Safety: shape; coords followed by ") restore true". + $input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.'; + $expected = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (0,0,0) restore true.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Coords followed by ") restore" must be replaced.'); + } + + public function testServerMetadataTriplesAreNotRedacted(): void + { + // DebugLog-server.txt entries contain server-state metadata that superficially + // resembles coordinates but is not: "st:48,648,157,584" is a 4-component token, + // "t:1776297642406" is a millisecond timestamp. Neither pattern lives inside + // brackets, parentheses followed by a PvP verb, or after " at " — so none of + // the three coordinate regexes should fire. + $input = '[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297642406, st:48,648,157,584> Server starting up.'; + + $output = (new ProjectZomboidRedactor())->redact($input); + + $this->assertSame($input, $output, 'Server metadata triples (st:) and millisecond timestamps (t:) must pass through unchanged.'); + } + + public function testToggleOffLeavesCoordsIntact(): void + { + $input = '[16-04-26 12:00:04.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redactCoordinates(false) + ->redact($input); + + $this->assertSame($input, $output, 'With the coordinates toggle disabled the original input must be returned unchanged.'); + } +} diff --git a/test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php b/test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php new file mode 100644 index 0000000..b84e32a --- /dev/null +++ b/test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php @@ -0,0 +1,99 @@ + do not accidentally re-match and produce a doubly- + * nested result like "" → something else. + */ +class ProjectZomboidRedactorIdempotenceTest extends TestCase +{ + public function testIdempotenceSteamIdOnly(): void + { + $input = implode("\n", [ + 'Players: 76561198111111111, 76561198222222222, 76561198333333333 connected.', + '[16-04-26 12:00:00.000] [76561198111111111][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].', + ]); + + $redactor = new ProjectZomboidRedactor(); + $redacted = $redactor->redact($input); + $redactedAgain = $redactor->redact($redacted); + + $this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to Steam-ID-only input must produce the same result as applying it once.'); + } + + public function testIdempotencePlayerNamesOnly(): void + { + // Input already has the Steam ID placeholder in place (as the Steam ID pass + // would have written it), so PLAYER_AFTER_STEAMID_REGEX can fire. After the + // first pass the name becomes ""; the second pass must leave "" + // untouched — it is not a valid display name inside double quotes preceded + // by the Steam ID placeholder anchor in a way that would re-match, because + // the replacement written is: 76561198000000000 "", and the regex + // would need an unquoted player name inside quotes after the placeholder. + // "" (with the angle brackets) does satisfy [^"]+ but the second + // pass must still produce an identical result. + $input = implode("\n", [ + '76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hi'}.", + '[16-04-26 16:17:49.731][LOG] Safety: "Player2" (1000,2000,0) restore true.', + ]); + + $redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactCoordinates(false); + $redacted = $redactor->redact($input); + $redactedAgain = $redactor->redact($redacted); + + $this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to player-name-only input must produce the same result as applying it once.'); + } + + public function testIdempotenceCoordsOnly(): void + { + $input = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + '[16-04-26 12:00:01.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].', + '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.', + '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.', + ]); + + $redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactPlayerNames(false); + $redacted = $redactor->redact($input); + $redactedAgain = $redactor->redact($redacted); + + $this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to coords-only input must produce the same result as applying it once; the placeholder 0,0,0 must not be re-matched.'); + } + + public function testIdempotenceAllCategories(): void + { + // Full input: all three PII categories in multiple lexical contexts. + // After the first redact(), every placeholder is in place. The second + // redact() must make no further changes. + $input = implode("\n", [ + '[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.', + '[16-04-26 12:00:01.000] 76561198222222222 "Player2" teleported to 1050,2050,0.', + "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.", + '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.', + '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.', + '[16-04-26 12:00:02.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].', + ]); + + $redactor = new ProjectZomboidRedactor(); + $redacted = $redactor->redact($input); + $redactedAgain = $redactor->redact($redacted); + + $this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to input with all PII categories must produce the same result as applying it once; no placeholder must re-match on the second pass.'); + } +} diff --git a/test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php b/test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php new file mode 100644 index 0000000..6b94d02 --- /dev/null +++ b/test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php @@ -0,0 +1,272 @@ +` coords survive because COORDS_AT_CLAUSE_REGEX + * anchors on ` at `, not ` to `. + */ +class ProjectZomboidRedactorIntegrationTest extends TestCase +{ + private static string $fixturesDir = __DIR__ . '/../../../src/Games/ProjectZomboid/fixtures'; + + // --------------------------------------------------------------------------- + // Data providers + // --------------------------------------------------------------------------- + + /** + * Yields [fixturePath] for every PZ fixture file. + */ + public static function fixturePathProvider(): array + { + $dir = self::$fixturesDir; + return [ + 'admin' => [$dir . '/admin-minimal.txt'], + 'burd-journals' => [$dir . '/burd-journals-minimal.txt'], + 'chat' => [$dir . '/chat-minimal.txt'], + 'client-action' => [$dir . '/client-action-minimal.txt'], + 'cmd' => [$dir . '/cmd-minimal.txt'], + 'debug-server' => [$dir . '/debug-server-minimal.txt'], + 'item' => [$dir . '/item-minimal.txt'], + 'map' => [$dir . '/map-minimal.txt'], + 'perk' => [$dir . '/perk-minimal.txt'], + 'pvp' => [$dir . '/pvp-minimal.txt'], + 'user' => [$dir . '/user-minimal.txt'], + ]; + } + + /** + * Yields [fixturePath] for the subset of fixtures where every synthetic + * player name (Player1 / Player2 / AdminUser / PlayerSuspect) appears + * exclusively in a context the redactor recognises: + * + * - chat: ChatMessage{author='...'} envelope + * - cmd, item, map, user: 77-char-Steam-ID followed by "..." quoted name + * + * Fixtures intentionally excluded: + * + * - admin: names appear in free-text positions (no Steam-ID anchor, + * no quotes, no Combat:/Safety: prefix). Names survive in v1. + * - client-action, + * perk: names appear inside [...] brackets, not "..." quotes. + * PLAYER_AFTER_STEAMID_REGEX requires double-quotes. + * - pvp: attacker name redacts but victim name after `hit "..."` + * survives in v1 (Task 3 limitation). + * - burd-journals, + * debug-server: no synthetic player names present. + */ + public static function fixturesWhereAllNamesAreInCoveredContextsProvider(): array + { + $dir = self::$fixturesDir; + return [ + 'chat' => [$dir . '/chat-minimal.txt'], + 'cmd' => [$dir . '/cmd-minimal.txt'], + 'item' => [$dir . '/item-minimal.txt'], + 'map' => [$dir . '/map-minimal.txt'], + 'user' => [$dir . '/user-minimal.txt'], + ]; + } + + /** + * Yields [fixturePath, logClass] for the fixtures whose log class parses + * them. All 11 fixtures are represented. + */ + public static function fixtureWithLogClassProvider(): array + { + $dir = self::$fixturesDir; + return [ + 'admin' => [$dir . '/admin-minimal.txt', ProjectZomboidAdminLog::class], + 'burd-journals' => [$dir . '/burd-journals-minimal.txt', ProjectZomboidBurdJournalsLog::class], + 'chat' => [$dir . '/chat-minimal.txt', ProjectZomboidChatLog::class], + 'client-action' => [$dir . '/client-action-minimal.txt', ProjectZomboidClientActionLog::class], + 'cmd' => [$dir . '/cmd-minimal.txt', ProjectZomboidCmdLog::class], + 'debug-server' => [$dir . '/debug-server-minimal.txt', ProjectZomboidServerLog::class], + 'item' => [$dir . '/item-minimal.txt', ProjectZomboidItemLog::class], + 'map' => [$dir . '/map-minimal.txt', ProjectZomboidMapLog::class], + 'perk' => [$dir . '/perk-minimal.txt', ProjectZomboidPerkLog::class], + 'pvp' => [$dir . '/pvp-minimal.txt', ProjectZomboidPvpLog::class], + 'user' => [$dir . '/user-minimal.txt', ProjectZomboidUserLog::class], + ]; + } + + // --------------------------------------------------------------------------- + // Helper + // --------------------------------------------------------------------------- + + private function redact(string $content): string + { + return (new ProjectZomboidRedactor())->redact($content); + } + + // --------------------------------------------------------------------------- + // Test 1 — Steam ID normalisation + // --------------------------------------------------------------------------- + + /** + * After redaction every 17-digit Steam ID that is NOT the zero-placeholder + * must be gone. The zero-placeholder itself (76561198000000000) is the only + * Steam ID that may remain. + */ + #[DataProvider('fixturePathProvider')] + public function testFixtureContainsNoSteamIdsAfterRedaction(string $fixturePath): void + { + $content = (new PathLogFile($fixturePath))->getContent(); + $redacted = $this->redact($content); + + $matches = preg_match_all('/(?assertSame( + 0, + $matches, + sprintf( + 'After redaction, fixture "%s" must contain no non-zero-placeholder Steam IDs, but %d were found.', + basename($fixturePath), + $matches, + ), + ); + } + + // --------------------------------------------------------------------------- + // Test 2 — Structural preservation (re-parse after redaction) + // --------------------------------------------------------------------------- + + /** + * The redacted content, fed back through the corresponding parser, must + * produce exactly the same number of log entries as the original content. + * + * This asserts that the redactor does not corrupt timestamps, delimiters, + * or structural tokens that the parser relies on. + * + * @param string $fixturePath Path to the fixture file. + * @param class-string<\IndifferentKetchup\Codex\Log\Log> $logClass + * Fully-qualified name of the Log subclass that corresponds to this fixture. + */ + #[DataProvider('fixtureWithLogClassProvider')] + public function testFixtureRedactedOutputParsesToSameEntryCount(string $fixturePath, string $logClass): void + { + $content = (new PathLogFile($fixturePath))->getContent(); + + /** @var \IndifferentKetchup\Codex\Log\Log $originalLog */ + $originalLog = (new $logClass())->setLogFile(new PathLogFile($fixturePath)); + $originalLog->parse(); + $originalCount = count($originalLog->getEntries()); + + $redacted = $this->redact($content); + + /** @var \IndifferentKetchup\Codex\Log\Log $redactedLog */ + $redactedLog = (new $logClass())->setLogFile(new StringLogFile($redacted)); + $redactedLog->parse(); + $redactedCount = count($redactedLog->getEntries()); + + $this->assertSame( + $originalCount, + $redactedCount, + sprintf( + 'Parsing the redacted "%s" fixture with %s must yield the same entry count (%d) as parsing the original, but got %d.', + basename($fixturePath), + $logClass, + $originalCount, + $redactedCount, + ), + ); + } + + // --------------------------------------------------------------------------- + // Test 3 — Idempotence + // --------------------------------------------------------------------------- + + /** + * Applying redact() a second time must produce no further changes: + * redact(redact(content)) === redact(content). + * + * This guards against poorly-anchored regexes that would re-match the + * redaction placeholders themselves on a second pass. + */ + #[DataProvider('fixturePathProvider')] + public function testFixtureIsIdempotent(string $fixturePath): void + { + $content = (new PathLogFile($fixturePath))->getContent(); + + $redactor = new ProjectZomboidRedactor(); + $once = $redactor->redact($content); + $twice = $redactor->redact($once); + + $this->assertSame( + $once, + $twice, + sprintf( + 'redact(redact(content)) must equal redact(content) for fixture "%s"; a second pass must be a no-op.', + basename($fixturePath), + ), + ); + } + + // --------------------------------------------------------------------------- + // Test 4 — Player-name collapse in fully-covered fixtures + // --------------------------------------------------------------------------- + + /** + * For fixtures where every synthetic player name appears exclusively in a + * context the redactor recognises, no synthetic name should remain after + * redaction. + * + * This addresses observation #3 from the final code review (the integration + * tests previously asserted Steam-ID elimination + structural preservation + * + idempotence, but did not directly verify name collapse). The unit tests + * in ProjectZomboidRedactorPlayerNameTest cover this property exhaustively + * per-context; this integration test re-verifies it end-to-end against the + * fixtures that ride into iblogs. + */ + #[DataProvider('fixturesWhereAllNamesAreInCoveredContextsProvider')] + public function testFixturePlayerNamesCollapseInCoveredContexts(string $fixturePath): void + { + $content = (new PathLogFile($fixturePath))->getContent(); + $redacted = $this->redact($content); + + foreach (['Player1', 'Player2', 'AdminUser', 'PlayerSuspect'] as $name) { + $this->assertStringNotContainsString( + $name, + $redacted, + sprintf( + 'Fixture "%s": synthetic name %s survived redaction. Every name in this fixture should appear only in a covered lexical context.', + basename($fixturePath), + $name, + ), + ); + } + } +} diff --git a/test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php b/test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php new file mode 100644 index 0000000..d7d639b --- /dev/null +++ b/test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php @@ -0,0 +1,93 @@ +" admin.broadcastMessage @ 1020,2020,0.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Player name following the redacted Steam ID placeholder must be replaced.'); + } + + public function testRedactsChatMessageAuthor(): void + { + // The author field inside ChatMessage{...} must be replaced; the text + // payload ('hello') is not in scope for player-name redaction and must + // survive unchanged. + $input = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player1', text='hello'}."; + $expected = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='', text='hello'}."; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + $this->assertSame($expected, $output, 'ChatMessage author must be replaced while the text payload remains unchanged.'); + } + + public function testRedactsCombatNameInPvpLog(): void + { + // Only the FIRST quoted name (after "Combat: ") is redacted in v1. + // The second name (after "hit") is NOT yet redacted — deferred to v2. + // The weapon name ("Tire Iron (Worn)") must also survive unchanged. + $input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.'; + // Attacker coords (before "hit") are also replaced by the coordinates pass. + // Victim coords (before "weapon=") lack the trailing keyword and are NOT replaced — deferred to v2. + $expected = '[16-04-26 17:14:35.128][INFO] Combat: "" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + // Player1 (after "Combat: ") is replaced; attacker coords (before "hit") are also replaced. + // Player2 (after "hit") and victim coords (before "weapon=") are NOT replaced in v1 — deferred. + $this->assertSame($expected, $output, 'First Combat: player name and attacker coords must be replaced; second name, victim coords, and weapon must survive.'); + } + + public function testRedactsSafetyNameInPvpLog(): void + { + $input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.'; + // Coords (before ") restore") are also replaced by the coordinates pass. + $expected = '[16-04-26 16:17:49.731][LOG] Safety: "" (0,0,0) restore true.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + $this->assertSame($expected, $output, 'Player name and coords following the Safety: token must both be replaced.'); + } + + public function testBareQuotedStringWithoutAnchorIsNotTouched(): void + { + // "foo" is not preceded by a redacted Steam ID, not inside ChatMessage{...}, + // and not after Combat:/Safety: — it must pass through unchanged. + $input = 'option changed to "foo" successfully.'; + + $output = (new ProjectZomboidRedactor())->redact($input); + + $this->assertSame($input, $output, 'A quoted string with no matching anchor must not be redacted.'); + } + + public function testToggleOffLeavesNamesIntact(): void + { + $input = '76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redactPlayerNames(false) + ->redact($input); + + $this->assertSame($input, $output, 'With the player-name toggle disabled the original input must be returned unchanged.'); + } +} diff --git a/test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php b/test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php new file mode 100644 index 0000000..4c1c8c2 --- /dev/null +++ b/test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php @@ -0,0 +1,52 @@ +redact($input); + + $this->assertSame($expected, $output, 'All three distinct Steam IDs should be replaced with the zero placeholder.'); + } + + public function testNonSteamIdLongDigitsAreNotTouched(): void + { + // 13-digit Unix-millisecond timestamp (PZ log t: shape) and a 17-digit number + // that does not begin with 76561198 — neither should be altered. + $input = 't:1776297642406 score=12345678901234567'; + + $output = (new ProjectZomboidRedactor())->redact($input); + + $this->assertSame($input, $output, 'Non-SteamID digit sequences must not be modified.'); + } + + public function testEmbeddedSteamIdInsideLongerAlphanumericTokenIsNotTouched(): void + { + // The SteamID64 pattern is embedded inside a longer alphanumeric token; + // the negative lookaround boundaries should prevent a match. + $input = 'token=abc76561198000000001def other=data'; + + $output = (new ProjectZomboidRedactor())->redact($input); + + $this->assertSame($input, $output, 'A Steam ID embedded inside an alphanumeric token must not be redacted.'); + } + + public function testToggleOffLeavesSteamIdsIntact(): void + { + $input = 'Connected: 76561198111111111 and 76561198222222222.'; + + $output = (new ProjectZomboidRedactor()) + ->redactSteamIds(false) + ->redact($input); + + $this->assertSame($input, $output, 'With the Steam ID toggle disabled the original input must be returned unchanged.'); + } +}