From 081d40c2089dbf950c0ece13f78789fdc4c14573 Mon Sep 17 00:00:00 2001 From: indifferentketchup Date: Fri, 1 May 2026 15:08:49 +0000 Subject: [PATCH] docs: document Redactor utility in CLAUDE.md, README, CHANGELOG MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CLAUDE.md: added RedactorInterface bullet to the architecture list (after Custom Analyser subclasses, before Detectors); added ProjectZomboidRedactor entry under ProjectZomboid specifics; added src/Util/ to the game-subtrees layout code block with a prose note marking it as the sixth component directory introduced post-v0.1.0; added Pitfall 5 on mandatory pass order. README.md: new "Redaction" subsection between Quick start and Architecture — PHP snippet, replacement descriptions, three toggle methods, three documented v1 limitations. CHANGELOG.md: added [Unreleased] section (Added + Changed) above [0.1.0]. Removed the Redactor bullet from [0.1.0]'s Deferred list entirely — the historical record stays accurate (v0.1.0 shipped without it) and [Unreleased] now documents its arrival; a stub mention in Deferred would be redundant. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 11 ++++++++++- CLAUDE.md | 6 ++++++ README.md | 15 +++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 38ceea6..c0d3824 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,16 @@ All notable changes to `indifferentketchup/codex` are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +### Added + +- `RedactorInterface` (`src/Util/RedactorInterface.php`) and `ProjectZomboidRedactor` (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — render-time PII filter that scrubs Steam IDs, player names, and world coordinates from Project Zomboid log content. Three independent toggles default to on. Designed as a string-in/string-out utility so consumers can apply it at any rendering or export step. Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted; victim's name and coords (after `hit`) are deferred to v2. In admin lines, `teleported X to ` coordinates are not redacted in v1. + +### Changed + +- New top-level `src/Util/` directory introduced. The Redactor is its first occupant; future utilities (e.g. tokenising redactor variants) land here. + ## [0.1.0] — 2026-05-01 First public release. Codex is a generic PHP log parsing and analysis framework with full Project Zomboid server-log support across eight analysers. The Composer package name is `indifferentketchup/codex` (the repository directory and Gitea slug are `ik-codex`; the package name is not). @@ -32,7 +42,6 @@ First public release. Codex is a generic PHP log parsing and analysis framework ### Deferred -- **Codex `Redactor` utility** — design captured in `docs/superpowers/specs/2026-04-30-redactor-design.md`. Not implemented in v0.1.0. iblogs (the downstream consumer) handles upload-time PII filtering for this release; codex itself ships no PII helper. The deferred spec exists so iblogs's privacy story has a referenced design to point at and so a future implementation pass has a clear contract to start from. - **Other game implementations** — `Minecraft`, `Hytale`, and `SevenDaysToDie` are detective-stub-only. Each has a TODO `Detective` extending base `Detective`; their per-component subdirectories under `Analyser`, `Log`, `Parser`, and `Pattern` contain only `.gitkeep` placeholders. Real implementations land if and when fixtures and demand exist. - **Packagist publication** — v0.1.0 is consumable via Composer's `vcs` repository entry pointing at the Gitea remote. Pushing to Packagist is a separate decision and is not in scope for this release. diff --git a/CLAUDE.md b/CLAUDE.md index bad7a06..5404b64 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -49,6 +49,7 @@ Analysis of Insight[] - **`PatternParser`** is regex-driven. Lines that don't match the LINE regex append to the previous `Entry` — this is the mechanism that handles multi-line records like Java stack traces under an ERROR header. - **`PatternAnalyser`** walks entries, runs each registered insight class's static `getPatterns()` against entry text via `preg_match_all`, and emits coalesced insights (equal insights bump a counter instead of duplicating). - **Custom `Analyser` subclasses** are the right move when analysis needs cross-entry state — pairing events, sliding-window thresholds, comparing consecutive snapshots. `PatternAnalyser` operates per-entry only and can't express those. Phase B.3 (`ConnectionFailureAnalyser`, `ItemDuplicationAnalyser`, `SkillProgressionAnomalyAnalyser`) shows the shape: extend `Analyser`, override `analyse()`, walk `$this->log` once, aggregate, then emit coalesced `Problem`/`Information` insights at the end. Tunable thresholds belong as `public const` constants on the subclass with the rationale in a docblock. +- **`RedactorInterface`** is a render-time PII filter — string-in/string-out, configured per game, implemented at `src/Util//Redactor.php`. Consumers call `redact(string $content): string` on a concrete instance before rendering or exporting log content. - Detectors available out of the box: `SinglePatternDetector`, `WeightedSinglePatternDetector`, `LinePatternDetector` (returns match ratio), `MultiPatternDetector` (AND), and the path-based `FilenameDetector` (uses `LogFileInterface::getPath()`, returns `false` when no path is available). ## Game subtrees @@ -58,10 +59,13 @@ Layout is **components-outer with game suffix**, not games-outer: ``` src///... e.g. src/Log/ProjectZomboid/ProjectZomboidServerLog.php src/Pattern//Pattern.php (regex string constants; not a framework abstraction) +src/Util//... e.g. src/Util/ProjectZomboid/ProjectZomboidRedactor.php test/tests/Games//... test/src/Games//fixtures/-minimal.txt (synthetic fixtures only) ``` +`src/Util/` is the sixth top-level component directory, introduced post-v0.1.0-tag. Its first occupant is the Redactor; future game-agnostic utilities (tokenising redactor variants, etc.) land here too. + Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 11 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic. `src/Pattern/` is **not a framework abstraction** — patterns are plain `string` class constants. Each `Pattern` typically holds a `LINE` constant for the parser plus named-group extractor constants (`FIELDS`, `COMBAT`, `MOD_LOAD`, etc.) for analysers. @@ -74,6 +78,7 @@ Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty - A custom `Analyser` subclass (cross-entry logic): `UserLog → ConnectionFailureAnalyser`, `ItemLog → ItemDuplicationAnalyser`, `PerkLog → SkillProgressionAnomalyAnalyser`. - A configured `PatternAnalyser` (per-entry pattern matching): `ServerLog`, `PvpLog`, `AdminLog` register their respective Insight classes. - An empty `PatternAnalyser` for logs with no analysers yet: `ChatLog`, `ClientActionLog`, `CmdLog`, `MapLog`, `BurdJournalsLog`. These are wiring stubs awaiting future analysis work. +- **`ProjectZomboidRedactor`** at `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` — concrete `RedactorInterface` implementation. Downstream consumers call `redact(string): string` to scrub Steam IDs (zeroed placeholder), player names (``), and world coordinates (`0,0,0`) from log content. Three independent toggle methods default to on: `redactSteamIds(bool)`, `redactPlayerNames(bool)`, `redactCoordinates(bool)`. Pass order (Steam ID → player name → coords) is mandatory and enforced internally — see Pitfall 5. ### Standard test template for a Log subclass @@ -85,6 +90,7 @@ At minimum: (1) entry count after `parse()` matches the synthetic fixture's line 2. **PHPUnit 12 requires the `#[DataProvider('methodName')]` attribute.** The legacy `@dataProvider` annotation silently passes zero args and fails with `ArgumentCountError`. 3. **`Level::fromString()` defaults to `Level::INFO` for unknown tokens.** Project Zomboid log levels map: `LOG`/`INFO` → INFO; `WARN` → WARNING; `ERROR` → ERROR. 4. **`PatternParser` matches array** must declare a match-type for **every** capture group in the regex (`TIME`, `LEVEL`, or `PREFIX`); otherwise the parser throws on the unmapped index. Use non-capturing groups `(?:...)` for fields you want to skip. +5. **`ProjectZomboidRedactor` pass order is mandatory.** `PLAYER_AFTER_STEAMID_REGEX` anchors on the already-redacted Steam ID placeholder — it will not match raw Steam IDs. Do NOT swap the Steam ID and player-name passes, and do NOT stub out the Steam ID pass while leaving the player-name pass enabled. ## Workflow conventions diff --git a/README.md b/README.md index dc3b537..f6a823f 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,21 @@ Project Zomboid Debug Server Log If the log content arrives without a filesystem path (clipboard paste, web upload, stream), use `StringLogFile` or `StreamLogFile` instead of `PathLogFile`. The detective falls back to content signatures when the filename hint is absent. +## Redaction + +Before rendering or exporting log content, pass it through `ProjectZomboidRedactor` to strip PII: + +```php +use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor; + +$redactor = new ProjectZomboidRedactor(); +$safe = $redactor->redact($logContent); +``` + +This scrubs three categories in a fixed pass order: Steam IDs are replaced with a zeroed placeholder, player names with ``, and world coordinates with `0,0,0`. All three passes are on by default; opt out per category with `redactSteamIds(bool)`, `redactPlayerNames(bool)`, or `redactCoordinates(bool)`. + +Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted — the victim's name and coords (appearing after `hit`) are deferred to v2. In admin lines, `teleported X to ` coordinates are not redacted in v1. + ## Architecture ```