Compare commits
37 Commits
1cdc78c54c
...
v0.3.0
| Author | SHA1 | Date | |
|---|---|---|---|
| 656142dbf8 | |||
| c63adb06c4 | |||
| 0d18cfbfc6 | |||
| 45a5e1a3da | |||
| 6978175dff | |||
| 3df6836909 | |||
| b6949ff0c3 | |||
| f1d2831d92 | |||
| bb4ee0d16a | |||
| 58d0ef187b | |||
| 9cd898bc9f | |||
| 87a0562bd6 | |||
| fdf70a0c06 | |||
| 2e7bebc911 | |||
| 4fec3a58f6 | |||
| 511583035b | |||
| e1a7785cf4 | |||
| 2bd4fe6189 | |||
| 5b4f77a72f | |||
| 1657be7711 | |||
| 50194c72b2 | |||
| 6bf63f1823 | |||
| 081d40c208 | |||
| d6831c5851 | |||
| c2cb64e9a7 | |||
| 2d1cbccc5d | |||
| 44b6b99047 | |||
| 0c8dad9502 | |||
| 7755d8385c | |||
| 409de16003 | |||
| aec835e0eb | |||
| 6fde2d49ff | |||
| 52ff8cb3fe | |||
| 1485507c8f | |||
| ed920485dc | |||
| b99d8f3061 | |||
| 38fa1471ba |
7
.gitignore
vendored
7
.gitignore
vendored
@@ -5,3 +5,10 @@ Logs.zip
|
||||
.scratch/
|
||||
.claude/
|
||||
.claude.local.md
|
||||
|
||||
# Python bytecode caches from tools/pz-analyzer/.
|
||||
__pycache__/
|
||||
|
||||
# Editor / manual backup files.
|
||||
*.bak
|
||||
*.bak-*
|
||||
|
||||
84
CHANGELOG.md
Normal file
84
CHANGELOG.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to `indifferentketchup/codex` are documented here.
|
||||
|
||||
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.3.0] — 2026-05-04
|
||||
|
||||
Adds IP-address redaction to the PZ redactor, a new `ErrorContextAnalyser` for surrounding-context surfacing, the `tools/pz-analyzer/` Python toolset (pre-production Qwen-driven research analyser and production-bound deterministic classifier), and a parser fix for the PZ B42 log shape that was silently breaking level/prefix attribution since The Indie Stone dropped the per-line `t:` field. New public API surface across the redactor and the analyser-side classes makes this a minor bump rather than a patch.
|
||||
|
||||
### Added
|
||||
|
||||
- **IP redaction in `ProjectZomboidRedactor`** (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — fourth pass that scrubs IPv4 (strict 0-255 octets, optional `:port` suffix) and IPv6 (full, abbreviated, bracketed-with-port, IPv4-mapped) addresses, replacing them with the literal `[REDACTED_IP]`. New public API: `IP_REPLACEMENT`, `IPV4_REGEX`, `IPV6_REGEX` constants and a `redactIpAddresses(bool)` toggle (defaults on, mirroring the existing three category toggles). Pattern-disjoint from the Steam-ID → name → coordinates chain; runs first by convention. Strict regexes plus `filter_var()` validation prevent false positives on PZ timestamps and PHP / Java scope ops. 20 new unit tests across two files (`ProjectZomboidRedactorIpv4Test.php`, `ProjectZomboidRedactorIpv6Test.php`).
|
||||
- **`ErrorContextAnalyser`** (`src/Analyser/ProjectZomboid/ErrorContextAnalyser.php`) — generic-purpose analyser that walks `Entry[]` once and emits one `ErrorContextProblem` per ERROR / WARNING entry with up to `CONTEXT_BEFORE` (20) entries of leading context and `CONTEXT_AFTER` (20) entries of trailing context. Overlapping windows clip to `lastEmittedIndex + 1` so no Entry appears in two context arrays; emission caps at `HIT_CAP` (500) with a single `ErrorContextTruncatedInformation` appended when reached. Standalone — not auto-registered to any existing Log subclass's `getDefaultAnalyser()`; consumers wire it in explicitly. Companion classes `ErrorContextProblem` and `ErrorContextTruncatedInformation` under `src/Analysis/ProjectZomboid/`. 3 unit tests, 134 assertions.
|
||||
- **`tools/pz-analyzer/`** — Python toolset adjacent to the library (not part of the Composer package's autoload surface). `pz_redact_all.sh` is a one-shot Docker wrapper that runs the PHP redactor over `.scratch/pz/Logs/` and produces a gitignored `.scratch/pz/Logs.redacted/` directory. `pz_error_analysis.py` is a developer-facing Qwen-backed pre-production analyser that calls a local OpenAI-compatible endpoint to classify residual log shapes the deterministic side hasn't yet captured. `pz_parser.py` + `pz_classify.py` are the production-bound deterministic-only counterpart: pure parser module with mod attribution, file:line extraction, cause-chain unwinding, engine-noise tagging, and a two-level signature scheme (`pattern_id` + `signature`), plus a stdlib-only orchestrator that walks the redacted directory and emits a JSON report. 32 Python unit tests across three files, 16 synthetic fixtures.
|
||||
- `docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md` — design contract for `pz_parser.py` / `pz_classify.py`. The PHP-side `ErrorContextAnalyser` ships without a separate spec; its design fell out of a brainstorming session inline with the pzmm-pattern-port discussion.
|
||||
- New synthetic fixture `test/src/Games/ProjectZomboid/fixtures/debug-server-42x-minimal.txt` mirroring the existing B41 fixture in PZ B42 line shape.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`DebugServerPattern::LINE` regex relaxed** to handle PZ build 42.x. The Indie Stone dropped the per-line `t:` (microsecond) field and tightened the spacing between `f:N`, `t:N`, and `st:N,N,N,N>` markers somewhere on the way to build 42.17. The previous regex required the full `f:\d+,\s+t:\d+,\s+st:` triplet and silently failed on every B42 line. Now `(?:,\s+t:\d+)?` makes the `t:N,` field optional and `,?` makes the inter-field comma optional. Backwards-compatible — every B41 line continues to parse identically. `ProjectZomboidServerLogTest` now runs each parser-shape assertion via `#[DataProvider]` against both fixtures.
|
||||
- **Pass order in `ProjectZomboidRedactor::redact()`**: the new IP pass runs first, so the chain is now `IP → Steam ID → player name → coordinates`. The mandatory Steam ID → name → coordinates ordering is preserved; placement of the IP pass is by convention since its regexes are pattern-disjoint from the rest.
|
||||
- **`CLAUDE.md`** documents `iblogs` as the primary downstream consumer with a per-component checklist for cross-repo public API impact; the release-flow cadence; the feature-branch workflow set by the `redactor` and `iblogs-bootstrap` precedents; and the `docs/superpowers/specs|plans/` path convention.
|
||||
- **`.gitignore`** excludes `__pycache__/` (Python bytecode caches generated under `tools/pz-analyzer/`) and `*.bak` / `*.bak-*` (editor / manual backup files).
|
||||
|
||||
### Fixed
|
||||
|
||||
- PZ build 42.x server logs now parse with proper level / prefix attribution. Previously, every B42 line failed `DebugServerPattern::LINE` and the resulting ServerLog entries fell through as level `INFO` with no prefix. This silently disabled `ServerExceptionProblem` and `ModMissingProblem` (their regexes anchor on `[timestamp]...` at entry start, which a level-less orphan entry doesn't emit). The anchorless `EngineVersionInformation` continued to fire against the joined entry text, producing the user-visible symptom "one Information badge, empty Problems panel" on B42 logs. The fix restores per-line parsing, re-enables both Problem classes, and makes the error-count badge populate correctly.
|
||||
|
||||
### Test counts
|
||||
|
||||
- PHP suite: **287 tests, 654 assertions** (up from 260 / 492 at v0.2.0).
|
||||
- Python suite under `tools/pz-analyzer/`: **32 tests** (stdlib `unittest`, sub-10 ms).
|
||||
|
||||
## [0.2.0] — 2026-05-01
|
||||
|
||||
Render-time PII redaction utility added on the same calendar day as v0.1.0. Cut as a minor version bump rather than a patch because it adds a new public API surface (`RedactorInterface` plus the per-game implementation), which under semver is a minor change, not a patch. Consumers (notably iblogs) pin to `^0.2.0` to opt into the redactor-aware version.
|
||||
|
||||
### Added
|
||||
|
||||
- `RedactorInterface` (`src/Util/RedactorInterface.php`) and `ProjectZomboidRedactor` (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — render-time PII filter that scrubs Steam IDs, player names, and world coordinates from Project Zomboid log content. Three independent toggles default to on. Designed as a string-in/string-out utility so consumers can apply it at any rendering or export step. Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted; victim's name and coords (after `hit`) are deferred to v2. In admin lines, `teleported X to <coords>` coordinates are not redacted in v1.
|
||||
- 65 new test methods across six files under `test/tests/Util/Redactor/` — per-category unit tests, combined / toggle / idempotence matrix, and integration coverage that drives all 11 existing PZ fixtures through the redactor end-to-end. Suite total: 260 tests, 492 assertions.
|
||||
- `docs/superpowers/specs/2026-04-30-redactor-design.md` flipped from "deferred" to "implemented" status. Plan committed at `docs/superpowers/plans/2026-05-01-redactor.md`.
|
||||
|
||||
### Changed
|
||||
|
||||
- New top-level `src/Util/` directory introduced. The Redactor is its first occupant; future utilities (e.g. tokenising redactor variants) land here.
|
||||
|
||||
## [0.1.0] — 2026-05-01
|
||||
|
||||
First public release. Codex is a generic PHP log parsing and analysis framework with full Project Zomboid server-log support across eight analysers. The Composer package name is `indifferentketchup/codex` (the repository directory and Gitea slug are `ik-codex`; the package name is not).
|
||||
|
||||
### Added
|
||||
|
||||
- **Framework foundation** — generic `Log` / `Entry` / `Line` / `Parser` / `Analyser` / `Detective` / `Insight` pipeline forked from upstream `aternos/codex` and renamed end-to-end to `IndifferentKetchup\Codex\*` in `66a2fcc`. Zero `Aternos\Codex\*` namespace references remain in `src/` or `test/`.
|
||||
- **`FilenameDetector`** at `IndifferentKetchup\Codex\Detective\FilenameDetector` — path-based detector that uses the new `LogFileInterface::getPath()` accessor to dispatch on a filename hint. Falls back to `false` for path-less log files (`StringLogFile`, `StreamLogFile`).
|
||||
- **Project Zomboid log subclasses (11)** under `IndifferentKetchup\Codex\Log\ProjectZomboid\*` covering every PZ server-log file type: a multi-line `ProjectZomboidServerLog` for `DebugLog-server.txt`, an abstract `ProjectZomboidEventLog` base for the ten single-line logs, and concrete subclasses for `admin.txt`, `BurdJournals.txt`, `chat.txt`, `ClientActionLog.txt`, `cmd.txt`, `item.txt`, `map.txt`, `PerkLog.txt`, `pvp.txt`, `user.txt`.
|
||||
- **Pattern classes (11)** under `IndifferentKetchup\Codex\Pattern\ProjectZomboid\*` holding regex string constants. Each `<Type>Pattern` carries a `LINE` regex used by `PatternParser`, plus named-group extractor regexes (`FIELDS`, `COMBAT`, `MOD_LOAD`, etc.) used by analysers.
|
||||
- **`ProjectZomboidDetective`** at `IndifferentKetchup\Codex\Detective\ProjectZomboid\ProjectZomboidDetective` — pre-registers all 11 log subclasses in its constructor with paired filename-hint plus content-signature detectors.
|
||||
- **Phase B.1 ServerLog analysers (3)**: `EngineVersionAnalyser` (extracts engine version, build hash, and build date from the server banner), `ModLoadAnalyser` (mod load order plus missing-mod problems with attached `ModMissingSolution`), `ServerExceptionAnalyser` (Java exception type and stack-trace body, coalesced by exception type).
|
||||
- **Phase B.2 PvP and Admin analysers (2)**: `PvpDamageAnalyser` (filters zombie hits and zero-damage rows at the regex itself), `AdminAuditAnalyser` (verb-pattern dispatch across six admin actions: added item, added xp, granted access, changed option, reloaded options, teleported).
|
||||
- **Phase B.3 deferred analysers (3)** — first custom `Analyser` subclasses in the tree, addressing logic that vanilla `PatternAnalyser` cannot express: `ConnectionFailureAnalyser` (event pairing across the file), `ItemDuplicationAnalyser` (sliding-window heuristic with `THRESHOLD_COUNT=5`, `THRESHOLD_WINDOW_SECONDS=10`), `SkillProgressionAnomalyAnalyser` (consecutive-snapshot delta with `THRESHOLD_DELTA=3`). All three threshold constants ship with rationale docblocks and are tunable via subclass override.
|
||||
- **Synthetic test fixtures** under `test/src/Games/ProjectZomboid/fixtures/`, hand-crafted from observed PZ log shapes with placeholder identifiers per the project's privacy rules: Steam IDs `76561198000000001`–`76561198000000004`, names `Player1` / `Player2` / `AdminUser` / `PlayerSuspect`, generic coords. No real-log content reaches the index.
|
||||
- **End-to-end tests** validating each Log subclass's parser, each analyser's insight emission, and the Detective's dispatch behaviour against the synthetic fixtures. Final count: **195 tests, 412 assertions**.
|
||||
- **Project documentation**: `CLAUDE.md` with framework architecture, pitfalls, and workflow conventions; `README.md` with worked Project Zomboid example and per-game support table; design specs and as-built plans for Phase B.1 / B.2 / B.3 plus a deferred-status spec for the codex `Redactor` utility, all under `docs/superpowers/`.
|
||||
|
||||
### Changed
|
||||
|
||||
- **Layout: components-outer with game suffix.** Every game's code lives at `IndifferentKetchup\Codex\<Component>\<Game>\*` for the existing components (`Analyser`, `Analysis`, `Detective`, `Log`, `Parser`, `Pattern`). This is option 1 from the Phase A Step 2 layout decision; option 3 (a flat `IndifferentKetchup\Codex\Games\<Game>\*` tree) was originally proposed and was **not** selected.
|
||||
- **`LICENSE`** retains the original `Copyright (c) 2019-2026 Aternos GmbH` line per MIT requirements; the LICENSE file is byte-for-byte unchanged from the upstream import.
|
||||
- **`composer.json`** rewritten in `aae016d`: package name `indifferentketchup/codex`, MIT license, generic-framework description, single author entry, PSR-4 autoload roots set to `IndifferentKetchup\Codex\` and the test-fixture / test-suite namespaces, PHP `>=8.4` require constraint, PHPUnit `^12` dev dependency.
|
||||
- **`tests.yaml`** uses the modern `$GITHUB_OUTPUT` workflow command instead of the deprecated `::set-output` (commit `60f12bc`). CI matrix runs PHP 8.4 and 8.5.
|
||||
- **`.gitignore`** excludes `Logs.zip` (real production log fixtures) and `.scratch/` (extracted reference logs), plus `.claude/` and `.claude.local.md` for personal Claude Code artefacts.
|
||||
|
||||
### Deferred
|
||||
|
||||
- **Other game implementations** — `Minecraft`, `Hytale`, and `SevenDaysToDie` are detective-stub-only. Each has a TODO `<Game>Detective` extending base `Detective`; their per-component subdirectories under `Analyser`, `Log`, `Parser`, and `Pattern` contain only `.gitkeep` placeholders. Real implementations land if and when fixtures and demand exist.
|
||||
- **Packagist publication** — v0.1.0 is consumable via Composer's `vcs` repository entry pointing at the Gitea remote. Pushing to Packagist is a separate decision and is not in scope for this release.
|
||||
|
||||
[0.3.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.3.0
|
||||
[0.2.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.2.0
|
||||
[0.1.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.1.0
|
||||
15
CLAUDE.md
15
CLAUDE.md
@@ -49,6 +49,7 @@ Analysis of Insight[]
|
||||
- **`PatternParser`** is regex-driven. Lines that don't match the LINE regex append to the previous `Entry` — this is the mechanism that handles multi-line records like Java stack traces under an ERROR header.
|
||||
- **`PatternAnalyser`** walks entries, runs each registered insight class's static `getPatterns()` against entry text via `preg_match_all`, and emits coalesced insights (equal insights bump a counter instead of duplicating).
|
||||
- **Custom `Analyser` subclasses** are the right move when analysis needs cross-entry state — pairing events, sliding-window thresholds, comparing consecutive snapshots. `PatternAnalyser` operates per-entry only and can't express those. Phase B.3 (`ConnectionFailureAnalyser`, `ItemDuplicationAnalyser`, `SkillProgressionAnomalyAnalyser`) shows the shape: extend `Analyser`, override `analyse()`, walk `$this->log` once, aggregate, then emit coalesced `Problem`/`Information` insights at the end. Tunable thresholds belong as `public const` constants on the subclass with the rationale in a docblock.
|
||||
- **`RedactorInterface`** is a render-time PII filter — string-in/string-out, configured per game, implemented at `src/Util/<Game>/<Game>Redactor.php`. Consumers call `redact(string $content): string` on a concrete instance before rendering or exporting log content.
|
||||
- Detectors available out of the box: `SinglePatternDetector`, `WeightedSinglePatternDetector`, `LinePatternDetector` (returns match ratio), `MultiPatternDetector` (AND), and the path-based `FilenameDetector` (uses `LogFileInterface::getPath()`, returns `false` when no path is available).
|
||||
|
||||
## Game subtrees
|
||||
@@ -58,11 +59,14 @@ Layout is **components-outer with game suffix**, not games-outer:
|
||||
```
|
||||
src/<Component>/<Game>/... e.g. src/Log/ProjectZomboid/ProjectZomboidServerLog.php
|
||||
src/Pattern/<Game>/<Type>Pattern.php (regex string constants; not a framework abstraction)
|
||||
src/Util/<Game>/... e.g. src/Util/ProjectZomboid/ProjectZomboidRedactor.php
|
||||
test/tests/Games/<Game>/...
|
||||
test/src/Games/<Game>/fixtures/<type>-minimal.txt (synthetic fixtures only)
|
||||
```
|
||||
|
||||
Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `<Game>Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 12 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic.
|
||||
`src/Util/` is the sixth top-level component directory, introduced post-v0.1.0-tag. Its first occupant is the Redactor; future game-agnostic utilities (tokenising redactor variants, etc.) land here too.
|
||||
|
||||
Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `<Game>Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 11 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic.
|
||||
|
||||
`src/Pattern/` is **not a framework abstraction** — patterns are plain `string` class constants. Each `<Type>Pattern` typically holds a `LINE` constant for the parser plus named-group extractor constants (`FIELDS`, `COMBAT`, `MOD_LOAD`, etc.) for analysers.
|
||||
|
||||
@@ -74,23 +78,32 @@ Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty
|
||||
- A custom `Analyser` subclass (cross-entry logic): `UserLog → ConnectionFailureAnalyser`, `ItemLog → ItemDuplicationAnalyser`, `PerkLog → SkillProgressionAnomalyAnalyser`.
|
||||
- A configured `PatternAnalyser` (per-entry pattern matching): `ServerLog`, `PvpLog`, `AdminLog` register their respective Insight classes.
|
||||
- An empty `PatternAnalyser` for logs with no analysers yet: `ChatLog`, `ClientActionLog`, `CmdLog`, `MapLog`, `BurdJournalsLog`. These are wiring stubs awaiting future analysis work.
|
||||
- **`ProjectZomboidRedactor`** at `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` — concrete `RedactorInterface` implementation. Downstream consumers call `redact(string): string` to scrub Steam IDs (zeroed placeholder), player names (`<player>`), and world coordinates (`0,0,0`) from log content. Three independent toggle methods default to on: `redactSteamIds(bool)`, `redactPlayerNames(bool)`, `redactCoordinates(bool)`. Pass order (Steam ID → player name → coords) is mandatory and enforced internally — see Pitfall 5.
|
||||
|
||||
### Standard test template for a Log subclass
|
||||
|
||||
At minimum: (1) entry count after `parse()` matches the synthetic fixture's line count, (2) one or more named-group `FIELDS` regexes from the `<Type>Pattern` class extract correctly from a representative line, (3) `Detective` handed the fixture path returns an instance of this Log class. Use `#[DataProvider]` when the same shape repeats per file.
|
||||
|
||||
### Downstream consumers
|
||||
|
||||
`iblogs` (sibling repo at `/opt/iblogs`, package `indifferentketchup/iblogs`, fork of `aternosorg/mclogs`) is the primary consumer of codex via a Composer `vcs` repository entry pinned to the latest minor tag. Public-API changes in `src/{Detective,Log,Printer,Util}/*.php` and `src/Analysis/*.php` propagate there; when modifying those types, sanity-check the iblogs call sites at `/opt/iblogs/src/{Detective.php,Log.php,Printer/Printer.php,Printer/FormatModification.php,Api/Response/CodexLogResponse.php}` and the stub class at `/opt/iblogs/src/Data/Deobfuscator.php`.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
1. **`PatternParser` is incompatible with named regex groups.** PHP's `preg_match` returns named groups *plus* their numeric duplicates in the same array; `PatternParser`'s foreach iterates both and throws on the string-key entries. Convention: `LINE` regexes (used by the parser) use **unnamed** groups with field order documented in the Pattern class's docblock. Named groups are fine inside extractor regexes invoked from analysers, since `PatternAnalyser` hands the whole match array to `Insight::setMatches`.
|
||||
2. **PHPUnit 12 requires the `#[DataProvider('methodName')]` attribute.** The legacy `@dataProvider` annotation silently passes zero args and fails with `ArgumentCountError`.
|
||||
3. **`Level::fromString()` defaults to `Level::INFO` for unknown tokens.** Project Zomboid log levels map: `LOG`/`INFO` → INFO; `WARN` → WARNING; `ERROR` → ERROR.
|
||||
4. **`PatternParser` matches array** must declare a match-type for **every** capture group in the regex (`TIME`, `LEVEL`, or `PREFIX`); otherwise the parser throws on the unmapped index. Use non-capturing groups `(?:...)` for fields you want to skip.
|
||||
5. **`ProjectZomboidRedactor` pass order is mandatory.** `PLAYER_AFTER_STEAMID_REGEX` anchors on the already-redacted Steam ID placeholder — it will not match raw Steam IDs. Do NOT swap the Steam ID and player-name passes, and do NOT stub out the Steam ID pass while leaving the player-name pass enabled.
|
||||
|
||||
## Workflow conventions
|
||||
|
||||
- **One commit per concrete log type** when adding game support: pattern class + log subclass + synthetic fixture + test in a single commit, run `composer test`, then move on. `<Game>Detective::__construct()` wiring goes in its own follow-up commit once all log types are present.
|
||||
- **Out-of-scope cleanup goes in its own commit.** Tempting workflow/lint fixes (e.g. deprecated CI syntax, comment hygiene) noticed mid-feature should not be folded in — separate commit or follow-up PR.
|
||||
- **Pre-destructive checkpoint pattern.** Before bulk renames/moves: `git commit --allow-empty -m "pre-X checkpoint"` as a revert anchor. Skip the empty slot if it produces no diff at the end of a plan.
|
||||
- **Release flow.** Semver: a new public API surface bumps the minor version, not the patch (`v0.1.x → v0.2.x`). Cut: rename `[Unreleased]` to `[X.Y.Z] — YYYY-MM-DD` in `CHANGELOG.md`, add a `[X.Y.Z]:` link reference at the bottom, fresh empty `[Unreleased]` above; lightweight `backup/pre-vX.Y.Z` tag (local only) before annotated `git tag -a vX.Y.Z`; push the annotated tag only.
|
||||
- **Feature branches.** Substantive feature work lands on a `<feature>-bootstrap`-style branch off master with a `backup/pre-<feature>` lightweight tag at the branch start, merged `--no-ff` after user review. The `redactor` and `iblogs-bootstrap` branches set the precedent.
|
||||
- **Specs and plans live at** `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md` and `docs/superpowers/plans/YYYY-MM-DD-<topic>.md` per the brainstorming and writing-plans skill conventions.
|
||||
|
||||
## Privacy / fixture rules
|
||||
|
||||
|
||||
103
README.md
103
README.md
@@ -1,13 +1,112 @@
|
||||
# IndifferentKetchup Codex
|
||||
|
||||
A generic PHP log parsing and analysis framework. Provides interfaces and base implementations for reading log files, detecting log types, parsing entries into structured form, analysing them for problems and information, and printing results.
|
||||
Generic PHP log parsing and analysis framework. Reads a log file, detects which log type it is, parses entries (including multi-line records like Java stack traces), runs the type-specific analysers, and returns structured `Information` and `Problem` insights with attached `Solution`s where applicable.
|
||||
|
||||
## Installation
|
||||
Originally a fork of [`aternos/codex`](https://github.com/aternosorg/codex); the framework is intentionally game-agnostic. The reference implementation in this tree is Project Zomboid server logs.
|
||||
|
||||
## Install
|
||||
|
||||
```
|
||||
composer require indifferentketchup/codex
|
||||
```
|
||||
|
||||
Requires PHP `>=8.4`. No third-party runtime dependencies.
|
||||
|
||||
## Quick start
|
||||
|
||||
Given a Project Zomboid `DebugLog-server.txt`:
|
||||
|
||||
```php
|
||||
<?php
|
||||
require __DIR__ . '/vendor/autoload.php';
|
||||
|
||||
use IndifferentKetchup\Codex\Detective\ProjectZomboid\ProjectZomboidDetective;
|
||||
use IndifferentKetchup\Codex\Log\File\PathLogFile;
|
||||
|
||||
$detective = new ProjectZomboidDetective();
|
||||
$detective->setLogFile(new PathLogFile('2026-04-30_14-00_DebugLog-server.txt'));
|
||||
|
||||
$log = $detective->detect();
|
||||
$log->parse();
|
||||
$analysis = $log->analyse();
|
||||
|
||||
echo $log->getTitle(), "\n\n";
|
||||
|
||||
foreach ($analysis->getInformation() as $info) {
|
||||
echo "[INFO] ", $info->getMessage(), "\n";
|
||||
}
|
||||
|
||||
foreach ($analysis->getProblems() as $problem) {
|
||||
echo "[PROBLEM] ", $problem->getMessage(), "\n";
|
||||
foreach ($problem->getSolutions() as $solution) {
|
||||
echo " -> ", $solution->getMessage(), "\n";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For a session with mod issues and a server-side exception, output looks roughly like:
|
||||
|
||||
```
|
||||
Project Zomboid Debug Server Log
|
||||
|
||||
[INFO] Engine version: 42.16.3 (build <hash>, <build date>)
|
||||
[INFO] Mod loaded: <mod_id>
|
||||
[INFO] Mod loaded: <other_mod_id>
|
||||
[PROBLEM] Required mod "<missing>" not found.
|
||||
-> Subscribe to mod "<missing>" or remove its ID from the Mods= line in serverconfig.ini.
|
||||
[PROBLEM] Exception thrown: java.nio.file.NoSuchFileException
|
||||
```
|
||||
|
||||
If the log content arrives without a filesystem path (clipboard paste, web upload, stream), use `StringLogFile` or `StreamLogFile` instead of `PathLogFile`. The detective falls back to content signatures when the filename hint is absent.
|
||||
|
||||
## Redaction
|
||||
|
||||
Before rendering or exporting log content, pass it through `ProjectZomboidRedactor` to strip PII:
|
||||
|
||||
```php
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$safe = $redactor->redact($logContent);
|
||||
```
|
||||
|
||||
This scrubs three categories in a fixed pass order: Steam IDs are replaced with a zeroed placeholder, player names with `<player>`, and world coordinates with `0,0,0`. All three passes are on by default; opt out per category with `redactSteamIds(bool)`, `redactPlayerNames(bool)`, or `redactCoordinates(bool)`.
|
||||
|
||||
Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted — the victim's name and coords (appearing after `hit`) are deferred to v2. In admin lines, `teleported X to <coords>` coordinates are not redacted in v1.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
LogFile → Log → parse() → Entry[] of Line[] → analyse() → Analysis of Insight[]
|
||||
└── Information | Problem(+Solutions)
|
||||
```
|
||||
|
||||
- **`Detective`** ranks candidate `Log` subclasses by running each candidate's static `getDetectors()` and picking the highest-scoring result. Each game ships its own `<Game>Detective` that pre-registers its log classes.
|
||||
- **`PatternParser`** is regex-driven; lines that don't match the entry-start regex append to the previous `Entry`, which is how multi-line records (Java stack traces, indented warnings) are kept intact.
|
||||
- **Analysers** come in two flavours: configured `PatternAnalyser` instances for per-entry pattern matching, and custom subclasses of `Analyser` for cross-entry logic (pairing events, sliding-window thresholds, snapshot comparisons).
|
||||
- **Insights** are either `Information` (label + value) or `Problem` (with attached `Solution`s). Equal insights coalesce via a counter, so repeated patterns don't produce duplicate output.
|
||||
|
||||
Patterns live as plain `string` constants under `src/Pattern/<Game>/` — there is no `PatternInterface`. Each game adds files under `src/<Component>/<Game>/` (components-outer, game-suffixed). Full extension guide and conventions in [`CLAUDE.md`](CLAUDE.md).
|
||||
|
||||
## Game support
|
||||
|
||||
| Game | State |
|
||||
|---|---|
|
||||
| Project Zomboid | Full: 11 log subclasses across all the file types a server emits; analysers covering engine version, mod loading, server exceptions, PvP combat, admin audit, connection failures, item duplication, skill progression anomalies |
|
||||
| Minecraft | Stub only — `MinecraftDetective` skeleton, no log subclasses yet |
|
||||
| Hytale | Stub only |
|
||||
| Seven Days To Die | Stub only |
|
||||
|
||||
The framework itself is generic — adding a new game means writing the same shape of files Project Zomboid demonstrates, not modifying anything in `src/{Analyser,Analysis,Detective,Log,Parser,Printer,Pattern}/` outside the new game's subdirectory.
|
||||
|
||||
## Developing
|
||||
|
||||
`composer test` runs the suite. PHP and Composer are not required on the host — invocations wrap in the official `composer:latest` Docker image (PHP 8.5). See [`CLAUDE.md`](CLAUDE.md) for the wrapped command, file layout, and the workflow conventions used in this repo.
|
||||
|
||||
## Source
|
||||
|
||||
<https://git.indifferentketchup.com/indifferentketchup/ik-codex>
|
||||
|
||||
## License
|
||||
|
||||
MIT — see [`LICENSE`](LICENSE).
|
||||
|
||||
74
docs/superpowers/plans/2026-04-30-pz-analysers-deferred.md
Normal file
74
docs/superpowers/plans/2026-04-30-pz-analysers-deferred.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# ProjectZomboid Phase B.3 Deferred Analysers — As-Built Plan
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
|
||||
This document is a historical record of how Phase B.3 (the three deferred analysers from the original Step D candidate list) was implemented. The corresponding design spec is `docs/superpowers/specs/2026-04-30-pz-analysers-deferred-design.md`. The work is complete and merged to `master`; checkboxes are pre-checked.
|
||||
|
||||
**Goal:** Land three custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` (the first non-empty contents of that directory), three `Problem` subclasses under `src/Analysis/ProjectZomboid/`, threshold constants documented inline as `public const`, fixture extensions to exercise trigger and non-trigger paths, and e2e tests verifying the analysers' behaviour against the fixtures.
|
||||
|
||||
**Architecture:** Custom subclasses of the framework's abstract `Analyser`. Each overrides `analyse()` to walk `$this->log` once, aggregate cross-entry state, and emit coalesced `Problem` insights at the end. This is the first deviation from Phase B.1/B.2's vanilla-`PatternAnalyser` pattern; the reasoning is recorded in the design spec and in `CLAUDE.md`.
|
||||
|
||||
**Tech Stack:** PHP 8.4+, PHPUnit 12, Composer (root package: `indifferentketchup/codex`). PHP/Composer not installed on host — all command invocations wrap in `docker run --rm -v "$(pwd):/app" -w /app -u "$(id -u):$(id -g)" composer:latest …`.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### Task 0 — Pre-checkpoint
|
||||
|
||||
- [x] Empty checkpoint commit: `c444e85 pre-phase-B.3 checkpoint`
|
||||
|
||||
### Task 1 — `ConnectionFailureAnalyser` (UserLog)
|
||||
|
||||
Pairing logic: walk the log, count `attempting to join` and `allowed to join` events per Steam ID, emit a `ConnectionFailureProblem` for any Steam ID whose attempt count exceeds its allowed count.
|
||||
|
||||
- [x] Add `src/Analysis/ProjectZomboid/ConnectionFailureProblem.php` (Steam ID, player, unmatched count; `isEqual` coalesces by Steam ID)
|
||||
- [x] Add `src/Analyser/ProjectZomboid/ConnectionFailureAnalyser.php` — first file in this directory; the `.gitkeep` placeholder is removed in this commit
|
||||
- [x] Wire `ProjectZomboidUserLog::getDefaultAnalyser()` to return `new ConnectionFailureAnalyser()` and drop the now-unused `PatternAnalyser` import
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analyser/UserLogAnalysisTest.php` — asserts Player1 (`76561198000000001`) is flagged with `unmatchedAttempts == 1` and Player2 (`76561198000000002`) is not flagged
|
||||
- [x] `composer test` green: 188 tests, 392 assertions
|
||||
- [x] Commit: `73e9ca6 Add ConnectionFailureAnalyser`
|
||||
|
||||
Design note inside the analyser docblock: "attempting to join used queue" rows are surfaced as failures in v1 because a long queue wait is indistinguishable from a real failure without timing context. Tunable in v2 if false positives become noisy.
|
||||
|
||||
### Task 2 — `ItemDuplicationAnalyser` (ItemLog)
|
||||
|
||||
Sliding-window heuristic over `(steamid, item)` groups, restricted to positive-delta events. Negative-delta rows (drops/transfers) are filtered out.
|
||||
|
||||
- [x] Add `src/Analysis/ProjectZomboid/ItemDuplicationProblem.php` (Steam ID, player, item, event count; `isEqual` coalesces by `(steamid, item)`)
|
||||
- [x] Add `src/Analyser/ProjectZomboid/ItemDuplicationAnalyser.php` with two threshold constants and rationale docblocks: `THRESHOLD_COUNT = 5`, `THRESHOLD_WINDOW_SECONDS = 10`
|
||||
- [x] Wire `ProjectZomboidItemLog::getDefaultAnalyser()` to return `new ItemDuplicationAnalyser()`; drop unused `PatternAnalyser` import
|
||||
- [x] Extend `test/src/Games/ProjectZomboid/fixtures/item-minimal.txt`: append 6 Bullets9mm events at sub-second timestamps `19:50:00.001`–`.006` for AdminUser (trigger), plus 4 Plank events scattered `20:00:00`–`20:03:00` for Player1 (sub-threshold)
|
||||
- [x] Bump entry-count assertion in `ProjectZomboidItemLogTest::testParsesEachLineAsAnEntry`: 10 → 20
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analyser/ItemLogAnalysisTest.php` — asserts one `ItemDuplicationProblem` (AdminUser + Bullets9mm + 6 events), zero for the Plank group, and the threshold constants are positive
|
||||
- [x] `composer test` green: 191 tests, 400 assertions
|
||||
- [x] Commit: `ba3fae8 Add ItemDuplicationAnalyser`
|
||||
|
||||
Implementation note: the analyser uses a two-pointer sliding window per group, which is O(n) per group after the initial sort. `Entry::getTime()` returns integer Unix seconds (sub-second precision dropped); the burst events all collapse to the same Unix-second value so any positive window catches them.
|
||||
|
||||
### Task 3 — `SkillProgressionAnomalyAnalyser` (PerkLog)
|
||||
|
||||
Compare consecutive perks-snapshot rows per Steam ID; emit a problem for any single skill that gained more than `THRESHOLD_DELTA` levels between snapshots.
|
||||
|
||||
- [x] Add `src/Analysis/ProjectZomboid/SkillProgressionAnomalyProblem.php` (Steam ID, player, skill, fromLevel, toLevel, delta; `isEqual` coalesces by `(steamid, skill)`)
|
||||
- [x] Add `src/Analyser/ProjectZomboid/SkillProgressionAnomalyAnalyser.php` with `THRESHOLD_DELTA = 3` and a rationale docblock about PZ's slow skill leveling
|
||||
- [x] Wire `ProjectZomboidPerkLog::getDefaultAnalyser()` to return `new SkillProgressionAnomalyAnalyser()`; drop unused `PatternAnalyser` import
|
||||
- [x] Extend `test/src/Games/ProjectZomboid/fixtures/perk-minimal.txt`: append PlayerSuspect (Steam ID `76561198000000004`) with two snapshots — Strength 2→10 (+8 trigger), Fitness 2→8 (+6 trigger), Maintenance 0→3 (+3 boundary, does not trigger because comparison is strict `>`)
|
||||
- [x] Bump entry-count assertion in `ProjectZomboidPerkLogTest::testParsesEachLineAsAnEntry`: 6 → 10
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analyser/PerkLogAnalysisTest.php` — asserts exactly two problems for PlayerSuspect (Strength + Fitness, sorted), no problem for Maintenance, no problems for single-snapshot Player1/Player2, and the threshold constant is positive
|
||||
- [x] `composer test` green: 195 tests, 412 assertions
|
||||
- [x] Commit: `0c90e40 Add SkillProgressionAnomalyAnalyser`
|
||||
|
||||
Filtering note: the analyser skips event-token rows (`Login`, `Logout`, `LevelUp`) by checking that the bracketed event field contains a `Skill=N` pair via `PerkPattern::PERK_PAIR`. Only true perks-snapshot rows enter the comparison.
|
||||
|
||||
---
|
||||
|
||||
## Done condition (met)
|
||||
|
||||
After Task 3, `composer test` reports **195 tests, 412 assertions, all green** under PHPUnit 12.5.6 / PHP 8.5.5. All eight Step D candidate analysers (Phase B.1's three ServerLog + Phase B.2's seven PvP/Admin + Phase B.3's three deferred) are operational across their respective Log subclasses.
|
||||
|
||||
The directory `src/Analyser/ProjectZomboid/` now contains real code for the first time; its `.gitkeep` placeholder was removed in `73e9ca6`.
|
||||
|
||||
## Deviations from the original plan
|
||||
|
||||
None this phase. The 4-commit count and the per-analyser shape both match what was committed-to in chat before execution. No silent breakages, no missing closing braces. The only observation worth recording is that the planned commit count was inclusive of the pre-checkpoint, and the actual commit ordering matched the plan exactly.
|
||||
116
docs/superpowers/plans/2026-04-30-pz-analysers-pvp-admin.md
Normal file
116
docs/superpowers/plans/2026-04-30-pz-analysers-pvp-admin.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# ProjectZomboid Phase B.2 Analysers — As-Built Plan
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
|
||||
This document is a historical record of how Phase B.2 (PvP combat detection + admin verb dispatch) was implemented. The corresponding design spec is `docs/superpowers/specs/2026-04-30-pz-analysers-pvp-admin-design.md`. The work is complete and merged to `master`; checkboxes are pre-checked.
|
||||
|
||||
**Goal:** Land seven new `Information` insight classes (one for PvP combat, six for admin verbs) under `src/Analysis/ProjectZomboid/`, plus seven new pattern constants on `PvpPattern` / `AdminPattern`, then wire `ProjectZomboidPvpLog` and `ProjectZomboidAdminLog` default analysers to register them.
|
||||
|
||||
**Architecture:** Vanilla `PatternAnalyser` configured with the new insight classes. No custom `Analyser` subclasses (deferred to Phase B.3). `Entry::__toString()` joins lines with `\n`, but B.2 logs are single-line per entry so multi-line behaviour doesn't apply here.
|
||||
|
||||
**Tech Stack:** PHP 8.4+, PHPUnit 12, Composer (root package: `indifferentketchup/codex`). PHP/Composer not installed on host — all command invocations wrap in `docker run --rm -v "$(pwd):/app" -w /app -u "$(id -u):$(id -g)" composer:latest …`.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### Task 0 — Pre-checkpoint
|
||||
|
||||
- [x] Empty checkpoint commit: `df62da1 pre-phase-B.2 checkpoint`
|
||||
|
||||
### Task 1 — `PvpDamageInformation` + `PvpPattern::COMBAT_REAL`
|
||||
|
||||
- [x] Add `PvpPattern::COMBAT_REAL` constant (combat regex with negative lookahead on weapon and positive-non-zero damage clause)
|
||||
- [x] Add `src/Analysis/ProjectZomboid/PvpDamageInformation.php`
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analysis/PvpDamageInformationTest.php` covering pattern shape, match extraction, and three rejection cases (zombie weapon, zero damage, negative damage)
|
||||
- [x] `composer test` green: 167 tests, 343 assertions
|
||||
- [x] Commit: `55f769c Add PvpDamageInformation insight`
|
||||
|
||||
### Task 2 — `AdminAddedItemInformation` + `AdminPattern::ADDED_ITEM_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::ADDED_ITEM_ENTRY` constant (entry-anchored variant; the body-only `ADDED_ITEM` from Phase A stays in place)
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminAddedItemInformation.php`
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analysis/AdminAddedItemInformationTest.php`
|
||||
- [x] Commit: `90c85a0 Add AdminAddedItemInformation insight` — **see Deviations section below**
|
||||
- [x] Forward-fix: `0d85a05 Fix missing closing brace in AdminPattern`
|
||||
- [x] `composer test` green after forward-fix: 170 tests
|
||||
|
||||
### Task 3 — `AdminAddedXpInformation` + `ADDED_XP_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::ADDED_XP_ENTRY` constant
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminAddedXpInformation.php`
|
||||
- [x] Unit test
|
||||
- [x] `composer test` green: 173 tests
|
||||
- [x] Commit: `a2faa55 Add AdminAddedXpInformation insight`
|
||||
|
||||
### Task 4 — `AdminGrantedAccessInformation` + `GRANTED_ACCESS_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::GRANTED_ACCESS_ENTRY` constant
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminGrantedAccessInformation.php`
|
||||
- [x] Unit test
|
||||
- [x] `composer test` green: 175 tests
|
||||
- [x] Commit: `caed04d Add AdminGrantedAccessInformation insight`
|
||||
|
||||
### Task 5 — `AdminChangedOptionInformation` + `CHANGED_OPTION_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::CHANGED_OPTION_ENTRY` constant
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminChangedOptionInformation.php`
|
||||
- [x] Unit test
|
||||
- [x] `composer test` green: 177 tests
|
||||
- [x] Commit: `b7b89ef Add AdminChangedOptionInformation insight`
|
||||
|
||||
### Task 6 — `AdminReloadedOptionsInformation` + `RELOADED_OPTIONS_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::RELOADED_OPTIONS_ENTRY` constant
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminReloadedOptionsInformation.php`
|
||||
- [x] Unit test
|
||||
- [x] `composer test` green: 179 tests
|
||||
- [x] Commit: `64641fa Add AdminReloadedOptionsInformation insight`
|
||||
|
||||
### Task 7 — `AdminTeleportedInformation` + `TELEPORTED_ENTRY`
|
||||
|
||||
- [x] Add `AdminPattern::TELEPORTED_ENTRY` constant (handles negative Z for basement coordinates)
|
||||
- [x] Add `src/Analysis/ProjectZomboid/AdminTeleportedInformation.php`
|
||||
- [x] Unit test (positive and negative Z cases)
|
||||
- [x] `composer test` green: 182 tests
|
||||
- [x] Commit: `d15fc81 Add AdminTeleportedInformation insight`
|
||||
|
||||
### Task 8 — Wire `ProjectZomboidPvpLog::getDefaultAnalyser()`
|
||||
|
||||
- [x] Replace `return new PatternAnalyser();` with `(new PatternAnalyser())->addPossibleInsightClass(PvpDamageInformation::class)`
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analyser/PvpLogAnalysisTest.php` — asserts three real-PvP insights (Bare Hands, Tire Iron, Hunting Knife) and zero zombie/vehicle insights
|
||||
- [x] `composer test` green: 184 tests
|
||||
- [x] Commit: `51eb2de Wire ProjectZomboidPvpLog default analyser`
|
||||
|
||||
### Task 9 — Wire `ProjectZomboidAdminLog::getDefaultAnalyser()`
|
||||
|
||||
- [x] Register all six `Admin<Verb>Information` classes
|
||||
- [x] Add `test/tests/Games/ProjectZomboid/Analyser/AdminLogAnalysisTest.php` — asserts the 2+2+2+2+1+2 distribution and confirms the duplicate ShotgunShells row coalesces with `counter == 2`
|
||||
- [x] `composer test` green: 186 tests
|
||||
- [x] Commit: `c57d646 Wire ProjectZomboidAdminLog default analyser`
|
||||
|
||||
---
|
||||
|
||||
## Deviations from the original plan
|
||||
|
||||
### The `90c85a0` brace-fix interlude
|
||||
|
||||
Task 2's commit (`90c85a0 Add AdminAddedItemInformation insight`) shipped broken. While adding the first `_ENTRY` constant to `AdminPattern.php`, the `Edit` tool's `old_string` was `<TELEPORTED line>\n}` and the `new_string` included a docblock plus the new constant but **dropped the closing brace** of the class body. The commit was made before the test result was inspected, so it landed with a `ParseError: Unclosed '{'` and 9 cascading test errors.
|
||||
|
||||
Forward-fix `0d85a05 Fix missing closing brace in AdminPattern` restored the brace as a separate commit (per the `CLAUDE.md` workflow rule: "Always create new commits rather than amending"). The broken intermediate commit remains in history; force-pushing master to clean it would have cost more than the cosmetic gain.
|
||||
|
||||
The remaining five admin commits (Tasks 3–7) used a deliberate practice change: every subsequent `Edit` to `AdminPattern.php` included the closing `}` in both `old_string` and `new_string` so it couldn't be dropped again. No further breakage.
|
||||
|
||||
### Total commit count
|
||||
|
||||
11 commits vs the 10 originally outlined in the spec's planning section. The extra commit is the brace-fix.
|
||||
|
||||
### Test-count divergence note (now resolved)
|
||||
|
||||
When Phase B.1's plan was written I projected a final count of 158 tests for B.1; the actual landed count was 161 (off by 3 — Task 5's contribution wasn't summed in the plan footer). For B.2 the planned and actual per-step counts match exactly. No projection error this phase.
|
||||
|
||||
---
|
||||
|
||||
## Done condition (met)
|
||||
|
||||
After Task 9, `composer test` reports **186 tests, 387 assertions, all green** under PHPUnit 12.5.6 / PHP 8.5.5 (verified via the `composer:latest` Docker image). All five originally-planned analysers from the Step D Phase B scope (B.1's three plus B.2's two) are now operational on their respective Log subclasses.
|
||||
211
docs/superpowers/plans/2026-05-01-redactor.md
Normal file
211
docs/superpowers/plans/2026-05-01-redactor.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Redactor Utility Implementation Plan
|
||||
|
||||
> Forward-looking. No code is written by this document.
|
||||
> Branch: `redactor` (off master `aec835e`). Backup tag: `backup/pre-redactor`.
|
||||
> Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
|
||||
**Goal:** Land the `RedactorInterface` plus a concrete `ProjectZomboidRedactor` implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer.
|
||||
|
||||
**Architecture:** Standalone string-in/string-out utility under a new top-level `src/Util/` directory, with per-game implementations under `src/Util/<Game>/`. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (`redactSteamIds`, `redactPlayerNames`, `redactCoordinates`); defaults all on; "all toggles off" yields verbatim passthrough.
|
||||
|
||||
**Tech stack:** PHP 8.4+, PHPUnit 12, Composer (`indifferentketchup/codex` v0.1.0+). All command invocations wrap in the `composer:latest` Docker image per `CLAUDE.md`.
|
||||
|
||||
---
|
||||
|
||||
## Design questions — resolved
|
||||
|
||||
### a. Render-time vs ingest-time
|
||||
|
||||
**Decision: render-time. Confirm spec's lean.**
|
||||
|
||||
Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time `Filter` chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way *out* of storage, not on the way in.
|
||||
|
||||
**Why:** the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste.
|
||||
|
||||
**Implication for iblogs schema:** iblogs stores raw content; the redaction toggle in the iblogs UI invokes `ProjectZomboidRedactor::redact()` at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature.
|
||||
|
||||
### b. Redactor as standalone class vs Printer decorator
|
||||
|
||||
**Decision: standalone utility (option iii from the question).**
|
||||
|
||||
The Redactor is a `string → string` function. It does not know about `Insight`, `Printer`, or any other codex type. Three options were considered:
|
||||
|
||||
- **(i) Printer wrapper.** Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client).
|
||||
- **(ii) Pre-Printer pass on Insights.** Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1.
|
||||
- **(iii) Standalone string utility.** Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights.
|
||||
|
||||
The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core.
|
||||
|
||||
### c. PII field taxonomy for PZ
|
||||
|
||||
**Decision: regex-based with lexical context anchors. No structured-field detection in v1.**
|
||||
|
||||
PZ-specific PII categories observed in the in-tree fixtures and the `.scratch/pz/Logs/` reference corpus:
|
||||
|
||||
| Field | Detection | Rationale |
|
||||
|---|---|---|
|
||||
| Steam ID | regex with `76561198\d{9}` prefix anchor and word-boundary classes | Steam's `76561198` SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers). |
|
||||
| Player name | regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, `Combat:`/`Safety:` subsystem) | Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes. |
|
||||
| World coordinate triple | regex with bracket / paren / `at`-clause anchors | Generic `\d+,\d+,\d+` would over-redact server metadata (`f:0, t:NNNN, st:48,648,157,584`). Lexical context disambiguates. |
|
||||
|
||||
**Not redacted in v1:**
|
||||
|
||||
- **IP addresses.** PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side `IPv4Filter` / `IPv6Filter` (ported from upstream mclogs) covers the rare case where a mod might log them.
|
||||
- **Server-side usernames distinct from player names.** PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's `UsernameFilter` is Minecraft-specific and isn't mirrored here.
|
||||
- **BurdJournals scientific-notation Steam IDs** (`7.65611…E16`). Spec open-question 2 explicitly defers this to v2; the `[BurdJournals]` tag already disambiguates them as mod-internal.
|
||||
|
||||
**Hybrid (regex + structured-field) deferred.** A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. `ConnectionFailureProblem::$steamId` → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need.
|
||||
|
||||
### d. Replacement strategy
|
||||
|
||||
**Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.**
|
||||
|
||||
Per the spec:
|
||||
|
||||
| Category | Replacement |
|
||||
|---|---|
|
||||
| Steam ID | `76561198000000000` (zeroed placeholder, still a syntactically valid Steam ID) |
|
||||
| Player name | `<player>` |
|
||||
| Coordinates | `0,0,0` (with shape preserved per anchor — bracketed, parenthesised, or `at` clause) |
|
||||
|
||||
Why these specifically and not `[REDACTED]` / `[STEAM_ID]` / hashed:
|
||||
|
||||
- The placeholders **match the existing synthetic test fixtures** (`76561198000000001`–`76561198000000004` collapse to `76561198000000000`; player names `Player1`/`Player2`/`AdminUser` collapse to `<player>`). Tests can verify "redacted output looks like a synthetic fixture."
|
||||
- Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities.
|
||||
- Type-tagged replacements (`[STEAM_ID]`) break shape preservation: a Pattern looking for `\d{17}` would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only.
|
||||
- Hashing breaks shape preservation similarly and adds determinism / collision concerns.
|
||||
|
||||
If a consumer later needs `[STEAM_ID]`-style output, a `setReplacementStyle('typed' | 'placeholder' | 'redacted')` setter can be added without breaking the v1 API. v1 ships placeholder-only.
|
||||
|
||||
### e. Game-agnostic vs PZ-specific layout
|
||||
|
||||
**Decision: thin generic interface in `src/Util/` plus PZ-specific implementation in `src/Util/ProjectZomboid/`.**
|
||||
|
||||
```
|
||||
src/Util/
|
||||
├── RedactorInterface.php (1 method: redact(string): string)
|
||||
└── ProjectZomboid/
|
||||
└── ProjectZomboidRedactor.php (toggles + regex passes)
|
||||
```
|
||||
|
||||
**YAGNI tradeoff stated:** the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just `ProjectZomboidRedactor` and skip the interface. The interface earns its keep because **iblogs's call sites will type-hint against `RedactorInterface`**, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites.
|
||||
|
||||
The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (`src/Util/<Game>/`) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern).
|
||||
|
||||
**Note on the new `src/Util/` directory.** Codex currently has no `src/Util/` (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified.
|
||||
|
||||
### f. Test strategy
|
||||
|
||||
**Decision: hybrid — small dedicated synthetic fixtures under `test/src/Util/Redactor/` for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.**
|
||||
|
||||
**Dedicated unit fixtures** (small string constants in test classes, not separate files): per spec test plan #1–#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast.
|
||||
|
||||
**Integration test** that re-uses an existing PZ fixture (e.g. `test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt`). Two assertions:
|
||||
|
||||
- The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding `ProjectZomboidAdminLog`).
|
||||
- Idempotence: `redact(redact($x)) === redact($x)`. Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op.
|
||||
|
||||
**False-positive avoidance.** The synthetic fixtures use `76561198000000001` etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the `76561198\d{9}` prefix and replaces with `76561198000000000` — so `76561198000000001` becomes `76561198000000000` (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like `f:0, t:1776297642406, st:48,648,157,584`) is **not** touched.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
Tasks are intended for the `redactor` branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed).
|
||||
|
||||
### Task 0 — Plan doc commit
|
||||
|
||||
- [ ] **Step 0.1.** Already done out-of-band: `git checkout -b redactor` off master `aec835e`; `git tag backup/pre-redactor` at branch tip; this plan written.
|
||||
- [ ] **Step 0.2.** Commit this plan: `docs: add Redactor implementation plan` on branch `redactor`. Push branch to origin for review.
|
||||
|
||||
### Task 1 — Scaffold (interface + skeleton class with toggles)
|
||||
|
||||
- [ ] **Step 1.1.** Create `src/Util/RedactorInterface.php`. Single method: `public function redact(string $content): string;` PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters before `redact()`.
|
||||
- [ ] **Step 1.2.** Create `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` that implements the interface. Class structure: three private bool properties (`$redactSteamIds`, `$redactPlayerNames`, `$redactCoordinates`) all defaulting to `true`; three fluent setters (`redactSteamIds(bool): static`, etc.); `redact(string): string` body that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks).
|
||||
- [ ] **Step 1.3.** Run `composer test` — expect 195 tests still green (no Redactor tests yet).
|
||||
- [ ] **Step 1.4.** Commit: `feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles`.
|
||||
|
||||
### Task 2 — Steam ID redaction pass
|
||||
|
||||
- [ ] **Step 2.1.** Add `STEAM_ID_REGEX` and `STEAM_ID_REPLACEMENT` constants on `ProjectZomboidRedactor`. Regex uses the `76561198\d{9}` prefix anchor with word-boundary classes (per spec). The `/u` flag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII.
|
||||
- [ ] **Step 2.2.** Implement the Steam ID branch of `redact()`: when `$redactSteamIds` is true, run `preg_replace` against the input.
|
||||
- [ ] **Step 2.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php`. Tests: redaction of various distinct synthetic Steam IDs collapses all to `76561198000000000`; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact.
|
||||
- [ ] **Step 2.4.** Run `composer test`. Expect new tests pass; old 195 unaffected.
|
||||
- [ ] **Step 2.5.** Commit: `feat: add Steam ID redaction pass`.
|
||||
|
||||
### Task 3 — Player name redaction pass
|
||||
|
||||
- [ ] **Step 3.1.** Add three regex constants on `ProjectZomboidRedactor` for the three player-name lexical contexts: `PLAYER_AFTER_STEAMID_REGEX`, `PLAYER_IN_CHATMESSAGE_REGEX`, `PLAYER_IN_PVP_SUBSYSTEM_REGEX`. Replacement is `<player>` for all. **Order constraint:** the after-Steam-ID context anchors on the post-redaction Steam ID `76561198000000000`, so the player-name pass must run *after* the Steam ID pass. Document this in a class-level docblock.
|
||||
- [ ] **Step 3.2.** Implement the player-name branch of `redact()`: three sequential `preg_replace` calls when `$redactPlayerNames` is true.
|
||||
- [ ] **Step 3.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php`. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g. `"foo"` not preceded by a Steam ID) is **not** touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder.
|
||||
- [ ] **Step 3.4.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 3.5.** Commit: `feat: add player name redaction pass`.
|
||||
|
||||
### Task 4 — Coordinates redaction pass
|
||||
|
||||
- [ ] **Step 4.1.** Add three regex constants on `ProjectZomboidRedactor` for the three coordinate contexts: `COORDS_AT_CLAUSE_REGEX`, `COORDS_BRACKETED_REGEX`, `COORDS_PARENTHESISED_REGEX`. Replacements preserve shape (`0,0,0` inside whatever bracket/paren wrapper).
|
||||
- [ ] **Step 4.2.** Implement the coords branch of `redact()`: three sequential `preg_replace_callback` (or `preg_replace`) calls when `$redactCoordinates` is true.
|
||||
- [ ] **Step 4.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php`. Tests: each of the three contexts redacts correctly; **negative test** — server metadata `f:0, t:1776297642406, st:48,648,157,584` is not touched; basement Z-coordinates (`-1`) are handled; toggle-off leaves coords intact.
|
||||
- [ ] **Step 4.4.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 4.5.** Commit: `feat: add coordinates redaction pass`.
|
||||
|
||||
### Task 5 — Combined / toggle / idempotence tests
|
||||
|
||||
- [ ] **Step 5.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php`. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (`===` equality).
|
||||
- [ ] **Step 5.2.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php`. Tests: `redact(redact($x)) === redact($x)` for several input shapes including all three PII categories.
|
||||
- [ ] **Step 5.3.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 5.4.** Commit: `test: add Redactor combined and idempotence coverage`.
|
||||
|
||||
### Task 6 — Existing-fixture integration tests
|
||||
|
||||
- [ ] **Step 6.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php`. Loads each existing PZ fixture (`admin-minimal.txt`, `chat-minimal.txt`, etc.) via `PathLogFile`, calls `redact()` on the content, and asserts: (a) the redacted content still parses cleanly through the corresponding `ProjectZomboid<X>Log`'s parser without throwing; (b) the synthetic Steam IDs `76561198000000001`–`76561198000000004` all collapse to `76561198000000000`; (c) the synthetic player names (`Player1`, `Player2`, `AdminUser`, `PlayerSuspect`) all collapse to `<player>`.
|
||||
- [ ] **Step 6.2.** Run `composer test`. Expect all integration assertions pass without modifying any existing test or fixture.
|
||||
- [ ] **Step 6.3.** Commit: `test: add Redactor integration coverage against existing PZ fixtures`.
|
||||
|
||||
### Task 7 — Documentation updates
|
||||
|
||||
- [ ] **Step 7.1.** Update `CLAUDE.md`: add a one-line `src/Util/` mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing at `ProjectZomboidRedactor` for downstream PII scrubbing; update the "Scaffolded games" line to mention that `ProjectZomboid` now also has a Redactor implementation under `src/Util/ProjectZomboid/`.
|
||||
- [ ] **Step 7.2.** Update `README.md`: add a short usage block showing `(new ProjectZomboidRedactor())->redact($logContent)` as a render-time scrub option, alongside the existing worked example.
|
||||
- [ ] **Step 7.3.** Update `CHANGELOG.md`: move Redactor out of the **Deferred** section under `[0.1.0]`, OR add a new `[Unreleased]` section if the v0.1.0 line should remain accurate as-shipped. Decision: **add `[Unreleased]`** — v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth.
|
||||
- [ ] **Step 7.4.** Run `composer test` once more for safety; confirm 195+(redactor tests) green.
|
||||
- [ ] **Step 7.5.** Commit: `docs: document Redactor utility in CLAUDE.md, README, CHANGELOG`.
|
||||
|
||||
### Task 8 — Final verification
|
||||
|
||||
- [ ] **Step 8.1.** Run `composer test`. All tests green.
|
||||
- [ ] **Step 8.2.** Re-run `vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors`. Expect zero output beyond the standard pass summary.
|
||||
- [ ] **Step 8.3.** Sanity-check the branch with `git log --oneline master..redactor`. Should be the plan-doc commit plus 7 implementation commits = 8 commits total.
|
||||
- [ ] **Step 8.4.** Push final state: `git push origin redactor`. **Do NOT merge to master.** User reviews diff and approves merge separately.
|
||||
|
||||
---
|
||||
|
||||
## Open questions / spec gaps
|
||||
|
||||
The spec is generally tight. Items worth flagging while implementing:
|
||||
|
||||
1. **`/u` flag for Unicode safety.** Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use `/u` on all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock.
|
||||
2. **Replacement order.** Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in `redact()` (Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant.
|
||||
3. **HTML / JSON-encoded input.** Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g. `"` instead of `"`), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode.
|
||||
4. **Future PII categories.** v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides.
|
||||
5. **`src/Util/` is a new top-level directory** in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive.
|
||||
6. **The empty `src/Printer/<Game>/.gitkeep` situation.** Phase A scaffolding chose not to create `Printer/<Game>/` directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home in `src/Util/<Game>/` mirrors that — `src/Util/` is created with PZ as its first occupant; no stub `Hytale/`/`Minecraft/`/`SevenDaysToDie/` placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point.
|
||||
|
||||
No spec contradictions found. No existing-code modifications required (additive-only design).
|
||||
|
||||
---
|
||||
|
||||
## Branch / commit invariants
|
||||
|
||||
- All commits land on the `redactor` branch.
|
||||
- Master is not touched until the user explicitly approves merge after reviewing the diff.
|
||||
- Conventional commit prefixes: `docs:`, `feat:`, `test:`, `refactor:`. (No `fix:` expected — this is greenfield work.)
|
||||
- One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits.
|
||||
- Backup tag `backup/pre-redactor` at `aec835e` lets us discard the branch and recover if the implementation goes sideways.
|
||||
- Branch can be pushed to origin freely for visibility / review checkpoints.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
- Synthetic fixtures the integration test will reuse: `test/src/Games/ProjectZomboid/fixtures/*.txt`.
|
||||
- Existing per-game layout precedent: `src/Analyser/ProjectZomboid/`, `src/Pattern/ProjectZomboid/`, `src/Log/ProjectZomboid/`.
|
||||
- Workflow conventions and pitfalls: `CLAUDE.md`.
|
||||
@@ -0,0 +1,117 @@
|
||||
# ProjectZomboid analyser design (Phase B.3 — deferred analysers)
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
|
||||
## Summary
|
||||
|
||||
Add the three remaining Project Zomboid analysers from the original Step D candidate list — connection failure pairing, item duplication heuristic, and skill progression anomaly detection — by introducing custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/`. These are the first analysers in the tree that cannot be expressed as configured `PatternAnalyser` instances; they require cross-entry state (event pairing, sliding windows, snapshot deltas) that `PatternAnalyser` does not provide.
|
||||
|
||||
This document covers Phase B.3. Phase B.1 / B.2 docs are at `2026-04-30-pz-analysers-design.md` / `2026-04-30-pz-analysers-pvp-admin-design.md`. With Phase B.3, the original eight-analyser candidate list from Step D is fully implemented.
|
||||
|
||||
## Scope
|
||||
|
||||
- **In scope:** `ConnectionFailureAnalyser` + `ConnectionFailureProblem` (UserLog, event pairing); `ItemDuplicationAnalyser` + `ItemDuplicationProblem` (ItemLog, sliding-window heuristic); `SkillProgressionAnomalyAnalyser` + `SkillProgressionAnomalyProblem` (PerkLog, consecutive-snapshot delta); wiring three Log subclasses' `getDefaultAnalyser()`; extending two synthetic fixtures to exercise trigger and non-trigger cases; end-to-end tests.
|
||||
- **Out of scope (B.3):** the five other PZ logs whose `getDefaultAnalyser()` continues returning an empty `PatternAnalyser` stub (Chat, ClientAction, Cmd, Map, BurdJournals); the codex-side `Redactor` utility; Hytale / Minecraft / Seven Days To Die analysers; v0.1.0 release plumbing.
|
||||
|
||||
## Architectural shift: custom `Analyser` subclasses
|
||||
|
||||
Phases B.1 and B.2 established the convention that vanilla `PatternAnalyser` plus `Insight::isEqual()` coalescing is sufficient for per-entry pattern matching, and a custom Analyser subclass is **not** needed even for multi-line records (PatternParser's continuation-line behaviour combined with `Entry::__toString()` joins solves multi-line capture without subclassing).
|
||||
|
||||
Phase B.3's three analysers genuinely require cross-entry state:
|
||||
|
||||
- **ConnectionFailureAnalyser** must count `attempting to join` and `allowed to join` events per Steam ID and report unmatched attempts. PatternAnalyser dispatches each entry independently and has no mechanism to compare counts across entries.
|
||||
- **ItemDuplicationAnalyser** must group positive-delta item events by `(steamid, item)` tuple and slide a fixed-second window across each group. Sliding-window logic spans multiple entries by definition.
|
||||
- **SkillProgressionAnomalyAnalyser** must collect all perks-row snapshots per Steam ID, sort them by time, then compute pairwise deltas between consecutive snapshots. Pairwise comparison spans entries.
|
||||
|
||||
Each subclass extends the framework's abstract `Analyser`, overrides `analyse(): AnalysisInterface`, walks `$this->log` once to aggregate state, and emits `Problem` insights at the end. The CLAUDE.md "Framework architecture" section was updated alongside Phase B.3 to document this pattern.
|
||||
|
||||
## Components
|
||||
|
||||
Three `Analyser` subclasses under `src/Analyser/ProjectZomboid/` (the directory's `.gitkeep` placeholder is removed in this phase):
|
||||
|
||||
| Analyser | Target Log | Logic shape | Threshold constants |
|
||||
|---|---|---|---|
|
||||
| `ConnectionFailureAnalyser` | `ProjectZomboidUserLog` | Two-pass count of attempt vs allowed events per Steam ID; emits one Problem per Steam ID where attempts > allowed | None — strict pairing |
|
||||
| `ItemDuplicationAnalyser` | `ProjectZomboidItemLog` | Sliding-window heuristic over `(steamid, item)` groups | `THRESHOLD_COUNT = 5`, `THRESHOLD_WINDOW_SECONDS = 10` |
|
||||
| `SkillProgressionAnomalyAnalyser` | `ProjectZomboidPerkLog` | Consecutive-snapshot delta per `(steamid, skill)`; only positive-delta perks-row entries (Login/Logout/LevelUp event tokens are filtered out) | `THRESHOLD_DELTA = 3` |
|
||||
|
||||
Three `Problem` subclasses under `src/Analysis/ProjectZomboid/`:
|
||||
|
||||
| Problem | Coalescing |
|
||||
|---|---|
|
||||
| `ConnectionFailureProblem` | By Steam ID — one problem per player regardless of how many unmatched attempts |
|
||||
| `ItemDuplicationProblem` | By `(steamid, item)` tuple — one problem per suspicious group |
|
||||
| `SkillProgressionAnomalyProblem` | By `(steamid, skill)` — one problem per skill exceeding the delta threshold |
|
||||
|
||||
## Threshold rationale (recorded as docblocks)
|
||||
|
||||
The constants are first-pass heuristics expected to be tuned once production logs flow through codex. Each is documented inline in its analyser class:
|
||||
|
||||
- **`ItemDuplicationAnalyser::THRESHOLD_COUNT = 5`**: Five identical item gains in a fixed window. Legitimate gameplay rarely produces five identical items quickly — crafting has animation delays, looting is one-at-a-time, zombie drops are similarly serial. A burst of five suggests admin-spawn or exploit. Tune downward if false negatives appear.
|
||||
- **`ItemDuplicationAnalyser::THRESHOLD_WINDOW_SECONDS = 10`**: Ten seconds covers a realistic burst-loot scenario (e.g. a crate full of identical items) without collapsing onto unrelated events. Combined with `THRESHOLD_COUNT` this means an effective rate of 0.5 same-item events per second.
|
||||
- **`SkillProgressionAnomalyAnalyser::THRESHOLD_DELTA = 3`**: PZ skills require thousands of XP per level; even active grinding rarely produces four-or-more level jumps in a single session bridge. Set to 3 as baseline; modded XP servers may need to raise this via subclass override.
|
||||
|
||||
## Patterns
|
||||
|
||||
No new pattern constants. Existing constants from Phase A are reused inside the per-entry walks:
|
||||
|
||||
- `UserPattern::PLAYER_EVENT` — decode `[time] <steamid> "<player>" <event>` lines
|
||||
- `ItemPattern::FIELDS` — decode `[time] <steamid> "<player>" <location> <delta> <coords> [<item>]` lines
|
||||
- `PerkPattern::FIELDS` — decode the bracket-heavy perks log line
|
||||
- `PerkPattern::PERK_PAIR` — extract individual `Skill=N` pairs from the perks-row event field
|
||||
|
||||
`Entry::getTime()` returns integer Unix seconds (sub-second precision is dropped by `DateTime::getTimestamp()`). For `ItemDuplicationAnalyser` this means events within the same second collapse to time-diff zero, which is acceptable for v1.
|
||||
|
||||
## Wiring
|
||||
|
||||
Three `getDefaultAnalyser()` overrides (each was previously `return new PatternAnalyser();`):
|
||||
|
||||
```php
|
||||
// ProjectZomboidUserLog
|
||||
return new ConnectionFailureAnalyser();
|
||||
|
||||
// ProjectZomboidItemLog
|
||||
return new ItemDuplicationAnalyser();
|
||||
|
||||
// ProjectZomboidPerkLog
|
||||
return new SkillProgressionAnomalyAnalyser();
|
||||
```
|
||||
|
||||
The unused `PatternAnalyser` import is removed from each Log subclass.
|
||||
|
||||
## Test plan
|
||||
|
||||
End-to-end tests under `test/tests/Games/ProjectZomboid/Analyser/`, one per Log:
|
||||
|
||||
- **`UserLogAnalysisTest`** — drives `user-minimal.txt`. Asserts exactly one `ConnectionFailureProblem` for Player1 (Steam ID `76561198000000001`) with `unmatchedAttempts == 1` (Player1 has two `attempting to join` events, one of which is `attempting to join used queue`, and one `allowed to join`). Asserts that Player2 (matched 1+1) is not flagged.
|
||||
- **`ItemLogAnalysisTest`** — drives the extended `item-minimal.txt`. Asserts one `ItemDuplicationProblem` for AdminUser + Base.Bullets9mm with `eventCount == 6`, and verifies the four-event Base.Plank group does not trigger. Also asserts the threshold constants are positive and documented.
|
||||
- **`PerkLogAnalysisTest`** — drives the extended `perk-minimal.txt`. Asserts exactly two `SkillProgressionAnomalyProblem` insights for PlayerSuspect (Steam ID `76561198000000004`), one for Strength (delta +8) and one for Fitness (delta +6). Verifies that Maintenance (delta exactly +3) does not trigger because the comparison is strict `>`. Verifies that single-snapshot players (Player1, Player2) are not flagged. Asserts the threshold constant is positive and documented.
|
||||
|
||||
## Fixture changes
|
||||
|
||||
Two synthetic fixtures extended (no new files, no real-log content):
|
||||
|
||||
- **`item-minimal.txt`** — appended 10 lines: a 6-event Bullets9mm burst by AdminUser at sub-second timestamps `19:50:00.001`–`.006` (triggers the dupe heuristic), and a 4-event Plank group by Player1 scattered across 4 minutes (`20:00:00`–`20:03:00`, sub-threshold). The Phase A entry-count assertion in `ProjectZomboidItemLogTest` was bumped from 10 → 20.
|
||||
- **`perk-minimal.txt`** — appended 4 lines: PlayerSuspect (Steam ID `76561198000000004`) with two perks snapshots — a low-stat baseline at `18:30:00.000` and an inflated set at `22:00:00.000` showing Strength 2→10, Fitness 2→8, and Maintenance 0→3 (boundary case). The Phase A entry-count assertion in `ProjectZomboidPerkLogTest` was bumped from 6 → 10.
|
||||
|
||||
All identifiers are placeholder per the Privacy / Fixture Rules in CLAUDE.md (`76561198000000001`–`76561198000000004` for Steam IDs, `Player1`/`Player2`/`AdminUser`/`PlayerSuspect` for names, coords in the `1000-1100, 2000-2200, 0` range).
|
||||
|
||||
## Commits (as-built, in order)
|
||||
|
||||
1. `c444e85` — `pre-phase-B.3 checkpoint` (`--allow-empty`)
|
||||
2. `73e9ca6` — `Add ConnectionFailureAnalyser`
|
||||
3. `ba3fae8` — `Add ItemDuplicationAnalyser`
|
||||
4. `0c90e40` — `Add SkillProgressionAnomalyAnalyser`
|
||||
|
||||
4 commits total. Each non-checkpoint commit ships an Analyser + Problem + (optional) fixture extension + updated count assertion + e2e test in one logical unit, per the per-analyser commit shape requested up front.
|
||||
|
||||
## Open issues
|
||||
|
||||
None blocking. All three threshold constants are heuristic guesses pending production data calibration; tuning is expected once iblogs starts feeding real logs through codex. The values are tunable via subclass override and the rationale is in the source docblocks.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Phase B.1 (foundation, ServerLog analysers): `2026-04-30-pz-analysers-design.md` and `2026-04-30-pz-analysers.md`.
|
||||
- Phase B.2 (vanilla PatternAnalyser PvP/Admin coverage): `2026-04-30-pz-analysers-pvp-admin-design.md` and `2026-04-30-pz-analysers-pvp-admin.md`.
|
||||
- Workflow conventions and architecture overview: `CLAUDE.md`.
|
||||
- The Phase B.3 commit set begins at `c444e85` (pre-checkpoint) and ends at `0c90e40` (the third analyser).
|
||||
@@ -0,0 +1,106 @@
|
||||
# ProjectZomboid analyser design (Phase B.2)
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
|
||||
## Summary
|
||||
|
||||
Add Project Zomboid PvP combat detection (filtering zombie hits and zero-damage events) and admin verb-dispatch coverage of six action types, by registering seven new `Information` insight classes onto the existing `PatternAnalyser`. No custom `Analyser` subclasses are introduced in this phase — all dispatch fits within `PatternAnalyser`'s per-entry pattern matching.
|
||||
|
||||
This document covers Phase B.2. Phase B.1 is in `2026-04-30-pz-analysers-design.md`. Phase B.3 (cross-entry / threshold analysers requiring custom `Analyser` subclasses) is in `2026-04-30-pz-analysers-deferred-design.md`.
|
||||
|
||||
## Scope
|
||||
|
||||
- **In scope:** `PvpDamageInformation` + `PvpPattern::COMBAT_REAL` regex; six `Admin<Verb>Information` classes + six `AdminPattern::<VERB>_ENTRY` regex constants; wiring `ProjectZomboidPvpLog::getDefaultAnalyser()` and `ProjectZomboidAdminLog::getDefaultAnalyser()`; end-to-end tests for both logs.
|
||||
- **Out of scope (B.2):** any cross-entry / threshold / pairing logic (deferred to B.3); the eight other PZ logs whose `getDefaultAnalyser()` continues returning an empty `PatternAnalyser` stub; the codex-side `Redactor` utility (deferred — see `2026-04-30-redactor-design.md`).
|
||||
|
||||
## Architectural decision: vanilla PatternAnalyser
|
||||
|
||||
Phase B.1 established that `PatternAnalyser` plus `Insight::isEqual()` coalescing covers single-entry pattern matching cleanly. Phase B.2's analysers (PvP damage rows, admin verb lines) all fit that mould — each interesting line is independent of the others, dispatch is per-entry, and counter-coalescing handles repeats. No `Analyser` subclassing required. (Phase B.3 will deviate from this when cross-entry logic enters the picture.)
|
||||
|
||||
## Components
|
||||
|
||||
All under `src/Analysis/ProjectZomboid/`:
|
||||
|
||||
| Class | Type | Pattern | Coalescing |
|
||||
|---|---|---|---|
|
||||
| `PvpDamageInformation` | Information | `PvpPattern::COMBAT_REAL` | Default `Information::isEqual` (label + value) — same attacker/victim/weapon coalesces |
|
||||
| `AdminAddedItemInformation` | Information | `AdminPattern::ADDED_ITEM_ENTRY` | Default — same admin/item/target coalesces |
|
||||
| `AdminAddedXpInformation` | Information | `AdminPattern::ADDED_XP_ENTRY` | Default — same admin/amount/skill/target coalesces |
|
||||
| `AdminGrantedAccessInformation` | Information | `AdminPattern::GRANTED_ACCESS_ENTRY` | Default — same admin/level/target coalesces |
|
||||
| `AdminChangedOptionInformation` | Information | `AdminPattern::CHANGED_OPTION_ENTRY` | Default — same admin/option/value coalesces |
|
||||
| `AdminReloadedOptionsInformation` | Information | `AdminPattern::RELOADED_OPTIONS_ENTRY` | Default — same admin coalesces |
|
||||
| `AdminTeleportedInformation` | Information | `AdminPattern::TELEPORTED_ENTRY` | Default — same admin/target/coords coalesces |
|
||||
|
||||
## Patterns
|
||||
|
||||
Seven new constants total.
|
||||
|
||||
**`PvpPattern::COMBAT_REAL`** — combat regex with the noise filter baked in. The negative lookahead `(?!zombie")` rejects zombie weapon rows; the damage clause uses alternation to match only positive non-zero floats:
|
||||
|
||||
```
|
||||
'/Combat: "(?<attacker>[^"]+)" \([^)]+\) hit "(?<victim>[^"]+)" \([^)]+\) weapon="(?<weapon>(?!zombie")[^"]+)" damage=(?<damage>0\.0*[1-9][0-9]*|[1-9][0-9]*\.[0-9]+)/'
|
||||
```
|
||||
|
||||
The damage alternation explicitly rejects `0.000000` and any leading-minus value because both branches require either `0.<non-zero>` or `<non-zero>.<digits>`.
|
||||
|
||||
**`AdminPattern::<VERB>_ENTRY`** — six entry-anchored variants of the existing body-only verb constants. Necessary because `PatternAnalyser` calls `preg_match_all` against the full Entry text (including the `[time]` prefix), so the Phase A verb constants anchored at `^<admin>` would never match. The Phase A constants stay intact for direct-message use; new ones live alongside them on the same `AdminPattern` class.
|
||||
|
||||
## Wiring
|
||||
|
||||
Two `getDefaultAnalyser()` overrides (was `return new PatternAnalyser();` for both):
|
||||
|
||||
```php
|
||||
// ProjectZomboidPvpLog
|
||||
return (new PatternAnalyser())
|
||||
->addPossibleInsightClass(PvpDamageInformation::class);
|
||||
```
|
||||
|
||||
```php
|
||||
// ProjectZomboidAdminLog
|
||||
return (new PatternAnalyser())
|
||||
->addPossibleInsightClass(AdminAddedItemInformation::class)
|
||||
->addPossibleInsightClass(AdminAddedXpInformation::class)
|
||||
->addPossibleInsightClass(AdminGrantedAccessInformation::class)
|
||||
->addPossibleInsightClass(AdminChangedOptionInformation::class)
|
||||
->addPossibleInsightClass(AdminReloadedOptionsInformation::class)
|
||||
->addPossibleInsightClass(AdminTeleportedInformation::class);
|
||||
```
|
||||
|
||||
## Test plan
|
||||
|
||||
Unit tests under `test/tests/Games/ProjectZomboid/Analysis/`, one per Insight class — exercises `getPatterns()` shape, `setMatches()` extraction, and at least one filter-rejection case for `PvpDamageInformation` (zombie weapon and zero-damage rejection).
|
||||
|
||||
End-to-end tests under `test/tests/Games/ProjectZomboid/Analyser/`:
|
||||
|
||||
- `PvpLogAnalysisTest` against `pvp-minimal.txt`: asserts exactly three `PvpDamageInformation` insights (Bare Hands, Tire Iron (Worn), Hunting Knife). Zombie and vehicle rows must be filtered out by the regex.
|
||||
- `AdminLogAnalysisTest` against `admin-minimal.txt`: asserts 2 + 2 + 2 + 2 + 1 + 2 = 11 insights across the six admin classes, with the duplicate ShotgunShells row coalescing into a single insight at `counter == 2`.
|
||||
|
||||
## Fixture changes
|
||||
|
||||
None. The Phase A synthetic fixtures `pvp-minimal.txt` and `admin-minimal.txt` already cover every code path Phase B.2 exercises.
|
||||
|
||||
## Commits (as-built, in order)
|
||||
|
||||
1. `df62da1` — `pre-phase-B.2 checkpoint` (`--allow-empty`)
|
||||
2. `55f769c` — `Add PvpDamageInformation insight`
|
||||
3. `90c85a0` — `Add AdminAddedItemInformation insight` ⚠️ broken — see `2026-04-30-pz-analysers-pvp-admin.md` §Deviations
|
||||
4. `0d85a05` — `Fix missing closing brace in AdminPattern` (forward-fix for #3)
|
||||
5. `a2faa55` — `Add AdminAddedXpInformation insight`
|
||||
6. `caed04d` — `Add AdminGrantedAccessInformation insight`
|
||||
7. `b7b89ef` — `Add AdminChangedOptionInformation insight`
|
||||
8. `64641fa` — `Add AdminReloadedOptionsInformation insight`
|
||||
9. `d15fc81` — `Add AdminTeleportedInformation insight`
|
||||
10. `51eb2de` — `Wire ProjectZomboidPvpLog default analyser`
|
||||
11. `c57d646` — `Wire ProjectZomboidAdminLog default analyser`
|
||||
|
||||
11 commits total, vs 10 originally planned. The brace-fix commit accounts for the discrepancy.
|
||||
|
||||
## Open issues
|
||||
|
||||
None blocking. Phase A Q4 (admin verb scope) was settled before B.2 began. Phase B Q2 confirmed PvP fixtures contain real combat events worth analysing.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Phase B.1 (foundation): `2026-04-30-pz-analysers-design.md` and `2026-04-30-pz-analysers.md`.
|
||||
- Phase B.3 (deferred analysers requiring custom `Analyser` subclasses): `2026-04-30-pz-analysers-deferred-design.md`.
|
||||
- Workflow conventions: `CLAUDE.md` § Workflow conventions and § Pitfalls.
|
||||
150
docs/superpowers/specs/2026-04-30-redactor-design.md
Normal file
150
docs/superpowers/specs/2026-04-30-redactor-design.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Codex Redactor utility — design spec
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
> **Status: implemented on the `redactor` branch (2026-05-01).** Plan: `docs/superpowers/plans/2026-05-01-redactor.md`. Arrival commit set documented in `CHANGELOG.md` `[Unreleased]`. The "Status: deferred" framing below is preserved for historical context; treat this file as the as-built design contract.
|
||||
|
||||
## Summary
|
||||
|
||||
Codex grows a small utility surface for redacting personally-identifying data from log content before it is stored, displayed, or analysed in environments where preservation of PII is unwanted. The shape is a thin generic interface plus per-game implementations that know each game's log format. iblogs is the primary line of defence (upload-time filter); codex's redactor is the optional helper consumers can call when they want codex itself to scrub data.
|
||||
|
||||
## Why deferred
|
||||
|
||||
The Phase A Step E open-questions table (Q5) marked the codex-side redactor as "defer to its own session" because the iblogs upload-time filter is the actual privacy boundary — anything codex does in this layer is a convenience, not a guarantee. Phase B (the analyser arc) shipped without the redactor and remains useful: synthetic fixtures use placeholder identifiers throughout, real Logs.zip never reaches the index, and the privacy story for codex's tests does not depend on this utility. Building it remains worthwhile when iblogs starts consuming codex output and wants a one-line option for "scrub before analyse."
|
||||
|
||||
## Scope
|
||||
|
||||
- **In scope (when this spec is implemented):** a `RedactorInterface` under `src/Util/`, a `ProjectZomboidRedactor` implementation that handles the three PII categories observed in PZ logs (Steam IDs, player names, world coordinates), per-category toggles with a defaults-on stance, replacement-string conventions matching the synthetic fixture placeholders.
|
||||
- **Out of scope:** non-PZ game redactors (those land alongside their respective game implementations); UI / CLI wrappers; redaction of mod-specific identifiers (e.g. BurdJournals scientific-notation Steam IDs) — handled by an extension of the PZ implementation if/when needed; storage / persistence of redaction maps.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+-------------------------+
|
||||
| RedactorInterface |
|
||||
| (src/Util/) |
|
||||
| redact(string): string|
|
||||
+-----------+-------------+
|
||||
|
|
||||
+-----------------------+-----------------------+
|
||||
| |
|
||||
+------------v-----------------+ +--------------v-------------+
|
||||
| ProjectZomboidRedactor | | (Future) MinecraftRedactor |
|
||||
| (src/Util/ProjectZomboid/) | | (src/Util/Minecraft/) |
|
||||
+------------------------------+ +----------------------------+
|
||||
```
|
||||
|
||||
A thin interface in the framework's `Util` namespace. One concrete implementation per supported game, mirroring the existing components-outer-with-game-suffix layout used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern). Future games' redactors land alongside their analyser surface.
|
||||
|
||||
## Why per-game implementations rather than a single regex utility
|
||||
|
||||
PII detection in log text is **context-sensitive**, not just regex matching:
|
||||
|
||||
- **Steam IDs** are 17-digit decimal numbers. Almost regexable, but care is needed not to chew through unrelated long numbers (timestamps, build numbers, GUIDs that happen to be 17 digits).
|
||||
- **Player names** are arbitrary strings. They cannot be detected from text alone — a redactor needs to know the lexical contexts where names appear (`<steamid> "Name"`, `ChatMessage{author='Name'}`, `Combat: "Name"`). Without that knowledge a naive `\w+`-style match would shred the entire log.
|
||||
- **Coordinates** are number triples in specific shapes (`x,y,z` after `at`, `[x,y,z]` between brackets, `(x,y,z)` in PvP combat lines). Stripping every "two commas in a row" regex match would over-redact (e.g. `f:0, t:1776297642406, st:48,648,157,584` is server metadata, not coordinates).
|
||||
|
||||
Per-game implementations encode the lexical contexts. PZ's redactor uses the same regex shapes Phase A's Pattern classes encode for parsing, applied in a different direction (replacement instead of extraction).
|
||||
|
||||
## Components
|
||||
|
||||
### `src/Util/RedactorInterface.php`
|
||||
|
||||
```php
|
||||
namespace IndifferentKetchup\Codex\Util;
|
||||
|
||||
interface RedactorInterface
|
||||
{
|
||||
/**
|
||||
* Return a copy of $content with PII replaced by placeholder tokens
|
||||
* according to the redactor's enabled toggles.
|
||||
*/
|
||||
public function redact(string $content): string;
|
||||
}
|
||||
```
|
||||
|
||||
A single method. Stateless from the caller's perspective; toggles are configured on the concrete implementation before `redact()` is called.
|
||||
|
||||
### `src/Util/ProjectZomboid/ProjectZomboidRedactor.php`
|
||||
|
||||
Implements `RedactorInterface`. Three independent toggles (defaults all on) and three regex-driven replacement passes:
|
||||
|
||||
```php
|
||||
namespace IndifferentKetchup\Codex\Util\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\RedactorInterface;
|
||||
|
||||
class ProjectZomboidRedactor implements RedactorInterface
|
||||
{
|
||||
private bool $redactSteamIds = true;
|
||||
private bool $redactPlayerNames = true;
|
||||
private bool $redactCoordinates = true;
|
||||
|
||||
public function redactSteamIds(bool $on): static { /* ... */ }
|
||||
public function redactPlayerNames(bool $on): static { /* ... */ }
|
||||
public function redactCoordinates(bool $on): static { /* ... */ }
|
||||
|
||||
public function redact(string $content): string
|
||||
{
|
||||
if ($this->redactSteamIds) { /* preg_replace */ }
|
||||
if ($this->redactPlayerNames) { /* preg_replace */ }
|
||||
if ($this->redactCoordinates) { /* preg_replace */ }
|
||||
return $content;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Replacement conventions
|
||||
|
||||
To match the synthetic fixture placeholders already used throughout the test suite (per the Privacy / fixture rules in CLAUDE.md):
|
||||
|
||||
| PII category | Replacement |
|
||||
|---|---|
|
||||
| Steam ID (17 decimal digits in a Steam ID context) | `76561198000000000` |
|
||||
| Player name (between `"..."` after a 17-digit Steam ID, between `'...'` in `ChatMessage{author='...'}`, between `"..."` after subsystem keywords like `Combat:` / `Safety:`) | `<player>` |
|
||||
| World coordinates (the `x,y,z` or `(x,y,z)` triples in PZ log lines, distinguished by leading-context anchors so server metadata triples are not stripped) | `0,0,0` |
|
||||
|
||||
The replacements are deliberately not reversible — codex makes no attempt to maintain a map between original and redacted values. Reversibility is a different feature scope (encryption / tokenization) and is not what this utility provides.
|
||||
|
||||
### Lexical anchors for the regex passes
|
||||
|
||||
Steam ID: `(?<![\w])(?P<sid>76561198\d{9})(?![\w])` — the `76561198` prefix matches the SteamID64 universe prefix for Steam (region "Individual"); avoids matching unrelated 17-digit numbers. Boundary classes prevent matching inside a longer alphanumeric token.
|
||||
|
||||
Player name (PZ-specific contexts):
|
||||
- After Steam ID quoted: `(?<sid>76561198000000000) "(?P<name>[^"]+)"` → preserve the redacted Steam ID, replace the quoted name. (Redaction order matters: SIDs first, names second.)
|
||||
- ChatMessage author: `ChatMessage\{chat=\w+, author='(?P<name>[^']+)',` → replace the captured author.
|
||||
- PvP / Safety subsystem: `(?P<sub>Combat|Safety): "(?P<name>[^"]+)"` → replace the captured name.
|
||||
|
||||
Coordinates:
|
||||
- ItemLog / MapLog / CmdLog `at` clauses: `at (?P<coords>[\d.]+,[\d.]+,-?[\d.]+)\.` → replace with `0,0,0.`
|
||||
- ClientActionLog / PerkLog bracketed coords: `\[(?P<coords>\d+,\d+,-?\d+)\]` → replace with `[0,0,0]`
|
||||
- PvP combat parenthesised coords: `\((?P<coords>\d+,\d+,-?\d+)\) (?:hit|restore|store|true|false)` — the trailing context disambiguates from server metadata triples.
|
||||
|
||||
These regex shapes are not yet committed to the spec implementation; tuning is expected during the actual implementation pass against the real `Logs.zip` content under `.scratch/pz/Logs/`.
|
||||
|
||||
## Where this fits relative to iblogs
|
||||
|
||||
The Phase A Step D Section e split holds: **iblogs is the primary line of defence**. iblogs filters PII at upload time, before storage, mirroring the mclogs IP/token redaction approach. Stored logs in iblogs are pre-sanitised. The codex `Redactor` is the *option* iblogs (or any other consumer) reaches for if they want codex itself to do the scrubbing — for example in a preview pipeline that wants to render redacted output without writing the raw paste to disk first, or in a dev environment where the same code path runs without iblogs's upload filter.
|
||||
|
||||
This means the codex Redactor is **non-load-bearing** for the privacy story. iblogs implementing redaction independently is the actual safety guarantee; codex's helper is a convenience.
|
||||
|
||||
## Test plan (when implemented)
|
||||
|
||||
Synthetic-only fixtures, no real-log content:
|
||||
|
||||
1. Three pairs of fixture-input / expected-output strings exercising each category in isolation.
|
||||
2. One combined-input fixture demonstrating that all three categories applied to the same content produce a fully-scrubbed output.
|
||||
3. Toggle tests: each of the three booleans turned off in isolation produces partial scrubbing; all three off produces an unchanged copy of input (the redactor returns input verbatim).
|
||||
4. Idempotence test: `redact(redact($x)) == redact($x)`.
|
||||
5. A small "negative" test: server metadata triples (`f:0, t:1776297642406, st:48,648,157,584`) are not mistaken for coordinates.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Should the redactor optionally preserve some structure for analysers downstream?** For example, after redaction the analysers can no longer correlate by Steam ID across events because every Steam ID is the same placeholder. Two paths: (a) accept the loss — redaction is done before storage and you don't analyse redacted content, or (b) provide a "tokenizing redactor" that maps each unique input value to a unique placeholder (`76561198000000001`, `76561198000000002`, ...) preserving cardinality. Recommend (a) for v1; (b) is its own design pass.
|
||||
2. **What about `BurdJournals.txt`'s scientific-notation Steam IDs?** Phase A Step C noted these as `7.656119799341651E16` form. The PZ redactor's Steam ID regex doesn't match this shape. v1 leaves them intact (tag `[BurdJournals]` already disambiguates them as mod-internal). v2 could add a separate regex for the sci-notation form.
|
||||
3. **Should `coords` redaction try to preserve relative location** (e.g. round to the nearest 1000-tile chunk so the *region* is visible without giving precise base coords)? Out of scope for v1.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Phase A original Q5 deferral: `2026-04-30-pz-analysers-design.md` referenced this; the explicit deferral lived in chat (Phase A Step E open-questions table).
|
||||
- iblogs upload-time filtering decisions: see the iblogs bootstrap spec at `2026-05-01-iblogs-bootstrap-design.md`.
|
||||
- Existing Pattern classes that the regex shapes will mirror in reverse: `src/Pattern/ProjectZomboid/{CmdPattern,ItemPattern,MapPattern,PerkPattern,ClientActionPattern,ChatPattern,PvpPattern,UserPattern}.php`.
|
||||
186
docs/superpowers/specs/2026-05-01-iblogs-bootstrap-design.md
Normal file
186
docs/superpowers/specs/2026-05-01-iblogs-bootstrap-design.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# iblogs bootstrap design
|
||||
|
||||
> Written 2026-05-01.
|
||||
> **Scope:** design only. No iblogs code is written by this document; the actual fork, rename, and rewire happen in a follow-up session after this design is approved.
|
||||
|
||||
## Summary
|
||||
|
||||
iblogs is a Project-Zomboid-first log triage service forked from `aternosorg/mclogs`. It consumes `indifferentketchup/codex` (pinned at `v0.1.0`) for log detection, parsing, and analysis, replacing mclogs's `aternos/codex-minecraft` / `aternos/codex-hytale` / `aternos/sherlock` dependency stack. The data model gains a session entity that wraps the multiple files Project Zomboid produces per server session (eleven file types per session), while mclogs's existing single-paste paths remain alive as legacy routes that map to "session of size 1."
|
||||
|
||||
## (a) Fork target verification
|
||||
|
||||
| Check | Value |
|
||||
|---|---|
|
||||
| Upstream | `github.com/aternosorg/mclogs` |
|
||||
| Default branch | `main` |
|
||||
| License | **MIT** (SPDX `MIT`) — compatible with `indifferentketchup/codex`'s MIT |
|
||||
| Last push | `2026-03-30` (active; ~30 days ago) |
|
||||
| Last update | `2026-04-26` |
|
||||
| Archived | no |
|
||||
| Stars / open issues | 290 / 2 |
|
||||
| PHP requirement | `>=8.5`, plus `ext-frankenphp`, `ext-mongodb`, `ext-uri`, `ext-zlib`, `ext-mbstring`, `ext-json` |
|
||||
| Storage | MongoDB |
|
||||
| Existing codex dep | yes — `aternos/codex-minecraft ^5.0.1` and `aternos/codex-hytale ^2.0` |
|
||||
|
||||
**Verdict: GO.** License is compatible. Project is actively maintained. No archival or licensing blockers. The fact that mclogs already integrates Aternos's codex stack tells us the fork's swap surface is well-defined: replace those Composer deps and the codex-facing call sites in `src/Api/Action/AnalyseLogAction.php` / `src/Api/Action/LogInsightsAction.php` / `src/Api/Response/CodexLogResponse.php` / `src/Detective.php` / `src/Log.php`.
|
||||
|
||||
The PHP `>=8.5` floor is stricter than codex's `>=8.4` — iblogs inherits the stricter constraint, which is fine. The `ext-frankenphp` requirement means iblogs runs on the FrankenPHP runtime rather than vanilla PHP-FPM; preserving this is the path of least resistance.
|
||||
|
||||
`aternos/sherlock` (MIT, "PHP library to apply minecraft mappings to log files") is Minecraft-specific (Mojang obfuscation maps). It is **not needed for PZ** and gets dropped. If iblogs ever adds Minecraft support, it can come back.
|
||||
|
||||
## (b) Repo plan
|
||||
|
||||
**Primary remote:** Gitea at `git.indifferentketchup.com:2222`. Fork as `indifferentketchup/iblogs`. SSH clone URL: `ssh://git@git.indifferentketchup.com:2222/indifferentketchup/iblogs.git`. Match the codex repo's existing Gitea setup.
|
||||
|
||||
**GitHub mirror:** Push-only secondary, configured via Gitea's Mirror feature (Repo Settings → Mirror Settings → Push Mirror). Same pattern any team using Gitea-as-primary uses for visibility.
|
||||
|
||||
**Composer dep on codex.** iblogs's `composer.json` gains a `repositories` entry of type `vcs` pointing at the codex Gitea URL (`ssh://git@git.indifferentketchup.com:2222/indifferentketchup/ik-codex.git`), and a `require` entry for `indifferentketchup/codex` pinned to exactly `0.1.0`. The exact pin is preferred over `^0.1.0` for early-version (0.x) releases where minor bumps may carry breaking changes.
|
||||
|
||||
**Removed deps:** `aternos/codex-minecraft`, `aternos/codex-hytale`, `aternos/sherlock`. The first two are replaced by `indifferentketchup/codex` (which covers Project Zomboid and ships detective stubs for Minecraft / Hytale / SevenDaysToDie that iblogs will not use in v0.1). The third (Sherlock) is Minecraft-mapping-specific and not relevant to PZ.
|
||||
|
||||
**Package name.** `aternos/mclogs` becomes `indifferentketchup/iblogs`. Composer name and the PSR-4 namespace move together: `Aternos\Mclogs\` → `IndifferentKetchup\Iblogs\`.
|
||||
|
||||
## (c) Multi-file / session paste model
|
||||
|
||||
Project Zomboid produces eleven log files per server session. The data model needs to accommodate this without breaking mclogs's existing single-paste consumers.
|
||||
|
||||
### Option (i) — 1 file = 1 paste, sibling-link via shared `session_id`
|
||||
|
||||
- **Pros:** minimal schema change. Reuse mclogs's existing `Log` per file. Sibling discovery is a `session_id` index.
|
||||
- **Cons:** no atomic ingest (zip becomes N independent uploads). Session views require runtime joins. `session_id` propagation through upload UX is fiddly (URL param? cookie? hidden form field?).
|
||||
- **Effort:** low.
|
||||
|
||||
### Option (ii) — zip upload explodes server-side into N linked pastes
|
||||
|
||||
- **Pros:** atomic ingest. One endpoint for whole-session upload. Maps cleanly to PZ's natural zip-of-logs deliverable.
|
||||
- **Cons:** zip-only ingest is restrictive (no single-file paste UX for users with just `DebugLog-server.txt`). Server-side zip extraction is attack surface (zip bombs, path traversal). Doubles upload paths if single-file is also supported.
|
||||
- **Effort:** medium.
|
||||
|
||||
### Option (iii) — session entity wraps N file entities (1:N relation)
|
||||
|
||||
- **Pros:** rich session model. Single URL for the whole session; child URLs per file. PZ's eleven-file natural session maps cleanly. mclogs's single-paste maps to "session of size 1," so the model degenerates gracefully into legacy behaviour. Session-level metadata (server name, date range, total size) becomes first-class.
|
||||
- **Cons:** most schema migration. Two URL types in routing. More concepts in the API.
|
||||
- **Effort:** medium-high.
|
||||
|
||||
### Recommendation: option (iii)
|
||||
|
||||
PZ's natural unit IS a session — the server emits all eleven files per restart, ZIP-bundled in production. Single-file uploads (the mclogs default UX) become "session of size 1" with no special-case code; the legacy `/api/1/log` routes return a paste that happens to belong to a singleton session. Cross-file analysis (e.g. correlating a `ServerExceptionProblem` from `DebugLog-server.txt` with a `ConnectionFailureProblem` from `user.txt`) is unlocked because both files share a `session_id`. The 1:N model is the only one that supports cross-file analysers in any future Phase B.4-equivalent on iblogs's side.
|
||||
|
||||
## (d) UI changes
|
||||
|
||||
**Primary nav: file-type tabs.** Within a session, eleven tabs (one per PZ file type) with a count badge (e.g. `DebugLog (6,998 lines)`, `chat (115)`). Clicking a tab loads that file's content + analysis. Tab order: DebugLog-server first (most useful for triage), then admin, user, chat, item, map, perk, pvp, ClientActionLog, cmd, BurdJournals.
|
||||
|
||||
**Secondary nav: session index sidebar.** Lists the user's recent sessions (cookie-driven, like mclogs's history). Less primary than tabs.
|
||||
|
||||
**Default view.** `/session/{id}` lands on the DebugLog-server tab by default — that file is what admins want to see when something is broken.
|
||||
|
||||
**Redaction toggle.** Per-session checkbox in the toolbar: "Redact PII". Behaviour depends on Step 4 (codex Redactor) status:
|
||||
- If Redactor ships first: toggle invokes `ProjectZomboidRedactor::redact()` on the rendered file content client-side or server-side (decision for the implementation pass).
|
||||
- If Redactor is still deferred: toggle is hidden in v0.1 of iblogs. Upload-time PII filtering still happens via the ported `Filter` chain (see `src/Filter/*` upstream — `IPv4Filter`, `IPv6Filter`, `AccessTokenFilter`, `UsernameFilter`).
|
||||
|
||||
**Branding.** Drop the "Built for Minecraft & Hytale" tagline and visual cues. Replace `mclo.gs` brand references with whatever short-domain iblogs uses (open question — see (h)). Color palette decision is open; mclogs's green accent (`#5cb85c` in `example.config.json`) is fine to keep or change.
|
||||
|
||||
## (e) API surface
|
||||
|
||||
Iblogs exposes a session-oriented API on top of the recommended (iii) model, plus the legacy mclogs paths kept alive.
|
||||
|
||||
| Path | Method | Purpose |
|
||||
|---|---|---|
|
||||
| `/api/session` | POST | Create a session by uploading one zip OR multiple file fields. Returns `session_id` plus a list of `{type, paste_id}` for each contained file. |
|
||||
| `/api/session/{id}` | GET | Return session metadata + array of contained pastes (`{type, paste_id, line_count, size_bytes}`). |
|
||||
| `/api/session/{id}/file/{type}` | GET | Return one file's content and its codex analysis result. `{type}` is one of the eleven PZ file-type tokens (`server`, `chat`, `clientaction`, `cmd`, `item`, `map`, `perk`, `pvp`, `admin`, `user`, `burdjournals`). |
|
||||
| `/api/paste/{id}` | GET | Single-paste back-compat. Returns content + analysis for any paste (whether part of a multi-file session or a singleton). |
|
||||
| `/api/1/log` | POST | Legacy mclogs path — kept alive. Internally creates a singleton session under the hood and returns the existing-shape mclogs response. |
|
||||
| `/api/1/log/{id}` | GET | Legacy mclogs path — kept alive. Same as `/api/paste/{id}` with the legacy response shape. |
|
||||
|
||||
The legacy paths preserve mclogs's API contract for any third-party clients that already integrate with `mclo.gs` or self-hosted mclogs instances. Upgrading clients to the session-aware API is opt-in.
|
||||
|
||||
## (f) String / branding inventory
|
||||
|
||||
Producing exact `path:line` references requires the cloned working copy of the fork. This section gives directional pointers from the fetched-but-not-cloned upstream tree at `aternosorg/mclogs:main`. The actual line-precise inventory belongs in a follow-up commit on the iblogs side, after the fork exists and can be `grep`ped.
|
||||
|
||||
**Composer / package metadata** — file `composer.json` upstream (no local clone, line refs not yet known):
|
||||
- `"name": "aternos/mclogs"` → `"indifferentketchup/iblogs"`
|
||||
- `"description": "Paste, share and analyse Minecraft logs"` → describe iblogs scope (PZ-first, server-log triage)
|
||||
- `"authors"` block (currently `Matthias Neid <matthias@aternos.org>`) → replace with `indifferentketchup` author
|
||||
- `require` block:
|
||||
- drop `aternos/codex-minecraft`
|
||||
- drop `aternos/codex-hytale`
|
||||
- drop `aternos/sherlock`
|
||||
- add `indifferentketchup/codex` pinned to `0.1.0`
|
||||
- `autoload.psr-4` mapping `"Aternos\\Mclogs\\": "src/"` → `"IndifferentKetchup\\Iblogs\\": "src/"`
|
||||
- new top-level `repositories` array entry of type `vcs` pointing at the codex Gitea URL
|
||||
|
||||
**Namespace bulk substitution** — every PHP file under `src/` (which is roughly 50+ files based on the upstream tree). The pattern mirrors the codex rename in commit `66a2fcc`: bulk `Aternos\Mclogs` → `IndifferentKetchup\Iblogs` across `namespace`, `use`, fully-qualified refs, and PHPDoc tags. Done as one logical commit on the iblogs side per the codex-side precedent.
|
||||
|
||||
**Codex API call sites** — the files mclogs uses to integrate Aternos's codex stack, all under `src/`:
|
||||
- `src/Detective.php` — likely a wrapper around `aternos/codex-minecraft`'s Detective. Swap to `IndifferentKetchup\Codex\Detective\ProjectZomboid\ProjectZomboidDetective` (or wrap multiple game detectives if iblogs ever supports more games).
|
||||
- `src/Log.php` — likely a wrapper. Re-point to codex's `Log` hierarchy.
|
||||
- `src/Api/Action/AnalyseLogAction.php` — the `analyse` endpoint. Update to call codex's `AnalysableLog::analyse()` with the new analyser surface.
|
||||
- `src/Api/Action/LogInsightsAction.php` — insights endpoint.
|
||||
- `src/Api/Response/CodexLogResponse.php` — response shape; verify field-by-field against `IndifferentKetchup\Codex\Analysis\AnalysisInterface::jsonSerialize()`.
|
||||
- `src/Api/Action/CreateLogAction.php` — log creation; integration with codex's `Detective::detect()`.
|
||||
- `src/Api/Action/RawLogAction.php`, `src/Api/Action/LogInfoAction.php` — verify these don't depend on Minecraft-specific codex behaviour.
|
||||
|
||||
**Frontend templates and assets** — file paths only, exact branding strings discovered post-clone:
|
||||
- `web/frontend/start.php` — landing page; "Paste, share and analyse Minecraft logs" hero copy lives here.
|
||||
- `web/frontend/api-docs.php` — API documentation page.
|
||||
- `web/frontend/parts/header.php`, `parts/footer.php`, `parts/head.php` — site title, meta tags, footer links to legal info.
|
||||
- `web/frontend/log.php` — log view template (probably hardcodes the syntax-highlighting language token — needs to handle multiple PZ file types).
|
||||
- `web/frontend/404.php` — error page copy.
|
||||
- `web/public/css/mclogs.css` — file is **renamed** to `iblogs.css` and CSS class names referencing `mclogs` are renamed.
|
||||
- `web/public/js/start.js`, `web/public/js/log.js` — likely contain text constants and reference `mclogs.css` filename.
|
||||
- `web/public/img/logo-icon.svg`, `logo.svg`, `favicon.ico` — replaced with iblogs assets.
|
||||
|
||||
**Configuration** — file `example.config.json`:
|
||||
- database name `mclogs` → `iblogs`
|
||||
- abuse contact `abuse@aternos.org` → iblogs contact (open question — see (h))
|
||||
- imprint and privacy policy links currently point at `aternos.gmbh` → iblogs equivalents
|
||||
- `mclo.gs` brand reference in the frontend styling section → new iblogs short-domain (open question)
|
||||
- worker request limit, ID length, TTL — review for iblogs-appropriate values; PZ sessions are larger than mclogs single pastes so size and line limits may need raising.
|
||||
|
||||
**Docker / deployment** — files `Dockerfile`, `docker/Caddyfile`, `docker/compose.production.yaml`, `docker/mclogs.ini`:
|
||||
- Image label maintainer references
|
||||
- Caddyfile likely hardcodes `mclo.gs` hostname for TLS certificates → replace with iblogs hostname
|
||||
- Compose service name `mclogs` → `iblogs`
|
||||
- File `docker/mclogs.ini` is renamed and its contents updated
|
||||
|
||||
**`LICENSE` file** — per MIT requirements, the original Aternos copyright line stays byte-for-byte unchanged. iblogs's LICENSE preserves the upstream copyright header. This mirrors codex's handling of its own upstream LICENSE.
|
||||
|
||||
**`README.md`** — full rewrite. Title, description, install line, links to upstream codex repo, scope statement (PZ-first, server-log triage). Drop Minecraft / Hytale framing entirely.
|
||||
|
||||
**Filter classes for PZ-specific PII** — upstream's filter chain (`src/Filter/IPv4Filter.php`, `IPv6Filter.php`, `AccessTokenFilter.php`, `UsernameFilter.php`) handles Minecraft-style PII (server access tokens, Minecraft-pattern usernames). For PZ, iblogs may need new filters: `SteamIdFilter`, `WorldCoordinateFilter`, and a PZ-aware username filter (Steam usernames look different from Minecraft ones). These are net-new code, not branding renames.
|
||||
|
||||
## (g) Migration
|
||||
|
||||
**Keep mclogs's existing single-paste API routes alive as legacy.** Two reasons:
|
||||
1. mclogs has live API consumers calling `POST /api/1/log` and `GET /api/1/log/{id}` against `mclo.gs` and self-hosted instances. Iblogs's primary value is PZ support, not breaking compat with the broader mclogs ecosystem.
|
||||
2. Under model option (iii), legacy single pastes are naturally "sessions of size 1." Zero extra schema work to support legacy routes — they just internally create singleton sessions.
|
||||
|
||||
**Strip:** `aternos/codex-minecraft`, `aternos/codex-hytale`, `aternos/sherlock` Composer deps; the `Aternos\Mclogs\` namespace; mclogs-specific branding strings; the `mclo.gs` hostname hardcodes; Minecraft-mapping deobfuscation code paths.
|
||||
|
||||
**Preserve:** the upstream `Filter` chain (it solves real problems — IP redaction, access tokens, usernames); the FrankenPHP runtime; MongoDB storage layer; the cookie-based session-history UX; the Caddy fronting.
|
||||
|
||||
## (h) Open questions
|
||||
|
||||
1. **`aternos/sherlock` license confirmation** — verified MIT (this design doc fetched the metadata) but iblogs is dropping it. No issue.
|
||||
2. **`ext-frankenphp` keep / replace decision** — recommend keep for v0.1 (path of least resistance). Migrating to vanilla nginx+php-fpm is its own project and can come later.
|
||||
3. **Branding decisions:**
|
||||
- Site name: `iblogs` (lowercase) seems chosen given the project mention `indifferentketchup/iblogs`. Confirm.
|
||||
- Tagline: needs writing. "Project Zomboid server log triage" is honest; longer-form copy is open.
|
||||
- Short-domain: mclogs uses `mclo.gs`. Is there an iblogs equivalent (`iblo.gs`? `ib.gs`?)? Affects Caddyfile, frontend assets, and docs links.
|
||||
- Accent / palette: keep mclogs green (`#5cb85c`) or pick a different colour?
|
||||
4. **Database choice:** keep MongoDB or migrate to PostgreSQL / SQLite? Migrating away from Mongo is a significant project; recommend keep for v0.1.
|
||||
5. **API URL versioning:** mclogs uses `/api/1/`. Stay with `/api/1/` for legacy paths (compat) and add `/api/session/...` for new endpoints (no version prefix), or use `/api/v2/session/...`? Recommend the former — minimum surface change.
|
||||
6. **Session-ID generation:** mclogs uses 7-character IDs. For iblogs sessions of N files, pick (a) one session-ID + N independent paste-IDs (richer URLs) or (b) single ID per paste with a sibling `session_id` field (simpler). Affects URL shape.
|
||||
7. **The codex Redactor utility.** Iblogs's redaction toggle (section d) depends on whether Step 4 (Redactor implementation) ships before or after iblogs scaffolding. **Decision deferred to user (Step 4 of the careful run).**
|
||||
8. **PZ-specific filter classes** (`SteamIdFilter`, `WorldCoordinateFilter`, etc.) — net-new work for iblogs. Could lift the regex shapes from `docs/superpowers/specs/2026-04-30-redactor-design.md` (they're the same PII categories). Implementation order: iblogs likely wants these for its upload-time filter chain regardless of whether the codex `Redactor` ships.
|
||||
9. **Multi-game support trajectory.** v0.1 of iblogs is PZ-first. If Minecraft / Hytale / SevenDaysToDie support is on the roadmap, iblogs's Detective wiring needs to be a multi-game dispatcher (not just `ProjectZomboidDetective`). Codex provides the per-game detectives separately; iblogs would compose them. Out of scope for v0.1.
|
||||
10. **The exact line-precise branding inventory** (every file:line ref of `Minecraft` / `Hytale` / `MC` / `mc` / `mclogs` / `mclo.gs` / `Aternos`). This document gives file-level pointers; the line-precise version is produced as a separate work item once the fork is cloned and grep-able.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Codex package consumed: `indifferentketchup/codex` v0.1.0, tag SHA `8a89550` (annotated tag) pointing at commit `52ff8cb`.
|
||||
- Codex Redactor design (deferred): `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
- Codex CHANGELOG: `CHANGELOG.md` in this repo.
|
||||
- Upstream mclogs: `https://github.com/aternosorg/mclogs` (MIT, `main` default branch, last push 2026-03-30).
|
||||
@@ -0,0 +1,246 @@
|
||||
# PZ deterministic classifier — design spec
|
||||
|
||||
> Drafted 2026-05-04. Status: design-approved, awaiting implementation plan.
|
||||
> Sibling tool to the existing pre-production Qwen analyzer (`pz_error_analysis.py`), which is unaffected by this work.
|
||||
|
||||
## Summary
|
||||
|
||||
A new deterministic-only Project Zomboid log classifier that lives alongside the existing Qwen-based analyzer in `tools/pz-analyzer/`. Walks redacted `DebugLog-server*.txt` files, extracts errors/warnings, attributes each to a mod where evidence allows, classifies by kind, and emits a structured JSON report. **Zero AI dependency**: this is the artefact that informs the future PHP / iblogs production path.
|
||||
|
||||
The patterns it implements are inspired by `paraxaQQ/pzmm`'s `core/inspector.py` — Lua mod-marker attribution, multi-fallback file:line extraction, bidirectional stack collection, cause-chain unwinding, engine-noise tagging. Reimplemented originally; no code copied verbatim.
|
||||
|
||||
## Why a separate tool, not an edit of `pz_error_analysis.py`
|
||||
|
||||
Two artefacts, two purposes:
|
||||
|
||||
- `pz_error_analysis.py` (existing, untouched) — pre-production discovery tool. Sends residual log content to Qwen so the developer can see what categories the deterministic side hasn't yet captured.
|
||||
- `pz_classify.py` (new) — production-bound deterministic classifier. Output is what an iblogs PHP port would eventually emit. Runs in seconds, no API dependency, no PII-going-to-LLM consideration.
|
||||
|
||||
Coexisting them lets the developer compare outputs and treat the LLM's residual output as the "deterministic to-do list."
|
||||
|
||||
## Scope
|
||||
|
||||
**In scope:**
|
||||
- Two new files: `tools/pz-analyzer/pz_parser.py` (pure module) and `tools/pz-analyzer/pz_classify.py` (CLI orchestrator).
|
||||
- Tests under `tools/pz-analyzer/tests/` with synthetic fixtures.
|
||||
- Operates exclusively on the already-redacted directory produced by `pz_redact_all.sh` (`.scratch/pz/Logs.redacted/`).
|
||||
|
||||
**Out of scope:**
|
||||
- Any modification to `pz_error_analysis.py`, `pz_redact_all.sh`, or PHP codex source.
|
||||
- Filesystem-based mod-scan reattribution (pzmm's symbol-index, vehicle-index, file-path-ownership reattribution requires an actual mod folder we don't have on the server side).
|
||||
- iblogs / bosslogs integration. The output schema is designed with that future port in mind, but no PHP code is written here.
|
||||
- Generic AI tab patterns from pzmm's `core/ai.py`. Explicitly excluded.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
redacted .txt files
|
||||
|
|
||||
v
|
||||
+---------------------------+
|
||||
| pz_classify.py | argparse · directory walk · aggregate · JSON write
|
||||
| (orchestrator) |
|
||||
+-------------+-------------+
|
||||
|
|
||||
v
|
||||
+---------------------------+
|
||||
| pz_parser.py | regexes · parse · classify · sign
|
||||
| (pure module, no I/O |
|
||||
| beyond reading the path |
|
||||
| it is handed) |
|
||||
+---------------------------+
|
||||
```
|
||||
|
||||
Two files inside `tools/pz-analyzer/`:
|
||||
|
||||
- **`pz_parser.py`** — stateless. All regex constants, `parse_file(path) -> list[Entry]`, attribution helpers, file:line extractors, cause-chain extractor, signature computation. No `argparse`, no JSON writing, no directory walking. Unit-testable in isolation.
|
||||
- **`pz_classify.py`** — entry point. CLI args, walks the redacted directory, calls `pz_parser`, aggregates records by signature, writes JSON, prints a one-line stats summary.
|
||||
|
||||
The split is deliberate: `pz_parser.py` is the module that eventually wants to be ported to PHP codex (separate spec). Keeping it pure makes that port mechanical and Python-side tests trivial.
|
||||
|
||||
## Parser pipeline phases
|
||||
|
||||
For each `*DebugLog-server*.txt`, the parser walks lines once and emits records via the following phases.
|
||||
|
||||
### 1. Severity-prefix recognition
|
||||
|
||||
Regex: `^\s*(ERROR|SEVERE|WARN)\s*[:\s]`. Broader than the existing `pz_error_analysis.py` regex — adds `SEVERE` (Java util-logging convention; appears in some PZ Java exception blocks). `LOG`/`INFO` is ignored at this layer.
|
||||
|
||||
### 2. Stack collection — bidirectional
|
||||
|
||||
Pzmm's contribution: PZ emits stack frames *before* the ERROR/WARN line as often as after.
|
||||
|
||||
- **Pre-stack**: walk up to 25 lines back from the severity line. Stop at another severity line or 8 collected. Only keep the block if at least one line looks stack-shaped (`at `, `[string ...]`, `function:`, `file:`, `.lua` markers).
|
||||
- **Post-stack**: walk forward up to 25 lines, gated by engine-noise detection. Stop at another severity line or 8 collected.
|
||||
- Merge deduped, preserving order; cap at 8 frames per record.
|
||||
|
||||
### 3. Mod attribution — three buckets
|
||||
|
||||
| Bucket | Trigger | Confidence |
|
||||
|---|---|---|
|
||||
| `direct` | Line itself matches `Lua\(\(MOD:([^)]+)\)\)` (or the `require("X") failed` shape, or an explicit `needed by <mod>` hint elsewhere in the entry) | `high` |
|
||||
| `inferred` | No marker on this line, but body is Lua-shaped (see below) *and* a `Lua((MOD:Y))` was emitted within the previous 40 lines | `medium` |
|
||||
| `unattributed` | Neither of the above | `low`; `mod_id = "__unattributed__"` |
|
||||
|
||||
"Lua-shaped" means the body matches at least one of (case-insensitive): `luamanager.getfunctionobject`, `no such function`, `exception thrown`, `runtimeexception`, `illegalstateexception`, or contains the bare token `lua`. This filter prevents inferred attribution from latching onto unrelated severity lines that happened to fall within the lookback window.
|
||||
|
||||
`mod_id` derives from the marker's raw name with a `_norm_mod_key` transform: lowercase, strip spaces / apostrophes / hyphens. `mod_name` preserves the human-readable form.
|
||||
|
||||
We do **not** attempt pzmm's filesystem-based reattribution.
|
||||
|
||||
### 4. File:line extraction — five fallbacks
|
||||
|
||||
Tried in order against the entry body and stack frames:
|
||||
|
||||
1. `at <path>.lua:<n>`
|
||||
2. `function: ... file: <path>.lua line #<n>` (or `: <n>`)
|
||||
3. `[string "<path>.lua"]:<n>`
|
||||
4. quoted path ending in `.lua` / `.txt` / `.xml` / `.json` / `.ini` / `.cfg` / `.bin`
|
||||
5. unquoted path segment beginning with `media/`, `maps/`, `lua/`, `scripts/`
|
||||
|
||||
Returns `(file, line)`; `line=0` if the matched form had no line number.
|
||||
|
||||
### 5. Cause-chain extraction
|
||||
|
||||
`Caused by: <X>` chains plus standalone exception lines (`(\w+\.)+\w+(Exception|Error): <msg>`) are normalised to `<ExceptionClass>: <msg>` tokens and joined with ` -> `. Up to 6 chain levels, deduped. Captures both Java exception nesting and Lua-wrapped exception chains.
|
||||
|
||||
### 6. Java exception kind detection
|
||||
|
||||
DebugLog-server has both Lua and Java exceptions; pzmm targets `console.txt` which is Lua-dominant. Extension here:
|
||||
|
||||
- `kind = "java_exception"` when the entry body or stack contains `(\w+\.)+\w+(Exception|Error)` AND no `Lua((MOD:X))` marker is present anywhere in the entry.
|
||||
- These typically resolve to `mod_id: __unattributed__` because Java code in PZ is engine, not mod. The exception class name becomes part of the message skeleton so similar Java exceptions dedup tightly.
|
||||
|
||||
### 7. Engine-noise tagging
|
||||
|
||||
`kind = "engine_noise"` when the body contains `kahluathread.flusherrormessage` or `dumping lua stack trace`. These severity-ERROR lines are PZ's own diagnostic chatter about its error reporting, not actual errors. They stay in the output (consumer can filter on `kind`).
|
||||
|
||||
### 8. Signature computation
|
||||
|
||||
Two-level deterministic identity, both stored on every record:
|
||||
|
||||
```
|
||||
pattern_id = sha256(level + normalized_first_line)[:16]
|
||||
signature = sha256(pattern_id + mod_id)[:16]
|
||||
```
|
||||
|
||||
Normalization for `pattern_id`:
|
||||
- Strip session metadata prefix (`General f:N, t:N, st:N,N,N,N>` shape)
|
||||
- Strip body-prefix severity token (`ERROR:` / `SEVERE:` / `WARN:` / `FATAL:`, case-insensitive) so a body that opens with the severity word still hashes the same as one that doesn't.
|
||||
- Flatten double- and single-quoted strings to `"<S>"` / `'<S>'`
|
||||
- Flatten ≥2-digit numeric runs to `<N>`
|
||||
- Collapse whitespace
|
||||
- Truncate to 200 chars
|
||||
|
||||
Both fields ride on every record. Two consumer views, neither requires LLM:
|
||||
|
||||
- **Per-mod view** (signature is the dedup key): one record per `(mod_id, error_shape)` pair.
|
||||
- **Pattern fan-out view** (group records by `pattern_id`): see all mods that hit the same shape.
|
||||
|
||||
### 9. Aggregation
|
||||
|
||||
Records dedup on `signature`. On second-and-subsequent occurrences: `occurrence_count++`, `files` set-extends, attribution-confidence promotes (direct beats inferred beats unattributed), stack and `cause_chain` merge.
|
||||
|
||||
## Output schema
|
||||
|
||||
```json
|
||||
{
|
||||
"meta": {
|
||||
"input_dir": "/opt/ik-codex/.scratch/pz/Logs.redacted",
|
||||
"files_scanned": 6,
|
||||
"log_lines_total": 78654,
|
||||
"error_lines_total": 30984,
|
||||
"unique_signatures": N,
|
||||
"unique_patterns": M,
|
||||
"redacted": true,
|
||||
"started": "ISO8601",
|
||||
"finished": "ISO8601"
|
||||
},
|
||||
"signatures": [
|
||||
{
|
||||
"signature": "sha256:...",
|
||||
"pattern_id": "sha256:...",
|
||||
"level": "ERROR",
|
||||
"kind": "lua_runtime|require_failed|java_exception|engine_noise|runtime",
|
||||
"mod_id": "spongies_clothing",
|
||||
"mod_name": "Spongie's Clothing",
|
||||
"attribution": "direct|inferred|unattributed",
|
||||
"confidence": "high|medium|low",
|
||||
"attribution_reason": "...",
|
||||
"file": "media/lua/client/X.lua",
|
||||
"line": 42,
|
||||
"cause_chain": "ExceptionA: msg -> ExceptionB: msg",
|
||||
"stack": ["at A.lua:12", "at B.lua:34"],
|
||||
"first_seen": {"file": "...", "line": 1234, "timestamp": "26-04-26 17:14:35.128"},
|
||||
"occurrence_count": 47,
|
||||
"files": ["..."],
|
||||
"excerpt": "..."
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"errors": N,
|
||||
"warnings": N,
|
||||
"by_kind": {"lua_runtime": ..., "java_exception": ..., "require_failed": ..., "engine_noise": ..., "runtime": ...},
|
||||
"by_attribution": {"direct": ..., "inferred": ..., "unattributed": ...},
|
||||
"by_confidence": {"high": ..., "medium": ..., "low": ...},
|
||||
"top_mods": [{"mod_id": "...", "mod_name": "...", "occurrence_count": N}, ...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Default output path: `/opt/ik-codex/.scratch/pz/classify.json` (gitignored under `.scratch/`).
|
||||
|
||||
## CLI
|
||||
|
||||
```
|
||||
pz_classify.py [--input <dir>] [--out <path>] [--quiet]
|
||||
```
|
||||
|
||||
- `--input` defaults to `<repo>/.scratch/pz/Logs.redacted`
|
||||
- `--out` defaults to `<repo>/.scratch/pz/classify.json`
|
||||
- `--quiet` suppresses the trailing summary line
|
||||
|
||||
No `--limit`, `--resume`, or `--checkpoint-every`. Runs in seconds; nothing to throttle or resume.
|
||||
|
||||
## Tests
|
||||
|
||||
New directory `tools/pz-analyzer/tests/`. Stdlib `unittest`. Three files, ~18 tests total.
|
||||
|
||||
- **`test_parser.py`** (~10 tests) — one fixture per scenario in `tests/fixtures/` (synthetic, tracked in git): pure-Lua-attributed, pure-Java-exception, inferred-from-context, unattributed-engine-noise, multi-cause-chain, pre-stack-collection, post-stack-collection, severity-variants, file-line-extraction-fallbacks. All synthetic identifiers (placeholder Steam IDs / mod names) per the existing PHP-side `test/src/Games/ProjectZomboid/fixtures/` convention.
|
||||
- **`test_attribution.py`** (~5 tests) — three confidence buckets, the 40-line lookback boundary, "needed by X" extraction, and the rejection of inferred attribution when the message isn't Lua-shaped.
|
||||
- **`test_signatures.py`** (~3 tests) — `pattern_id` stability across formatting variations (whitespace, numeric values, quoted strings) and `signature` uniqueness across mods.
|
||||
|
||||
Invocation: `python -m unittest discover tools/pz-analyzer/tests/`. No external deps.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end smoke against the redacted real-data directory:
|
||||
|
||||
```
|
||||
bash /opt/ik-codex/tools/pz-analyzer/pz_redact_all.sh # one-time, already done
|
||||
python /opt/ik-codex/tools/pz-analyzer/pz_classify.py
|
||||
```
|
||||
|
||||
Expect:
|
||||
- 6 files scanned, ~30,984 error lines processed.
|
||||
- A meaningful number of unique signatures and patterns (likely in the low hundreds for signatures; fewer patterns).
|
||||
- `top_mods` lists the highest-occurrence mods.
|
||||
- PII audit: no real Steam IDs, IPs, or coordinates in the output JSON (input is already redacted; classifier doesn't introduce PII).
|
||||
|
||||
Test invocation: `python -m unittest discover tools/pz-analyzer/tests/` should be all-green.
|
||||
|
||||
## Risks and open questions
|
||||
|
||||
- **Inferred attribution accuracy.** The 40-line lookback is pzmm's heuristic; it's correct for tightly-paced server bursts but can mis-attribute when an unrelated mod logs in the gap. Surface as `confidence: medium` so consumers can choose to treat them differently. Acceptable for v1; tunable via a constant in `pz_parser.py`.
|
||||
- **Pzmm targets `console.txt`, we target `DebugLog-server.txt`.** Format overlap is high (both share `Lua((MOD:X))` markers, Caused-by chains, Java exception shapes), but some patterns may be `console.txt`-specific. Tests use `DebugLog-server`-shaped fixtures only.
|
||||
- **Future PHP port.** `pz_parser.py` is structured for mechanical translation to a `LuaErrorAnalyser` / `ModAttributionAnalyser` pair under `src/Analyser/ProjectZomboid/` in a separate spec. Output schema chosen to be PHP-codex-compatible (Insight subclasses with typed fields).
|
||||
- **Licence.** The `paraxaQQ/pzmm` zip we reviewed has no top-level LICENSE; this spec mandates rewriting the patterns originally rather than copying code. Regex shapes and heuristics are general programming patterns and not author-specific, but no code blocks are lifted verbatim.
|
||||
|
||||
## Out of scope (explicit)
|
||||
|
||||
- Editing `pz_error_analysis.py` or `pz_redact_all.sh`.
|
||||
- Modifying any file in `/opt/ik-codex/src/`, `/opt/ik-codex/test/`, or `/opt/iblogs/`.
|
||||
- AI / LLM integration of any kind in the new tool.
|
||||
- LLM inference at runtime in iblogs / bosslogs production. The Qwen analyzer (`pz_error_analysis.py`) is a developer-only discovery tool used to expand the deterministic ruleset in `pz_parser.py` (and its future PHP port). Production rendering is deterministic-only, forever.
|
||||
- iblogs front-end rendering of the classification output.
|
||||
- Filesystem mod-scan reattribution (pzmm's symbol/vehicle indexes).
|
||||
131
src/Analyser/ProjectZomboid/ErrorContextAnalyser.php
Normal file
131
src/Analyser/ProjectZomboid/ErrorContextAnalyser.php
Normal file
@@ -0,0 +1,131 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analyser\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analyser\Analyser;
|
||||
use IndifferentKetchup\Codex\Analysis\Analysis;
|
||||
use IndifferentKetchup\Codex\Analysis\AnalysisInterface;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
|
||||
use IndifferentKetchup\Codex\Log\EntryInterface;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
|
||||
/**
|
||||
* Surfaces ERROR or WARNING entries with a sliding context window of
|
||||
* surrounding entries, so a viewer can see the lead-up and aftermath of
|
||||
* each event without scanning the full log. PatternAnalyser cannot
|
||||
* express this because windows span multiple entries; this walks once,
|
||||
* classifies by Level (already resolved by the parser), and emits one
|
||||
* ErrorContextProblem per hit.
|
||||
*
|
||||
* Stack-trace continuation lines are absorbed into the same Entry as the
|
||||
* level header that preceded them by PatternParser, so noise filtering
|
||||
* happens at parse time — windows here count Entries, not raw lines, and
|
||||
* a stack-trace ERROR contributes exactly one window.
|
||||
*
|
||||
* Overlapping windows are merged: when two error/warning entries fall
|
||||
* within CONTEXT_BEFORE + CONTEXT_AFTER of each other, the later
|
||||
* window's before- and after-ranges are clipped to start past the
|
||||
* previously emitted range so no Entry appears in two context arrays.
|
||||
* The hit cap is enforced after emission; reaching it adds an
|
||||
* ErrorContextTruncatedInformation to the analysis instead of further
|
||||
* problems.
|
||||
*/
|
||||
class ErrorContextAnalyser extends Analyser
|
||||
{
|
||||
/**
|
||||
* Number of entries preceding a hit captured as leading context.
|
||||
* Twenty entries is wide enough to surface the immediate precursor
|
||||
* events (mod load, player join, prior warning) for a server-log
|
||||
* error without dragging in unrelated activity from minutes earlier.
|
||||
*/
|
||||
public const int CONTEXT_BEFORE = 20;
|
||||
|
||||
/**
|
||||
* Number of entries following a hit captured as trailing context.
|
||||
* Mirrors CONTEXT_BEFORE so windows are symmetric and the maximum
|
||||
* window size is CONTEXT_BEFORE + 1 (hit) + CONTEXT_AFTER = 41
|
||||
* entries.
|
||||
*/
|
||||
public const int CONTEXT_AFTER = 20;
|
||||
|
||||
/**
|
||||
* Maximum number of hits emitted before truncation. Caps memory and
|
||||
* output size on logs with cascading errors (e.g. a save-system
|
||||
* failure that produces an error every tick). Reaching the cap adds
|
||||
* an ErrorContextTruncatedInformation to the analysis so consumers
|
||||
* can flag truncation rather than silently dropping later hits.
|
||||
*/
|
||||
public const int HIT_CAP = 500;
|
||||
|
||||
public function analyse(): AnalysisInterface
|
||||
{
|
||||
$analysis = new Analysis();
|
||||
$analysis->setLog($this->log);
|
||||
|
||||
$entries = [];
|
||||
foreach ($this->log as $entry) {
|
||||
$entries[] = $entry;
|
||||
}
|
||||
$count = count($entries);
|
||||
|
||||
$hits = 0;
|
||||
$truncated = false;
|
||||
$lastEmittedIndex = -1;
|
||||
|
||||
for ($i = 0; $i < $count; $i++) {
|
||||
$type = $this->classify($entries[$i]);
|
||||
if ($type === null) {
|
||||
continue;
|
||||
}
|
||||
|
||||
if ($hits >= self::HIT_CAP) {
|
||||
$truncated = true;
|
||||
break;
|
||||
}
|
||||
|
||||
$beforeStart = max($lastEmittedIndex + 1, $i - self::CONTEXT_BEFORE);
|
||||
if ($beforeStart > $i) {
|
||||
$beforeStart = $i;
|
||||
}
|
||||
$afterStart = max($lastEmittedIndex + 1, $i + 1);
|
||||
$afterEnd = min($count - 1, $i + self::CONTEXT_AFTER);
|
||||
$afterLength = max(0, $afterEnd - $afterStart + 1);
|
||||
|
||||
$analysis->addInsight((new ErrorContextProblem())
|
||||
->setEntry($entries[$i])
|
||||
->setType($type)
|
||||
->setEntryIndex($i + 1)
|
||||
->setBefore(array_slice($entries, $beforeStart, $i - $beforeStart))
|
||||
->setAfter(array_slice($entries, $afterStart, $afterLength)));
|
||||
|
||||
$hits++;
|
||||
$lastEmittedIndex = max($lastEmittedIndex, $afterEnd);
|
||||
}
|
||||
|
||||
if ($truncated) {
|
||||
$analysis->addInsight((new ErrorContextTruncatedInformation())
|
||||
->setHitCap(self::HIT_CAP));
|
||||
}
|
||||
|
||||
return $analysis;
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify an entry as 'error', 'warning', or null based on its Level.
|
||||
* Levels at or below ERROR (EMERGENCY/ALERT/CRITICAL/ERROR) collapse
|
||||
* into 'error'; WARNING alone collapses into 'warning'. Returns null
|
||||
* for anything less severe so the analyser skips it.
|
||||
*/
|
||||
protected function classify(EntryInterface $entry): ?string
|
||||
{
|
||||
$level = $entry->getLevel()->asInt();
|
||||
if ($level <= Level::ERROR->asInt()) {
|
||||
return 'error';
|
||||
}
|
||||
if ($level === Level::WARNING->asInt()) {
|
||||
return 'warning';
|
||||
}
|
||||
return null;
|
||||
}
|
||||
}
|
||||
130
src/Analysis/ProjectZomboid/ErrorContextProblem.php
Normal file
130
src/Analysis/ProjectZomboid/ErrorContextProblem.php
Normal file
@@ -0,0 +1,130 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analysis\InsightInterface;
|
||||
use IndifferentKetchup\Codex\Analysis\Problem;
|
||||
use IndifferentKetchup\Codex\Log\EntryInterface;
|
||||
|
||||
/**
|
||||
* Problem emitted by ErrorContextAnalyser for each ERROR or WARNING entry,
|
||||
* carrying a sliding window of surrounding entries as before/after
|
||||
* context. Coalesced by 1-based entryIndex so re-adding the same hit
|
||||
* never produces duplicate problems.
|
||||
*/
|
||||
class ErrorContextProblem extends Problem
|
||||
{
|
||||
private string $type = 'error';
|
||||
private int $entryIndex = 0;
|
||||
|
||||
/**
|
||||
* @var EntryInterface[]
|
||||
*/
|
||||
private array $before = [];
|
||||
|
||||
/**
|
||||
* @var EntryInterface[]
|
||||
*/
|
||||
private array $after = [];
|
||||
|
||||
/**
|
||||
* @param string $type 'error' or 'warning'
|
||||
* @return $this
|
||||
*/
|
||||
public function setType(string $type): static
|
||||
{
|
||||
$this->type = $type;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return string
|
||||
*/
|
||||
public function getType(): string
|
||||
{
|
||||
return $this->type;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param int $entryIndex 1-based index of the hit entry within the log
|
||||
* @return $this
|
||||
*/
|
||||
public function setEntryIndex(int $entryIndex): static
|
||||
{
|
||||
$this->entryIndex = $entryIndex;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return int 1-based index of the hit entry within the log
|
||||
*/
|
||||
public function getEntryIndex(): int
|
||||
{
|
||||
return $this->entryIndex;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param EntryInterface[] $entries
|
||||
* @return $this
|
||||
*/
|
||||
public function setBefore(array $entries): static
|
||||
{
|
||||
$this->before = $entries;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getBefore(): array
|
||||
{
|
||||
return $this->before;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param EntryInterface[] $entries
|
||||
* @return $this
|
||||
*/
|
||||
public function setAfter(array $entries): static
|
||||
{
|
||||
$this->after = $entries;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getAfter(): array
|
||||
{
|
||||
return $this->after;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convenience accessor returning before-context, hit entry, and
|
||||
* after-context as a single ordered array of at most
|
||||
* ErrorContextAnalyser::CONTEXT_BEFORE + 1 + CONTEXT_AFTER = 41
|
||||
* entries.
|
||||
*
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getContext(): array
|
||||
{
|
||||
return [...$this->before, $this->getEntry(), ...$this->after];
|
||||
}
|
||||
|
||||
public function getMessage(): string
|
||||
{
|
||||
return sprintf(
|
||||
'%s at entry %d (%d before, %d after)',
|
||||
strtoupper($this->type),
|
||||
$this->entryIndex,
|
||||
count($this->before),
|
||||
count($this->after)
|
||||
);
|
||||
}
|
||||
|
||||
public function isEqual(InsightInterface $insight): bool
|
||||
{
|
||||
return $insight instanceof self && $insight->getEntryIndex() === $this->entryIndex;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,42 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analysis\Information;
|
||||
use IndifferentKetchup\Codex\Analysis\InsightInterface;
|
||||
|
||||
/**
|
||||
* Emitted by ErrorContextAnalyser exactly once when its hit cap is
|
||||
* reached, so downstream consumers can surface a "results truncated"
|
||||
* notice instead of silently dropping subsequent error/warning hits.
|
||||
*/
|
||||
class ErrorContextTruncatedInformation extends Information
|
||||
{
|
||||
private int $hitCap = 0;
|
||||
|
||||
/**
|
||||
* @param int $hitCap the cap that was hit (mirrors
|
||||
* ErrorContextAnalyser::HIT_CAP at emission time)
|
||||
* @return $this
|
||||
*/
|
||||
public function setHitCap(int $hitCap): static
|
||||
{
|
||||
$this->hitCap = $hitCap;
|
||||
$this->setLabel('Error context');
|
||||
$this->setValue(sprintf('truncated after %d hits', $hitCap));
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return int
|
||||
*/
|
||||
public function getHitCap(): int
|
||||
{
|
||||
return $this->hitCap;
|
||||
}
|
||||
|
||||
public function isEqual(InsightInterface $insight): bool
|
||||
{
|
||||
return $insight instanceof self;
|
||||
}
|
||||
}
|
||||
@@ -15,7 +15,7 @@ namespace IndifferentKetchup\Codex\Pattern\ProjectZomboid;
|
||||
*/
|
||||
class DebugServerPattern
|
||||
{
|
||||
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+,\s+t:\d+,\s+st:[\d,]+>\s+.*$/';
|
||||
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+(?:,\s+t:\d+)?,?\s+st:[\d,]+>\s+.*$/';
|
||||
|
||||
public const string VERSION = '/version=(?<version>\S+) (?<hash>[a-f0-9]{40}) (?<date>\d{4}-\d{2}-\d{2}) (?<time>\d{2}:\d{2}:\d{2})/';
|
||||
|
||||
|
||||
185
src/Util/ProjectZomboid/ProjectZomboidRedactor.php
Normal file
185
src/Util/ProjectZomboid/ProjectZomboidRedactor.php
Normal file
@@ -0,0 +1,185 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Util\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\RedactorInterface;
|
||||
|
||||
/**
|
||||
* Render-time PII filter for Project Zomboid log content.
|
||||
*
|
||||
* Applies up to four sequential regex passes over the raw log string,
|
||||
* each controlled by a boolean toggle (all enabled by default):
|
||||
*
|
||||
* 1. IP address pass — replaces IPv4 addresses (with optional :port
|
||||
* suffix) and IPv6 addresses (full, abbreviated, bracketed, and
|
||||
* IPv4-mapped forms; all with optional :port when bracketed) with
|
||||
* a placeholder token. Pattern-disjoint from the other passes.
|
||||
* 2. Steam ID pass — replaces 17-digit Steam IDs with a placeholder
|
||||
* token.
|
||||
* 3. Player name pass — replaces player display names with a placeholder
|
||||
* token. This pass anchors on the already-redacted Steam ID token, so
|
||||
* the ordering Steam ID -> name -> coordinates is mandatory.
|
||||
* 4. Coordinates pass — replaces world coordinate triplets with a
|
||||
* placeholder token.
|
||||
*
|
||||
* Pass 1 runs first by convention, not dependency: it shares no anchors
|
||||
* with passes 2-4 and could run anywhere in the chain without affecting
|
||||
* their output.
|
||||
*
|
||||
* All regex passes use the /u flag for Unicode safety.
|
||||
*
|
||||
* Replacements are not reversible; do not apply to content that must later be
|
||||
* restored to its original form.
|
||||
*/
|
||||
class ProjectZomboidRedactor implements RedactorInterface
|
||||
{
|
||||
/** Generic placeholder substituted for every matched IPv4 or IPv6 address (with port suffix consumed when present). */
|
||||
public const string IP_REPLACEMENT = '[REDACTED_IP]';
|
||||
|
||||
/** Strict IPv4 with valid 0-255 octets and optional :port suffix. Lookarounds reject matches embedded in longer alphanumeric or dotted-decimal tokens; the (?<!\d\.) / (?!\.\d) pair specifically prevents matching inside an N-octet (N>4) sequence like 1.2.3.4.5 while still allowing a trailing sentence period after the IP/port. */
|
||||
public const string IPV4_REGEX = '/'
|
||||
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
|
||||
. '(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
|
||||
. '(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'
|
||||
. '(?::\d{1,5})?'
|
||||
. '(?![A-Za-z0-9_:])(?!\.\d)'
|
||||
. '/u';
|
||||
|
||||
/** Coarse IPv6 candidate matcher (bracketed-with-port, or bare 2-7-colon hex form covering full / abbreviated / IPv4-mapped). Each match is validated with filter_var() in the redact() callback so PHP/Java scope ops like Foo::Bar and PZ timestamps like 12:00:00.000 are rejected. Boundary lookarounds mirror the IPv4 regex so trailing sentence periods don't block the match. */
|
||||
public const string IPV6_REGEX = '/'
|
||||
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
|
||||
. '(?:'
|
||||
. '\[(?<bracketed>[0-9a-fA-F:.]+)\](?::\d{1,5})?'
|
||||
. '|'
|
||||
. '(?<bare>(?:[0-9a-fA-F]{0,4}:){2,7}[0-9a-fA-F.]*)'
|
||||
. ')'
|
||||
. '(?![A-Za-z0-9_:])(?!\.\d)'
|
||||
. '/u';
|
||||
|
||||
/** Regex matching a 17-digit SteamID64 anchored on the 76561198 universe prefix, with lookaround boundaries that reject embedded occurrences. */
|
||||
public const string STEAM_ID_REGEX = '/(?<![A-Za-z0-9])76561198\d{9}(?![A-Za-z0-9])/u';
|
||||
|
||||
/** Zeroed-out SteamID64 placeholder; syntactically valid but refers to no real account. */
|
||||
public const string STEAM_ID_REPLACEMENT = '76561198000000000';
|
||||
|
||||
/** Generic placeholder substituted for every matched player display name. */
|
||||
public const string PLAYER_NAME_REPLACEMENT = '<player>';
|
||||
|
||||
/** Matches a double-quoted player name that immediately follows the redacted Steam ID placeholder (cmd.txt / admin.txt shape); relies on the Steam ID pass having run first. */
|
||||
public const string PLAYER_AFTER_STEAMID_REGEX = '/(?<=76561198000000000) "(?<name>[^"]+)"/u';
|
||||
|
||||
/** Matches the author value inside a ChatMessage{...} envelope, using a fixed-length lookbehind on ", author='" and a lookahead on the closing "'" so only the bare name is replaced. */
|
||||
public const string PLAYER_IN_CHATMESSAGE_REGEX = '/(?<=, author=\')(?<name>[^\']+)(?=\')/u';
|
||||
|
||||
/** Matches the first double-quoted player name following a Combat: or Safety: subsystem token (pvp.txt shape); does NOT redact the second name after "hit" — deferred to v2. */
|
||||
public const string PLAYER_IN_PVP_SUBSYSTEM_REGEX = '/(?<=(?:Combat|Safety): )"(?<name>[^"]+)"/u';
|
||||
|
||||
/** Zeroed-out coordinate triple used as the inner replacement; bracket/paren/`at` wrapper is preserved by the regex lookaround anchors. */
|
||||
public const string COORDS_REPLACEMENT = '0,0,0';
|
||||
|
||||
/** Matches integer or float coordinate triplets that immediately follow the literal ` at ` token (map.txt / item.txt shape); the trailing dot is preserved via lookahead. */
|
||||
public const string COORDS_AT_CLAUSE_REGEX = '/(?<= at )(?<x>[\d.]+),(?<y>[\d.]+),(?<z>-?[\d.]+)(?=\.)/u';
|
||||
|
||||
/** Matches integer coordinate triplets enclosed in square brackets (ClientActionLog.txt / PerkLog.txt / cmd.txt @-context shape); the surrounding brackets are preserved via lookaround. */
|
||||
public const string COORDS_BRACKETED_REGEX = '/(?<=\[)(?<x>\d+),(?<y>\d+),(?<z>-?\d+)(?=\])/u';
|
||||
|
||||
/** Matches integer coordinate triplets enclosed in round parentheses, anchored on a trailing PvP verb to disambiguate from server-metadata triples (pvp.txt Combat:/Safety: shape); only the attacker/first-coord set is redacted per line — the victim coords lack the trailing keyword and are deferred to v2. */
|
||||
public const string COORDS_PARENTHESISED_REGEX = '/(?<=\()(?<x>\d+),(?<y>\d+),(?<z>-?\d+)(?=\) (?:hit|restore|store|true|false))/u';
|
||||
|
||||
private bool $redactIpAddresses = true;
|
||||
private bool $redactSteamIds = true;
|
||||
private bool $redactPlayerNames = true;
|
||||
private bool $redactCoordinates = true;
|
||||
|
||||
/**
|
||||
* Enable or disable the IP address redaction pass (covers IPv4 and IPv6).
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactIpAddresses(bool $on): static
|
||||
{
|
||||
$this->redactIpAddresses = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the Steam ID redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactSteamIds(bool $on): static
|
||||
{
|
||||
$this->redactSteamIds = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the player-name redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactPlayerNames(bool $on): static
|
||||
{
|
||||
$this->redactPlayerNames = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the coordinates redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactCoordinates(bool $on): static
|
||||
{
|
||||
$this->redactCoordinates = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Redact PII from the given Project Zomboid log content.
|
||||
*
|
||||
* Passes are applied in the order: IP address -> Steam ID -> player
|
||||
* name -> coordinates. The Steam ID -> name -> coordinates ordering
|
||||
* is mandatory (see class docblock); the IP pass is pattern-disjoint
|
||||
* and runs first by convention.
|
||||
*
|
||||
* @param string $content Raw log content that may contain PII.
|
||||
* @return string Content with enabled PII categories replaced by tokens.
|
||||
*/
|
||||
public function redact(string $content): string
|
||||
{
|
||||
if ($this->redactIpAddresses) {
|
||||
$content = preg_replace_callback(
|
||||
self::IPV6_REGEX,
|
||||
static function (array $matches): string {
|
||||
$candidate = ($matches['bracketed'] ?? '') !== ''
|
||||
? $matches['bracketed']
|
||||
: ($matches['bare'] ?? '');
|
||||
return filter_var($candidate, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) !== false
|
||||
? self::IP_REPLACEMENT
|
||||
: $matches[0];
|
||||
},
|
||||
$content
|
||||
);
|
||||
$content = preg_replace(self::IPV4_REGEX, self::IP_REPLACEMENT, $content);
|
||||
}
|
||||
if ($this->redactSteamIds) {
|
||||
$content = preg_replace(self::STEAM_ID_REGEX, self::STEAM_ID_REPLACEMENT, $content);
|
||||
}
|
||||
if ($this->redactPlayerNames) {
|
||||
$content = preg_replace(self::PLAYER_AFTER_STEAMID_REGEX, ' "' . self::PLAYER_NAME_REPLACEMENT . '"', $content);
|
||||
$content = preg_replace(self::PLAYER_IN_CHATMESSAGE_REGEX, self::PLAYER_NAME_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::PLAYER_IN_PVP_SUBSYSTEM_REGEX, '"' . self::PLAYER_NAME_REPLACEMENT . '"', $content);
|
||||
}
|
||||
if ($this->redactCoordinates) {
|
||||
$content = preg_replace(self::COORDS_AT_CLAUSE_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::COORDS_BRACKETED_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::COORDS_PARENTHESISED_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
}
|
||||
return $content;
|
||||
}
|
||||
}
|
||||
20
src/Util/RedactorInterface.php
Normal file
20
src/Util/RedactorInterface.php
Normal file
@@ -0,0 +1,20 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Util;
|
||||
|
||||
interface RedactorInterface
|
||||
{
|
||||
/**
|
||||
* Redact PII from the given content string and return the result.
|
||||
*
|
||||
* The method is stateless from the caller's perspective: the same instance
|
||||
* may be called repeatedly and each call operates independently on its
|
||||
* input. Configuration (which passes are enabled, replacement tokens, etc.)
|
||||
* is applied once via implementation-specific setters before the first call
|
||||
* to redact().
|
||||
*
|
||||
* @param string $content Raw log content that may contain PII.
|
||||
* @return string Content with PII replaced by redaction tokens.
|
||||
*/
|
||||
public function redact(string $content): string;
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0 st:48,648,157,434> SLF4J(W): No SLF4J providers were found..
|
||||
[16-04-26 00:00:42.315] LOG : General f:0 st:48,648,157,492> SLF4J(W): Defaulting to no-operation (NOP) logger implementation.
|
||||
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,584> version=42.17.0 0000000000000000000000000000000000000000 2026-04-20 14:34:44 (ZB) demo=false.
|
||||
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,585> revision=0000000000000000000000000000000000000000 date=2026-04-20 time=14:34:44 (ZB).
|
||||
[16-04-26 00:01:19.080] ERROR: General f:0 st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
|
||||
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
|
||||
Stack trace:
|
||||
java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
|
||||
java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
|
||||
java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
|
||||
java.base/sun.nio.fs.AbstractPoller.processRequests(Unknown Source)
|
||||
java.base/sun.nio.fs.LinuxWatchService$Poller.run(Unknown Source)
|
||||
[16-04-26 00:01:19.131] LOG : Mod f:0 st:48,648,194,309> loading example_mod_alpha.
|
||||
[16-04-26 00:01:19.142] LOG : Mod f:0 st:48,648,194,320> loading example_mod_beta.
|
||||
[16-04-26 00:01:19.155] LOG : Mod f:0 st:48,648,194,333> loading example_mod_gamma.
|
||||
[16-04-26 00:01:19.200] WARN : Mod f:0 st:48,648,194,378> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
|
||||
[16-04-26 00:01:45.937] ERROR: WorldGen f:0 st:48,648,221,115> IsoPropertyType.lookupOrDefaultStr> Exception thrown
|
||||
zombie.core.properties.IsoPropertyType$IsoPropertyTypeNotFoundException: Property Name not found: ladderW at IsoPropertyType.lookup(IsoPropertyType.java:269). Message: Property Name not found: ladderW
|
||||
at zombie.core.properties.IsoPropertyType.lookup(IsoPropertyType.java:269)
|
||||
at zombie.iso.IsoChunkData.PostProcessChunk(IsoChunkData.java:512)
|
||||
[16-04-26 00:02:00.000] LOG : General f:0 st:48,648,235,178> server initialised.
|
||||
[16-04-26 00:05:00.000] LOG : General f:0 st:48,648,415,178> shutdown requested.
|
||||
@@ -0,0 +1,128 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Games\ProjectZomboid\Analyser;
|
||||
|
||||
use IndifferentKetchup\Codex\Analyser\AnalyserInterface;
|
||||
use IndifferentKetchup\Codex\Analyser\ProjectZomboid\ErrorContextAnalyser;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
|
||||
use IndifferentKetchup\Codex\Log\AnalysableLog;
|
||||
use IndifferentKetchup\Codex\Log\Entry;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
use IndifferentKetchup\Codex\Log\Line;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ErrorContextAnalyserTest extends TestCase
|
||||
{
|
||||
/**
|
||||
* Build an in-memory AnalysableLog with $count entries; entries whose
|
||||
* 1-based index is in $errorIndices are tagged Level::ERROR, the rest
|
||||
* Level::INFO. Anonymous AnalysableLog subclass keeps the fixture
|
||||
* inline since we exercise the analyser directly via setLog().
|
||||
*
|
||||
* @param int[] $errorIndices 1-based entry indices to mark as ERROR
|
||||
*/
|
||||
private function makeLog(array $errorIndices, int $count): AnalysableLog
|
||||
{
|
||||
$errorSet = array_flip($errorIndices);
|
||||
$log = new class extends AnalysableLog {
|
||||
public static function getDefaultAnalyser(): AnalyserInterface
|
||||
{
|
||||
return new ErrorContextAnalyser();
|
||||
}
|
||||
};
|
||||
for ($n = 1; $n <= $count; $n++) {
|
||||
$level = isset($errorSet[$n]) ? Level::ERROR : Level::INFO;
|
||||
$entry = (new Entry())
|
||||
->setLevel($level)
|
||||
->addLine(new Line($n, sprintf('line %d', $n)));
|
||||
$log->addEntry($entry);
|
||||
}
|
||||
return $log;
|
||||
}
|
||||
|
||||
public function testEmitsThreeNonOverlappingWindows(): void
|
||||
{
|
||||
$log = $this->makeLog([10, 50, 95], 100);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(3, $problems);
|
||||
|
||||
$this->assertSame(10, $problems[0]->getEntryIndex());
|
||||
$this->assertSame(50, $problems[1]->getEntryIndex());
|
||||
$this->assertSame(95, $problems[2]->getEntryIndex());
|
||||
|
||||
// First hit (entry 10): 9 entries before (1..9), 20 after (11..30).
|
||||
$this->assertCount(9, $problems[0]->getBefore());
|
||||
$this->assertCount(20, $problems[0]->getAfter());
|
||||
|
||||
// Second hit (entry 50): clipped to 19 before (31..49), 20 after (51..70).
|
||||
$this->assertCount(19, $problems[1]->getBefore());
|
||||
$this->assertCount(20, $problems[1]->getAfter());
|
||||
|
||||
// Third hit (entry 95): clipped to 20 before (75..94), 5 after (96..100).
|
||||
$this->assertCount(20, $problems[2]->getBefore());
|
||||
$this->assertCount(5, $problems[2]->getAfter());
|
||||
|
||||
// Total window per hit never exceeds 1 + CONTEXT_BEFORE + CONTEXT_AFTER = 41.
|
||||
foreach ($problems as $problem) {
|
||||
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_BEFORE, count($problem->getBefore()));
|
||||
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_AFTER, count($problem->getAfter()));
|
||||
$this->assertLessThanOrEqual(41, count($problem->getContext()));
|
||||
}
|
||||
|
||||
// No entry appears in two problems' context arrays.
|
||||
$seen = [];
|
||||
foreach ($problems as $problem) {
|
||||
foreach ([...$problem->getBefore(), ...$problem->getAfter()] as $entry) {
|
||||
$id = spl_object_id($entry);
|
||||
$this->assertArrayNotHasKey($id, $seen, 'Entry duplicated across problem context arrays');
|
||||
$seen[$id] = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public function testMergesAdjacentWindowsWhenWithinContextRange(): void
|
||||
{
|
||||
// Errors 5 entries apart; without merge their windows would
|
||||
// overlap heavily.
|
||||
$log = $this->makeLog([10, 15], 50);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(2, $problems);
|
||||
|
||||
// First hit: 9 before (1..9), 20 after (11..30). lastEmittedIndex=29 (0-based).
|
||||
$this->assertCount(9, $problems[0]->getBefore());
|
||||
$this->assertCount(20, $problems[0]->getAfter());
|
||||
|
||||
// Second hit at entry 15 (i=14). beforeStart clamped past i so before is empty.
|
||||
// afterStart=max(30, 15)=30, afterEnd=min(49, 34)=34, so after=entries 31..35
|
||||
// (5 entries, all unseen).
|
||||
$this->assertCount(0, $problems[1]->getBefore());
|
||||
$this->assertCount(5, $problems[1]->getAfter());
|
||||
|
||||
// Confirm no entry appears in both problems' context arrays.
|
||||
$first = [...$problems[0]->getBefore(), ...$problems[0]->getAfter()];
|
||||
$second = [...$problems[1]->getBefore(), ...$problems[1]->getAfter()];
|
||||
foreach ($second as $entry) {
|
||||
$this->assertNotContains($entry, $first, 'Entry duplicated across merged windows');
|
||||
}
|
||||
}
|
||||
|
||||
public function testTruncatesAtHitCap(): void
|
||||
{
|
||||
// 600 consecutive ERROR entries — analyser should cap emission at
|
||||
// HIT_CAP and add exactly one truncation Information.
|
||||
$log = $this->makeLog(range(1, 600), 600);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(ErrorContextAnalyser::HIT_CAP, $problems);
|
||||
|
||||
$information = $analysis->getFilteredInsights(ErrorContextTruncatedInformation::class);
|
||||
$this->assertCount(1, $information);
|
||||
$this->assertSame(ErrorContextAnalyser::HIT_CAP, $information[0]->getHitCap());
|
||||
}
|
||||
}
|
||||
@@ -6,18 +6,31 @@ use IndifferentKetchup\Codex\Detective\Detective;
|
||||
use IndifferentKetchup\Codex\Log\File\PathLogFile;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidServerLog;
|
||||
use PHPUnit\Framework\Attributes\DataProvider;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidServerLogTest extends TestCase
|
||||
{
|
||||
private function fixturePath(): string
|
||||
/**
|
||||
* Both PZ B41 and B42 line shapes must parse identically. B41 (and the
|
||||
* fixture used by every analyser test) emits `f:N, t:N, st:N,N,N,N>`;
|
||||
* B42 (release branch from 2026-04 onward, e.g. build 42.17) drops the
|
||||
* `t:` microsecond field entirely and tightens whitespace to
|
||||
* `f:N st:N,N,N,N>`.
|
||||
*/
|
||||
public static function fixtureProvider(): array
|
||||
{
|
||||
return __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures/debug-server-minimal.txt';
|
||||
$base = __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures';
|
||||
return [
|
||||
'pz41-format' => [$base . '/debug-server-minimal.txt'],
|
||||
'pz42-format' => [$base . '/debug-server-42x-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
public function testParsesEntriesWithLevelAndPrefix(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testParsesEntriesWithLevelAndPrefix(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$entries = $log->getEntries();
|
||||
@@ -29,9 +42,10 @@ class ProjectZomboidServerLogTest extends TestCase
|
||||
$this->assertNotNull($first->getTime());
|
||||
}
|
||||
|
||||
public function testStackTraceLinesAttachToTriggeringErrorEntry(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testStackTraceLinesAttachToTriggeringErrorEntry(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$errorEntry = null;
|
||||
@@ -46,19 +60,21 @@ class ProjectZomboidServerLogTest extends TestCase
|
||||
$this->assertGreaterThan(1, count($errorEntry->getLines()));
|
||||
}
|
||||
|
||||
public function testWarnLevelMapsCorrectly(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testWarnLevelMapsCorrectly(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$warnEntries = array_filter($log->getEntries(), fn($e) => $e->getLevel() === Level::WARNING);
|
||||
$this->assertNotEmpty($warnEntries);
|
||||
}
|
||||
|
||||
public function testDetectiveDispatchesByContent(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testDetectiveDispatchesByContent(string $fixturePath): void
|
||||
{
|
||||
$detective = (new Detective())
|
||||
->setLogFile(new PathLogFile($this->fixturePath()))
|
||||
->setLogFile(new PathLogFile($fixturePath))
|
||||
->addPossibleLogClass(ProjectZomboidServerLog::class);
|
||||
|
||||
$log = $detective->detect();
|
||||
|
||||
146
test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php
Normal file
146
test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php
Normal file
@@ -0,0 +1,146 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorCombinedTest extends TestCase
|
||||
{
|
||||
public function testFullScrubAllTogglesOn(): void
|
||||
{
|
||||
// Realistic multi-line input touching all three PII categories:
|
||||
// Steam IDs, player names in multiple contexts (after Steam ID, in ChatMessage,
|
||||
// after Combat:/Safety:), and coordinates in multiple shapes (at clause,
|
||||
// bracketed, parenthesised before PvP verb).
|
||||
$input = implode("\n", [
|
||||
// cmd.txt / admin.txt: Steam ID + quoted name + at-clause coords (keyword " at ")
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
// map.txt: Steam ID + quoted name + at-clause float coords
|
||||
'[16-04-26 12:00:01.000] 76561198222222222 "Player2" added IsoObject (fence_01) at 1050.0,2050.0,0.0.',
|
||||
// chat.txt: ChatMessage author
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
// pvp.txt Combat: name + attacker parenthesised coords before "hit"
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
// pvp.txt Safety: name + parenthesised coords before "restore"
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
// ClientActionLog: bracketed Steam ID + action + name + coords bracket
|
||||
'[16-04-26 12:00:02.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "<player>" added Base.Aerosolbomb at 0,0,0.',
|
||||
'[16-04-26 12:00:01.000] 76561198000000000 "<player>" added IsoObject (fence_01) at 0,0,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "<player>" (0,0,0) restore true.',
|
||||
'[16-04-26 12:00:02.000] [76561198000000000][ISEnterVehicle][Player2][0,0,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With all three toggles on, every Steam ID, player name context, and coord shape must be replaced.');
|
||||
}
|
||||
|
||||
public function testSteamIdToggleOffLeavesSteamIdsIntact(): void
|
||||
{
|
||||
// All three PII categories present; Steam ID toggle is disabled.
|
||||
//
|
||||
// Important nuance: PLAYER_AFTER_STEAMID_REGEX anchors on the redacted placeholder
|
||||
// 76561198000000000. With redactSteamIds(false) the raw Steam ID survives, so the
|
||||
// regex does NOT fire for lines in the "after-Steam-ID" shape — those names survive
|
||||
// too. Names anchored by other contexts (ChatMessage author, Combat:/Safety:) are
|
||||
// still redacted because those regexes don't depend on the Steam ID pass.
|
||||
$input = implode("\n", [
|
||||
// after-Steam-ID shape: name will NOT be redacted because the Steam ID is raw
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
// ChatMessage author: still redacted (anchor is independent of Steam ID pass)
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
// Combat: name + attacker coords
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player2" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
// Steam ID intact; "Player1" NOT redacted (anchor regex didn't fire)
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 0,0,0.',
|
||||
// ChatMessage name redacted; coords were an at-clause → redacted
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.",
|
||||
// Combat: name + attacker coords both redacted
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame(
|
||||
$expected,
|
||||
$output,
|
||||
'With Steam ID toggle off: raw Steam IDs survive; PLAYER_AFTER_STEAMID_REGEX does not fire (no placeholder to anchor on) so those names also survive; ChatMessage and Combat:/Safety: names are still redacted; coords are still redacted.',
|
||||
);
|
||||
}
|
||||
|
||||
public function testPlayerNameToggleOffLeavesNamesIntact(): void
|
||||
{
|
||||
// Steam IDs and coords redact; player names survive verbatim.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (1050,2050,0) restore true.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "Player1" added Base.Aerosolbomb at 0,0,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (0,0,0) restore true.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With player-name toggle off, all player names must survive; Steam IDs and coords must still be redacted.');
|
||||
}
|
||||
|
||||
public function testCoordinatesToggleOffLeavesCoordsIntact(): void
|
||||
{
|
||||
// Steam IDs and player names redact; coordinates survive verbatim.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198222222222][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "<player>" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198000000000][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With coordinates toggle off, all coord triplets must survive; Steam IDs and player names must still be redacted.');
|
||||
}
|
||||
|
||||
public function testAllTogglesOffReturnsInputByteForByte(): void
|
||||
{
|
||||
// Disabling every toggle must produce an output identical to the input —
|
||||
// the "passthrough" contract: opt-out means truly nothing happens.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 12:00:01.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With all three toggles disabled, the output must be byte-for-byte identical to the input.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,124 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorCoordinatesTest extends TestCase
|
||||
{
|
||||
public function testRedactsAtClauseCoords(): void
|
||||
{
|
||||
// map.txt / item.txt shape: integer coords following " at " with trailing dot.
|
||||
$input = '[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.';
|
||||
$expected = '[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 0,0,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Integer coords following " at " must be replaced; leading "at " and trailing "." must be preserved.');
|
||||
}
|
||||
|
||||
public function testRedactsAtClauseFloatCoords(): void
|
||||
{
|
||||
// map.txt shape: IsoObject form with float coords (x.x,y.y,z.z).
|
||||
$input = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 1010.0,2010.0,0.0.';
|
||||
$expected = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 0,0,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Float coords following " at " must be replaced; the IsoObject parenthesised form must be unaffected.');
|
||||
}
|
||||
|
||||
public function testRedactsBracketedCoords(): void
|
||||
{
|
||||
// ClientActionLog.txt shape: strict 5-field bracketed structure.
|
||||
// The Steam ID bracket and action/player/param brackets must survive.
|
||||
$input = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].';
|
||||
$expected = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Coord bracket must become [0,0,0]; Steam ID, action, player name, and param brackets must be unaffected.');
|
||||
}
|
||||
|
||||
public function testRedactsBracketedNegativeZ(): void
|
||||
{
|
||||
// Basement Z coordinates are negative; the regex must handle the leading minus.
|
||||
$input = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].';
|
||||
$expected = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Negative Z (basement level) inside square brackets must be replaced.');
|
||||
}
|
||||
|
||||
public function testRedactsParenthesisedCoordsBeforeHit(): void
|
||||
{
|
||||
// pvp.txt Combat: shape. The attacker coords are followed by ") hit" and ARE
|
||||
// redacted. The victim coords are followed by ") weapon=" and are NOT redacted
|
||||
// in v1 — the trailing-keyword anchor is intentionally absent for that position.
|
||||
$input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
$expected = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
// Attacker coords (before "hit") are redacted; victim coords (before "weapon=") are NOT — deferred to v2.
|
||||
$this->assertSame($expected, $output, 'Attacker coords before "hit" must be replaced; victim coords without a trailing keyword must survive.');
|
||||
}
|
||||
|
||||
public function testRedactsParenthesisedCoordsBeforeSafetyVerb(): void
|
||||
{
|
||||
// pvp.txt Safety: shape; coords followed by ") restore true".
|
||||
$input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.';
|
||||
$expected = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (0,0,0) restore true.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Coords followed by ") restore" must be replaced.');
|
||||
}
|
||||
|
||||
public function testServerMetadataTriplesAreNotRedacted(): void
|
||||
{
|
||||
// DebugLog-server.txt entries contain server-state metadata that superficially
|
||||
// resembles coordinates but is not: "st:48,648,157,584" is a 4-component token,
|
||||
// "t:1776297642406" is a millisecond timestamp. Neither pattern lives inside
|
||||
// brackets, parentheses followed by a PvP verb, or after " at " — so none of
|
||||
// the three coordinate regexes should fire.
|
||||
$input = '[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297642406, st:48,648,157,584> Server starting up.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'Server metadata triples (st:) and millisecond timestamps (t:) must pass through unchanged.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesCoordsIntact(): void
|
||||
{
|
||||
$input = '[16-04-26 12:00:04.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the coordinates toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
/**
|
||||
* Verifies the idempotence property of ProjectZomboidRedactor::redact().
|
||||
*
|
||||
* Idempotence: redact(redact(x)) === redact(x) for all valid inputs.
|
||||
*
|
||||
* A downstream consumer might accidentally double-pipe content through the
|
||||
* Redactor. The result must be stable — a second pass must make no further
|
||||
* changes. If a regex were poorly anchored such that the post-redact placeholder
|
||||
* itself matched and was re-redacted to something different, idempotence would
|
||||
* fail. Specifically, the player-name regex PLAYER_AFTER_STEAMID_REGEX anchors
|
||||
* on 76561198000000000 — the same value the Steam ID pass writes. This test
|
||||
* suite verifies that applying redact() twice is safe: on the second pass, names
|
||||
* already written as <player> do not accidentally re-match and produce a doubly-
|
||||
* nested result like "<player>" → something else.
|
||||
*/
|
||||
class ProjectZomboidRedactorIdempotenceTest extends TestCase
|
||||
{
|
||||
public function testIdempotenceSteamIdOnly(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Players: 76561198111111111, 76561198222222222, 76561198333333333 connected.',
|
||||
'[16-04-26 12:00:00.000] [76561198111111111][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to Steam-ID-only input must produce the same result as applying it once.');
|
||||
}
|
||||
|
||||
public function testIdempotencePlayerNamesOnly(): void
|
||||
{
|
||||
// Input already has the Steam ID placeholder in place (as the Steam ID pass
|
||||
// would have written it), so PLAYER_AFTER_STEAMID_REGEX can fire. After the
|
||||
// first pass the name becomes "<player>"; the second pass must leave "<player>"
|
||||
// untouched — it is not a valid display name inside double quotes preceded
|
||||
// by the Steam ID placeholder anchor in a way that would re-match, because
|
||||
// the replacement written is: 76561198000000000 "<player>", and the regex
|
||||
// would need an unquoted player name inside quotes after the placeholder.
|
||||
// "<player>" (with the angle brackets) does satisfy [^"]+ but the second
|
||||
// pass must still produce an identical result.
|
||||
$input = implode("\n", [
|
||||
'76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hi'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player2" (1000,2000,0) restore true.',
|
||||
]);
|
||||
|
||||
$redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactCoordinates(false);
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to player-name-only input must produce the same result as applying it once.');
|
||||
}
|
||||
|
||||
public function testIdempotenceCoordsOnly(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
]);
|
||||
|
||||
$redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactPlayerNames(false);
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to coords-only input must produce the same result as applying it once; the placeholder 0,0,0 must not be re-matched.');
|
||||
}
|
||||
|
||||
public function testIdempotenceAllCategories(): void
|
||||
{
|
||||
// Full input: all three PII categories in multiple lexical contexts.
|
||||
// After the first redact(), every placeholder is in place. The second
|
||||
// redact() must make no further changes.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] 76561198222222222 "Player2" teleported to 1050,2050,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
'[16-04-26 12:00:02.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to input with all PII categories must produce the same result as applying it once; no placeholder must re-match on the second pass.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,272 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Log\File\PathLogFile;
|
||||
use IndifferentKetchup\Codex\Log\File\StringLogFile;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidAdminLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidBurdJournalsLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidChatLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidClientActionLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidCmdLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidItemLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidMapLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidPerkLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidPvpLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidServerLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidUserLog;
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\Attributes\DataProvider;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
/**
|
||||
* Integration tests: drive all 11 existing PZ fixtures through ProjectZomboidRedactor
|
||||
* and verify that the output is well-formed.
|
||||
*
|
||||
* Three properties are checked across all fixtures:
|
||||
*
|
||||
* 1. Steam ID normalisation — no non-zero-placeholder Steam IDs survive.
|
||||
* 2. Structural preservation — parsing the redacted content yields the same
|
||||
* entry count as parsing the original.
|
||||
* 3. Idempotence — applying redact() a second time produces no further changes.
|
||||
*
|
||||
* Known v1 limitations documented inline:
|
||||
*
|
||||
* - pvp.txt: victim names after `hit "..."` are NOT redacted (Task 3 limitation).
|
||||
* Player2 can therefore still appear after `hit` in the redacted pvp output.
|
||||
* - pvp.txt: victim coords after `hit "(x,y,z)"` are NOT redacted (Task 4
|
||||
* limitation). COORDS_PARENTHESISED_REGEX anchors on the trailing PvP verb
|
||||
* which is present only for the attacker bracket.
|
||||
* - admin.txt: `teleported X to <x,y,z>` coords survive because COORDS_AT_CLAUSE_REGEX
|
||||
* anchors on ` at `, not ` to `.
|
||||
*/
|
||||
class ProjectZomboidRedactorIntegrationTest extends TestCase
|
||||
{
|
||||
private static string $fixturesDir = __DIR__ . '/../../../src/Games/ProjectZomboid/fixtures';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Data providers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Yields [fixturePath] for every PZ fixture file.
|
||||
*/
|
||||
public static function fixturePathProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'admin' => [$dir . '/admin-minimal.txt'],
|
||||
'burd-journals' => [$dir . '/burd-journals-minimal.txt'],
|
||||
'chat' => [$dir . '/chat-minimal.txt'],
|
||||
'client-action' => [$dir . '/client-action-minimal.txt'],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt'],
|
||||
'debug-server' => [$dir . '/debug-server-minimal.txt'],
|
||||
'item' => [$dir . '/item-minimal.txt'],
|
||||
'map' => [$dir . '/map-minimal.txt'],
|
||||
'perk' => [$dir . '/perk-minimal.txt'],
|
||||
'pvp' => [$dir . '/pvp-minimal.txt'],
|
||||
'user' => [$dir . '/user-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
/**
|
||||
* Yields [fixturePath] for the subset of fixtures where every synthetic
|
||||
* player name (Player1 / Player2 / AdminUser / PlayerSuspect) appears
|
||||
* exclusively in a context the redactor recognises:
|
||||
*
|
||||
* - chat: ChatMessage{author='...'} envelope
|
||||
* - cmd, item, map, user: 77-char-Steam-ID followed by "..." quoted name
|
||||
*
|
||||
* Fixtures intentionally excluded:
|
||||
*
|
||||
* - admin: names appear in free-text positions (no Steam-ID anchor,
|
||||
* no quotes, no Combat:/Safety: prefix). Names survive in v1.
|
||||
* - client-action,
|
||||
* perk: names appear inside [...] brackets, not "..." quotes.
|
||||
* PLAYER_AFTER_STEAMID_REGEX requires double-quotes.
|
||||
* - pvp: attacker name redacts but victim name after `hit "..."`
|
||||
* survives in v1 (Task 3 limitation).
|
||||
* - burd-journals,
|
||||
* debug-server: no synthetic player names present.
|
||||
*/
|
||||
public static function fixturesWhereAllNamesAreInCoveredContextsProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'chat' => [$dir . '/chat-minimal.txt'],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt'],
|
||||
'item' => [$dir . '/item-minimal.txt'],
|
||||
'map' => [$dir . '/map-minimal.txt'],
|
||||
'user' => [$dir . '/user-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
/**
|
||||
* Yields [fixturePath, logClass] for the fixtures whose log class parses
|
||||
* them. All 11 fixtures are represented.
|
||||
*/
|
||||
public static function fixtureWithLogClassProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'admin' => [$dir . '/admin-minimal.txt', ProjectZomboidAdminLog::class],
|
||||
'burd-journals' => [$dir . '/burd-journals-minimal.txt', ProjectZomboidBurdJournalsLog::class],
|
||||
'chat' => [$dir . '/chat-minimal.txt', ProjectZomboidChatLog::class],
|
||||
'client-action' => [$dir . '/client-action-minimal.txt', ProjectZomboidClientActionLog::class],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt', ProjectZomboidCmdLog::class],
|
||||
'debug-server' => [$dir . '/debug-server-minimal.txt', ProjectZomboidServerLog::class],
|
||||
'item' => [$dir . '/item-minimal.txt', ProjectZomboidItemLog::class],
|
||||
'map' => [$dir . '/map-minimal.txt', ProjectZomboidMapLog::class],
|
||||
'perk' => [$dir . '/perk-minimal.txt', ProjectZomboidPerkLog::class],
|
||||
'pvp' => [$dir . '/pvp-minimal.txt', ProjectZomboidPvpLog::class],
|
||||
'user' => [$dir . '/user-minimal.txt', ProjectZomboidUserLog::class],
|
||||
];
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
private function redact(string $content): string
|
||||
{
|
||||
return (new ProjectZomboidRedactor())->redact($content);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 1 — Steam ID normalisation
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* After redaction every 17-digit Steam ID that is NOT the zero-placeholder
|
||||
* must be gone. The zero-placeholder itself (76561198000000000) is the only
|
||||
* Steam ID that may remain.
|
||||
*/
|
||||
#[DataProvider('fixturePathProvider')]
|
||||
public function testFixtureContainsNoSteamIdsAfterRedaction(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
$matches = preg_match_all('/(?<![A-Za-z0-9])76561198(?!000000000)\d{9}(?![A-Za-z0-9])/u', $redacted);
|
||||
|
||||
$this->assertSame(
|
||||
0,
|
||||
$matches,
|
||||
sprintf(
|
||||
'After redaction, fixture "%s" must contain no non-zero-placeholder Steam IDs, but %d were found.',
|
||||
basename($fixturePath),
|
||||
$matches,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 2 — Structural preservation (re-parse after redaction)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* The redacted content, fed back through the corresponding parser, must
|
||||
* produce exactly the same number of log entries as the original content.
|
||||
*
|
||||
* This asserts that the redactor does not corrupt timestamps, delimiters,
|
||||
* or structural tokens that the parser relies on.
|
||||
*
|
||||
* @param string $fixturePath Path to the fixture file.
|
||||
* @param class-string<\IndifferentKetchup\Codex\Log\Log> $logClass
|
||||
* Fully-qualified name of the Log subclass that corresponds to this fixture.
|
||||
*/
|
||||
#[DataProvider('fixtureWithLogClassProvider')]
|
||||
public function testFixtureRedactedOutputParsesToSameEntryCount(string $fixturePath, string $logClass): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
|
||||
/** @var \IndifferentKetchup\Codex\Log\Log $originalLog */
|
||||
$originalLog = (new $logClass())->setLogFile(new PathLogFile($fixturePath));
|
||||
$originalLog->parse();
|
||||
$originalCount = count($originalLog->getEntries());
|
||||
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
/** @var \IndifferentKetchup\Codex\Log\Log $redactedLog */
|
||||
$redactedLog = (new $logClass())->setLogFile(new StringLogFile($redacted));
|
||||
$redactedLog->parse();
|
||||
$redactedCount = count($redactedLog->getEntries());
|
||||
|
||||
$this->assertSame(
|
||||
$originalCount,
|
||||
$redactedCount,
|
||||
sprintf(
|
||||
'Parsing the redacted "%s" fixture with %s must yield the same entry count (%d) as parsing the original, but got %d.',
|
||||
basename($fixturePath),
|
||||
$logClass,
|
||||
$originalCount,
|
||||
$redactedCount,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 3 — Idempotence
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Applying redact() a second time must produce no further changes:
|
||||
* redact(redact(content)) === redact(content).
|
||||
*
|
||||
* This guards against poorly-anchored regexes that would re-match the
|
||||
* redaction placeholders themselves on a second pass.
|
||||
*/
|
||||
#[DataProvider('fixturePathProvider')]
|
||||
public function testFixtureIsIdempotent(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($content);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame(
|
||||
$once,
|
||||
$twice,
|
||||
sprintf(
|
||||
'redact(redact(content)) must equal redact(content) for fixture "%s"; a second pass must be a no-op.',
|
||||
basename($fixturePath),
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 4 — Player-name collapse in fully-covered fixtures
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* For fixtures where every synthetic player name appears exclusively in a
|
||||
* context the redactor recognises, no synthetic name should remain after
|
||||
* redaction.
|
||||
*
|
||||
* This addresses observation #3 from the final code review (the integration
|
||||
* tests previously asserted Steam-ID elimination + structural preservation
|
||||
* + idempotence, but did not directly verify name collapse). The unit tests
|
||||
* in ProjectZomboidRedactorPlayerNameTest cover this property exhaustively
|
||||
* per-context; this integration test re-verifies it end-to-end against the
|
||||
* fixtures that ride into iblogs.
|
||||
*/
|
||||
#[DataProvider('fixturesWhereAllNamesAreInCoveredContextsProvider')]
|
||||
public function testFixturePlayerNamesCollapseInCoveredContexts(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
foreach (['Player1', 'Player2', 'AdminUser', 'PlayerSuspect'] as $name) {
|
||||
$this->assertStringNotContainsString(
|
||||
$name,
|
||||
$redacted,
|
||||
sprintf(
|
||||
'Fixture "%s": synthetic name %s survived redaction. Every name in this fixture should appear only in a covered lexical context.',
|
||||
basename($fixturePath),
|
||||
$name,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
114
test/tests/Util/Redactor/ProjectZomboidRedactorIpv4Test.php
Normal file
114
test/tests/Util/Redactor/ProjectZomboidRedactorIpv4Test.php
Normal file
@@ -0,0 +1,114 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorIpv4Test extends TestCase
|
||||
{
|
||||
public function testRedactsBareIpv4(): void
|
||||
{
|
||||
$input = 'Connection from 192.168.1.1 closed.';
|
||||
$expected = 'Connection from [REDACTED_IP] closed.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsIpv4WithPortSuffix(): void
|
||||
{
|
||||
$input = 'Connected to 10.0.0.42:27015.';
|
||||
$expected = 'Connected to [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsMultipleIpv4OnOneLine(): void
|
||||
{
|
||||
$input = 'Peer 192.168.1.10 -> 192.168.1.20 via 10.0.0.1:8080.';
|
||||
$expected = 'Peer [REDACTED_IP] -> [REDACTED_IP] via [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsLoopbackAndBoundaryAddresses(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'127.0.0.1',
|
||||
'0.0.0.0',
|
||||
'255.255.255.255',
|
||||
]);
|
||||
$expected = implode("\n", [
|
||||
'[REDACTED_IP]',
|
||||
'[REDACTED_IP]',
|
||||
'[REDACTED_IP]',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactOutOfRangeOctets(): void
|
||||
{
|
||||
// 999 is not a valid octet under the 0-255 alternation; the address
|
||||
// must therefore be left untouched.
|
||||
$input = 'Bogus: 999.999.999.999';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactInsideLongerDottedSequence(): void
|
||||
{
|
||||
// Five dotted segments are not an IPv4 address; the lookarounds must
|
||||
// reject any partial match inside the longer sequence.
|
||||
$input = 'Path frag 1.2.3.4.5 should not match.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactThreeSegmentBuildNumbers(): void
|
||||
{
|
||||
// PZ build numbers are 3-segment (e.g. 41.78.16) and must not match.
|
||||
$input = 'Build 41.78.16 starting up.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesIpv4Intact(): void
|
||||
{
|
||||
$input = 'Connection from 192.168.1.1:27015 closed.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactIpAddresses(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testIdempotence(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Connection from 192.168.1.1:27015 closed.',
|
||||
'Peer 10.0.0.42 -> 10.0.0.43 via 172.16.0.1:8080.',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($input);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame($once, $twice);
|
||||
}
|
||||
}
|
||||
135
test/tests/Util/Redactor/ProjectZomboidRedactorIpv6Test.php
Normal file
135
test/tests/Util/Redactor/ProjectZomboidRedactorIpv6Test.php
Normal file
@@ -0,0 +1,135 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorIpv6Test extends TestCase
|
||||
{
|
||||
public function testRedactsFullIpv6(): void
|
||||
{
|
||||
$input = 'Bound 2001:0db8:85a3:0000:0000:8a2e:0370:7334 ok.';
|
||||
$expected = 'Bound [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsAbbreviatedIpv6(): void
|
||||
{
|
||||
$input = 'Server peer 2001:db8::1 connected.';
|
||||
$expected = 'Server peer [REDACTED_IP] connected.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsLoopbackIpv6(): void
|
||||
{
|
||||
$input = 'localhost ::1 reachable.';
|
||||
$expected = 'localhost [REDACTED_IP] reachable.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsBracketedIpv6WithPort(): void
|
||||
{
|
||||
$input = 'Bound to [2001:db8::1]:8080 ok.';
|
||||
$expected = 'Bound to [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsBracketedLoopbackWithPort(): void
|
||||
{
|
||||
$input = 'Listening on [::1]:27015.';
|
||||
$expected = 'Listening on [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsIpv4MappedIpv6(): void
|
||||
{
|
||||
// IPv4-mapped form must be handled by the IPv6 pass before the IPv4
|
||||
// pass so the leading "::ffff:" doesn't get orphaned. With the IPv6
|
||||
// pass first, the whole token collapses into a single placeholder.
|
||||
$input = 'Mapped ::ffff:192.168.1.1 ok.';
|
||||
$expected = 'Mapped [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactJavaScopeOperator(): void
|
||||
{
|
||||
// Java method references and PHP scope operators look superficially
|
||||
// like leading-:: IPv6 forms but fail filter_var validation; the
|
||||
// word-boundary lookbehind also rejects matches that follow letters.
|
||||
$input = 'Foo::bar called Object::toString.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactTimestampShape(): void
|
||||
{
|
||||
// PZ log timestamps include hh:mm:ss.v segments which match the coarse
|
||||
// IPv6 candidate pattern but are rejected by filter_var.
|
||||
$input = '[16-04-26 12:00:00.000][LOG] startup complete';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactSteamIdAsIpv6(): void
|
||||
{
|
||||
// 17-digit Steam IDs share no characters with IPv6 syntax, but assert
|
||||
// explicitly so a future change to the IPv6 regex doesn't accidentally
|
||||
// collide with the Steam ID pass.
|
||||
$input = 'Player 76561198111111111 joined.';
|
||||
$expected = 'Player 76561198000000000 joined.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesIpv6Intact(): void
|
||||
{
|
||||
$input = 'Bound to [2001:db8::1]:8080 ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactIpAddresses(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testIdempotence(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Server peer 2001:db8::1 connected.',
|
||||
'Listening on [::1]:27015.',
|
||||
'Mapped ::ffff:192.168.1.1 ok.',
|
||||
'[16-04-26 12:00:00.000][LOG] startup complete',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($input);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame($once, $twice);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,93 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorPlayerNameTest extends TestCase
|
||||
{
|
||||
public function testRedactsPlayerNameAfterRedactedSteamId(): void
|
||||
{
|
||||
// The Steam ID pass has already run; the literal placeholder 76561198000000000
|
||||
// precedes the quoted name. The player-name pass must redact the name.
|
||||
$input = '76561198000000000 "AdminUser" admin.broadcastMessage @ 1020,2020,0.';
|
||||
$expected = '76561198000000000 "<player>" admin.broadcastMessage @ 1020,2020,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Player name following the redacted Steam ID placeholder must be replaced.');
|
||||
}
|
||||
|
||||
public function testRedactsChatMessageAuthor(): void
|
||||
{
|
||||
// The author field inside ChatMessage{...} must be replaced; the text
|
||||
// payload ('hello') is not in scope for player-name redaction and must
|
||||
// survive unchanged.
|
||||
$input = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player1', text='hello'}.";
|
||||
$expected = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.";
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'ChatMessage author must be replaced while the text payload remains unchanged.');
|
||||
}
|
||||
|
||||
public function testRedactsCombatNameInPvpLog(): void
|
||||
{
|
||||
// Only the FIRST quoted name (after "Combat: ") is redacted in v1.
|
||||
// The second name (after "hit") is NOT yet redacted — deferred to v2.
|
||||
// The weapon name ("Tire Iron (Worn)") must also survive unchanged.
|
||||
$input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
// Attacker coords (before "hit") are also replaced by the coordinates pass.
|
||||
// Victim coords (before "weapon=") lack the trailing keyword and are NOT replaced — deferred to v2.
|
||||
$expected = '[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
// Player1 (after "Combat: ") is replaced; attacker coords (before "hit") are also replaced.
|
||||
// Player2 (after "hit") and victim coords (before "weapon=") are NOT replaced in v1 — deferred.
|
||||
$this->assertSame($expected, $output, 'First Combat: player name and attacker coords must be replaced; second name, victim coords, and weapon must survive.');
|
||||
}
|
||||
|
||||
public function testRedactsSafetyNameInPvpLog(): void
|
||||
{
|
||||
$input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.';
|
||||
// Coords (before ") restore") are also replaced by the coordinates pass.
|
||||
$expected = '[16-04-26 16:17:49.731][LOG] Safety: "<player>" (0,0,0) restore true.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Player name and coords following the Safety: token must both be replaced.');
|
||||
}
|
||||
|
||||
public function testBareQuotedStringWithoutAnchorIsNotTouched(): void
|
||||
{
|
||||
// "foo" is not preceded by a redacted Steam ID, not inside ChatMessage{...},
|
||||
// and not after Combat:/Safety: — it must pass through unchanged.
|
||||
$input = 'option changed to "foo" successfully.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'A quoted string with no matching anchor must not be redacted.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesNamesIntact(): void
|
||||
{
|
||||
$input = '76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the player-name toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorSteamIdTest extends TestCase
|
||||
{
|
||||
public function testCollapsesDistinctSteamIdsToZeroPlaceholder(): void
|
||||
{
|
||||
$input = 'Players: 76561198111111111, 76561198222222222, 76561198333333333 connected.';
|
||||
$expected = 'Players: 76561198000000000, 76561198000000000, 76561198000000000 connected.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'All three distinct Steam IDs should be replaced with the zero placeholder.');
|
||||
}
|
||||
|
||||
public function testNonSteamIdLongDigitsAreNotTouched(): void
|
||||
{
|
||||
// 13-digit Unix-millisecond timestamp (PZ log t: shape) and a 17-digit number
|
||||
// that does not begin with 76561198 — neither should be altered.
|
||||
$input = 't:1776297642406 score=12345678901234567';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'Non-SteamID digit sequences must not be modified.');
|
||||
}
|
||||
|
||||
public function testEmbeddedSteamIdInsideLongerAlphanumericTokenIsNotTouched(): void
|
||||
{
|
||||
// The SteamID64 pattern is embedded inside a longer alphanumeric token;
|
||||
// the negative lookaround boundaries should prevent a match.
|
||||
$input = 'token=abc76561198000000001def other=data';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'A Steam ID embedded inside an alphanumeric token must not be redacted.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesSteamIdsIntact(): void
|
||||
{
|
||||
$input = 'Connected: 76561198111111111 and 76561198222222222.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the Steam ID toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
310
tools/pz-analyzer/pz_classify.py
Normal file
310
tools/pz-analyzer/pz_classify.py
Normal file
@@ -0,0 +1,310 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pz_classify.py — Deterministic Project Zomboid log classifier orchestrator.
|
||||
|
||||
Walks ``*DebugLog-server*.txt`` files under the redacted-logs directory,
|
||||
runs the pz_parser pipeline per file, merges records cross-file by their
|
||||
deterministic ``signature``, and emits the spec-shaped JSON report.
|
||||
|
||||
Companion to the existing Qwen-backed discovery tool ``pz_error_analysis.py``
|
||||
(left untouched). Zero AI dependency, stdlib-only, runs in seconds.
|
||||
|
||||
By convention the input is always the redacted directory produced by
|
||||
``pz_redact_all.sh``; ``meta.redacted`` is therefore hard-coded ``true``.
|
||||
If the user overrides ``--input`` to a non-redacted source we still emit
|
||||
``true`` because we have no upstream way to verify redaction status.
|
||||
|
||||
Pipeline:
|
||||
parser.parse_file per-file Entry list
|
||||
parser.classify_entries per-file deduped Record list
|
||||
_merge_cross_file global Record list deduped across files
|
||||
_build_summary top-line stats + by_kind / by_attribution / top_mods
|
||||
|
||||
Output schema, CLI flags, and aggregation rules are defined in
|
||||
``docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import dataclasses
|
||||
import json
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from pz_parser import (
|
||||
MAX_CAUSE_CHAIN_LEVELS,
|
||||
MAX_STACK_FRAMES,
|
||||
SEVERITY_LEVELS,
|
||||
Record,
|
||||
classify_entries,
|
||||
parse_file,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Defaults / constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
DEFAULT_INPUT: Path = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
|
||||
DEFAULT_OUT: Path = _REPO_ROOT / ".scratch" / "pz" / "classify.json"
|
||||
|
||||
#: Filename glob driving the directory walk.
|
||||
INPUT_GLOB: str = "*DebugLog-server*.txt"
|
||||
#: Cap on entries in ``summary.top_mods`` — most occurrence-count-heavy mods.
|
||||
TOP_MODS_LIMIT: int = 10
|
||||
|
||||
#: Confidence / attribution promotion ladders (higher rank wins on merge).
|
||||
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
|
||||
_ATTRIBUTION_RANK: dict[str, int] = {
|
||||
"unattributed": 0,
|
||||
"inferred": 1,
|
||||
"direct": 2,
|
||||
}
|
||||
#: Levels that count as errors (vs warnings) in the summary.
|
||||
_ERROR_LEVELS: frozenset[str] = frozenset({"ERROR", "SEVERE", "FATAL"})
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cross-file aggregation (spec §9, inter-file equivalent of parser dedup)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _merge_cross_file(per_file_records: list[Record]) -> list[Record]:
|
||||
"""Merge ``Record`` instances across files by ``signature``.
|
||||
|
||||
The parser already dedups within a single file. This is the inter-file
|
||||
equivalent: when the same signature appears in records from multiple
|
||||
files, sum occurrences, union file lists, promote attribution/confidence,
|
||||
and merge stack and cause-chain (deduped, capped at parser constants).
|
||||
First-seen is the earliest by file-then-line; since callers feed records
|
||||
in sorted file order, the first record we encounter per signature is
|
||||
already the earliest.
|
||||
"""
|
||||
by_signature: dict[str, Record] = {}
|
||||
for incoming in per_file_records:
|
||||
existing = by_signature.get(incoming.signature)
|
||||
if existing is None:
|
||||
# First occurrence — copy so we don't mutate the caller's list.
|
||||
by_signature[incoming.signature] = Record(
|
||||
signature=incoming.signature,
|
||||
pattern_id=incoming.pattern_id,
|
||||
level=incoming.level,
|
||||
kind=incoming.kind,
|
||||
mod_id=incoming.mod_id,
|
||||
mod_name=incoming.mod_name,
|
||||
attribution=incoming.attribution,
|
||||
confidence=incoming.confidence,
|
||||
attribution_reason=incoming.attribution_reason,
|
||||
file=incoming.file,
|
||||
line=incoming.line,
|
||||
cause_chain=incoming.cause_chain,
|
||||
stack=list(incoming.stack),
|
||||
first_seen=incoming.first_seen,
|
||||
occurrence_count=incoming.occurrence_count,
|
||||
files=list(incoming.files),
|
||||
excerpt=incoming.excerpt,
|
||||
)
|
||||
continue
|
||||
# Aggregate.
|
||||
existing.occurrence_count += incoming.occurrence_count
|
||||
for fname in incoming.files:
|
||||
if fname not in existing.files:
|
||||
existing.files.append(fname)
|
||||
# Promote attribution / confidence / mod_name on stronger evidence.
|
||||
if _ATTRIBUTION_RANK[incoming.attribution] > _ATTRIBUTION_RANK[existing.attribution]:
|
||||
existing.attribution = incoming.attribution
|
||||
existing.attribution_reason = incoming.attribution_reason
|
||||
if incoming.mod_name:
|
||||
existing.mod_name = incoming.mod_name
|
||||
if _CONFIDENCE_RANK[incoming.confidence] > _CONFIDENCE_RANK[existing.confidence]:
|
||||
existing.confidence = incoming.confidence
|
||||
# Merge stack frames preserving order, capped.
|
||||
for frame in incoming.stack:
|
||||
if frame not in existing.stack and len(existing.stack) < MAX_STACK_FRAMES:
|
||||
existing.stack.append(frame)
|
||||
# Merge cause chain (deduped tokens, capped).
|
||||
if incoming.cause_chain and incoming.cause_chain != existing.cause_chain:
|
||||
old = existing.cause_chain.split(" -> ") if existing.cause_chain else []
|
||||
new = incoming.cause_chain.split(" -> ")
|
||||
merged = list(old)
|
||||
for tok in new:
|
||||
if tok and tok not in merged:
|
||||
merged.append(tok)
|
||||
existing.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
return list(by_signature.values())
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Summary computation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _build_summary(records: list[Record]) -> dict[str, object]:
|
||||
"""Build the ``summary`` block per spec.
|
||||
|
||||
Counts records (signatures), not raw occurrences, except for ``top_mods``
|
||||
which sums ``occurrence_count`` per mod_id so that volume-driving mods
|
||||
surface even when they hit the same shape repeatedly.
|
||||
"""
|
||||
errors = sum(1 for r in records if r.level in _ERROR_LEVELS)
|
||||
warnings = sum(1 for r in records if r.level == "WARN")
|
||||
by_kind = Counter(r.kind for r in records)
|
||||
by_attribution = Counter(r.attribution for r in records)
|
||||
by_confidence = Counter(r.confidence for r in records)
|
||||
|
||||
# Group by mod_id summing total occurrence_count; preserve any mod_name.
|
||||
mod_totals: dict[str, int] = {}
|
||||
mod_names: dict[str, str] = {}
|
||||
for r in records:
|
||||
mod_totals[r.mod_id] = mod_totals.get(r.mod_id, 0) + r.occurrence_count
|
||||
# First non-empty mod_name wins; subsequent records may have empty
|
||||
# mod_name (e.g. for unattributed) so don't overwrite with "".
|
||||
if r.mod_name and r.mod_id not in mod_names:
|
||||
mod_names[r.mod_id] = r.mod_name
|
||||
top_mods = sorted(
|
||||
(
|
||||
{
|
||||
"mod_id": mod_id,
|
||||
"mod_name": mod_names.get(mod_id, ""),
|
||||
"occurrence_count": total,
|
||||
}
|
||||
for mod_id, total in mod_totals.items()
|
||||
),
|
||||
key=lambda d: d["occurrence_count"],
|
||||
reverse=True,
|
||||
)[:TOP_MODS_LIMIT]
|
||||
|
||||
return {
|
||||
"errors": errors,
|
||||
"warnings": warnings,
|
||||
"by_kind": dict(by_kind),
|
||||
"by_attribution": dict(by_attribution),
|
||||
"by_confidence": dict(by_confidence),
|
||||
"top_mods": top_mods,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Driver
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _run(input_dir: Path, out_path: Path, *, quiet: bool) -> int:
|
||||
if not input_dir.is_dir():
|
||||
print(
|
||||
f"pz_classify: --input directory not found: {input_dir}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 2
|
||||
|
||||
started = datetime.now(timezone.utc).isoformat(timespec="seconds")
|
||||
files = sorted(input_dir.glob(INPUT_GLOB))
|
||||
|
||||
all_records: list[Record] = []
|
||||
log_lines_total = 0
|
||||
error_lines_total = 0
|
||||
|
||||
for path in files:
|
||||
try:
|
||||
entries = parse_file(path)
|
||||
except Exception as exc: # noqa: BLE001 — orchestrator must keep going.
|
||||
print(
|
||||
f"pz_classify: warning: failed to parse {path.name}: {exc}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
continue
|
||||
# Body-line totals: every line under every parsed entry contributes
|
||||
# to log_lines_total; severity-level entries' body lines feed
|
||||
# error_lines_total. Counted before dedup so it reflects raw volume.
|
||||
for e in entries:
|
||||
log_lines_total += len(e.body)
|
||||
if e.level in SEVERITY_LEVELS:
|
||||
error_lines_total += len(e.body)
|
||||
all_records.extend(classify_entries(entries, source_file=path.name))
|
||||
|
||||
merged = _merge_cross_file(all_records)
|
||||
merged.sort(key=lambda r: r.occurrence_count, reverse=True)
|
||||
|
||||
finished = datetime.now(timezone.utc).isoformat(timespec="seconds")
|
||||
|
||||
unique_patterns = len({r.pattern_id for r in merged})
|
||||
|
||||
document: dict[str, object] = {
|
||||
"meta": {
|
||||
"input_dir": str(input_dir),
|
||||
"files_scanned": len(files),
|
||||
"log_lines_total": log_lines_total,
|
||||
"error_lines_total": error_lines_total,
|
||||
"unique_signatures": len(merged),
|
||||
"unique_patterns": unique_patterns,
|
||||
"redacted": True,
|
||||
"started": started,
|
||||
"finished": finished,
|
||||
},
|
||||
"signatures": [dataclasses.asdict(r) for r in merged],
|
||||
"summary": _build_summary(merged),
|
||||
}
|
||||
|
||||
tmp = out_path.with_suffix(out_path.suffix + ".tmp")
|
||||
try:
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with tmp.open("w", encoding="utf-8") as f:
|
||||
json.dump(document, f, ensure_ascii=False, indent=2)
|
||||
f.write("\n")
|
||||
tmp.replace(out_path)
|
||||
except OSError as exc:
|
||||
print(f"pz_classify: failed to write {out_path}: {exc}", file=sys.stderr)
|
||||
# Best-effort cleanup of the temp file.
|
||||
try:
|
||||
tmp.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
return 1
|
||||
|
||||
if not quiet:
|
||||
print(
|
||||
f"pz_classify: {len(files)} file(s), {log_lines_total} log lines, "
|
||||
f"{error_lines_total} error lines, {len(merged)} records "
|
||||
f"({unique_patterns} unique patterns) -> {out_path}"
|
||||
)
|
||||
return 0
|
||||
|
||||
|
||||
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="pz_classify",
|
||||
description=(
|
||||
"Deterministic Project Zomboid log classifier. Walks redacted "
|
||||
"DebugLog-server*.txt files, classifies errors/warnings, and "
|
||||
"emits a JSON report."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input",
|
||||
type=Path,
|
||||
default=DEFAULT_INPUT,
|
||||
help=f"Input directory of redacted log files (default: {DEFAULT_INPUT}).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--out",
|
||||
type=Path,
|
||||
default=DEFAULT_OUT,
|
||||
help=f"Output JSON path (default: {DEFAULT_OUT}).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--quiet",
|
||||
action="store_true",
|
||||
help="Suppress the trailing one-line summary.",
|
||||
)
|
||||
return parser.parse_args(argv)
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
args = _parse_args(argv)
|
||||
return _run(args.input, args.out, quiet=args.quiet)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
467
tools/pz-analyzer/pz_error_analysis.py
Normal file
467
tools/pz-analyzer/pz_error_analysis.py
Normal file
@@ -0,0 +1,467 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pz_error_analysis.py — Qwen-backed Project Zomboid error analyzer.
|
||||
|
||||
Walks `*DebugLog-server*.txt` files (DEFAULT_INPUT — already PII-redacted by
|
||||
pz_redact_all.sh), groups WARN/ERROR/FATAL entries with surrounding context,
|
||||
deduplicates by signature hash, and asks Qwen to classify each unique
|
||||
signature into a fixed taxonomy (missing_mod, java_exception, lua_error,
|
||||
out_of_memory, ...) with a short title / summary / likely_cause /
|
||||
suggested_fix / confidence.
|
||||
|
||||
Standalone: requires Python 3.10+ and the `openai` package
|
||||
(`pip install openai>=1.30`). Talks to a local OpenAI-compatible endpoint
|
||||
(default sam-desktop llama-swap on port 8401); override with QWEN_BASE_URL
|
||||
and QWEN_MODEL env vars.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import datetime as dt
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator
|
||||
|
||||
from openai import OpenAI
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
|
||||
DEFAULT_INPUT = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
|
||||
DEFAULT_OUT = _REPO_ROOT / ".scratch" / "pz" / "analysis.json"
|
||||
|
||||
# --- Qwen client (inlined from /opt/analytics/ib_analytics/llm/local_client.py
|
||||
# so this script has no cross-repo dependency; mirror upstream changes if
|
||||
# the analytics client API evolves) ---
|
||||
|
||||
QWEN_DEFAULT_BASE_URL = "http://100.101.41.16:8401/v1"
|
||||
QWEN_DEFAULT_MODEL = "qwen3.6-35b-a3b"
|
||||
|
||||
SAMPLING_STRUCTURED: dict[str, Any] = {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.80,
|
||||
"extra_body": {
|
||||
"top_k": 20,
|
||||
"presence_penalty": 1.5,
|
||||
"chat_template_kwargs": {"enable_thinking": False},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def get_client() -> OpenAI:
|
||||
return OpenAI(
|
||||
base_url=os.environ.get("QWEN_BASE_URL", QWEN_DEFAULT_BASE_URL),
|
||||
api_key="EMPTY",
|
||||
)
|
||||
|
||||
|
||||
def get_model() -> str:
|
||||
return os.environ.get("QWEN_MODEL", QWEN_DEFAULT_MODEL)
|
||||
|
||||
|
||||
def structured_call(
|
||||
tool_schema: dict[str, Any],
|
||||
messages: list[dict[str, Any]],
|
||||
*,
|
||||
sampling: dict[str, Any] = SAMPLING_STRUCTURED,
|
||||
client: OpenAI | None = None,
|
||||
model: str | None = None,
|
||||
max_tokens: int = 4096,
|
||||
) -> dict[str, Any]:
|
||||
cli = client or get_client()
|
||||
mdl = model or get_model()
|
||||
fn_name = tool_schema["function"]["name"]
|
||||
kwargs = dict(sampling)
|
||||
extra_body = dict(kwargs.pop("extra_body", {}))
|
||||
response = cli.chat.completions.create(
|
||||
model=mdl,
|
||||
messages=messages,
|
||||
tools=[tool_schema],
|
||||
tool_choice="required",
|
||||
max_tokens=max_tokens,
|
||||
extra_body=extra_body,
|
||||
**kwargs,
|
||||
)
|
||||
choice = response.choices[0]
|
||||
tool_calls = getattr(choice.message, "tool_calls", None) or []
|
||||
if not tool_calls:
|
||||
raise ValueError(
|
||||
f"Qwen did not invoke {fn_name}; finish_reason={choice.finish_reason}, "
|
||||
f"content={(choice.message.content or '')[:500]}"
|
||||
)
|
||||
call = tool_calls[0]
|
||||
if call.function.name != fn_name:
|
||||
raise ValueError(
|
||||
f"Qwen invoked unexpected tool {call.function.name!r}; expected {fn_name!r}"
|
||||
)
|
||||
try:
|
||||
return json.loads(call.function.arguments)
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(
|
||||
f"Malformed tool-call arguments for {fn_name}: {e}; "
|
||||
f"raw={call.function.arguments[:500]}"
|
||||
) from e
|
||||
|
||||
|
||||
# --- Parser ---
|
||||
|
||||
ENTRY_RE = re.compile(
|
||||
r"^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
|
||||
r"(LOG|WARN|ERROR|FATAL)\s*:\s*(.*)"
|
||||
)
|
||||
SESSION_META_RE = re.compile(r"^[A-Za-z]+\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*")
|
||||
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
|
||||
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
|
||||
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
|
||||
WS_RUN_RE = re.compile(r"\s+")
|
||||
|
||||
CATEGORIES = [
|
||||
"missing_mod", "mod_conflict", "lua_error", "java_exception",
|
||||
"out_of_memory", "corrupt_save", "network_error", "load_order",
|
||||
"performance", "server_crash", "unknown",
|
||||
]
|
||||
|
||||
TOOL_SCHEMA: dict[str, Any] = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "submit_error_analysis",
|
||||
"description": (
|
||||
"Analyse a single Project Zomboid server error block and emit "
|
||||
"structured insight."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"category": {"type": "string", "enum": CATEGORIES},
|
||||
"severity": {"type": "string", "enum": ["problem", "warning", "info"]},
|
||||
"title": {"type": "string", "description": "One-line headline (<=80 chars)"},
|
||||
"summary": {"type": "string", "description": "1-3 sentences explaining what happened"},
|
||||
"likely_cause": {"type": "string", "description": "Most plausible cause given the context"},
|
||||
"suggested_fix": {"type": "string", "description": "Concrete remediation, server-admin actionable"},
|
||||
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
|
||||
},
|
||||
"required": [
|
||||
"category", "severity", "title", "summary",
|
||||
"likely_cause", "suggested_fix", "confidence",
|
||||
],
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
SYSTEM_PROMPT = """You are a Project Zomboid dedicated server administrator
|
||||
diagnosing a server log. You receive one error/warning event with surrounding
|
||||
context (entries marked with `>>>` are the hit; the rest are leading or
|
||||
trailing context). Classify the event using the submit_error_analysis tool
|
||||
ONLY — never reply in plain text.
|
||||
|
||||
Rules:
|
||||
- `category` must be one of the enum values; choose `unknown` only if no
|
||||
other fits.
|
||||
- `severity`: problem = breaks something users notice; warning = degraded
|
||||
but functional; info = noteworthy but not failing.
|
||||
- `title`: at most 80 chars, neutral and specific.
|
||||
- `suggested_fix`: a concrete admin action ("subscribe to mod X", "increase
|
||||
-Xmx to 8G", "remove the conflicting mod from Mods= line"), not generic
|
||||
advice.
|
||||
- `confidence`: 0.0-1.0; lower it when the evidence is ambiguous.
|
||||
"""
|
||||
|
||||
MAX_PROMPT_CHARS = 4000
|
||||
|
||||
|
||||
def parse_file(path: Path) -> list[dict[str, Any]]:
|
||||
"""Parse a DebugLog-server file into a list of multi-line entries.
|
||||
|
||||
Continuation lines (lines that don't match ENTRY_RE) append to the
|
||||
previous entry, mirroring codex's PatternParser behaviour.
|
||||
"""
|
||||
entries: list[dict[str, Any]] = []
|
||||
current: dict[str, Any] | None = None
|
||||
with path.open("r", encoding="utf-8", errors="replace") as f:
|
||||
for lineno, raw in enumerate(f, start=1):
|
||||
line = raw.rstrip("\n")
|
||||
m = ENTRY_RE.match(line)
|
||||
if m:
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
current = {
|
||||
"timestamp": m.group(1),
|
||||
"level": m.group(2),
|
||||
"body": [m.group(3)],
|
||||
"line_start": lineno,
|
||||
"line_end": lineno,
|
||||
}
|
||||
elif current is not None:
|
||||
current["body"].append(line)
|
||||
current["line_end"] = lineno
|
||||
# else: orphan line at start of file (no preceding entry); ignore.
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
return entries
|
||||
|
||||
|
||||
def signature_for(level: str, body_lines: list[str]) -> str:
|
||||
"""Stable signature derived from the first body line only.
|
||||
|
||||
Stack-trace continuations are deliberately ignored: the same logical
|
||||
exception can produce slightly different traces (e.g. timing-related
|
||||
code paths) but should still collapse to one signature. Quoted strings
|
||||
(vehicle names, mod IDs, paths) are flattened to <S>; numeric runs of
|
||||
length >= 2 are flattened to <N>; session-metadata prefix
|
||||
(`General f:0,t:N,st:N,N,N>`) is stripped.
|
||||
"""
|
||||
first = (body_lines[0] if body_lines else "").strip()
|
||||
first = SESSION_META_RE.sub("", first)
|
||||
first = DOUBLE_QUOTED_RE.sub('"<S>"', first)
|
||||
first = SINGLE_QUOTED_RE.sub("'<S>'", first)
|
||||
first = NUMERIC_RUN_RE.sub("<N>", first)
|
||||
first = WS_RUN_RE.sub(" ", first)
|
||||
first = first[:200]
|
||||
h = hashlib.sha256(f"{level}\n{first}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
def build_excerpt(
|
||||
entries: list[dict[str, Any]], hit_idx: int, context: int
|
||||
) -> str:
|
||||
"""Render an excerpt centered on entries[hit_idx] with ±context entries."""
|
||||
start = max(0, hit_idx - context)
|
||||
end = min(len(entries), hit_idx + context + 1)
|
||||
lines: list[str] = []
|
||||
for i in range(start, end):
|
||||
e = entries[i]
|
||||
is_hit = i == hit_idx
|
||||
marker = ">>>" if is_hit else " "
|
||||
prefix = f'{marker} [{e["timestamp"]}] {e["level"]}: '
|
||||
body = e["body"]
|
||||
if is_hit:
|
||||
for j, body_line in enumerate(body):
|
||||
lines.append(prefix + body_line if j == 0 else " " + body_line)
|
||||
else:
|
||||
first = (body[0] if body else "").strip()[:200]
|
||||
lines.append(prefix + first)
|
||||
if len(body) > 1:
|
||||
lines.append(f' ... (+{len(body) - 1} more lines)')
|
||||
excerpt = "\n".join(lines)
|
||||
if len(excerpt) > MAX_PROMPT_CHARS:
|
||||
excerpt = excerpt[:MAX_PROMPT_CHARS] + "\n... [truncated]"
|
||||
return excerpt
|
||||
|
||||
|
||||
def iter_warn_or_error(entries: list[dict[str, Any]]) -> Iterator[int]:
|
||||
for i, e in enumerate(entries):
|
||||
if e["level"] in ("WARN", "ERROR", "FATAL"):
|
||||
yield i
|
||||
|
||||
|
||||
def collect_signatures(
|
||||
input_dir: Path, context: int
|
||||
) -> tuple[dict[str, dict[str, Any]], dict[str, int]]:
|
||||
"""Walk DebugLog-server files and collect dedup'd signatures."""
|
||||
signatures: dict[str, dict[str, Any]] = {}
|
||||
files_scanned = 0
|
||||
log_lines_total = 0
|
||||
error_lines_total = 0
|
||||
|
||||
for path in sorted(input_dir.glob("*DebugLog-server*.txt")):
|
||||
files_scanned += 1
|
||||
entries = parse_file(path)
|
||||
log_lines_total += sum(len(e["body"]) for e in entries)
|
||||
for hit_idx in iter_warn_or_error(entries):
|
||||
hit = entries[hit_idx]
|
||||
error_lines_total += len(hit["body"])
|
||||
sig = signature_for(hit["level"], hit["body"])
|
||||
occurrence = {
|
||||
"file": path.name,
|
||||
"line": hit["line_start"],
|
||||
"timestamp": hit["timestamp"],
|
||||
}
|
||||
if sig not in signatures:
|
||||
signatures[sig] = {
|
||||
"signature": sig,
|
||||
"level": hit["level"],
|
||||
"first_seen": occurrence,
|
||||
"occurrence_count": 1,
|
||||
"files": [path.name],
|
||||
"excerpt": build_excerpt(entries, hit_idx, context),
|
||||
}
|
||||
else:
|
||||
rec = signatures[sig]
|
||||
rec["occurrence_count"] += 1
|
||||
if path.name not in rec["files"]:
|
||||
rec["files"].append(path.name)
|
||||
return signatures, {
|
||||
"files_scanned": files_scanned,
|
||||
"log_lines_total": log_lines_total,
|
||||
"error_lines_total": error_lines_total,
|
||||
}
|
||||
|
||||
|
||||
def call_qwen(client: OpenAI, model: str, sig_rec: dict[str, Any]) -> dict[str, Any]:
|
||||
user_prompt = (
|
||||
f'Level: {sig_rec["level"]}\n'
|
||||
f'First seen: {sig_rec["first_seen"]["file"]} '
|
||||
f'line {sig_rec["first_seen"]["line"]}\n'
|
||||
f'Occurrences across this run: {sig_rec["occurrence_count"]} '
|
||||
f'(across {len(sig_rec["files"])} file(s))\n\n'
|
||||
f'Log excerpt:\n{sig_rec["excerpt"]}'
|
||||
)
|
||||
return structured_call(
|
||||
TOOL_SCHEMA,
|
||||
[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": user_prompt},
|
||||
],
|
||||
sampling=SAMPLING_STRUCTURED,
|
||||
client=client,
|
||||
model=model,
|
||||
)
|
||||
|
||||
|
||||
def atomic_write(path: Path, payload: Any) -> None:
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = path.with_suffix(path.suffix + ".tmp")
|
||||
with tmp.open("w", encoding="utf-8") as f:
|
||||
json.dump(payload, f, indent=2, ensure_ascii=False)
|
||||
tmp.replace(path)
|
||||
|
||||
|
||||
def load_existing(path: Path) -> dict[str, dict[str, Any]]:
|
||||
"""Reload signatures previously written to --out.
|
||||
|
||||
Only signatures with an `llm` field count as completed. Bare records
|
||||
(left behind when --limit truncated a prior run) get re-attempted on
|
||||
resume so progressive analysis converges.
|
||||
"""
|
||||
if not path.exists():
|
||||
return {}
|
||||
try:
|
||||
with path.open("r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
return {
|
||||
s["signature"]: s
|
||||
for s in data.get("signatures", [])
|
||||
if "signature" in s and "llm" in s
|
||||
}
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
|
||||
def summarise(analyzed: list[dict[str, Any]]) -> dict[str, Any]:
|
||||
sev_counts = {"problem": 0, "warning": 0, "info": 0}
|
||||
by_cat: dict[str, int] = {}
|
||||
for s in analyzed:
|
||||
llm = s.get("llm") or {}
|
||||
sev = llm.get("severity")
|
||||
cat = llm.get("category")
|
||||
if sev in sev_counts:
|
||||
sev_counts[sev] += 1
|
||||
if cat:
|
||||
by_cat[cat] = by_cat.get(cat, 0) + 1
|
||||
return {
|
||||
"problems": sev_counts["problem"],
|
||||
"warnings": sev_counts["warning"],
|
||||
"info": sev_counts["info"],
|
||||
"by_category": by_cat,
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description=__doc__)
|
||||
ap.add_argument("--input", type=Path, default=DEFAULT_INPUT)
|
||||
ap.add_argument("--out", type=Path, default=DEFAULT_OUT)
|
||||
ap.add_argument("--context", type=int, default=20)
|
||||
ap.add_argument("--limit", type=int, default=None,
|
||||
help="Stop after N new signatures analysed.")
|
||||
ap.add_argument("--resume", action="store_true",
|
||||
help="Reuse existing analysis from --out if present.")
|
||||
ap.add_argument("--checkpoint-every", type=int, default=25)
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.input.is_dir():
|
||||
print(f"error: {args.input} not a directory", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
started = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
|
||||
print(f"[init] scanning {args.input}")
|
||||
signatures, file_stats = collect_signatures(args.input, args.context)
|
||||
print(
|
||||
f"[init] {file_stats['files_scanned']} file(s), "
|
||||
f"{file_stats['log_lines_total']} log lines, "
|
||||
f"{file_stats['error_lines_total']} error lines, "
|
||||
f"{len(signatures)} unique signature(s)"
|
||||
)
|
||||
|
||||
existing = load_existing(args.out) if args.resume else {}
|
||||
if existing:
|
||||
print(f"[init] {len(existing)} signature(s) already analysed; resuming")
|
||||
|
||||
client = get_client()
|
||||
model = get_model()
|
||||
print(f"[init] qwen model={model}")
|
||||
|
||||
n_new = 0
|
||||
t0 = time.time()
|
||||
analyzed: list[dict[str, Any]] = []
|
||||
|
||||
# Process in occurrence_count desc so --limit N picks the most-impactful
|
||||
# signatures rather than whichever happened to scan first.
|
||||
for sig, rec in sorted(
|
||||
signatures.items(), key=lambda kv: -kv[1]["occurrence_count"]
|
||||
):
|
||||
if sig in existing:
|
||||
analyzed.append(existing[sig])
|
||||
continue
|
||||
if args.limit is not None and n_new >= args.limit:
|
||||
analyzed.append(rec) # keep raw record so it's not lost on resume
|
||||
continue
|
||||
try:
|
||||
llm = call_qwen(client, model, rec)
|
||||
rec["llm"] = llm
|
||||
except Exception as e:
|
||||
rec["llm"] = {"error": str(e)[:500]}
|
||||
print(f" [{n_new + 1}] LLM error on {sig}: {e}", file=sys.stderr)
|
||||
analyzed.append(rec)
|
||||
n_new += 1
|
||||
if n_new % args.checkpoint_every == 0:
|
||||
payload = {
|
||||
"meta": {
|
||||
"input_dir": str(args.input),
|
||||
**file_stats,
|
||||
"unique_signatures": len(signatures),
|
||||
"redacted": True,
|
||||
"qwen_model": model,
|
||||
"started": started,
|
||||
"checkpoint_at": dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds"),
|
||||
},
|
||||
"signatures": analyzed,
|
||||
"summary": summarise(analyzed),
|
||||
}
|
||||
atomic_write(args.out, payload)
|
||||
rate = n_new / max(time.time() - t0, 1e-3)
|
||||
print(f" [{n_new}] checkpoint @ {rate:.2f} sig/s")
|
||||
|
||||
finished = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
|
||||
payload = {
|
||||
"meta": {
|
||||
"input_dir": str(args.input),
|
||||
**file_stats,
|
||||
"unique_signatures": len(signatures),
|
||||
"redacted": True,
|
||||
"qwen_model": model,
|
||||
"started": started,
|
||||
"finished": finished,
|
||||
},
|
||||
"signatures": analyzed,
|
||||
"summary": summarise(analyzed),
|
||||
}
|
||||
atomic_write(args.out, payload)
|
||||
print(f"[done] {n_new} new, {len(analyzed)} total -> {args.out}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
777
tools/pz-analyzer/pz_parser.py
Normal file
777
tools/pz-analyzer/pz_parser.py
Normal file
@@ -0,0 +1,777 @@
|
||||
"""
|
||||
pz_parser.py — Deterministic Project Zomboid log parser.
|
||||
|
||||
Pure module (no I/O beyond reading the path it is handed). Walks a redacted
|
||||
DebugLog-server*.txt file, extracts errors/warnings, attributes each to a mod
|
||||
where evidence allows, classifies by kind, and computes deterministic
|
||||
signatures. Output records are designed to be `dataclasses.asdict()`-ready
|
||||
for direct JSON serialisation.
|
||||
|
||||
Pipeline phases (per design spec at
|
||||
docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md):
|
||||
|
||||
1. Severity-prefix recognition (ERROR|SEVERE|WARN)
|
||||
2. Bidirectional stack collection (pre-stack walk back, post-stack walk forward)
|
||||
3. Mod attribution (direct, inferred, unattributed)
|
||||
4. File:line extraction (five fallbacks)
|
||||
5. Cause-chain extraction (Caused by: chains + standalone exception lines)
|
||||
6. Java exception kind detection
|
||||
7. Engine-noise tagging
|
||||
8. Signature computation (pattern_id + signature)
|
||||
9. Aggregation (dedup on signature)
|
||||
|
||||
Style notes mirror sibling tool pz_error_analysis.py: type hints with built-in
|
||||
generics, `from __future__ import annotations`, regex precompilation as
|
||||
module-level constants, stdlib-only.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import pathlib
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tunable constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Lookback window (in raw file lines) for inferred mod attribution.
|
||||
INFERRED_LOOKBACK_LINES: int = 40
|
||||
#: Maximum frames retained per record after pre+post stack merge.
|
||||
MAX_STACK_FRAMES: int = 8
|
||||
#: Maximum lines walked in each direction during bidirectional stack collection.
|
||||
STACK_WALK_LINES: int = 25
|
||||
#: Maximum cause-chain depth retained.
|
||||
MAX_CAUSE_CHAIN_LEVELS: int = 6
|
||||
#: Truncation length for the normalised first line that feeds pattern_id.
|
||||
PATTERN_ID_FIRST_LINE_MAX: int = 200
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Line-shape regexes (parsing)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: PZ DebugLog entry header.
|
||||
#: Example: ``[16-04-26 00:01:19.080] ERROR: General f:0, t:1, st:1,2,3,4> body``
|
||||
ENTRY_RE = re.compile(
|
||||
r"^\[(?P<ts>\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
|
||||
r"(?P<level>[A-Z]+)\s*:\s*(?P<rest>.*)$"
|
||||
)
|
||||
|
||||
#: Strips the "General f:N, t:N, st:N,N,N,N>" prefix from a body line.
|
||||
SESSION_META_RE = re.compile(
|
||||
r"^[A-Za-z][A-Za-z0-9]*\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Severity-prefix recognition (phase 1)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Severity tokens that flag a body line as an error/warning event when they
|
||||
#: appear at the start of body text. Per spec: broader than the existing
|
||||
#: pz_error_analysis.py regex (adds SEVERE for Java util-logging).
|
||||
SEVERITY_BODY_RE = re.compile(r"^\s*(ERROR|SEVERE|WARN)\s*[:\s]")
|
||||
#: Bracketed-level tokens that map to severity events.
|
||||
SEVERITY_LEVELS: tuple[str, ...] = ("ERROR", "WARN", "SEVERE", "FATAL")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stack-frame recognition (phase 2)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Markers that identify a line as stack-shaped. Used to gate pre/post stack
|
||||
#: collection so we don't latch onto non-stack continuation text.
|
||||
STACK_HINT_RE = re.compile(
|
||||
r"(?:\bat\s+\S+|\[string\s+\"|function:\s|file:\s|\.lua\b)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Mod attribution (phase 3)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Direct attribution marker: ``Lua((MOD:<name>))``.
|
||||
LUA_MOD_MARKER_RE = re.compile(r"Lua\(\(MOD:([^)]+)\)\)")
|
||||
#: Direct attribution: ``require("X") failed`` shape.
|
||||
REQUIRE_FAILED_RE = re.compile(
|
||||
r"""require\s*\(\s*["']([^"']+)["']\s*\)\s+failed""",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: Direct attribution: explicit ``needed by <mod>`` hint.
|
||||
NEEDED_BY_RE = re.compile(r"needed\s+by\s+([A-Za-z0-9_'\- ]+?)(?:[,.]|$)", re.IGNORECASE)
|
||||
|
||||
#: Patterns that flag a body as "Lua-shaped" — gating filter for inferred
|
||||
#: attribution. Mirrors the spec's enumeration.
|
||||
LUA_SHAPED_PATTERNS: tuple[re.Pattern[str], ...] = (
|
||||
re.compile(r"luamanager\.getfunctionobject", re.IGNORECASE),
|
||||
re.compile(r"no\s+such\s+function", re.IGNORECASE),
|
||||
re.compile(r"exception\s+thrown", re.IGNORECASE),
|
||||
re.compile(r"runtimeexception", re.IGNORECASE),
|
||||
re.compile(r"illegalstateexception", re.IGNORECASE),
|
||||
re.compile(r"\blua\b", re.IGNORECASE),
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# File:line extraction (phase 4) — five fallbacks tried in order
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: 1. ``at <path>.lua:<n>`` — typical Lua stack frame.
|
||||
FILE_LINE_AT_RE = re.compile(r"\bat\s+([^\s:]+\.lua):(\d+)")
|
||||
#: 2. ``function: ... file: <path>.lua line #<n>`` (or `: <n>`).
|
||||
FILE_LINE_FUNCTION_RE = re.compile(
|
||||
r"function:\s*[^,]*?file:\s*([^\s,]+\.lua)\s+line\s*(?:#|:)\s*(\d+)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: 3. ``[string "<path>.lua"]:<n>`` — Lua VM source string.
|
||||
FILE_LINE_STRING_RE = re.compile(r"""\[string\s+["']([^"']+\.lua)["']\]:(\d+)""")
|
||||
#: 4. quoted path ending in a known extension; line # optional.
|
||||
FILE_LINE_QUOTED_RE = re.compile(
|
||||
r"""["']([^"']+\.(?:lua|txt|xml|json|ini|cfg|bin))["'](?::(\d+))?"""
|
||||
)
|
||||
#: 5. unquoted path segment beginning with a recognised root.
|
||||
FILE_LINE_UNQUOTED_RE = re.compile(
|
||||
r"\b((?:media|maps|lua|scripts)/[\w./\-]+\.(?:lua|txt|xml|json|ini|cfg|bin))(?::(\d+))?"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cause-chain extraction (phase 5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: ``Caused by: <ExceptionClass>: <msg>`` (msg optional).
|
||||
CAUSED_BY_RE = re.compile(
|
||||
r"Caused\s+by:\s+((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?\s*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: Standalone Java exception line: ``com.foo.BarException: msg``.
|
||||
EXCEPTION_LINE_RE = re.compile(
|
||||
r"((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?(?=\s+at\s|\s*$)"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Engine-noise tagging (phase 7)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
ENGINE_NOISE_PATTERNS: tuple[re.Pattern[str], ...] = (
|
||||
re.compile(r"kahluathread\.flusherrormessage", re.IGNORECASE),
|
||||
re.compile(r"dumping\s+lua\s+stack\s+trace", re.IGNORECASE),
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Signature normalisation (phase 8)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
|
||||
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
|
||||
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
|
||||
WS_RUN_RE = re.compile(r"\s+")
|
||||
#: Strips a leading ``ERROR:`` / ``SEVERE:`` / ``WARN:`` / ``FATAL:`` token
|
||||
#: from a body line so a body that happens to begin with the severity word
|
||||
#: hashes to the same pattern_id as the bracketed-only variant. Matches the
|
||||
#: token plus any colon and trailing whitespace; case-insensitive.
|
||||
SEVERITY_PREFIX_STRIP_RE = re.compile(
|
||||
r"^\s*(?:ERROR|SEVERE|WARN|FATAL)\s*[:\s]\s*", re.IGNORECASE
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dataclasses — match the JSON keys the spec mandates so consumers can
|
||||
# `dataclasses.asdict(record)` straight to JSON.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class Entry:
|
||||
"""One parsed log entry. Continuation lines (TAB-indented or otherwise
|
||||
non-header lines) are folded into ``body``. Phase-2 stack collection
|
||||
walks neighbouring entries (not raw lines), so no extra context is
|
||||
stored here.
|
||||
"""
|
||||
|
||||
timestamp: str
|
||||
level: str
|
||||
body: list[str]
|
||||
line_start: int
|
||||
line_end: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class FirstSeen:
|
||||
"""Provenance for the first occurrence of a deduped record."""
|
||||
|
||||
file: str
|
||||
line: int
|
||||
timestamp: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Record:
|
||||
"""One classified, deduplicated error/warning record. Field names mirror
|
||||
the JSON output schema in the spec verbatim — this object is intended to
|
||||
be `dataclasses.asdict()`-ed straight into the output document.
|
||||
"""
|
||||
|
||||
signature: str
|
||||
pattern_id: str
|
||||
level: str
|
||||
kind: str
|
||||
mod_id: str
|
||||
mod_name: str
|
||||
attribution: str
|
||||
confidence: str
|
||||
attribution_reason: str
|
||||
file: str
|
||||
line: int
|
||||
cause_chain: str
|
||||
stack: list[str]
|
||||
first_seen: FirstSeen
|
||||
occurrence_count: int
|
||||
files: list[str]
|
||||
excerpt: str
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 0: file parse
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def parse_file(path: pathlib.Path) -> list[Entry]:
|
||||
"""Parse a DebugLog-server file into a list of multi-line entries.
|
||||
|
||||
Continuation lines (those not matching ENTRY_RE) append to the previous
|
||||
entry's body, mirroring codex's PatternParser behaviour for multi-line
|
||||
Java stack traces under an ERROR header.
|
||||
"""
|
||||
entries: list[Entry] = []
|
||||
current: Entry | None = None
|
||||
with path.open("r", encoding="utf-8", errors="replace") as f:
|
||||
for lineno, raw in enumerate(f, start=1):
|
||||
line = raw.rstrip("\n")
|
||||
m = ENTRY_RE.match(line)
|
||||
if m:
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
current = Entry(
|
||||
timestamp=m.group("ts"),
|
||||
level=m.group("level"),
|
||||
body=[m.group("rest")],
|
||||
line_start=lineno,
|
||||
line_end=lineno,
|
||||
)
|
||||
elif current is not None:
|
||||
current.body.append(line)
|
||||
current.line_end = lineno
|
||||
# else: orphan line at start of file (no preceding entry); ignore.
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
return entries
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 1: severity-prefix recognition
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def is_severity_entry(entry: Entry) -> bool:
|
||||
"""True if this entry is an ERROR/WARN/SEVERE/FATAL — either by the
|
||||
bracketed level or a leading SEVERE/ERROR/WARN token in the body (after
|
||||
stripping the session-meta prefix)."""
|
||||
if entry.level in SEVERITY_LEVELS:
|
||||
return True
|
||||
if entry.body and SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0])):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def effective_level(entry: Entry) -> str:
|
||||
"""Return the effective severity for an entry. Body-prefix takes
|
||||
precedence — covers the SEVERE-in-body case where bracketed level is LOG
|
||||
*and* the case where bracketed level is ERROR but body says SEVERE.
|
||||
"""
|
||||
if entry.body:
|
||||
m = SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0]))
|
||||
if m:
|
||||
return m.group(1).upper()
|
||||
return entry.level
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 2: bidirectional stack collection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _is_stack_shaped(line: str) -> bool:
|
||||
return bool(STACK_HINT_RE.search(line))
|
||||
|
||||
|
||||
def _strip_session_meta(body_line: str) -> str:
|
||||
"""Strip the ``General f:N, t:N, st:...> `` session-metadata prefix from
|
||||
a body's first line so pattern matching can run against the meaningful tail.
|
||||
"""
|
||||
return SESSION_META_RE.sub("", body_line)
|
||||
|
||||
|
||||
def _collect_pre_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Walk back through prior entries; collect stack-shaped lines from each
|
||||
entry's body. Stop at the previous severity-flagged entry. Cap collection
|
||||
at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines examined.
|
||||
Per spec, only return the block if at least one line looks stack-shaped.
|
||||
"""
|
||||
collected: list[str] = []
|
||||
lines_examined = 0
|
||||
for j in range(hit_idx - 1, -1, -1):
|
||||
prior = entries[j]
|
||||
# Stop at another severity line (the previous error's boundary).
|
||||
if is_severity_entry(prior):
|
||||
break
|
||||
# Walk this entry's body in reverse; for body[0] the session-meta
|
||||
# prefix is part of the line — strip it before stack-shape check.
|
||||
for k in range(len(prior.body) - 1, -1, -1):
|
||||
line = prior.body[k]
|
||||
stripped = _strip_session_meta(line) if k == 0 else line
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(stripped):
|
||||
collected.append(stripped.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
break
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
break
|
||||
if len(collected) >= MAX_STACK_FRAMES or lines_examined >= STACK_WALK_LINES:
|
||||
break
|
||||
if not collected:
|
||||
return []
|
||||
collected.reverse() # restore source order
|
||||
return collected
|
||||
|
||||
|
||||
def _collect_post_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Look at the entry's own body continuation lines first (stack frames
|
||||
attached to the ERROR header become continuation lines after parsing),
|
||||
then walk forward through subsequent entries. Stop at the next severity
|
||||
entry. Cap at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines."""
|
||||
entry = entries[hit_idx]
|
||||
collected: list[str] = []
|
||||
lines_examined = 0
|
||||
# Body continuations (skip body[0] which is the headline itself).
|
||||
for line in entry.body[1:]:
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(line):
|
||||
collected.append(line.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
return collected
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
return collected
|
||||
for j in range(hit_idx + 1, len(entries)):
|
||||
next_entry = entries[j]
|
||||
if is_severity_entry(next_entry):
|
||||
break
|
||||
for k, line in enumerate(next_entry.body):
|
||||
stripped = _strip_session_meta(line) if k == 0 else line
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(stripped):
|
||||
collected.append(stripped.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
return collected
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
return collected
|
||||
return collected
|
||||
|
||||
|
||||
def collect_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Merge pre + post stack, dedup preserving order, cap at MAX_STACK_FRAMES."""
|
||||
pre = _collect_pre_stack(entries, hit_idx)
|
||||
post = _collect_post_stack(entries, hit_idx)
|
||||
seen: set[str] = set()
|
||||
merged: list[str] = []
|
||||
for frame in pre + post:
|
||||
if frame in seen:
|
||||
continue
|
||||
seen.add(frame)
|
||||
merged.append(frame)
|
||||
if len(merged) >= MAX_STACK_FRAMES:
|
||||
break
|
||||
return merged
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 3: mod attribution
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _norm_mod_key(raw_name: str) -> str:
|
||||
"""Lowercase, strip spaces / apostrophes / hyphens. Used as mod_id."""
|
||||
s = raw_name.lower()
|
||||
for ch in (" ", "'", "-"):
|
||||
s = s.replace(ch, "")
|
||||
return s
|
||||
|
||||
|
||||
def _entry_text(entry: Entry) -> str:
|
||||
"""Whole-entry text (body + collected stack) for marker scanning."""
|
||||
return "\n".join(entry.body)
|
||||
|
||||
|
||||
def attribute_entry(entry: Entry, prior_lookback_lines: list[str]) -> tuple[str, str, str, str, str]:
|
||||
"""Determine ``(mod_id, mod_name, attribution, confidence, reason)``.
|
||||
|
||||
``prior_lookback_lines`` is the body lines from prior entries that fall
|
||||
within INFERRED_LOOKBACK_LINES raw-file-line distance from this entry's
|
||||
start, in source order. The list is scanned in reverse for the nearest
|
||||
``Lua((MOD:Y))`` marker when inferred attribution is being attempted.
|
||||
|
||||
Direct-attribution priority: Lua marker -> needed-by -> require-failed.
|
||||
|
||||
Rationale: ``needed by <mod>`` names the dependent mod (more semantically
|
||||
targeted) and is preferred over ``require("...") failed`` which only names
|
||||
the missing module path. ``Lua((MOD:...))`` is unambiguous and wins
|
||||
outright.
|
||||
"""
|
||||
text = _entry_text(entry)
|
||||
# 1. Direct via Lua((MOD:X)) — unambiguous; outranks every other signal.
|
||||
m = LUA_MOD_MARKER_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip()
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"direct",
|
||||
"high",
|
||||
"Lua((MOD:...)) marker on the entry itself",
|
||||
)
|
||||
# 2. Direct via "needed by <mod>"
|
||||
m = NEEDED_BY_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip().rstrip(".,;")
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"direct",
|
||||
"high",
|
||||
"needed by <mod> hint",
|
||||
)
|
||||
# 3. Direct via require("X") failed — attribute to required module name.
|
||||
m = REQUIRE_FAILED_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip()
|
||||
# Mod-name first segment (PZ paths often look like Mod/Foo/Bar).
|
||||
mod_name = raw.split("/")[0] if "/" in raw else raw
|
||||
return (
|
||||
_norm_mod_key(mod_name),
|
||||
mod_name,
|
||||
"direct",
|
||||
"high",
|
||||
'require("...") failed shape',
|
||||
)
|
||||
# 4. Inferred — Lua-shaped body + recent Lua((MOD:Y)) within lookback.
|
||||
if any(p.search(text) for p in LUA_SHAPED_PATTERNS):
|
||||
for line in reversed(prior_lookback_lines):
|
||||
mm = LUA_MOD_MARKER_RE.search(line)
|
||||
if mm:
|
||||
raw = mm.group(1).strip()
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"inferred",
|
||||
"medium",
|
||||
f"Lua-shaped body; nearest Lua((MOD:{raw})) within "
|
||||
f"{INFERRED_LOOKBACK_LINES}-line lookback",
|
||||
)
|
||||
return (
|
||||
"__unattributed__",
|
||||
"",
|
||||
"unattributed",
|
||||
"low",
|
||||
"no marker; body not Lua-shaped or no recent Lua((MOD:...))",
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 4: file:line extraction (five fallbacks, in order)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def extract_file_line(text: str) -> tuple[str, int]:
|
||||
"""Run the five fallbacks in order. Returns ``(file, line)`` with line=0
|
||||
when only a path was matched."""
|
||||
m = FILE_LINE_AT_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_FUNCTION_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_STRING_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_QUOTED_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2)) if m.group(2) else 0
|
||||
m = FILE_LINE_UNQUOTED_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2)) if m.group(2) else 0
|
||||
return "", 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 5: cause-chain extraction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def extract_cause_chain(text: str) -> str:
|
||||
"""Return ``ExceptionA: msg -> ExceptionB: msg`` joined chain, deduped,
|
||||
capped at MAX_CAUSE_CHAIN_LEVELS levels.
|
||||
"""
|
||||
tokens: list[str] = []
|
||||
seen: set[str] = set()
|
||||
for line in text.splitlines():
|
||||
cb = CAUSED_BY_RE.search(line)
|
||||
if cb:
|
||||
cls = cb.group(1)
|
||||
msg = cb.group(2) or ""
|
||||
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
|
||||
if tok not in seen:
|
||||
seen.add(tok)
|
||||
tokens.append(tok)
|
||||
continue
|
||||
ex = EXCEPTION_LINE_RE.search(line)
|
||||
if ex:
|
||||
cls = ex.group(1)
|
||||
msg = ex.group(2) or ""
|
||||
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
|
||||
if tok not in seen:
|
||||
seen.add(tok)
|
||||
tokens.append(tok)
|
||||
if len(tokens) >= MAX_CAUSE_CHAIN_LEVELS:
|
||||
break
|
||||
return " -> ".join(tokens[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 6: Java exception kind detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
JAVA_EXCEPTION_RE = re.compile(r"(?:\w+\.)+\w+(?:Exception|Error)\b")
|
||||
|
||||
|
||||
def detect_kind(entry: Entry, attribution: str, body_text: str) -> str:
|
||||
"""Determine the ``kind`` field. Order: engine_noise > require_failed >
|
||||
java_exception > lua_runtime > runtime."""
|
||||
# Phase 7 short-circuit (engine noise outranks others per spec — engine
|
||||
# noise is PZ's own diagnostic chatter regardless of class).
|
||||
if any(p.search(body_text) for p in ENGINE_NOISE_PATTERNS):
|
||||
return "engine_noise"
|
||||
if REQUIRE_FAILED_RE.search(body_text):
|
||||
return "require_failed"
|
||||
has_java = bool(JAVA_EXCEPTION_RE.search(body_text))
|
||||
has_lua_marker = bool(LUA_MOD_MARKER_RE.search(body_text))
|
||||
if has_java and not has_lua_marker:
|
||||
return "java_exception"
|
||||
# Lua-attributed runtime / inferred
|
||||
if has_lua_marker or attribution in ("direct", "inferred"):
|
||||
return "lua_runtime"
|
||||
return "runtime"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 8: signature computation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def normalize_first_line(first: str) -> str:
|
||||
"""Per spec: strip session metadata prefix, strip any leading severity
|
||||
word (so ``SEVERE: foo`` and ``foo`` produce the same pattern_id when both
|
||||
are SEVERE-level), flatten quoted strings to ``"<S>"`` / ``'<S>'``, flatten
|
||||
≥2-digit numeric runs to ``<N>``, collapse whitespace, truncate to 200
|
||||
chars.
|
||||
"""
|
||||
s = first.strip()
|
||||
s = SESSION_META_RE.sub("", s)
|
||||
# Strip any leading ERROR:/SEVERE:/WARN:/FATAL: that survived in the body
|
||||
# — the bracketed level already feeds pattern_id separately, so leaving
|
||||
# the body-prefix in place would fragment signatures across "body has
|
||||
# SEVERE: prefix" vs "body has no prefix but bracketed level is SEVERE."
|
||||
s = SEVERITY_PREFIX_STRIP_RE.sub("", s)
|
||||
s = DOUBLE_QUOTED_RE.sub('"<S>"', s)
|
||||
s = SINGLE_QUOTED_RE.sub("'<S>'", s)
|
||||
s = NUMERIC_RUN_RE.sub("<N>", s)
|
||||
s = WS_RUN_RE.sub(" ", s)
|
||||
return s[:PATTERN_ID_FIRST_LINE_MAX]
|
||||
|
||||
|
||||
def compute_pattern_id(level: str, first_line: str) -> str:
|
||||
"""``sha256(level + normalized_first_line)[:16]``, prefixed ``sha256:``.
|
||||
|
||||
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
|
||||
trade-off; consumers treat as opaque.
|
||||
"""
|
||||
norm = normalize_first_line(first_line)
|
||||
h = hashlib.sha256(f"{level}\n{norm}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
def compute_signature(pattern_id: str, mod_id: str) -> str:
|
||||
"""``sha256(pattern_id + mod_id)[:16]``, prefixed ``sha256:``.
|
||||
|
||||
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
|
||||
trade-off; consumers treat as opaque.
|
||||
"""
|
||||
h = hashlib.sha256(f"{pattern_id}\n{mod_id}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Aggregation (phase 9) and the public classify_entries entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
|
||||
_ATTRIBUTION_RANK: dict[str, int] = {
|
||||
"unattributed": 0,
|
||||
"inferred": 1,
|
||||
"direct": 2,
|
||||
}
|
||||
|
||||
|
||||
def _build_excerpt(entry: Entry, max_chars: int = 1000) -> str:
|
||||
"""Best-effort one-block excerpt of the entry (header + continuations)."""
|
||||
lines: list[str] = []
|
||||
header = f'[{entry.timestamp}] {entry.level}: '
|
||||
if entry.body:
|
||||
lines.append(header + entry.body[0])
|
||||
for cont in entry.body[1:]:
|
||||
lines.append(cont)
|
||||
text = "\n".join(lines)
|
||||
if len(text) > max_chars:
|
||||
text = text[:max_chars] + "\n... [truncated]"
|
||||
return text
|
||||
|
||||
|
||||
def _build_lookback_window(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Collect body lines from prior entries whose ``line_start`` falls within
|
||||
INFERRED_LOOKBACK_LINES raw-file-line distance from the current entry.
|
||||
|
||||
Spec wording is "within the previous 40 lines", measured in raw file lines
|
||||
(mirrors pzmm's ``(i - last_mod_line) <= 40``, inclusive of 40). Counting
|
||||
raw lines means a multi-line entry (e.g., a 5-line Java stack trace) does
|
||||
not shrink the practical window the way a body-line budget would.
|
||||
|
||||
Returned list is in source order (oldest first) so callers can call
|
||||
``reversed()`` on it.
|
||||
"""
|
||||
if hit_idx <= 0:
|
||||
return []
|
||||
threshold = entries[hit_idx].line_start - INFERRED_LOOKBACK_LINES
|
||||
in_window: list[Entry] = []
|
||||
for j in range(hit_idx - 1, -1, -1):
|
||||
prior = entries[j]
|
||||
if prior.line_start < threshold:
|
||||
break
|
||||
in_window.append(prior)
|
||||
# We accumulated newest-first; reverse so we emit in source order.
|
||||
in_window.reverse()
|
||||
collected: list[str] = []
|
||||
for prior in in_window:
|
||||
collected.extend(prior.body)
|
||||
return collected
|
||||
|
||||
|
||||
def classify_entries(entries: list[Entry], source_file: str = "") -> list[Record]:
|
||||
"""Apply phases 1-9 to a parsed-file entry list. Returns one Record per
|
||||
unique (mod_id, error_shape) pair after dedup on signature.
|
||||
"""
|
||||
by_signature: dict[str, Record] = {}
|
||||
for hit_idx, entry in enumerate(entries):
|
||||
if not is_severity_entry(entry):
|
||||
continue
|
||||
level = effective_level(entry)
|
||||
body_text = _entry_text(entry)
|
||||
# Phase 2: stack collection
|
||||
stack = collect_stack(entries, hit_idx)
|
||||
# Phase 3: attribution (with INFERRED_LOOKBACK_LINES lookback)
|
||||
prior_window = _build_lookback_window(entries, hit_idx)
|
||||
mod_id, mod_name, attribution, confidence, attribution_reason = attribute_entry(
|
||||
entry, prior_window
|
||||
)
|
||||
# Phase 4: file:line extraction (search body + stack frames)
|
||||
search_text = body_text + "\n" + "\n".join(stack)
|
||||
file_path, line_no = extract_file_line(search_text)
|
||||
# Phase 5: cause-chain extraction
|
||||
cause_chain = extract_cause_chain(search_text)
|
||||
# Phase 6 & 7: kind detection (engine_noise short-circuits)
|
||||
kind = detect_kind(entry, attribution, body_text)
|
||||
# Phase 8: signature computation
|
||||
pattern_id = compute_pattern_id(level, entry.body[0] if entry.body else "")
|
||||
signature = compute_signature(pattern_id, mod_id)
|
||||
# Phase 9: dedup & aggregate
|
||||
if signature not in by_signature:
|
||||
by_signature[signature] = Record(
|
||||
signature=signature,
|
||||
pattern_id=pattern_id,
|
||||
level=level,
|
||||
kind=kind,
|
||||
mod_id=mod_id,
|
||||
mod_name=mod_name,
|
||||
attribution=attribution,
|
||||
confidence=confidence,
|
||||
attribution_reason=attribution_reason,
|
||||
file=file_path,
|
||||
line=line_no,
|
||||
cause_chain=cause_chain,
|
||||
stack=list(stack),
|
||||
first_seen=FirstSeen(
|
||||
file=source_file,
|
||||
line=entry.line_start,
|
||||
timestamp=entry.timestamp,
|
||||
),
|
||||
occurrence_count=1,
|
||||
files=[source_file] if source_file else [],
|
||||
excerpt=_build_excerpt(entry),
|
||||
)
|
||||
else:
|
||||
rec = by_signature[signature]
|
||||
rec.occurrence_count += 1
|
||||
if source_file and source_file not in rec.files:
|
||||
rec.files.append(source_file)
|
||||
# Promote attribution / confidence if this hit is stronger.
|
||||
if _ATTRIBUTION_RANK[attribution] > _ATTRIBUTION_RANK[rec.attribution]:
|
||||
rec.attribution = attribution
|
||||
rec.attribution_reason = attribution_reason
|
||||
if mod_name:
|
||||
rec.mod_name = mod_name
|
||||
if _CONFIDENCE_RANK[confidence] > _CONFIDENCE_RANK[rec.confidence]:
|
||||
rec.confidence = confidence
|
||||
# Merge stack frames (preserving order, capped).
|
||||
for frame in stack:
|
||||
if frame not in rec.stack and len(rec.stack) < MAX_STACK_FRAMES:
|
||||
rec.stack.append(frame)
|
||||
# Extend cause chain if the new hit has additional segments.
|
||||
if cause_chain and cause_chain != rec.cause_chain:
|
||||
# Concatenate unseen tokens.
|
||||
old = rec.cause_chain.split(" -> ") if rec.cause_chain else []
|
||||
new = cause_chain.split(" -> ")
|
||||
merged = list(old)
|
||||
for tok in new:
|
||||
if tok and tok not in merged:
|
||||
merged.append(tok)
|
||||
rec.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
return list(by_signature.values())
|
||||
|
||||
|
||||
__all__ = [
|
||||
"Entry",
|
||||
"FirstSeen",
|
||||
"Record",
|
||||
"parse_file",
|
||||
"classify_entries",
|
||||
"is_severity_entry",
|
||||
"effective_level",
|
||||
"collect_stack",
|
||||
"attribute_entry",
|
||||
"extract_file_line",
|
||||
"extract_cause_chain",
|
||||
"detect_kind",
|
||||
"normalize_first_line",
|
||||
"compute_pattern_id",
|
||||
"compute_signature",
|
||||
"INFERRED_LOOKBACK_LINES",
|
||||
"MAX_STACK_FRAMES",
|
||||
"STACK_WALK_LINES",
|
||||
"MAX_CAUSE_CHAIN_LEVELS",
|
||||
"SEVERITY_LEVELS",
|
||||
]
|
||||
36
tools/pz-analyzer/pz_redact_all.sh
Executable file
36
tools/pz-analyzer/pz_redact_all.sh
Executable file
@@ -0,0 +1,36 @@
|
||||
#!/usr/bin/env bash
|
||||
# One-shot PII redaction over the PZ DebugLog-server files extracted from
|
||||
# /opt/ik-codex/Logs.zip. Produces /opt/ik-codex/.scratch/pz/Logs.redacted/
|
||||
# (gitignored alongside the source). Single Docker invocation; the codex
|
||||
# library's vendor/autoload.php is mounted read-write only because composer's
|
||||
# image refuses world-readable mounts under -u UID:GID.
|
||||
#
|
||||
# Re-runnable: rewrites every output file. Add --refresh-cache semantics by
|
||||
# rm -rf'ing the OUT directory first if you want.
|
||||
set -euo pipefail
|
||||
|
||||
IN=/opt/ik-codex/.scratch/pz/Logs
|
||||
OUT=/opt/ik-codex/.scratch/pz/Logs.redacted
|
||||
|
||||
if [ ! -d "$IN" ]; then
|
||||
echo "error: input directory $IN missing — extract Logs.zip first" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$OUT"
|
||||
|
||||
docker run --rm \
|
||||
--entrypoint php \
|
||||
-v /opt/ik-codex:/app -w /app \
|
||||
-v "$IN":/in:ro -v "$OUT":/out \
|
||||
-u "$(id -u):$(id -g)" \
|
||||
composer:latest \
|
||||
-r '
|
||||
require "vendor/autoload.php";
|
||||
$r = new IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor();
|
||||
$files = glob("/in/*DebugLog-server*.txt");
|
||||
foreach ($files as $f) {
|
||||
file_put_contents("/out/" . basename($f), $r->redact(file_get_contents($f)));
|
||||
}
|
||||
fprintf(STDERR, "redacted %d file(s)\n", count($files));
|
||||
'
|
||||
0
tools/pz-analyzer/tests/__init__.py
Normal file
0
tools/pz-analyzer/tests/__init__.py
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_cause_chain.txt
vendored
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_cause_chain.txt
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:04:00.000] ERROR: General f:0, t:1776297840000, st:48,648,355,178> Lua((MOD:Test Mod Alpha)) wrapper failure
|
||||
java.lang.RuntimeException: outer wrapper at zombie.Foo(Foo.java:10)
|
||||
Caused by: java.lang.IllegalStateException: middle layer
|
||||
Caused by: java.lang.NullPointerException: deepest cause
|
||||
at zombie.Bar(Bar.java:99)
|
||||
[16-04-26 00:04:01.000] LOG : General f:0, t:1776297841000, st:48,648,356,178> after.
|
||||
8
tools/pz-analyzer/tests/fixtures/fixture_dedup.txt
vendored
Normal file
8
tools/pz-analyzer/tests/fixtures/fixture_dedup.txt
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.
|
||||
0
tools/pz-analyzer/tests/fixtures/fixture_empty.txt
vendored
Normal file
0
tools/pz-analyzer/tests/fixtures/fixture_empty.txt
vendored
Normal file
4
tools/pz-analyzer/tests/fixtures/fixture_engine_noise.txt
vendored
Normal file
4
tools/pz-analyzer/tests/fixtures/fixture_engine_noise.txt
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:03:00.000] ERROR: General f:0, t:1776297780000, st:48,648,295,178> KahluaThread.flusherrormessage> dumping lua stack trace
|
||||
at media/lua/client/Foo.lua:1
|
||||
[16-04-26 00:03:01.000] LOG : General f:0, t:1776297781000, st:48,648,296,178> after.
|
||||
10
tools/pz-analyzer/tests/fixtures/fixture_file_line_fallbacks.txt
vendored
Normal file
10
tools/pz-analyzer/tests/fixtures/fixture_file_line_fallbacks.txt
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod A)) format1
|
||||
at media/lua/client/F1.lua:11
|
||||
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod B)) format2
|
||||
function: doStuff -- file: media/lua/client/F2.lua line # 22
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod C)) format3
|
||||
[string "media/lua/client/F3.lua"]:33: bang
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod D)) format4 about "media/lua/client/F4.lua" failure
|
||||
[16-04-26 00:01:04.000] ERROR: General f:0, t:1776297664000, st:48,648,179,178> Lua((MOD:Test Mod E)) format5 path media/lua/client/F5.lua mention
|
||||
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> ok.
|
||||
7
tools/pz-analyzer/tests/fixtures/fixture_inferred.txt
vendored
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_inferred.txt
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> another log line.
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> LuaManager.GetFunctionObject> no such function: doStuff
|
||||
at media/lua/client/Spongie.lua:7
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
8
tools/pz-analyzer/tests/fixtures/fixture_java_exception.txt
vendored
Normal file
8
tools/pz-analyzer/tests/fixtures/fixture_java_exception.txt
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297679080, st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
|
||||
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
|
||||
Stack trace:
|
||||
at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
|
||||
at java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
|
||||
at java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
|
||||
[16-04-26 00:01:19.090] LOG : General f:0, t:1776297679090, st:48,648,194,268> after.
|
||||
45
tools/pz-analyzer/tests/fixtures/fixture_lookback_boundary.txt
vendored
Normal file
45
tools/pz-analyzer/tests/fixtures/fixture_lookback_boundary.txt
vendored
Normal file
@@ -0,0 +1,45 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Distant)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> filler 1.
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> filler 2.
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> filler 3.
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> filler 4.
|
||||
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> filler 5.
|
||||
[16-04-26 00:01:06.000] LOG : General f:0, t:1776297666000, st:48,648,181,178> filler 6.
|
||||
[16-04-26 00:01:07.000] LOG : General f:0, t:1776297667000, st:48,648,182,178> filler 7.
|
||||
[16-04-26 00:01:08.000] LOG : General f:0, t:1776297668000, st:48,648,183,178> filler 8.
|
||||
[16-04-26 00:01:09.000] LOG : General f:0, t:1776297669000, st:48,648,184,178> filler 9.
|
||||
[16-04-26 00:01:10.000] LOG : General f:0, t:1776297670000, st:48,648,185,178> filler 10.
|
||||
[16-04-26 00:01:11.000] LOG : General f:0, t:1776297671000, st:48,648,186,178> filler 11.
|
||||
[16-04-26 00:01:12.000] LOG : General f:0, t:1776297672000, st:48,648,187,178> filler 12.
|
||||
[16-04-26 00:01:13.000] LOG : General f:0, t:1776297673000, st:48,648,188,178> filler 13.
|
||||
[16-04-26 00:01:14.000] LOG : General f:0, t:1776297674000, st:48,648,189,178> filler 14.
|
||||
[16-04-26 00:01:15.000] LOG : General f:0, t:1776297675000, st:48,648,190,178> filler 15.
|
||||
[16-04-26 00:01:16.000] LOG : General f:0, t:1776297676000, st:48,648,191,178> filler 16.
|
||||
[16-04-26 00:01:17.000] LOG : General f:0, t:1776297677000, st:48,648,192,178> filler 17.
|
||||
[16-04-26 00:01:18.000] LOG : General f:0, t:1776297678000, st:48,648,193,178> filler 18.
|
||||
[16-04-26 00:01:19.000] LOG : General f:0, t:1776297679000, st:48,648,194,178> filler 19.
|
||||
[16-04-26 00:01:20.000] LOG : General f:0, t:1776297680000, st:48,648,195,178> filler 20.
|
||||
[16-04-26 00:01:21.000] LOG : General f:0, t:1776297681000, st:48,648,196,178> filler 21.
|
||||
[16-04-26 00:01:22.000] LOG : General f:0, t:1776297682000, st:48,648,197,178> filler 22.
|
||||
[16-04-26 00:01:23.000] LOG : General f:0, t:1776297683000, st:48,648,198,178> filler 23.
|
||||
[16-04-26 00:01:24.000] LOG : General f:0, t:1776297684000, st:48,648,199,178> filler 24.
|
||||
[16-04-26 00:01:25.000] LOG : General f:0, t:1776297685000, st:48,648,200,178> filler 25.
|
||||
[16-04-26 00:01:26.000] LOG : General f:0, t:1776297686000, st:48,648,201,178> filler 26.
|
||||
[16-04-26 00:01:27.000] LOG : General f:0, t:1776297687000, st:48,648,202,178> filler 27.
|
||||
[16-04-26 00:01:28.000] LOG : General f:0, t:1776297688000, st:48,648,203,178> filler 28.
|
||||
[16-04-26 00:01:29.000] LOG : General f:0, t:1776297689000, st:48,648,204,178> filler 29.
|
||||
[16-04-26 00:01:30.000] LOG : General f:0, t:1776297690000, st:48,648,205,178> filler 30.
|
||||
[16-04-26 00:01:31.000] LOG : General f:0, t:1776297691000, st:48,648,206,178> filler 31.
|
||||
[16-04-26 00:01:32.000] LOG : General f:0, t:1776297692000, st:48,648,207,178> filler 32.
|
||||
[16-04-26 00:01:33.000] LOG : General f:0, t:1776297693000, st:48,648,208,178> filler 33.
|
||||
[16-04-26 00:01:34.000] LOG : General f:0, t:1776297694000, st:48,648,209,178> filler 34.
|
||||
[16-04-26 00:01:35.000] LOG : General f:0, t:1776297695000, st:48,648,210,178> filler 35.
|
||||
[16-04-26 00:01:36.000] LOG : General f:0, t:1776297696000, st:48,648,211,178> filler 36.
|
||||
[16-04-26 00:01:37.000] LOG : General f:0, t:1776297697000, st:48,648,212,178> filler 37.
|
||||
[16-04-26 00:01:38.000] LOG : General f:0, t:1776297698000, st:48,648,213,178> filler 38.
|
||||
[16-04-26 00:01:39.000] LOG : General f:0, t:1776297699000, st:48,648,214,178> filler 39.
|
||||
[16-04-26 00:01:40.000] LOG : General f:0, t:1776297700000, st:48,648,215,178> filler 40.
|
||||
[16-04-26 00:01:41.000] LOG : General f:0, t:1776297701000, st:48,648,216,178> filler 41.
|
||||
[16-04-26 00:01:42.000] ERROR: General f:0, t:1776297702000, st:48,648,217,178> LuaManager.GetFunctionObject> no such function (way past lookback)
|
||||
[16-04-26 00:01:43.000] LOG : General f:0, t:1776297703000, st:48,648,218,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_lua_attributed.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_lua_attributed.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:19.131] LOG : Mod f:0, t:1776297679131, st:48,648,194,309> loading example_mod_alpha.
|
||||
[16-04-26 00:05:00.000] ERROR: General f:0, t:1776297900000, st:48,648,415,178> Lua((MOD:Test Mod Alpha)) something broke
|
||||
at media/lua/client/Foo.lua:42
|
||||
function: doStuff -- file: media/lua/client/Foo.lua line # 42
|
||||
[16-04-26 00:05:01.000] LOG : General f:0, t:1776297901000, st:48,648,416,178> after the error.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_no_errors.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_no_errors.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> ordinary line.
|
||||
[16-04-26 00:02:00.000] LOG : General f:0, t:1776297720000, st:48,648,235,178> nothing wrong.
|
||||
5
tools/pz-analyzer/tests/fixtures/fixture_non_lua_no_inferred.txt
vendored
Normal file
5
tools/pz-analyzer/tests/fixtures/fixture_non_lua_no_inferred.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Disk full while writing chunk data
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_post_stack.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_post_stack.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash now
|
||||
at media/lua/client/X.lua:11
|
||||
at media/lua/client/Y.lua:22
|
||||
[string "media/lua/client/Z.lua"]:33: oops
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_pre_stack.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_pre_stack.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> at media/lua/client/B.lua:22
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> [string "media/lua/client/C.lua"]:33: oops
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod Alpha)) crash
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_require_failed.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_require_failed.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> require("DependencyMod/Foo") failed: needed by Test Mod Alpha
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ok.
|
||||
5
tools/pz-analyzer/tests/fixtures/fixture_severity_variants.txt
vendored
Normal file
5
tools/pz-analyzer/tests/fixtures/fixture_severity_variants.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> ERROR: top-level error message
|
||||
[16-04-26 00:01:01.000] WARN : General f:0, t:1776297661000, st:48,648,176,178> WARN: top-level warn message
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> SEVERE: java-style severe message at zombie.Foo(Foo.java:5)
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_unattributed.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_unattributed.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:02:00.000] WARN : General f:0, t:1776297720000, st:48,648,235,178> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
|
||||
[16-04-26 00:02:01.000] LOG : General f:0, t:1776297721000, st:48,648,236,178> after.
|
||||
225
tools/pz-analyzer/tests/test_attribution.py
Normal file
225
tools/pz-analyzer/tests/test_attribution.py
Normal file
@@ -0,0 +1,225 @@
|
||||
"""Tests for pz_parser phase 3 — mod attribution."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
|
||||
|
||||
|
||||
def fixture(name: str) -> pathlib.Path:
|
||||
return FIXTURE_DIR / name
|
||||
|
||||
|
||||
class AttributionBucketTests(unittest.TestCase):
|
||||
"""Three confidence buckets: direct (high), inferred (medium),
|
||||
unattributed (low)."""
|
||||
|
||||
def test_direct_attribution_when_lua_marker_on_entry(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="la.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "direct")
|
||||
self.assertEqual(rec.confidence, "high")
|
||||
# mod_id is normalised: lowercase, no spaces / apostrophes / hyphens.
|
||||
self.assertEqual(rec.mod_id, "testmodalpha")
|
||||
self.assertEqual(rec.mod_name, "Test Mod Alpha")
|
||||
|
||||
def test_inferred_attribution_within_lookback_window(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_inferred.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="in.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "inferred")
|
||||
self.assertEqual(rec.confidence, "medium")
|
||||
self.assertEqual(rec.mod_id, "spongiesclothing")
|
||||
|
||||
def test_unattributed_when_no_marker_and_not_lua_shaped(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="ua.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
self.assertEqual(rec.confidence, "low")
|
||||
self.assertEqual(rec.mod_id, "__unattributed__")
|
||||
|
||||
|
||||
class LookbackBoundaryTests(unittest.TestCase):
|
||||
"""Phase 3 — 40-line inferred-attribution window boundary."""
|
||||
|
||||
def test_lua_marker_beyond_lookback_does_not_attribute(self) -> None:
|
||||
# Fixture places the Lua((MOD:...)) >40 lines before the ERROR.
|
||||
entries = pz_parser.parse_file(fixture("fixture_lookback_boundary.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="lb.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# The Lua-shaped ERROR is far enough back to be unattributed.
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
self.assertEqual(rec.mod_id, "__unattributed__")
|
||||
|
||||
def test_non_lua_shaped_body_rejects_inferred_attribution(self) -> None:
|
||||
# Recent Lua((MOD:Spongies Clothing)) emitted, but the ERROR body
|
||||
# ("Disk full while writing chunk data") isn't Lua-shaped.
|
||||
entries = pz_parser.parse_file(fixture("fixture_non_lua_no_inferred.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="nl.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
|
||||
|
||||
class NeededByTests(unittest.TestCase):
|
||||
"""Phase 3 — direct attribution via "needed by <mod>" hint."""
|
||||
|
||||
def test_needed_by_extracts_dependent_mod(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="rf.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# "needed by Test Mod Alpha" should set the mod to Test Mod Alpha
|
||||
# (preferred over the require("...") side which would mention
|
||||
# DependencyMod). Either way we want direct/high.
|
||||
self.assertEqual(rec.attribution, "direct")
|
||||
self.assertEqual(rec.confidence, "high")
|
||||
# The "needed by" branch is checked before the require() branch in
|
||||
# the priority order; mod_id should reflect Test Mod Alpha.
|
||||
self.assertEqual(rec.mod_id, "testmodalpha")
|
||||
|
||||
|
||||
def _make_marker_line(idx: int) -> str:
|
||||
"""Synthesise a single LOG-level entry containing a Lua((MOD:...)) marker."""
|
||||
# Vary timestamps so the bracketed prefix is unique-ish; not strictly
|
||||
# required — they only feed Entry.timestamp, not parsing.
|
||||
return (
|
||||
f"[16-04-26 00:00:{idx:02d}.000] LOG : General f:0, "
|
||||
f"t:1776297642{idx:03d}, st:48,648,157,434> "
|
||||
"Lua((MOD:Test Mod Alpha)) initialised."
|
||||
)
|
||||
|
||||
|
||||
def _make_filler_line(idx: int) -> str:
|
||||
"""A plain LOG-level entry with no marker; one raw line."""
|
||||
return (
|
||||
f"[16-04-26 00:01:{idx % 60:02d}.000] LOG : General f:0, "
|
||||
f"t:177629760{idx:04d}, st:48,648,200,178> filler entry {idx}."
|
||||
)
|
||||
|
||||
|
||||
def _make_error_line() -> str:
|
||||
"""A Lua-shaped ERROR with no Lua((MOD:...)) marker on the entry itself
|
||||
— so attribution must come from the lookback window if it comes at all."""
|
||||
return (
|
||||
"[16-04-26 00:02:00.000] ERROR: General f:0, "
|
||||
"t:1776297900000, st:48,648,300,178> "
|
||||
"LuaManager.GetFunctionObject> no such function: doStuff"
|
||||
)
|
||||
|
||||
|
||||
class RawLineLookbackTests(unittest.TestCase):
|
||||
"""Phase 3 — lookback semantics measure raw file lines, not body-line
|
||||
budgets. Multi-line entries inside the window must not shrink the
|
||||
practical reach."""
|
||||
|
||||
def _write_fixture(self, name: str, lines: list[str]) -> pathlib.Path:
|
||||
path = FIXTURE_DIR / name
|
||||
path.write_text("\n".join(lines) + "\n")
|
||||
return path
|
||||
|
||||
def test_marker_exactly_at_lookback_boundary_attributes(self) -> None:
|
||||
# Marker on line 1, ERROR on line 41 -> raw-line distance = 40
|
||||
# (inclusive of INFERRED_LOOKBACK_LINES=40 -> still attributed).
|
||||
lines = [_make_marker_line(0)]
|
||||
for i in range(1, 40):
|
||||
lines.append(_make_filler_line(i))
|
||||
lines.append(_make_error_line()) # line 41 in the fixture
|
||||
path = self._write_fixture("_rawline_at_boundary.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
self.assertEqual(entries[-1].line_start, 41)
|
||||
records = pz_parser.classify_entries(entries, source_file="b1.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].attribution, "inferred")
|
||||
self.assertEqual(records[0].mod_id, "testmodalpha")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
def test_marker_one_line_past_boundary_does_not_attribute(self) -> None:
|
||||
# Marker on line 1, ERROR on line 42 -> raw-line distance = 41
|
||||
# (just outside INFERRED_LOOKBACK_LINES -> unattributed).
|
||||
lines = [_make_marker_line(0)]
|
||||
for i in range(1, 41):
|
||||
lines.append(_make_filler_line(i))
|
||||
lines.append(_make_error_line()) # line 42 in the fixture
|
||||
path = self._write_fixture("_rawline_past_boundary.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
self.assertEqual(entries[-1].line_start, 42)
|
||||
records = pz_parser.classify_entries(entries, source_file="b2.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].attribution, "unattributed")
|
||||
self.assertEqual(records[0].mod_id, "__unattributed__")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
def test_multiline_entry_does_not_shrink_practical_lookback(self) -> None:
|
||||
"""Multi-line entries inside the lookback window do not break
|
||||
attribution. (Old body-line-budget and new raw-line-distance semantics
|
||||
happen to be equivalent on contiguous PZ entries; this test locks the
|
||||
post-fix semantic against future regression to a budget that *would*
|
||||
differ — e.g. a body-line cap with a smaller value.)
|
||||
"""
|
||||
# Layout the file so a multi-line entry sits between marker and ERROR.
|
||||
# The marker on line 1 is within 40 raw lines of the ERROR even though
|
||||
# the file has a 6-line multi-line entry in between.
|
||||
lines = [_make_marker_line(0)] # raw line 1: marker entry
|
||||
# Single-line fillers on raw lines 2..30 (29 entries).
|
||||
for i in range(1, 30):
|
||||
lines.append(_make_filler_line(i))
|
||||
# Multi-line entry: header on raw line 31, 5 continuations on lines
|
||||
# 32..36 (Java-stack-trace shape).
|
||||
lines.append(
|
||||
"[16-04-26 00:01:30.000] LOG : General f:0, "
|
||||
"t:1776297930000, st:48,648,200,178> stack trace dump"
|
||||
)
|
||||
for k in range(5):
|
||||
lines.append(f"\tat zombie.SomeClass.method{k}(SomeClass.java:{k + 1})")
|
||||
# Single-line fillers on raw lines 37..40 (4 entries).
|
||||
for i in range(30, 34):
|
||||
lines.append(_make_filler_line(i))
|
||||
# ERROR at raw line 41 -> N - 1 = 40 -> within window.
|
||||
lines.append(_make_error_line())
|
||||
path = self._write_fixture("_rawline_multiline.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
# Sanity-check the layout: first entry at line 1, multi-line entry
|
||||
# sits at line 31 with 6 body lines (header + 5 continuations),
|
||||
# ERROR at line 41.
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
multi = next(
|
||||
e for e in entries
|
||||
if e.line_start == 31 and len(e.body) == 6
|
||||
)
|
||||
self.assertEqual(multi.line_end, 36)
|
||||
self.assertEqual(entries[-1].line_start, 41)
|
||||
records = pz_parser.classify_entries(entries, source_file="ml.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
# Raw-line-distance semantics: the marker on line 1 is 40 raw
|
||||
# lines from the ERROR on line 41, so attribution holds. (Old
|
||||
# body-line-budget would also pass here on contiguous entries;
|
||||
# this assertion locks the post-fix behavior against future
|
||||
# regression to a tighter cap.)
|
||||
self.assertEqual(records[0].attribution, "inferred")
|
||||
self.assertEqual(records[0].mod_id, "testmodalpha")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
199
tools/pz-analyzer/tests/test_parser.py
Normal file
199
tools/pz-analyzer/tests/test_parser.py
Normal file
@@ -0,0 +1,199 @@
|
||||
"""Tests for pz_parser parsing pipeline (phases 1, 2, 4-7, 9)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
# Make the parser module importable when running via `python -m unittest
|
||||
# discover -s tools/pz-analyzer/tests`.
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
|
||||
|
||||
|
||||
def fixture(name: str) -> pathlib.Path:
|
||||
return FIXTURE_DIR / name
|
||||
|
||||
|
||||
class ParseFileTests(unittest.TestCase):
|
||||
"""Phase 0 — basic line-shape recognition and continuation folding."""
|
||||
|
||||
def test_parse_file_groups_continuations_under_entry(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
|
||||
# 3 bracketed entries; the ERROR has 4 continuation lines.
|
||||
self.assertEqual(len(entries), 3)
|
||||
error_entry = entries[1]
|
||||
self.assertEqual(error_entry.level, "ERROR")
|
||||
self.assertGreater(len(error_entry.body), 1)
|
||||
# First continuation should be the java exception line.
|
||||
self.assertIn("NoSuchFileException", error_entry.body[1])
|
||||
|
||||
def test_parse_file_handles_empty_file(self) -> None:
|
||||
self.assertEqual(pz_parser.parse_file(fixture("fixture_empty.txt")), [])
|
||||
|
||||
def test_parse_file_handles_no_errors(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
|
||||
self.assertEqual(len(entries), 3)
|
||||
self.assertTrue(all(e.level == "LOG" for e in entries))
|
||||
|
||||
|
||||
class SeverityRecognitionTests(unittest.TestCase):
|
||||
"""Phase 1 — ERROR / WARN / SEVERE recognition."""
|
||||
|
||||
def test_classify_picks_up_error_warn_and_severe(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_severity_variants.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="severity.txt")
|
||||
levels = sorted({r.level for r in records})
|
||||
# Spec accepts ERROR / WARN / SEVERE. The third entry has bracketed
|
||||
# ERROR but body starts with SEVERE: ; effective_level should be SEVERE.
|
||||
self.assertIn("ERROR", levels)
|
||||
self.assertIn("WARN", levels)
|
||||
self.assertIn("SEVERE", levels)
|
||||
|
||||
def test_log_lines_are_ignored(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="x.txt")
|
||||
self.assertEqual(records, [])
|
||||
|
||||
|
||||
class StackCollectionTests(unittest.TestCase):
|
||||
"""Phase 2 — bidirectional stack collection."""
|
||||
|
||||
def test_pre_stack_walk_picks_up_preceding_lua_frames(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_pre_stack.txt"))
|
||||
# The ERROR entry is the 5th LOG-bracketed line; its predecessors are
|
||||
# LOG-bracketed entries whose bodies are stack-shaped lines.
|
||||
records = pz_parser.classify_entries(entries, source_file="pre.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# Pre-stack walk should pick up at least the "at media/lua/.../A.lua:11" frame.
|
||||
self.assertTrue(any("A.lua:11" in f for f in rec.stack))
|
||||
|
||||
def test_post_stack_collected_from_entry_body_continuations(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_post_stack.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="post.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertTrue(any("X.lua:11" in f for f in rec.stack))
|
||||
self.assertTrue(any("Y.lua:22" in f for f in rec.stack))
|
||||
# Lua [string "..."]:N form preserves quoting in the captured frame.
|
||||
self.assertTrue(any("Z.lua" in f and ":33" in f for f in rec.stack))
|
||||
|
||||
def test_stack_capped_at_eight_frames(self) -> None:
|
||||
# Synthesise an ERROR with many continuation frames.
|
||||
lines = ["[16-04-26 00:00:42.314] ERROR: General f:0, t:1, st:1,2,3,4> Lua((MOD:Test Mod Alpha)) crash"]
|
||||
for i in range(20):
|
||||
lines.append(f"\tat media/lua/client/F{i}.lua:{i + 1}")
|
||||
path = FIXTURE_DIR / "_runtime_stack_cap.txt"
|
||||
path.write_text("\n".join(lines) + "\n")
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
records = pz_parser.classify_entries(entries, source_file="cap.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertLessEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
|
||||
# And it should be exactly MAX_STACK_FRAMES given >MAX inputs.
|
||||
self.assertEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
|
||||
class FileLineExtractionTests(unittest.TestCase):
|
||||
"""Phase 4 — five-fallback file:line extraction."""
|
||||
|
||||
def test_each_fallback_form_extracts_path(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_file_line_fallbacks.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="ff.txt")
|
||||
# 5 distinct ERRORs, distinct mods — should produce 5 records.
|
||||
files = sorted(r.file for r in records)
|
||||
self.assertEqual(
|
||||
files,
|
||||
sorted([
|
||||
"media/lua/client/F1.lua",
|
||||
"media/lua/client/F2.lua",
|
||||
"media/lua/client/F3.lua",
|
||||
"media/lua/client/F4.lua",
|
||||
"media/lua/client/F5.lua",
|
||||
]),
|
||||
)
|
||||
|
||||
def test_quoted_path_without_line_number_yields_zero(self) -> None:
|
||||
# Format 4 fixture line lacks a :NN suffix on the quoted path.
|
||||
file_path, line_no = pz_parser.extract_file_line(
|
||||
'failure about "media/lua/client/F4.lua" tail'
|
||||
)
|
||||
self.assertEqual(file_path, "media/lua/client/F4.lua")
|
||||
self.assertEqual(line_no, 0)
|
||||
|
||||
|
||||
class CauseChainTests(unittest.TestCase):
|
||||
"""Phase 5 — Caused-by chain unwinding."""
|
||||
|
||||
def test_caused_by_chain_renders_with_arrow_separator(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_cause_chain.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="cc.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
chain = records[0].cause_chain
|
||||
self.assertIn("RuntimeException", chain)
|
||||
self.assertIn("IllegalStateException", chain)
|
||||
self.assertIn("NullPointerException", chain)
|
||||
# Order preserved (outer -> inner).
|
||||
idx_runtime = chain.index("RuntimeException")
|
||||
idx_illegal = chain.index("IllegalStateException")
|
||||
idx_null = chain.index("NullPointerException")
|
||||
self.assertLess(idx_runtime, idx_illegal)
|
||||
self.assertLess(idx_illegal, idx_null)
|
||||
|
||||
def test_no_cause_chain_when_no_exceptions(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="u.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].cause_chain, "")
|
||||
|
||||
|
||||
class KindDetectionTests(unittest.TestCase):
|
||||
"""Phases 6 & 7 — kind classification."""
|
||||
|
||||
def test_java_exception_kind_when_no_lua_marker(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="je.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "java_exception")
|
||||
# Java engine errors should resolve to __unattributed__.
|
||||
self.assertEqual(records[0].mod_id, "__unattributed__")
|
||||
|
||||
def test_engine_noise_kind_for_kahluathread(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_engine_noise.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="en.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "engine_noise")
|
||||
|
||||
def test_lua_runtime_kind_for_attributed_lua_error(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="la.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "lua_runtime")
|
||||
|
||||
def test_require_failed_kind(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="rf.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "require_failed")
|
||||
|
||||
|
||||
class AggregationTests(unittest.TestCase):
|
||||
"""Phase 9 — dedup, occurrence_count, files-set growth."""
|
||||
|
||||
def test_three_identical_errors_dedup_to_one_record(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_dedup.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="dd.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].occurrence_count, 3)
|
||||
# files list shouldn't duplicate "dd.txt".
|
||||
self.assertEqual(records[0].files, ["dd.txt"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
91
tools/pz-analyzer/tests/test_signatures.py
Normal file
91
tools/pz-analyzer/tests/test_signatures.py
Normal file
@@ -0,0 +1,91 @@
|
||||
"""Tests for pz_parser phase 8 — signature computation."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
|
||||
class PatternIdStabilityTests(unittest.TestCase):
|
||||
"""pattern_id should be invariant under formatting variations."""
|
||||
|
||||
def test_pattern_id_collapses_numeric_runs(self) -> None:
|
||||
a = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"General f:0, t:1776297642, st:48,648,157,434> failed at offset 12345",
|
||||
)
|
||||
b = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"General f:0, t:9999999999, st:99,99,99,99> failed at offset 99999",
|
||||
)
|
||||
self.assertEqual(a, b)
|
||||
|
||||
def test_pattern_id_collapses_quoted_strings_and_whitespace(self) -> None:
|
||||
a = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
'no such function "doStuff" in module',
|
||||
)
|
||||
b = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
'no such function "fooBarBaz" in module',
|
||||
)
|
||||
# Whitespace-collapse plus quoted-string-flatten => same pattern_id.
|
||||
self.assertEqual(a, b)
|
||||
|
||||
def test_pattern_id_changes_with_level(self) -> None:
|
||||
a = pz_parser.compute_pattern_id("ERROR", "exception thrown")
|
||||
b = pz_parser.compute_pattern_id("WARN", "exception thrown")
|
||||
self.assertNotEqual(a, b)
|
||||
|
||||
|
||||
class SignatureUniquenessTests(unittest.TestCase):
|
||||
"""signature should fan out across mods sharing a pattern_id."""
|
||||
|
||||
def test_signature_unique_per_mod_for_shared_pattern(self) -> None:
|
||||
# Same first line, different mod_ids — different signatures, same pattern_id.
|
||||
pat = pz_parser.compute_pattern_id("ERROR", "Lua((MOD:X)) crash")
|
||||
sig_a = pz_parser.compute_signature(pat, "spongiesclothing")
|
||||
sig_b = pz_parser.compute_signature(pat, "testmodalpha")
|
||||
self.assertNotEqual(sig_a, sig_b)
|
||||
# Both should share their pattern_id (consumer's pattern-fanout view).
|
||||
self.assertEqual(pat[:7], "sha256:")
|
||||
|
||||
|
||||
class SeverityPrefixStripTests(unittest.TestCase):
|
||||
"""A body line that begins with a literal severity word (``SEVERE:``,
|
||||
``ERROR:``, ``WARN:``, ``FATAL:``) should not fragment pattern_id away
|
||||
from the otherwise-identical body that lacks the prefix. The bracketed
|
||||
level already feeds pattern_id; the prefix is redundant and varies in
|
||||
practice."""
|
||||
|
||||
def test_pattern_id_invariant_under_body_prefix_severe(self) -> None:
|
||||
# Same logical error: one line carries ``SEVERE: `` body prefix, the
|
||||
# other doesn't. Both classified as SEVERE by their bracketed level.
|
||||
with_prefix = pz_parser.compute_pattern_id(
|
||||
"SEVERE",
|
||||
"SEVERE: foo at zombie.X(File.java:42)",
|
||||
)
|
||||
without_prefix = pz_parser.compute_pattern_id(
|
||||
"SEVERE",
|
||||
"foo at zombie.X(File.java:42)",
|
||||
)
|
||||
self.assertEqual(with_prefix, without_prefix)
|
||||
|
||||
def test_pattern_id_invariant_under_body_prefix_error(self) -> None:
|
||||
with_prefix = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"ERROR: doStuff failed in module",
|
||||
)
|
||||
without_prefix = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"doStuff failed in module",
|
||||
)
|
||||
self.assertEqual(with_prefix, without_prefix)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user