Compare commits
32 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 656142dbf8 | |||
| c63adb06c4 | |||
| 0d18cfbfc6 | |||
| 45a5e1a3da | |||
| 6978175dff | |||
| 3df6836909 | |||
| b6949ff0c3 | |||
| f1d2831d92 | |||
| bb4ee0d16a | |||
| 58d0ef187b | |||
| 9cd898bc9f | |||
| 87a0562bd6 | |||
| fdf70a0c06 | |||
| 2e7bebc911 | |||
| 4fec3a58f6 | |||
| 511583035b | |||
| e1a7785cf4 | |||
| 2bd4fe6189 | |||
| 5b4f77a72f | |||
| 1657be7711 | |||
| 50194c72b2 | |||
| 6bf63f1823 | |||
| 081d40c208 | |||
| d6831c5851 | |||
| c2cb64e9a7 | |||
| 2d1cbccc5d | |||
| 44b6b99047 | |||
| 0c8dad9502 | |||
| 7755d8385c | |||
| 409de16003 | |||
| aec835e0eb | |||
| 6fde2d49ff |
7
.gitignore
vendored
7
.gitignore
vendored
@@ -5,3 +5,10 @@ Logs.zip
|
||||
.scratch/
|
||||
.claude/
|
||||
.claude.local.md
|
||||
|
||||
# Python bytecode caches from tools/pz-analyzer/.
|
||||
__pycache__/
|
||||
|
||||
# Editor / manual backup files.
|
||||
*.bak
|
||||
*.bak-*
|
||||
|
||||
47
CHANGELOG.md
47
CHANGELOG.md
@@ -4,6 +4,50 @@ All notable changes to `indifferentketchup/codex` are documented here.
|
||||
|
||||
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.3.0] — 2026-05-04
|
||||
|
||||
Adds IP-address redaction to the PZ redactor, a new `ErrorContextAnalyser` for surrounding-context surfacing, the `tools/pz-analyzer/` Python toolset (pre-production Qwen-driven research analyser and production-bound deterministic classifier), and a parser fix for the PZ B42 log shape that was silently breaking level/prefix attribution since The Indie Stone dropped the per-line `t:` field. New public API surface across the redactor and the analyser-side classes makes this a minor bump rather than a patch.
|
||||
|
||||
### Added
|
||||
|
||||
- **IP redaction in `ProjectZomboidRedactor`** (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — fourth pass that scrubs IPv4 (strict 0-255 octets, optional `:port` suffix) and IPv6 (full, abbreviated, bracketed-with-port, IPv4-mapped) addresses, replacing them with the literal `[REDACTED_IP]`. New public API: `IP_REPLACEMENT`, `IPV4_REGEX`, `IPV6_REGEX` constants and a `redactIpAddresses(bool)` toggle (defaults on, mirroring the existing three category toggles). Pattern-disjoint from the Steam-ID → name → coordinates chain; runs first by convention. Strict regexes plus `filter_var()` validation prevent false positives on PZ timestamps and PHP / Java scope ops. 20 new unit tests across two files (`ProjectZomboidRedactorIpv4Test.php`, `ProjectZomboidRedactorIpv6Test.php`).
|
||||
- **`ErrorContextAnalyser`** (`src/Analyser/ProjectZomboid/ErrorContextAnalyser.php`) — generic-purpose analyser that walks `Entry[]` once and emits one `ErrorContextProblem` per ERROR / WARNING entry with up to `CONTEXT_BEFORE` (20) entries of leading context and `CONTEXT_AFTER` (20) entries of trailing context. Overlapping windows clip to `lastEmittedIndex + 1` so no Entry appears in two context arrays; emission caps at `HIT_CAP` (500) with a single `ErrorContextTruncatedInformation` appended when reached. Standalone — not auto-registered to any existing Log subclass's `getDefaultAnalyser()`; consumers wire it in explicitly. Companion classes `ErrorContextProblem` and `ErrorContextTruncatedInformation` under `src/Analysis/ProjectZomboid/`. 3 unit tests, 134 assertions.
|
||||
- **`tools/pz-analyzer/`** — Python toolset adjacent to the library (not part of the Composer package's autoload surface). `pz_redact_all.sh` is a one-shot Docker wrapper that runs the PHP redactor over `.scratch/pz/Logs/` and produces a gitignored `.scratch/pz/Logs.redacted/` directory. `pz_error_analysis.py` is a developer-facing Qwen-backed pre-production analyser that calls a local OpenAI-compatible endpoint to classify residual log shapes the deterministic side hasn't yet captured. `pz_parser.py` + `pz_classify.py` are the production-bound deterministic-only counterpart: pure parser module with mod attribution, file:line extraction, cause-chain unwinding, engine-noise tagging, and a two-level signature scheme (`pattern_id` + `signature`), plus a stdlib-only orchestrator that walks the redacted directory and emits a JSON report. 32 Python unit tests across three files, 16 synthetic fixtures.
|
||||
- `docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md` — design contract for `pz_parser.py` / `pz_classify.py`. The PHP-side `ErrorContextAnalyser` ships without a separate spec; its design fell out of a brainstorming session inline with the pzmm-pattern-port discussion.
|
||||
- New synthetic fixture `test/src/Games/ProjectZomboid/fixtures/debug-server-42x-minimal.txt` mirroring the existing B41 fixture in PZ B42 line shape.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`DebugServerPattern::LINE` regex relaxed** to handle PZ build 42.x. The Indie Stone dropped the per-line `t:` (microsecond) field and tightened the spacing between `f:N`, `t:N`, and `st:N,N,N,N>` markers somewhere on the way to build 42.17. The previous regex required the full `f:\d+,\s+t:\d+,\s+st:` triplet and silently failed on every B42 line. Now `(?:,\s+t:\d+)?` makes the `t:N,` field optional and `,?` makes the inter-field comma optional. Backwards-compatible — every B41 line continues to parse identically. `ProjectZomboidServerLogTest` now runs each parser-shape assertion via `#[DataProvider]` against both fixtures.
|
||||
- **Pass order in `ProjectZomboidRedactor::redact()`**: the new IP pass runs first, so the chain is now `IP → Steam ID → player name → coordinates`. The mandatory Steam ID → name → coordinates ordering is preserved; placement of the IP pass is by convention since its regexes are pattern-disjoint from the rest.
|
||||
- **`CLAUDE.md`** documents `iblogs` as the primary downstream consumer with a per-component checklist for cross-repo public API impact; the release-flow cadence; the feature-branch workflow set by the `redactor` and `iblogs-bootstrap` precedents; and the `docs/superpowers/specs|plans/` path convention.
|
||||
- **`.gitignore`** excludes `__pycache__/` (Python bytecode caches generated under `tools/pz-analyzer/`) and `*.bak` / `*.bak-*` (editor / manual backup files).
|
||||
|
||||
### Fixed
|
||||
|
||||
- PZ build 42.x server logs now parse with proper level / prefix attribution. Previously, every B42 line failed `DebugServerPattern::LINE` and the resulting ServerLog entries fell through as level `INFO` with no prefix. This silently disabled `ServerExceptionProblem` and `ModMissingProblem` (their regexes anchor on `[timestamp]...` at entry start, which a level-less orphan entry doesn't emit). The anchorless `EngineVersionInformation` continued to fire against the joined entry text, producing the user-visible symptom "one Information badge, empty Problems panel" on B42 logs. The fix restores per-line parsing, re-enables both Problem classes, and makes the error-count badge populate correctly.
|
||||
|
||||
### Test counts
|
||||
|
||||
- PHP suite: **287 tests, 654 assertions** (up from 260 / 492 at v0.2.0).
|
||||
- Python suite under `tools/pz-analyzer/`: **32 tests** (stdlib `unittest`, sub-10 ms).
|
||||
|
||||
## [0.2.0] — 2026-05-01
|
||||
|
||||
Render-time PII redaction utility added on the same calendar day as v0.1.0. Cut as a minor version bump rather than a patch because it adds a new public API surface (`RedactorInterface` plus the per-game implementation), which under semver is a minor change, not a patch. Consumers (notably iblogs) pin to `^0.2.0` to opt into the redactor-aware version.
|
||||
|
||||
### Added
|
||||
|
||||
- `RedactorInterface` (`src/Util/RedactorInterface.php`) and `ProjectZomboidRedactor` (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — render-time PII filter that scrubs Steam IDs, player names, and world coordinates from Project Zomboid log content. Three independent toggles default to on. Designed as a string-in/string-out utility so consumers can apply it at any rendering or export step. Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted; victim's name and coords (after `hit`) are deferred to v2. In admin lines, `teleported X to <coords>` coordinates are not redacted in v1.
|
||||
- 65 new test methods across six files under `test/tests/Util/Redactor/` — per-category unit tests, combined / toggle / idempotence matrix, and integration coverage that drives all 11 existing PZ fixtures through the redactor end-to-end. Suite total: 260 tests, 492 assertions.
|
||||
- `docs/superpowers/specs/2026-04-30-redactor-design.md` flipped from "deferred" to "implemented" status. Plan committed at `docs/superpowers/plans/2026-05-01-redactor.md`.
|
||||
|
||||
### Changed
|
||||
|
||||
- New top-level `src/Util/` directory introduced. The Redactor is its first occupant; future utilities (e.g. tokenising redactor variants) land here.
|
||||
|
||||
## [0.1.0] — 2026-05-01
|
||||
|
||||
First public release. Codex is a generic PHP log parsing and analysis framework with full Project Zomboid server-log support across eight analysers. The Composer package name is `indifferentketchup/codex` (the repository directory and Gitea slug are `ik-codex`; the package name is not).
|
||||
@@ -32,8 +76,9 @@ First public release. Codex is a generic PHP log parsing and analysis framework
|
||||
|
||||
### Deferred
|
||||
|
||||
- **Codex `Redactor` utility** — design captured in `docs/superpowers/specs/2026-04-30-redactor-design.md`. Not implemented in v0.1.0. iblogs (the downstream consumer) handles upload-time PII filtering for this release; codex itself ships no PII helper. The deferred spec exists so iblogs's privacy story has a referenced design to point at and so a future implementation pass has a clear contract to start from.
|
||||
- **Other game implementations** — `Minecraft`, `Hytale`, and `SevenDaysToDie` are detective-stub-only. Each has a TODO `<Game>Detective` extending base `Detective`; their per-component subdirectories under `Analyser`, `Log`, `Parser`, and `Pattern` contain only `.gitkeep` placeholders. Real implementations land if and when fixtures and demand exist.
|
||||
- **Packagist publication** — v0.1.0 is consumable via Composer's `vcs` repository entry pointing at the Gitea remote. Pushing to Packagist is a separate decision and is not in scope for this release.
|
||||
|
||||
[0.3.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.3.0
|
||||
[0.2.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.2.0
|
||||
[0.1.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.1.0
|
||||
|
||||
15
CLAUDE.md
15
CLAUDE.md
@@ -49,6 +49,7 @@ Analysis of Insight[]
|
||||
- **`PatternParser`** is regex-driven. Lines that don't match the LINE regex append to the previous `Entry` — this is the mechanism that handles multi-line records like Java stack traces under an ERROR header.
|
||||
- **`PatternAnalyser`** walks entries, runs each registered insight class's static `getPatterns()` against entry text via `preg_match_all`, and emits coalesced insights (equal insights bump a counter instead of duplicating).
|
||||
- **Custom `Analyser` subclasses** are the right move when analysis needs cross-entry state — pairing events, sliding-window thresholds, comparing consecutive snapshots. `PatternAnalyser` operates per-entry only and can't express those. Phase B.3 (`ConnectionFailureAnalyser`, `ItemDuplicationAnalyser`, `SkillProgressionAnomalyAnalyser`) shows the shape: extend `Analyser`, override `analyse()`, walk `$this->log` once, aggregate, then emit coalesced `Problem`/`Information` insights at the end. Tunable thresholds belong as `public const` constants on the subclass with the rationale in a docblock.
|
||||
- **`RedactorInterface`** is a render-time PII filter — string-in/string-out, configured per game, implemented at `src/Util/<Game>/<Game>Redactor.php`. Consumers call `redact(string $content): string` on a concrete instance before rendering or exporting log content.
|
||||
- Detectors available out of the box: `SinglePatternDetector`, `WeightedSinglePatternDetector`, `LinePatternDetector` (returns match ratio), `MultiPatternDetector` (AND), and the path-based `FilenameDetector` (uses `LogFileInterface::getPath()`, returns `false` when no path is available).
|
||||
|
||||
## Game subtrees
|
||||
@@ -58,11 +59,14 @@ Layout is **components-outer with game suffix**, not games-outer:
|
||||
```
|
||||
src/<Component>/<Game>/... e.g. src/Log/ProjectZomboid/ProjectZomboidServerLog.php
|
||||
src/Pattern/<Game>/<Type>Pattern.php (regex string constants; not a framework abstraction)
|
||||
src/Util/<Game>/... e.g. src/Util/ProjectZomboid/ProjectZomboidRedactor.php
|
||||
test/tests/Games/<Game>/...
|
||||
test/src/Games/<Game>/fixtures/<type>-minimal.txt (synthetic fixtures only)
|
||||
```
|
||||
|
||||
Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `<Game>Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 12 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic.
|
||||
`src/Util/` is the sixth top-level component directory, introduced post-v0.1.0-tag. Its first occupant is the Redactor; future game-agnostic utilities (tokenising redactor variants, etc.) land here too.
|
||||
|
||||
Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty `.gitkeep`s plus a TODO `<Game>Detective` extending base `Detective`). `ProjectZomboid` is fully implemented: 11 log subclasses, 11 pattern classes, detective wired with all 11, synthetic fixtures, dispatch tests, plus the analyser surface — 11 `PatternAnalyser`-driven Insight classes under `src/Analysis/ProjectZomboid/` and 3 custom `Analyser` subclasses under `src/Analyser/ProjectZomboid/` for cross-entry / threshold logic.
|
||||
|
||||
`src/Pattern/` is **not a framework abstraction** — patterns are plain `string` class constants. Each `<Type>Pattern` typically holds a `LINE` constant for the parser plus named-group extractor constants (`FIELDS`, `COMBAT`, `MOD_LOAD`, etc.) for analysers.
|
||||
|
||||
@@ -74,23 +78,32 @@ Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty
|
||||
- A custom `Analyser` subclass (cross-entry logic): `UserLog → ConnectionFailureAnalyser`, `ItemLog → ItemDuplicationAnalyser`, `PerkLog → SkillProgressionAnomalyAnalyser`.
|
||||
- A configured `PatternAnalyser` (per-entry pattern matching): `ServerLog`, `PvpLog`, `AdminLog` register their respective Insight classes.
|
||||
- An empty `PatternAnalyser` for logs with no analysers yet: `ChatLog`, `ClientActionLog`, `CmdLog`, `MapLog`, `BurdJournalsLog`. These are wiring stubs awaiting future analysis work.
|
||||
- **`ProjectZomboidRedactor`** at `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` — concrete `RedactorInterface` implementation. Downstream consumers call `redact(string): string` to scrub Steam IDs (zeroed placeholder), player names (`<player>`), and world coordinates (`0,0,0`) from log content. Three independent toggle methods default to on: `redactSteamIds(bool)`, `redactPlayerNames(bool)`, `redactCoordinates(bool)`. Pass order (Steam ID → player name → coords) is mandatory and enforced internally — see Pitfall 5.
|
||||
|
||||
### Standard test template for a Log subclass
|
||||
|
||||
At minimum: (1) entry count after `parse()` matches the synthetic fixture's line count, (2) one or more named-group `FIELDS` regexes from the `<Type>Pattern` class extract correctly from a representative line, (3) `Detective` handed the fixture path returns an instance of this Log class. Use `#[DataProvider]` when the same shape repeats per file.
|
||||
|
||||
### Downstream consumers
|
||||
|
||||
`iblogs` (sibling repo at `/opt/iblogs`, package `indifferentketchup/iblogs`, fork of `aternosorg/mclogs`) is the primary consumer of codex via a Composer `vcs` repository entry pinned to the latest minor tag. Public-API changes in `src/{Detective,Log,Printer,Util}/*.php` and `src/Analysis/*.php` propagate there; when modifying those types, sanity-check the iblogs call sites at `/opt/iblogs/src/{Detective.php,Log.php,Printer/Printer.php,Printer/FormatModification.php,Api/Response/CodexLogResponse.php}` and the stub class at `/opt/iblogs/src/Data/Deobfuscator.php`.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
1. **`PatternParser` is incompatible with named regex groups.** PHP's `preg_match` returns named groups *plus* their numeric duplicates in the same array; `PatternParser`'s foreach iterates both and throws on the string-key entries. Convention: `LINE` regexes (used by the parser) use **unnamed** groups with field order documented in the Pattern class's docblock. Named groups are fine inside extractor regexes invoked from analysers, since `PatternAnalyser` hands the whole match array to `Insight::setMatches`.
|
||||
2. **PHPUnit 12 requires the `#[DataProvider('methodName')]` attribute.** The legacy `@dataProvider` annotation silently passes zero args and fails with `ArgumentCountError`.
|
||||
3. **`Level::fromString()` defaults to `Level::INFO` for unknown tokens.** Project Zomboid log levels map: `LOG`/`INFO` → INFO; `WARN` → WARNING; `ERROR` → ERROR.
|
||||
4. **`PatternParser` matches array** must declare a match-type for **every** capture group in the regex (`TIME`, `LEVEL`, or `PREFIX`); otherwise the parser throws on the unmapped index. Use non-capturing groups `(?:...)` for fields you want to skip.
|
||||
5. **`ProjectZomboidRedactor` pass order is mandatory.** `PLAYER_AFTER_STEAMID_REGEX` anchors on the already-redacted Steam ID placeholder — it will not match raw Steam IDs. Do NOT swap the Steam ID and player-name passes, and do NOT stub out the Steam ID pass while leaving the player-name pass enabled.
|
||||
|
||||
## Workflow conventions
|
||||
|
||||
- **One commit per concrete log type** when adding game support: pattern class + log subclass + synthetic fixture + test in a single commit, run `composer test`, then move on. `<Game>Detective::__construct()` wiring goes in its own follow-up commit once all log types are present.
|
||||
- **Out-of-scope cleanup goes in its own commit.** Tempting workflow/lint fixes (e.g. deprecated CI syntax, comment hygiene) noticed mid-feature should not be folded in — separate commit or follow-up PR.
|
||||
- **Pre-destructive checkpoint pattern.** Before bulk renames/moves: `git commit --allow-empty -m "pre-X checkpoint"` as a revert anchor. Skip the empty slot if it produces no diff at the end of a plan.
|
||||
- **Release flow.** Semver: a new public API surface bumps the minor version, not the patch (`v0.1.x → v0.2.x`). Cut: rename `[Unreleased]` to `[X.Y.Z] — YYYY-MM-DD` in `CHANGELOG.md`, add a `[X.Y.Z]:` link reference at the bottom, fresh empty `[Unreleased]` above; lightweight `backup/pre-vX.Y.Z` tag (local only) before annotated `git tag -a vX.Y.Z`; push the annotated tag only.
|
||||
- **Feature branches.** Substantive feature work lands on a `<feature>-bootstrap`-style branch off master with a `backup/pre-<feature>` lightweight tag at the branch start, merged `--no-ff` after user review. The `redactor` and `iblogs-bootstrap` branches set the precedent.
|
||||
- **Specs and plans live at** `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md` and `docs/superpowers/plans/YYYY-MM-DD-<topic>.md` per the brainstorming and writing-plans skill conventions.
|
||||
|
||||
## Privacy / fixture rules
|
||||
|
||||
|
||||
15
README.md
15
README.md
@@ -59,6 +59,21 @@ Project Zomboid Debug Server Log
|
||||
|
||||
If the log content arrives without a filesystem path (clipboard paste, web upload, stream), use `StringLogFile` or `StreamLogFile` instead of `PathLogFile`. The detective falls back to content signatures when the filename hint is absent.
|
||||
|
||||
## Redaction
|
||||
|
||||
Before rendering or exporting log content, pass it through `ProjectZomboidRedactor` to strip PII:
|
||||
|
||||
```php
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$safe = $redactor->redact($logContent);
|
||||
```
|
||||
|
||||
This scrubs three categories in a fixed pass order: Steam IDs are replaced with a zeroed placeholder, player names with `<player>`, and world coordinates with `0,0,0`. All three passes are on by default; opt out per category with `redactSteamIds(bool)`, `redactPlayerNames(bool)`, or `redactCoordinates(bool)`.
|
||||
|
||||
Documented v1 limitations: in PvP combat lines, only the attacker's name and coords are redacted — the victim's name and coords (appearing after `hit`) are deferred to v2. In admin lines, `teleported X to <coords>` coordinates are not redacted in v1.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
|
||||
211
docs/superpowers/plans/2026-05-01-redactor.md
Normal file
211
docs/superpowers/plans/2026-05-01-redactor.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Redactor Utility Implementation Plan
|
||||
|
||||
> Forward-looking. No code is written by this document.
|
||||
> Branch: `redactor` (off master `aec835e`). Backup tag: `backup/pre-redactor`.
|
||||
> Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
|
||||
**Goal:** Land the `RedactorInterface` plus a concrete `ProjectZomboidRedactor` implementation so iblogs (and any other downstream consumer) can scrub Project Zomboid log content of Steam IDs, player names, and world coordinates with a single call. The Redactor is a render-time filter on raw string content; raw stays canonical at the storage layer.
|
||||
|
||||
**Architecture:** Standalone string-in/string-out utility under a new top-level `src/Util/` directory, with per-game implementations under `src/Util/<Game>/`. Each implementation owns the lexical regex anchors for its game's PII shapes. Three independent toggles per implementation (`redactSteamIds`, `redactPlayerNames`, `redactCoordinates`); defaults all on; "all toggles off" yields verbatim passthrough.
|
||||
|
||||
**Tech stack:** PHP 8.4+, PHPUnit 12, Composer (`indifferentketchup/codex` v0.1.0+). All command invocations wrap in the `composer:latest` Docker image per `CLAUDE.md`.
|
||||
|
||||
---
|
||||
|
||||
## Design questions — resolved
|
||||
|
||||
### a. Render-time vs ingest-time
|
||||
|
||||
**Decision: render-time. Confirm spec's lean.**
|
||||
|
||||
Raw log content is canonical. Redaction is a view filter that consumers apply when they want to display, export, or analyse a redacted projection. iblogs's storage layer holds the unredacted upload (subject to iblogs's own upload-time `Filter` chain for IPs/access-tokens, which is a different layer of defence); the codex Redactor runs on the way *out* of storage, not on the way in.
|
||||
|
||||
**Why:** the alternative (ingest-time, where storage holds redacted content) is destructive — once stored, the original cannot be recovered for legitimate operator use. Render-time leaves the original in place and lets each render path opt in. iblogs gets a per-session toggle without needing to keep two copies of every paste.
|
||||
|
||||
**Implication for iblogs schema:** iblogs stores raw content; the redaction toggle in the iblogs UI invokes `ProjectZomboidRedactor::redact()` at render time (server-side) or at fetch time (API consumers' choice). No schema migration required for the redaction feature.
|
||||
|
||||
### b. Redactor as standalone class vs Printer decorator
|
||||
|
||||
**Decision: standalone utility (option iii from the question).**
|
||||
|
||||
The Redactor is a `string → string` function. It does not know about `Insight`, `Printer`, or any other codex type. Three options were considered:
|
||||
|
||||
- **(i) Printer wrapper.** Cleanly composable but ties the Redactor to the Printer abstraction. Doesn't help iblogs's most common case: redacting raw log content for display in a non-Printer rendering path (HTML page rendered server-side, raw download served to API client).
|
||||
- **(ii) Pre-Printer pass on Insights.** Heavy. Insights are typed objects with structured fields; redacting them means per-Insight code that knows which fields are PII-bearing. Against the YAGNI line for v1.
|
||||
- **(iii) Standalone string utility.** Simple, generic, works on any string input — raw log content, JSON-serialised analysis output, rendered Printer output piped through. Doesn't know about Insights.
|
||||
|
||||
The spec describes (iii). v1 ships (iii) only. If a Printer-wrapper convenience is later wanted, it can be added as a thin adapter that calls the standalone Redactor on the Printer's output; it doesn't require restructuring the core.
|
||||
|
||||
### c. PII field taxonomy for PZ
|
||||
|
||||
**Decision: regex-based with lexical context anchors. No structured-field detection in v1.**
|
||||
|
||||
PZ-specific PII categories observed in the in-tree fixtures and the `.scratch/pz/Logs/` reference corpus:
|
||||
|
||||
| Field | Detection | Rationale |
|
||||
|---|---|---|
|
||||
| Steam ID | regex with `76561198\d{9}` prefix anchor and word-boundary classes | Steam's `76561198` SteamID64 universe prefix lets us cleanly distinguish from other long numbers (timestamps, build numbers). |
|
||||
| Player name | regex with multi-context lexical anchors (after-Steam-ID-quoted, ChatMessage author, `Combat:`/`Safety:` subsystem) | Names are arbitrary strings — not detectable without context. The contexts are well-defined by the parser-side pattern classes. |
|
||||
| World coordinate triple | regex with bracket / paren / `at`-clause anchors | Generic `\d+,\d+,\d+` would over-redact server metadata (`f:0, t:NNNN, st:48,648,157,584`). Lexical context disambiguates. |
|
||||
|
||||
**Not redacted in v1:**
|
||||
|
||||
- **IP addresses.** PZ logs do not normally include IPs in any of the eleven file types observed. iblogs's upload-side `IPv4Filter` / `IPv6Filter` (ported from upstream mclogs) covers the rare case where a mod might log them.
|
||||
- **Server-side usernames distinct from player names.** PZ uses Steam display name as the player identity; there's no separate auth username layer. Mclogs's `UsernameFilter` is Minecraft-specific and isn't mirrored here.
|
||||
- **BurdJournals scientific-notation Steam IDs** (`7.65611…E16`). Spec open-question 2 explicitly defers this to v2; the `[BurdJournals]` tag already disambiguates them as mod-internal.
|
||||
|
||||
**Hybrid (regex + structured-field) deferred.** A v2 enhancement could redact specific Insight fields at JSON-serialisation time (e.g. `ConnectionFailureProblem::$steamId` → placeholder when serialised). Useful only if iblogs starts shipping the structured analysis JSON to redacted views — a real but currently hypothetical need.
|
||||
|
||||
### d. Replacement strategy
|
||||
|
||||
**Decision: per-category placeholder strings matching the synthetic-fixture conventions. Configurable replacement style is YAGNI for v1.**
|
||||
|
||||
Per the spec:
|
||||
|
||||
| Category | Replacement |
|
||||
|---|---|
|
||||
| Steam ID | `76561198000000000` (zeroed placeholder, still a syntactically valid Steam ID) |
|
||||
| Player name | `<player>` |
|
||||
| Coordinates | `0,0,0` (with shape preserved per anchor — bracketed, parenthesised, or `at` clause) |
|
||||
|
||||
Why these specifically and not `[REDACTED]` / `[STEAM_ID]` / hashed:
|
||||
|
||||
- The placeholders **match the existing synthetic test fixtures** (`76561198000000001`–`76561198000000004` collapse to `76561198000000000`; player names `Player1`/`Player2`/`AdminUser` collapse to `<player>`). Tests can verify "redacted output looks like a synthetic fixture."
|
||||
- Shape preservation means downstream consumers can still parse the redacted output with the same Pattern classes — a redacted log is still a syntactically valid PZ log, it just contains no identities.
|
||||
- Type-tagged replacements (`[STEAM_ID]`) break shape preservation: a Pattern looking for `\d{17}` would fail. Worth offering as a config option if a consumer specifically wants type-visibility, but v1 ships placeholder-only.
|
||||
- Hashing breaks shape preservation similarly and adds determinism / collision concerns.
|
||||
|
||||
If a consumer later needs `[STEAM_ID]`-style output, a `setReplacementStyle('typed' | 'placeholder' | 'redacted')` setter can be added without breaking the v1 API. v1 ships placeholder-only.
|
||||
|
||||
### e. Game-agnostic vs PZ-specific layout
|
||||
|
||||
**Decision: thin generic interface in `src/Util/` plus PZ-specific implementation in `src/Util/ProjectZomboid/`.**
|
||||
|
||||
```
|
||||
src/Util/
|
||||
├── RedactorInterface.php (1 method: redact(string): string)
|
||||
└── ProjectZomboid/
|
||||
└── ProjectZomboidRedactor.php (toggles + regex passes)
|
||||
```
|
||||
|
||||
**YAGNI tradeoff stated:** the interface has one method and currently one implementation. Strictly, YAGNI says collapse to just `ProjectZomboidRedactor` and skip the interface. The interface earns its keep because **iblogs's call sites will type-hint against `RedactorInterface`**, not the concrete class — that's the architectural payoff. Consumer code stays loosely coupled; when Minecraft or another game ships a redactor, iblogs swaps the implementation by changing one DI binding rather than touching call sites.
|
||||
|
||||
The cost is two files instead of one. Acceptable given the dependency-inversion benefit. The directory layout (`src/Util/<Game>/`) mirrors the components-outer-with-game-suffix convention used everywhere else in the tree (Analyser, Analysis, Detective, Log, Parser, Pattern).
|
||||
|
||||
**Note on the new `src/Util/` directory.** Codex currently has no `src/Util/` (the Phase A scaffolding established Analyser / Analysis / Detective / Log / Parser / Pattern / Printer; Phase B.3 added Analyser/ProjectZomboid content but not Util). The Redactor introduces this new top-level. This is an additive change — no existing code is modified.
|
||||
|
||||
### f. Test strategy
|
||||
|
||||
**Decision: hybrid — small dedicated synthetic fixtures under `test/src/Util/Redactor/` for direct unit tests, plus an integration test that runs the Redactor over an existing PZ fixture and asserts idempotence.**
|
||||
|
||||
**Dedicated unit fixtures** (small string constants in test classes, not separate files): per spec test plan #1–#5. Each test class owns its input/expected pairs. Keeps unit tests self-contained and fast.
|
||||
|
||||
**Integration test** that re-uses an existing PZ fixture (e.g. `test/src/Games/ProjectZomboid/fixtures/admin-minimal.txt`). Two assertions:
|
||||
|
||||
- The Redactor's output is a syntactically valid log (still parses cleanly through the corresponding `ProjectZomboidAdminLog`).
|
||||
- Idempotence: `redact(redact($x)) === redact($x)`. Existing fixture content is already placeholder-shaped, so the redactor should leave it byte-for-byte identical OR apply the canonical normalisation once and then no-op.
|
||||
|
||||
**False-positive avoidance.** The synthetic fixtures use `76561198000000001` etc. as placeholder Steam IDs. The Redactor's Steam ID regex matches the `76561198\d{9}` prefix and replaces with `76561198000000000` — so `76561198000000001` becomes `76561198000000000` (a normalisation, not a corruption). Tests verify this normalisation is correct and that legitimate-non-PII data (e.g. server metadata triples like `f:0, t:1776297642406, st:48,648,157,584`) is **not** touched.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
Tasks are intended for the `redactor` branch. Each is a single logical commit. Test-running between commits uses the standard Docker invocation. Work proceeds only after Step 0 sign-off (this plan reviewed).
|
||||
|
||||
### Task 0 — Plan doc commit
|
||||
|
||||
- [ ] **Step 0.1.** Already done out-of-band: `git checkout -b redactor` off master `aec835e`; `git tag backup/pre-redactor` at branch tip; this plan written.
|
||||
- [ ] **Step 0.2.** Commit this plan: `docs: add Redactor implementation plan` on branch `redactor`. Push branch to origin for review.
|
||||
|
||||
### Task 1 — Scaffold (interface + skeleton class with toggles)
|
||||
|
||||
- [ ] **Step 1.1.** Create `src/Util/RedactorInterface.php`. Single method: `public function redact(string $content): string;` PHPDoc describing the contract: stateless from the caller's perspective; configuration happens via implementation-specific setters before `redact()`.
|
||||
- [ ] **Step 1.2.** Create `src/Util/ProjectZomboid/ProjectZomboidRedactor.php` that implements the interface. Class structure: three private bool properties (`$redactSteamIds`, `$redactPlayerNames`, `$redactCoordinates`) all defaulting to `true`; three fluent setters (`redactSteamIds(bool): static`, etc.); `redact(string): string` body that returns input unchanged when all toggles are off (for now — regex passes added in subsequent tasks).
|
||||
- [ ] **Step 1.3.** Run `composer test` — expect 195 tests still green (no Redactor tests yet).
|
||||
- [ ] **Step 1.4.** Commit: `feat: scaffold RedactorInterface and ProjectZomboidRedactor with toggles`.
|
||||
|
||||
### Task 2 — Steam ID redaction pass
|
||||
|
||||
- [ ] **Step 2.1.** Add `STEAM_ID_REGEX` and `STEAM_ID_REPLACEMENT` constants on `ProjectZomboidRedactor`. Regex uses the `76561198\d{9}` prefix anchor with word-boundary classes (per spec). The `/u` flag is added to all regexes for Unicode safety even though Steam IDs themselves are ASCII.
|
||||
- [ ] **Step 2.2.** Implement the Steam ID branch of `redact()`: when `$redactSteamIds` is true, run `preg_replace` against the input.
|
||||
- [ ] **Step 2.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorSteamIdTest.php`. Tests: redaction of various distinct synthetic Steam IDs collapses all to `76561198000000000`; non-Steam-ID 17-digit numbers (e.g. timestamps) are not touched; toggle-off leaves Steam IDs intact.
|
||||
- [ ] **Step 2.4.** Run `composer test`. Expect new tests pass; old 195 unaffected.
|
||||
- [ ] **Step 2.5.** Commit: `feat: add Steam ID redaction pass`.
|
||||
|
||||
### Task 3 — Player name redaction pass
|
||||
|
||||
- [ ] **Step 3.1.** Add three regex constants on `ProjectZomboidRedactor` for the three player-name lexical contexts: `PLAYER_AFTER_STEAMID_REGEX`, `PLAYER_IN_CHATMESSAGE_REGEX`, `PLAYER_IN_PVP_SUBSYSTEM_REGEX`. Replacement is `<player>` for all. **Order constraint:** the after-Steam-ID context anchors on the post-redaction Steam ID `76561198000000000`, so the player-name pass must run *after* the Steam ID pass. Document this in a class-level docblock.
|
||||
- [ ] **Step 3.2.** Implement the player-name branch of `redact()`: three sequential `preg_replace` calls when `$redactPlayerNames` is true.
|
||||
- [ ] **Step 3.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorPlayerNameTest.php`. Tests: each of the three contexts redacts correctly when paired with its anchor; a bare quoted string (e.g. `"foo"` not preceded by a Steam ID) is **not** touched; toggle-off leaves names intact; the after-Steam-ID context works correctly when the Steam ID has already been redacted to the zeroed placeholder.
|
||||
- [ ] **Step 3.4.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 3.5.** Commit: `feat: add player name redaction pass`.
|
||||
|
||||
### Task 4 — Coordinates redaction pass
|
||||
|
||||
- [ ] **Step 4.1.** Add three regex constants on `ProjectZomboidRedactor` for the three coordinate contexts: `COORDS_AT_CLAUSE_REGEX`, `COORDS_BRACKETED_REGEX`, `COORDS_PARENTHESISED_REGEX`. Replacements preserve shape (`0,0,0` inside whatever bracket/paren wrapper).
|
||||
- [ ] **Step 4.2.** Implement the coords branch of `redact()`: three sequential `preg_replace_callback` (or `preg_replace`) calls when `$redactCoordinates` is true.
|
||||
- [ ] **Step 4.3.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCoordinatesTest.php`. Tests: each of the three contexts redacts correctly; **negative test** — server metadata `f:0, t:1776297642406, st:48,648,157,584` is not touched; basement Z-coordinates (`-1`) are handled; toggle-off leaves coords intact.
|
||||
- [ ] **Step 4.4.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 4.5.** Commit: `feat: add coordinates redaction pass`.
|
||||
|
||||
### Task 5 — Combined / toggle / idempotence tests
|
||||
|
||||
- [ ] **Step 5.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php`. Tests cover: combined input with all three PII categories present produces fully-scrubbed output when all toggles on; each toggle off in isolation produces partial scrubbing matching the toggle's category; all toggles off returns input byte-for-byte identical (`===` equality).
|
||||
- [ ] **Step 5.2.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIdempotenceTest.php`. Tests: `redact(redact($x)) === redact($x)` for several input shapes including all three PII categories.
|
||||
- [ ] **Step 5.3.** Run `composer test`. Expect new tests pass.
|
||||
- [ ] **Step 5.4.** Commit: `test: add Redactor combined and idempotence coverage`.
|
||||
|
||||
### Task 6 — Existing-fixture integration tests
|
||||
|
||||
- [ ] **Step 6.1.** Create `test/tests/Util/Redactor/ProjectZomboidRedactorIntegrationTest.php`. Loads each existing PZ fixture (`admin-minimal.txt`, `chat-minimal.txt`, etc.) via `PathLogFile`, calls `redact()` on the content, and asserts: (a) the redacted content still parses cleanly through the corresponding `ProjectZomboid<X>Log`'s parser without throwing; (b) the synthetic Steam IDs `76561198000000001`–`76561198000000004` all collapse to `76561198000000000`; (c) the synthetic player names (`Player1`, `Player2`, `AdminUser`, `PlayerSuspect`) all collapse to `<player>`.
|
||||
- [ ] **Step 6.2.** Run `composer test`. Expect all integration assertions pass without modifying any existing test or fixture.
|
||||
- [ ] **Step 6.3.** Commit: `test: add Redactor integration coverage against existing PZ fixtures`.
|
||||
|
||||
### Task 7 — Documentation updates
|
||||
|
||||
- [ ] **Step 7.1.** Update `CLAUDE.md`: add a one-line `src/Util/` mention to the framework architecture section; one-line note in the ProjectZomboid specifics section pointing at `ProjectZomboidRedactor` for downstream PII scrubbing; update the "Scaffolded games" line to mention that `ProjectZomboid` now also has a Redactor implementation under `src/Util/ProjectZomboid/`.
|
||||
- [ ] **Step 7.2.** Update `README.md`: add a short usage block showing `(new ProjectZomboidRedactor())->redact($logContent)` as a render-time scrub option, alongside the existing worked example.
|
||||
- [ ] **Step 7.3.** Update `CHANGELOG.md`: move Redactor out of the **Deferred** section under `[0.1.0]`, OR add a new `[Unreleased]` section if the v0.1.0 line should remain accurate as-shipped. Decision: **add `[Unreleased]`** — v0.1.0 was tagged without the Redactor and the changelog should reflect the historical truth.
|
||||
- [ ] **Step 7.4.** Run `composer test` once more for safety; confirm 195+(redactor tests) green.
|
||||
- [ ] **Step 7.5.** Commit: `docs: document Redactor utility in CLAUDE.md, README, CHANGELOG`.
|
||||
|
||||
### Task 8 — Final verification
|
||||
|
||||
- [ ] **Step 8.1.** Run `composer test`. All tests green.
|
||||
- [ ] **Step 8.2.** Re-run `vendor/bin/phpunit --display-deprecations --display-warnings --display-notices --display-errors`. Expect zero output beyond the standard pass summary.
|
||||
- [ ] **Step 8.3.** Sanity-check the branch with `git log --oneline master..redactor`. Should be the plan-doc commit plus 7 implementation commits = 8 commits total.
|
||||
- [ ] **Step 8.4.** Push final state: `git push origin redactor`. **Do NOT merge to master.** User reviews diff and approves merge separately.
|
||||
|
||||
---
|
||||
|
||||
## Open questions / spec gaps
|
||||
|
||||
The spec is generally tight. Items worth flagging while implementing:
|
||||
|
||||
1. **`/u` flag for Unicode safety.** Spec doesn't specify regex flags. PZ player names can contain non-ASCII characters (Steam display names are Unicode-permissive). The implementation will use `/u` on all regexes to avoid mangling multi-byte sequences. Documenting in the class docblock.
|
||||
2. **Replacement order.** Spec says "Redaction order matters: SIDs first, names second" because the after-Steam-ID player-name regex anchors on the redacted Steam ID. The implementation will enforce this order in `redact()` (Steam ID pass first, then names, then coords). The class docblock will document the ordering invariant.
|
||||
3. **HTML / JSON-encoded input.** Spec assumes plain log text. If a consumer feeds HTML-escaped content (e.g. `"` instead of `"`), the player-name regex won't match. Document as a v2 concern: callers feed plain text in, render afterwards. v1 does not implement HTML/JSON-aware mode.
|
||||
4. **Future PII categories.** v1 ships exactly the three toggles per spec. New categories (emails, IPs from mods, etc.) extend the toggle set in a future release; v1 does not pre-build extension points beyond what the interface already provides.
|
||||
5. **`src/Util/` is a new top-level directory** in this codebase. The Redactor is the first occupant. Future utilities (e.g. a tokenizing variant per spec open-question 1) would also live here. No existing-code modification is needed; the new directory is purely additive.
|
||||
6. **The empty `src/Printer/<Game>/.gitkeep` situation.** Phase A scaffolding chose not to create `Printer/<Game>/` directories at all (only Analyser/Detective/Log/Parser/Pattern got per-game subdirs). The Redactor's home in `src/Util/<Game>/` mirrors that — `src/Util/` is created with PZ as its first occupant; no stub `Hytale/`/`Minecraft/`/`SevenDaysToDie/` placeholders are scaffolded. When other games' redactors land, they create their own subdirectories at that point.
|
||||
|
||||
No spec contradictions found. No existing-code modifications required (additive-only design).
|
||||
|
||||
---
|
||||
|
||||
## Branch / commit invariants
|
||||
|
||||
- All commits land on the `redactor` branch.
|
||||
- Master is not touched until the user explicitly approves merge after reviewing the diff.
|
||||
- Conventional commit prefixes: `docs:`, `feat:`, `test:`, `refactor:`. (No `fix:` expected — this is greenfield work.)
|
||||
- One logical concept per commit. Tasks 1, 2, 3, 4 each ship implementation + per-pass tests in one commit; Task 5 / 6 / 7 are pure-test or pure-docs commits.
|
||||
- Backup tag `backup/pre-redactor` at `aec835e` lets us discard the branch and recover if the implementation goes sideways.
|
||||
- Branch can be pushed to origin freely for visibility / review checkpoints.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Spec: `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
- Synthetic fixtures the integration test will reuse: `test/src/Games/ProjectZomboid/fixtures/*.txt`.
|
||||
- Existing per-game layout precedent: `src/Analyser/ProjectZomboid/`, `src/Pattern/ProjectZomboid/`, `src/Log/ProjectZomboid/`.
|
||||
- Workflow conventions and pitfalls: `CLAUDE.md`.
|
||||
@@ -1,7 +1,7 @@
|
||||
# Codex Redactor utility — design spec
|
||||
|
||||
> Retroactive: written 2026-05-01.
|
||||
> **Status: deferred — not implemented.** This is a forward-looking design captured here for backfill symmetry and to inform iblogs's upload-time PII handling.
|
||||
> **Status: implemented on the `redactor` branch (2026-05-01).** Plan: `docs/superpowers/plans/2026-05-01-redactor.md`. Arrival commit set documented in `CHANGELOG.md` `[Unreleased]`. The "Status: deferred" framing below is preserved for historical context; treat this file as the as-built design contract.
|
||||
|
||||
## Summary
|
||||
|
||||
|
||||
186
docs/superpowers/specs/2026-05-01-iblogs-bootstrap-design.md
Normal file
186
docs/superpowers/specs/2026-05-01-iblogs-bootstrap-design.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# iblogs bootstrap design
|
||||
|
||||
> Written 2026-05-01.
|
||||
> **Scope:** design only. No iblogs code is written by this document; the actual fork, rename, and rewire happen in a follow-up session after this design is approved.
|
||||
|
||||
## Summary
|
||||
|
||||
iblogs is a Project-Zomboid-first log triage service forked from `aternosorg/mclogs`. It consumes `indifferentketchup/codex` (pinned at `v0.1.0`) for log detection, parsing, and analysis, replacing mclogs's `aternos/codex-minecraft` / `aternos/codex-hytale` / `aternos/sherlock` dependency stack. The data model gains a session entity that wraps the multiple files Project Zomboid produces per server session (eleven file types per session), while mclogs's existing single-paste paths remain alive as legacy routes that map to "session of size 1."
|
||||
|
||||
## (a) Fork target verification
|
||||
|
||||
| Check | Value |
|
||||
|---|---|
|
||||
| Upstream | `github.com/aternosorg/mclogs` |
|
||||
| Default branch | `main` |
|
||||
| License | **MIT** (SPDX `MIT`) — compatible with `indifferentketchup/codex`'s MIT |
|
||||
| Last push | `2026-03-30` (active; ~30 days ago) |
|
||||
| Last update | `2026-04-26` |
|
||||
| Archived | no |
|
||||
| Stars / open issues | 290 / 2 |
|
||||
| PHP requirement | `>=8.5`, plus `ext-frankenphp`, `ext-mongodb`, `ext-uri`, `ext-zlib`, `ext-mbstring`, `ext-json` |
|
||||
| Storage | MongoDB |
|
||||
| Existing codex dep | yes — `aternos/codex-minecraft ^5.0.1` and `aternos/codex-hytale ^2.0` |
|
||||
|
||||
**Verdict: GO.** License is compatible. Project is actively maintained. No archival or licensing blockers. The fact that mclogs already integrates Aternos's codex stack tells us the fork's swap surface is well-defined: replace those Composer deps and the codex-facing call sites in `src/Api/Action/AnalyseLogAction.php` / `src/Api/Action/LogInsightsAction.php` / `src/Api/Response/CodexLogResponse.php` / `src/Detective.php` / `src/Log.php`.
|
||||
|
||||
The PHP `>=8.5` floor is stricter than codex's `>=8.4` — iblogs inherits the stricter constraint, which is fine. The `ext-frankenphp` requirement means iblogs runs on the FrankenPHP runtime rather than vanilla PHP-FPM; preserving this is the path of least resistance.
|
||||
|
||||
`aternos/sherlock` (MIT, "PHP library to apply minecraft mappings to log files") is Minecraft-specific (Mojang obfuscation maps). It is **not needed for PZ** and gets dropped. If iblogs ever adds Minecraft support, it can come back.
|
||||
|
||||
## (b) Repo plan
|
||||
|
||||
**Primary remote:** Gitea at `git.indifferentketchup.com:2222`. Fork as `indifferentketchup/iblogs`. SSH clone URL: `ssh://git@git.indifferentketchup.com:2222/indifferentketchup/iblogs.git`. Match the codex repo's existing Gitea setup.
|
||||
|
||||
**GitHub mirror:** Push-only secondary, configured via Gitea's Mirror feature (Repo Settings → Mirror Settings → Push Mirror). Same pattern any team using Gitea-as-primary uses for visibility.
|
||||
|
||||
**Composer dep on codex.** iblogs's `composer.json` gains a `repositories` entry of type `vcs` pointing at the codex Gitea URL (`ssh://git@git.indifferentketchup.com:2222/indifferentketchup/ik-codex.git`), and a `require` entry for `indifferentketchup/codex` pinned to exactly `0.1.0`. The exact pin is preferred over `^0.1.0` for early-version (0.x) releases where minor bumps may carry breaking changes.
|
||||
|
||||
**Removed deps:** `aternos/codex-minecraft`, `aternos/codex-hytale`, `aternos/sherlock`. The first two are replaced by `indifferentketchup/codex` (which covers Project Zomboid and ships detective stubs for Minecraft / Hytale / SevenDaysToDie that iblogs will not use in v0.1). The third (Sherlock) is Minecraft-mapping-specific and not relevant to PZ.
|
||||
|
||||
**Package name.** `aternos/mclogs` becomes `indifferentketchup/iblogs`. Composer name and the PSR-4 namespace move together: `Aternos\Mclogs\` → `IndifferentKetchup\Iblogs\`.
|
||||
|
||||
## (c) Multi-file / session paste model
|
||||
|
||||
Project Zomboid produces eleven log files per server session. The data model needs to accommodate this without breaking mclogs's existing single-paste consumers.
|
||||
|
||||
### Option (i) — 1 file = 1 paste, sibling-link via shared `session_id`
|
||||
|
||||
- **Pros:** minimal schema change. Reuse mclogs's existing `Log` per file. Sibling discovery is a `session_id` index.
|
||||
- **Cons:** no atomic ingest (zip becomes N independent uploads). Session views require runtime joins. `session_id` propagation through upload UX is fiddly (URL param? cookie? hidden form field?).
|
||||
- **Effort:** low.
|
||||
|
||||
### Option (ii) — zip upload explodes server-side into N linked pastes
|
||||
|
||||
- **Pros:** atomic ingest. One endpoint for whole-session upload. Maps cleanly to PZ's natural zip-of-logs deliverable.
|
||||
- **Cons:** zip-only ingest is restrictive (no single-file paste UX for users with just `DebugLog-server.txt`). Server-side zip extraction is attack surface (zip bombs, path traversal). Doubles upload paths if single-file is also supported.
|
||||
- **Effort:** medium.
|
||||
|
||||
### Option (iii) — session entity wraps N file entities (1:N relation)
|
||||
|
||||
- **Pros:** rich session model. Single URL for the whole session; child URLs per file. PZ's eleven-file natural session maps cleanly. mclogs's single-paste maps to "session of size 1," so the model degenerates gracefully into legacy behaviour. Session-level metadata (server name, date range, total size) becomes first-class.
|
||||
- **Cons:** most schema migration. Two URL types in routing. More concepts in the API.
|
||||
- **Effort:** medium-high.
|
||||
|
||||
### Recommendation: option (iii)
|
||||
|
||||
PZ's natural unit IS a session — the server emits all eleven files per restart, ZIP-bundled in production. Single-file uploads (the mclogs default UX) become "session of size 1" with no special-case code; the legacy `/api/1/log` routes return a paste that happens to belong to a singleton session. Cross-file analysis (e.g. correlating a `ServerExceptionProblem` from `DebugLog-server.txt` with a `ConnectionFailureProblem` from `user.txt`) is unlocked because both files share a `session_id`. The 1:N model is the only one that supports cross-file analysers in any future Phase B.4-equivalent on iblogs's side.
|
||||
|
||||
## (d) UI changes
|
||||
|
||||
**Primary nav: file-type tabs.** Within a session, eleven tabs (one per PZ file type) with a count badge (e.g. `DebugLog (6,998 lines)`, `chat (115)`). Clicking a tab loads that file's content + analysis. Tab order: DebugLog-server first (most useful for triage), then admin, user, chat, item, map, perk, pvp, ClientActionLog, cmd, BurdJournals.
|
||||
|
||||
**Secondary nav: session index sidebar.** Lists the user's recent sessions (cookie-driven, like mclogs's history). Less primary than tabs.
|
||||
|
||||
**Default view.** `/session/{id}` lands on the DebugLog-server tab by default — that file is what admins want to see when something is broken.
|
||||
|
||||
**Redaction toggle.** Per-session checkbox in the toolbar: "Redact PII". Behaviour depends on Step 4 (codex Redactor) status:
|
||||
- If Redactor ships first: toggle invokes `ProjectZomboidRedactor::redact()` on the rendered file content client-side or server-side (decision for the implementation pass).
|
||||
- If Redactor is still deferred: toggle is hidden in v0.1 of iblogs. Upload-time PII filtering still happens via the ported `Filter` chain (see `src/Filter/*` upstream — `IPv4Filter`, `IPv6Filter`, `AccessTokenFilter`, `UsernameFilter`).
|
||||
|
||||
**Branding.** Drop the "Built for Minecraft & Hytale" tagline and visual cues. Replace `mclo.gs` brand references with whatever short-domain iblogs uses (open question — see (h)). Color palette decision is open; mclogs's green accent (`#5cb85c` in `example.config.json`) is fine to keep or change.
|
||||
|
||||
## (e) API surface
|
||||
|
||||
Iblogs exposes a session-oriented API on top of the recommended (iii) model, plus the legacy mclogs paths kept alive.
|
||||
|
||||
| Path | Method | Purpose |
|
||||
|---|---|---|
|
||||
| `/api/session` | POST | Create a session by uploading one zip OR multiple file fields. Returns `session_id` plus a list of `{type, paste_id}` for each contained file. |
|
||||
| `/api/session/{id}` | GET | Return session metadata + array of contained pastes (`{type, paste_id, line_count, size_bytes}`). |
|
||||
| `/api/session/{id}/file/{type}` | GET | Return one file's content and its codex analysis result. `{type}` is one of the eleven PZ file-type tokens (`server`, `chat`, `clientaction`, `cmd`, `item`, `map`, `perk`, `pvp`, `admin`, `user`, `burdjournals`). |
|
||||
| `/api/paste/{id}` | GET | Single-paste back-compat. Returns content + analysis for any paste (whether part of a multi-file session or a singleton). |
|
||||
| `/api/1/log` | POST | Legacy mclogs path — kept alive. Internally creates a singleton session under the hood and returns the existing-shape mclogs response. |
|
||||
| `/api/1/log/{id}` | GET | Legacy mclogs path — kept alive. Same as `/api/paste/{id}` with the legacy response shape. |
|
||||
|
||||
The legacy paths preserve mclogs's API contract for any third-party clients that already integrate with `mclo.gs` or self-hosted mclogs instances. Upgrading clients to the session-aware API is opt-in.
|
||||
|
||||
## (f) String / branding inventory
|
||||
|
||||
Producing exact `path:line` references requires the cloned working copy of the fork. This section gives directional pointers from the fetched-but-not-cloned upstream tree at `aternosorg/mclogs:main`. The actual line-precise inventory belongs in a follow-up commit on the iblogs side, after the fork exists and can be `grep`ped.
|
||||
|
||||
**Composer / package metadata** — file `composer.json` upstream (no local clone, line refs not yet known):
|
||||
- `"name": "aternos/mclogs"` → `"indifferentketchup/iblogs"`
|
||||
- `"description": "Paste, share and analyse Minecraft logs"` → describe iblogs scope (PZ-first, server-log triage)
|
||||
- `"authors"` block (currently `Matthias Neid <matthias@aternos.org>`) → replace with `indifferentketchup` author
|
||||
- `require` block:
|
||||
- drop `aternos/codex-minecraft`
|
||||
- drop `aternos/codex-hytale`
|
||||
- drop `aternos/sherlock`
|
||||
- add `indifferentketchup/codex` pinned to `0.1.0`
|
||||
- `autoload.psr-4` mapping `"Aternos\\Mclogs\\": "src/"` → `"IndifferentKetchup\\Iblogs\\": "src/"`
|
||||
- new top-level `repositories` array entry of type `vcs` pointing at the codex Gitea URL
|
||||
|
||||
**Namespace bulk substitution** — every PHP file under `src/` (which is roughly 50+ files based on the upstream tree). The pattern mirrors the codex rename in commit `66a2fcc`: bulk `Aternos\Mclogs` → `IndifferentKetchup\Iblogs` across `namespace`, `use`, fully-qualified refs, and PHPDoc tags. Done as one logical commit on the iblogs side per the codex-side precedent.
|
||||
|
||||
**Codex API call sites** — the files mclogs uses to integrate Aternos's codex stack, all under `src/`:
|
||||
- `src/Detective.php` — likely a wrapper around `aternos/codex-minecraft`'s Detective. Swap to `IndifferentKetchup\Codex\Detective\ProjectZomboid\ProjectZomboidDetective` (or wrap multiple game detectives if iblogs ever supports more games).
|
||||
- `src/Log.php` — likely a wrapper. Re-point to codex's `Log` hierarchy.
|
||||
- `src/Api/Action/AnalyseLogAction.php` — the `analyse` endpoint. Update to call codex's `AnalysableLog::analyse()` with the new analyser surface.
|
||||
- `src/Api/Action/LogInsightsAction.php` — insights endpoint.
|
||||
- `src/Api/Response/CodexLogResponse.php` — response shape; verify field-by-field against `IndifferentKetchup\Codex\Analysis\AnalysisInterface::jsonSerialize()`.
|
||||
- `src/Api/Action/CreateLogAction.php` — log creation; integration with codex's `Detective::detect()`.
|
||||
- `src/Api/Action/RawLogAction.php`, `src/Api/Action/LogInfoAction.php` — verify these don't depend on Minecraft-specific codex behaviour.
|
||||
|
||||
**Frontend templates and assets** — file paths only, exact branding strings discovered post-clone:
|
||||
- `web/frontend/start.php` — landing page; "Paste, share and analyse Minecraft logs" hero copy lives here.
|
||||
- `web/frontend/api-docs.php` — API documentation page.
|
||||
- `web/frontend/parts/header.php`, `parts/footer.php`, `parts/head.php` — site title, meta tags, footer links to legal info.
|
||||
- `web/frontend/log.php` — log view template (probably hardcodes the syntax-highlighting language token — needs to handle multiple PZ file types).
|
||||
- `web/frontend/404.php` — error page copy.
|
||||
- `web/public/css/mclogs.css` — file is **renamed** to `iblogs.css` and CSS class names referencing `mclogs` are renamed.
|
||||
- `web/public/js/start.js`, `web/public/js/log.js` — likely contain text constants and reference `mclogs.css` filename.
|
||||
- `web/public/img/logo-icon.svg`, `logo.svg`, `favicon.ico` — replaced with iblogs assets.
|
||||
|
||||
**Configuration** — file `example.config.json`:
|
||||
- database name `mclogs` → `iblogs`
|
||||
- abuse contact `abuse@aternos.org` → iblogs contact (open question — see (h))
|
||||
- imprint and privacy policy links currently point at `aternos.gmbh` → iblogs equivalents
|
||||
- `mclo.gs` brand reference in the frontend styling section → new iblogs short-domain (open question)
|
||||
- worker request limit, ID length, TTL — review for iblogs-appropriate values; PZ sessions are larger than mclogs single pastes so size and line limits may need raising.
|
||||
|
||||
**Docker / deployment** — files `Dockerfile`, `docker/Caddyfile`, `docker/compose.production.yaml`, `docker/mclogs.ini`:
|
||||
- Image label maintainer references
|
||||
- Caddyfile likely hardcodes `mclo.gs` hostname for TLS certificates → replace with iblogs hostname
|
||||
- Compose service name `mclogs` → `iblogs`
|
||||
- File `docker/mclogs.ini` is renamed and its contents updated
|
||||
|
||||
**`LICENSE` file** — per MIT requirements, the original Aternos copyright line stays byte-for-byte unchanged. iblogs's LICENSE preserves the upstream copyright header. This mirrors codex's handling of its own upstream LICENSE.
|
||||
|
||||
**`README.md`** — full rewrite. Title, description, install line, links to upstream codex repo, scope statement (PZ-first, server-log triage). Drop Minecraft / Hytale framing entirely.
|
||||
|
||||
**Filter classes for PZ-specific PII** — upstream's filter chain (`src/Filter/IPv4Filter.php`, `IPv6Filter.php`, `AccessTokenFilter.php`, `UsernameFilter.php`) handles Minecraft-style PII (server access tokens, Minecraft-pattern usernames). For PZ, iblogs may need new filters: `SteamIdFilter`, `WorldCoordinateFilter`, and a PZ-aware username filter (Steam usernames look different from Minecraft ones). These are net-new code, not branding renames.
|
||||
|
||||
## (g) Migration
|
||||
|
||||
**Keep mclogs's existing single-paste API routes alive as legacy.** Two reasons:
|
||||
1. mclogs has live API consumers calling `POST /api/1/log` and `GET /api/1/log/{id}` against `mclo.gs` and self-hosted instances. Iblogs's primary value is PZ support, not breaking compat with the broader mclogs ecosystem.
|
||||
2. Under model option (iii), legacy single pastes are naturally "sessions of size 1." Zero extra schema work to support legacy routes — they just internally create singleton sessions.
|
||||
|
||||
**Strip:** `aternos/codex-minecraft`, `aternos/codex-hytale`, `aternos/sherlock` Composer deps; the `Aternos\Mclogs\` namespace; mclogs-specific branding strings; the `mclo.gs` hostname hardcodes; Minecraft-mapping deobfuscation code paths.
|
||||
|
||||
**Preserve:** the upstream `Filter` chain (it solves real problems — IP redaction, access tokens, usernames); the FrankenPHP runtime; MongoDB storage layer; the cookie-based session-history UX; the Caddy fronting.
|
||||
|
||||
## (h) Open questions
|
||||
|
||||
1. **`aternos/sherlock` license confirmation** — verified MIT (this design doc fetched the metadata) but iblogs is dropping it. No issue.
|
||||
2. **`ext-frankenphp` keep / replace decision** — recommend keep for v0.1 (path of least resistance). Migrating to vanilla nginx+php-fpm is its own project and can come later.
|
||||
3. **Branding decisions:**
|
||||
- Site name: `iblogs` (lowercase) seems chosen given the project mention `indifferentketchup/iblogs`. Confirm.
|
||||
- Tagline: needs writing. "Project Zomboid server log triage" is honest; longer-form copy is open.
|
||||
- Short-domain: mclogs uses `mclo.gs`. Is there an iblogs equivalent (`iblo.gs`? `ib.gs`?)? Affects Caddyfile, frontend assets, and docs links.
|
||||
- Accent / palette: keep mclogs green (`#5cb85c`) or pick a different colour?
|
||||
4. **Database choice:** keep MongoDB or migrate to PostgreSQL / SQLite? Migrating away from Mongo is a significant project; recommend keep for v0.1.
|
||||
5. **API URL versioning:** mclogs uses `/api/1/`. Stay with `/api/1/` for legacy paths (compat) and add `/api/session/...` for new endpoints (no version prefix), or use `/api/v2/session/...`? Recommend the former — minimum surface change.
|
||||
6. **Session-ID generation:** mclogs uses 7-character IDs. For iblogs sessions of N files, pick (a) one session-ID + N independent paste-IDs (richer URLs) or (b) single ID per paste with a sibling `session_id` field (simpler). Affects URL shape.
|
||||
7. **The codex Redactor utility.** Iblogs's redaction toggle (section d) depends on whether Step 4 (Redactor implementation) ships before or after iblogs scaffolding. **Decision deferred to user (Step 4 of the careful run).**
|
||||
8. **PZ-specific filter classes** (`SteamIdFilter`, `WorldCoordinateFilter`, etc.) — net-new work for iblogs. Could lift the regex shapes from `docs/superpowers/specs/2026-04-30-redactor-design.md` (they're the same PII categories). Implementation order: iblogs likely wants these for its upload-time filter chain regardless of whether the codex `Redactor` ships.
|
||||
9. **Multi-game support trajectory.** v0.1 of iblogs is PZ-first. If Minecraft / Hytale / SevenDaysToDie support is on the roadmap, iblogs's Detective wiring needs to be a multi-game dispatcher (not just `ProjectZomboidDetective`). Codex provides the per-game detectives separately; iblogs would compose them. Out of scope for v0.1.
|
||||
10. **The exact line-precise branding inventory** (every file:line ref of `Minecraft` / `Hytale` / `MC` / `mc` / `mclogs` / `mclo.gs` / `Aternos`). This document gives file-level pointers; the line-precise version is produced as a separate work item once the fork is cloned and grep-able.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Codex package consumed: `indifferentketchup/codex` v0.1.0, tag SHA `8a89550` (annotated tag) pointing at commit `52ff8cb`.
|
||||
- Codex Redactor design (deferred): `docs/superpowers/specs/2026-04-30-redactor-design.md`.
|
||||
- Codex CHANGELOG: `CHANGELOG.md` in this repo.
|
||||
- Upstream mclogs: `https://github.com/aternosorg/mclogs` (MIT, `main` default branch, last push 2026-03-30).
|
||||
@@ -0,0 +1,246 @@
|
||||
# PZ deterministic classifier — design spec
|
||||
|
||||
> Drafted 2026-05-04. Status: design-approved, awaiting implementation plan.
|
||||
> Sibling tool to the existing pre-production Qwen analyzer (`pz_error_analysis.py`), which is unaffected by this work.
|
||||
|
||||
## Summary
|
||||
|
||||
A new deterministic-only Project Zomboid log classifier that lives alongside the existing Qwen-based analyzer in `tools/pz-analyzer/`. Walks redacted `DebugLog-server*.txt` files, extracts errors/warnings, attributes each to a mod where evidence allows, classifies by kind, and emits a structured JSON report. **Zero AI dependency**: this is the artefact that informs the future PHP / iblogs production path.
|
||||
|
||||
The patterns it implements are inspired by `paraxaQQ/pzmm`'s `core/inspector.py` — Lua mod-marker attribution, multi-fallback file:line extraction, bidirectional stack collection, cause-chain unwinding, engine-noise tagging. Reimplemented originally; no code copied verbatim.
|
||||
|
||||
## Why a separate tool, not an edit of `pz_error_analysis.py`
|
||||
|
||||
Two artefacts, two purposes:
|
||||
|
||||
- `pz_error_analysis.py` (existing, untouched) — pre-production discovery tool. Sends residual log content to Qwen so the developer can see what categories the deterministic side hasn't yet captured.
|
||||
- `pz_classify.py` (new) — production-bound deterministic classifier. Output is what an iblogs PHP port would eventually emit. Runs in seconds, no API dependency, no PII-going-to-LLM consideration.
|
||||
|
||||
Coexisting them lets the developer compare outputs and treat the LLM's residual output as the "deterministic to-do list."
|
||||
|
||||
## Scope
|
||||
|
||||
**In scope:**
|
||||
- Two new files: `tools/pz-analyzer/pz_parser.py` (pure module) and `tools/pz-analyzer/pz_classify.py` (CLI orchestrator).
|
||||
- Tests under `tools/pz-analyzer/tests/` with synthetic fixtures.
|
||||
- Operates exclusively on the already-redacted directory produced by `pz_redact_all.sh` (`.scratch/pz/Logs.redacted/`).
|
||||
|
||||
**Out of scope:**
|
||||
- Any modification to `pz_error_analysis.py`, `pz_redact_all.sh`, or PHP codex source.
|
||||
- Filesystem-based mod-scan reattribution (pzmm's symbol-index, vehicle-index, file-path-ownership reattribution requires an actual mod folder we don't have on the server side).
|
||||
- iblogs / bosslogs integration. The output schema is designed with that future port in mind, but no PHP code is written here.
|
||||
- Generic AI tab patterns from pzmm's `core/ai.py`. Explicitly excluded.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
redacted .txt files
|
||||
|
|
||||
v
|
||||
+---------------------------+
|
||||
| pz_classify.py | argparse · directory walk · aggregate · JSON write
|
||||
| (orchestrator) |
|
||||
+-------------+-------------+
|
||||
|
|
||||
v
|
||||
+---------------------------+
|
||||
| pz_parser.py | regexes · parse · classify · sign
|
||||
| (pure module, no I/O |
|
||||
| beyond reading the path |
|
||||
| it is handed) |
|
||||
+---------------------------+
|
||||
```
|
||||
|
||||
Two files inside `tools/pz-analyzer/`:
|
||||
|
||||
- **`pz_parser.py`** — stateless. All regex constants, `parse_file(path) -> list[Entry]`, attribution helpers, file:line extractors, cause-chain extractor, signature computation. No `argparse`, no JSON writing, no directory walking. Unit-testable in isolation.
|
||||
- **`pz_classify.py`** — entry point. CLI args, walks the redacted directory, calls `pz_parser`, aggregates records by signature, writes JSON, prints a one-line stats summary.
|
||||
|
||||
The split is deliberate: `pz_parser.py` is the module that eventually wants to be ported to PHP codex (separate spec). Keeping it pure makes that port mechanical and Python-side tests trivial.
|
||||
|
||||
## Parser pipeline phases
|
||||
|
||||
For each `*DebugLog-server*.txt`, the parser walks lines once and emits records via the following phases.
|
||||
|
||||
### 1. Severity-prefix recognition
|
||||
|
||||
Regex: `^\s*(ERROR|SEVERE|WARN)\s*[:\s]`. Broader than the existing `pz_error_analysis.py` regex — adds `SEVERE` (Java util-logging convention; appears in some PZ Java exception blocks). `LOG`/`INFO` is ignored at this layer.
|
||||
|
||||
### 2. Stack collection — bidirectional
|
||||
|
||||
Pzmm's contribution: PZ emits stack frames *before* the ERROR/WARN line as often as after.
|
||||
|
||||
- **Pre-stack**: walk up to 25 lines back from the severity line. Stop at another severity line or 8 collected. Only keep the block if at least one line looks stack-shaped (`at `, `[string ...]`, `function:`, `file:`, `.lua` markers).
|
||||
- **Post-stack**: walk forward up to 25 lines, gated by engine-noise detection. Stop at another severity line or 8 collected.
|
||||
- Merge deduped, preserving order; cap at 8 frames per record.
|
||||
|
||||
### 3. Mod attribution — three buckets
|
||||
|
||||
| Bucket | Trigger | Confidence |
|
||||
|---|---|---|
|
||||
| `direct` | Line itself matches `Lua\(\(MOD:([^)]+)\)\)` (or the `require("X") failed` shape, or an explicit `needed by <mod>` hint elsewhere in the entry) | `high` |
|
||||
| `inferred` | No marker on this line, but body is Lua-shaped (see below) *and* a `Lua((MOD:Y))` was emitted within the previous 40 lines | `medium` |
|
||||
| `unattributed` | Neither of the above | `low`; `mod_id = "__unattributed__"` |
|
||||
|
||||
"Lua-shaped" means the body matches at least one of (case-insensitive): `luamanager.getfunctionobject`, `no such function`, `exception thrown`, `runtimeexception`, `illegalstateexception`, or contains the bare token `lua`. This filter prevents inferred attribution from latching onto unrelated severity lines that happened to fall within the lookback window.
|
||||
|
||||
`mod_id` derives from the marker's raw name with a `_norm_mod_key` transform: lowercase, strip spaces / apostrophes / hyphens. `mod_name` preserves the human-readable form.
|
||||
|
||||
We do **not** attempt pzmm's filesystem-based reattribution.
|
||||
|
||||
### 4. File:line extraction — five fallbacks
|
||||
|
||||
Tried in order against the entry body and stack frames:
|
||||
|
||||
1. `at <path>.lua:<n>`
|
||||
2. `function: ... file: <path>.lua line #<n>` (or `: <n>`)
|
||||
3. `[string "<path>.lua"]:<n>`
|
||||
4. quoted path ending in `.lua` / `.txt` / `.xml` / `.json` / `.ini` / `.cfg` / `.bin`
|
||||
5. unquoted path segment beginning with `media/`, `maps/`, `lua/`, `scripts/`
|
||||
|
||||
Returns `(file, line)`; `line=0` if the matched form had no line number.
|
||||
|
||||
### 5. Cause-chain extraction
|
||||
|
||||
`Caused by: <X>` chains plus standalone exception lines (`(\w+\.)+\w+(Exception|Error): <msg>`) are normalised to `<ExceptionClass>: <msg>` tokens and joined with ` -> `. Up to 6 chain levels, deduped. Captures both Java exception nesting and Lua-wrapped exception chains.
|
||||
|
||||
### 6. Java exception kind detection
|
||||
|
||||
DebugLog-server has both Lua and Java exceptions; pzmm targets `console.txt` which is Lua-dominant. Extension here:
|
||||
|
||||
- `kind = "java_exception"` when the entry body or stack contains `(\w+\.)+\w+(Exception|Error)` AND no `Lua((MOD:X))` marker is present anywhere in the entry.
|
||||
- These typically resolve to `mod_id: __unattributed__` because Java code in PZ is engine, not mod. The exception class name becomes part of the message skeleton so similar Java exceptions dedup tightly.
|
||||
|
||||
### 7. Engine-noise tagging
|
||||
|
||||
`kind = "engine_noise"` when the body contains `kahluathread.flusherrormessage` or `dumping lua stack trace`. These severity-ERROR lines are PZ's own diagnostic chatter about its error reporting, not actual errors. They stay in the output (consumer can filter on `kind`).
|
||||
|
||||
### 8. Signature computation
|
||||
|
||||
Two-level deterministic identity, both stored on every record:
|
||||
|
||||
```
|
||||
pattern_id = sha256(level + normalized_first_line)[:16]
|
||||
signature = sha256(pattern_id + mod_id)[:16]
|
||||
```
|
||||
|
||||
Normalization for `pattern_id`:
|
||||
- Strip session metadata prefix (`General f:N, t:N, st:N,N,N,N>` shape)
|
||||
- Strip body-prefix severity token (`ERROR:` / `SEVERE:` / `WARN:` / `FATAL:`, case-insensitive) so a body that opens with the severity word still hashes the same as one that doesn't.
|
||||
- Flatten double- and single-quoted strings to `"<S>"` / `'<S>'`
|
||||
- Flatten ≥2-digit numeric runs to `<N>`
|
||||
- Collapse whitespace
|
||||
- Truncate to 200 chars
|
||||
|
||||
Both fields ride on every record. Two consumer views, neither requires LLM:
|
||||
|
||||
- **Per-mod view** (signature is the dedup key): one record per `(mod_id, error_shape)` pair.
|
||||
- **Pattern fan-out view** (group records by `pattern_id`): see all mods that hit the same shape.
|
||||
|
||||
### 9. Aggregation
|
||||
|
||||
Records dedup on `signature`. On second-and-subsequent occurrences: `occurrence_count++`, `files` set-extends, attribution-confidence promotes (direct beats inferred beats unattributed), stack and `cause_chain` merge.
|
||||
|
||||
## Output schema
|
||||
|
||||
```json
|
||||
{
|
||||
"meta": {
|
||||
"input_dir": "/opt/ik-codex/.scratch/pz/Logs.redacted",
|
||||
"files_scanned": 6,
|
||||
"log_lines_total": 78654,
|
||||
"error_lines_total": 30984,
|
||||
"unique_signatures": N,
|
||||
"unique_patterns": M,
|
||||
"redacted": true,
|
||||
"started": "ISO8601",
|
||||
"finished": "ISO8601"
|
||||
},
|
||||
"signatures": [
|
||||
{
|
||||
"signature": "sha256:...",
|
||||
"pattern_id": "sha256:...",
|
||||
"level": "ERROR",
|
||||
"kind": "lua_runtime|require_failed|java_exception|engine_noise|runtime",
|
||||
"mod_id": "spongies_clothing",
|
||||
"mod_name": "Spongie's Clothing",
|
||||
"attribution": "direct|inferred|unattributed",
|
||||
"confidence": "high|medium|low",
|
||||
"attribution_reason": "...",
|
||||
"file": "media/lua/client/X.lua",
|
||||
"line": 42,
|
||||
"cause_chain": "ExceptionA: msg -> ExceptionB: msg",
|
||||
"stack": ["at A.lua:12", "at B.lua:34"],
|
||||
"first_seen": {"file": "...", "line": 1234, "timestamp": "26-04-26 17:14:35.128"},
|
||||
"occurrence_count": 47,
|
||||
"files": ["..."],
|
||||
"excerpt": "..."
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"errors": N,
|
||||
"warnings": N,
|
||||
"by_kind": {"lua_runtime": ..., "java_exception": ..., "require_failed": ..., "engine_noise": ..., "runtime": ...},
|
||||
"by_attribution": {"direct": ..., "inferred": ..., "unattributed": ...},
|
||||
"by_confidence": {"high": ..., "medium": ..., "low": ...},
|
||||
"top_mods": [{"mod_id": "...", "mod_name": "...", "occurrence_count": N}, ...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Default output path: `/opt/ik-codex/.scratch/pz/classify.json` (gitignored under `.scratch/`).
|
||||
|
||||
## CLI
|
||||
|
||||
```
|
||||
pz_classify.py [--input <dir>] [--out <path>] [--quiet]
|
||||
```
|
||||
|
||||
- `--input` defaults to `<repo>/.scratch/pz/Logs.redacted`
|
||||
- `--out` defaults to `<repo>/.scratch/pz/classify.json`
|
||||
- `--quiet` suppresses the trailing summary line
|
||||
|
||||
No `--limit`, `--resume`, or `--checkpoint-every`. Runs in seconds; nothing to throttle or resume.
|
||||
|
||||
## Tests
|
||||
|
||||
New directory `tools/pz-analyzer/tests/`. Stdlib `unittest`. Three files, ~18 tests total.
|
||||
|
||||
- **`test_parser.py`** (~10 tests) — one fixture per scenario in `tests/fixtures/` (synthetic, tracked in git): pure-Lua-attributed, pure-Java-exception, inferred-from-context, unattributed-engine-noise, multi-cause-chain, pre-stack-collection, post-stack-collection, severity-variants, file-line-extraction-fallbacks. All synthetic identifiers (placeholder Steam IDs / mod names) per the existing PHP-side `test/src/Games/ProjectZomboid/fixtures/` convention.
|
||||
- **`test_attribution.py`** (~5 tests) — three confidence buckets, the 40-line lookback boundary, "needed by X" extraction, and the rejection of inferred attribution when the message isn't Lua-shaped.
|
||||
- **`test_signatures.py`** (~3 tests) — `pattern_id` stability across formatting variations (whitespace, numeric values, quoted strings) and `signature` uniqueness across mods.
|
||||
|
||||
Invocation: `python -m unittest discover tools/pz-analyzer/tests/`. No external deps.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end smoke against the redacted real-data directory:
|
||||
|
||||
```
|
||||
bash /opt/ik-codex/tools/pz-analyzer/pz_redact_all.sh # one-time, already done
|
||||
python /opt/ik-codex/tools/pz-analyzer/pz_classify.py
|
||||
```
|
||||
|
||||
Expect:
|
||||
- 6 files scanned, ~30,984 error lines processed.
|
||||
- A meaningful number of unique signatures and patterns (likely in the low hundreds for signatures; fewer patterns).
|
||||
- `top_mods` lists the highest-occurrence mods.
|
||||
- PII audit: no real Steam IDs, IPs, or coordinates in the output JSON (input is already redacted; classifier doesn't introduce PII).
|
||||
|
||||
Test invocation: `python -m unittest discover tools/pz-analyzer/tests/` should be all-green.
|
||||
|
||||
## Risks and open questions
|
||||
|
||||
- **Inferred attribution accuracy.** The 40-line lookback is pzmm's heuristic; it's correct for tightly-paced server bursts but can mis-attribute when an unrelated mod logs in the gap. Surface as `confidence: medium` so consumers can choose to treat them differently. Acceptable for v1; tunable via a constant in `pz_parser.py`.
|
||||
- **Pzmm targets `console.txt`, we target `DebugLog-server.txt`.** Format overlap is high (both share `Lua((MOD:X))` markers, Caused-by chains, Java exception shapes), but some patterns may be `console.txt`-specific. Tests use `DebugLog-server`-shaped fixtures only.
|
||||
- **Future PHP port.** `pz_parser.py` is structured for mechanical translation to a `LuaErrorAnalyser` / `ModAttributionAnalyser` pair under `src/Analyser/ProjectZomboid/` in a separate spec. Output schema chosen to be PHP-codex-compatible (Insight subclasses with typed fields).
|
||||
- **Licence.** The `paraxaQQ/pzmm` zip we reviewed has no top-level LICENSE; this spec mandates rewriting the patterns originally rather than copying code. Regex shapes and heuristics are general programming patterns and not author-specific, but no code blocks are lifted verbatim.
|
||||
|
||||
## Out of scope (explicit)
|
||||
|
||||
- Editing `pz_error_analysis.py` or `pz_redact_all.sh`.
|
||||
- Modifying any file in `/opt/ik-codex/src/`, `/opt/ik-codex/test/`, or `/opt/iblogs/`.
|
||||
- AI / LLM integration of any kind in the new tool.
|
||||
- LLM inference at runtime in iblogs / bosslogs production. The Qwen analyzer (`pz_error_analysis.py`) is a developer-only discovery tool used to expand the deterministic ruleset in `pz_parser.py` (and its future PHP port). Production rendering is deterministic-only, forever.
|
||||
- iblogs front-end rendering of the classification output.
|
||||
- Filesystem mod-scan reattribution (pzmm's symbol/vehicle indexes).
|
||||
131
src/Analyser/ProjectZomboid/ErrorContextAnalyser.php
Normal file
131
src/Analyser/ProjectZomboid/ErrorContextAnalyser.php
Normal file
@@ -0,0 +1,131 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analyser\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analyser\Analyser;
|
||||
use IndifferentKetchup\Codex\Analysis\Analysis;
|
||||
use IndifferentKetchup\Codex\Analysis\AnalysisInterface;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
|
||||
use IndifferentKetchup\Codex\Log\EntryInterface;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
|
||||
/**
|
||||
* Surfaces ERROR or WARNING entries with a sliding context window of
|
||||
* surrounding entries, so a viewer can see the lead-up and aftermath of
|
||||
* each event without scanning the full log. PatternAnalyser cannot
|
||||
* express this because windows span multiple entries; this walks once,
|
||||
* classifies by Level (already resolved by the parser), and emits one
|
||||
* ErrorContextProblem per hit.
|
||||
*
|
||||
* Stack-trace continuation lines are absorbed into the same Entry as the
|
||||
* level header that preceded them by PatternParser, so noise filtering
|
||||
* happens at parse time — windows here count Entries, not raw lines, and
|
||||
* a stack-trace ERROR contributes exactly one window.
|
||||
*
|
||||
* Overlapping windows are merged: when two error/warning entries fall
|
||||
* within CONTEXT_BEFORE + CONTEXT_AFTER of each other, the later
|
||||
* window's before- and after-ranges are clipped to start past the
|
||||
* previously emitted range so no Entry appears in two context arrays.
|
||||
* The hit cap is enforced after emission; reaching it adds an
|
||||
* ErrorContextTruncatedInformation to the analysis instead of further
|
||||
* problems.
|
||||
*/
|
||||
class ErrorContextAnalyser extends Analyser
|
||||
{
|
||||
/**
|
||||
* Number of entries preceding a hit captured as leading context.
|
||||
* Twenty entries is wide enough to surface the immediate precursor
|
||||
* events (mod load, player join, prior warning) for a server-log
|
||||
* error without dragging in unrelated activity from minutes earlier.
|
||||
*/
|
||||
public const int CONTEXT_BEFORE = 20;
|
||||
|
||||
/**
|
||||
* Number of entries following a hit captured as trailing context.
|
||||
* Mirrors CONTEXT_BEFORE so windows are symmetric and the maximum
|
||||
* window size is CONTEXT_BEFORE + 1 (hit) + CONTEXT_AFTER = 41
|
||||
* entries.
|
||||
*/
|
||||
public const int CONTEXT_AFTER = 20;
|
||||
|
||||
/**
|
||||
* Maximum number of hits emitted before truncation. Caps memory and
|
||||
* output size on logs with cascading errors (e.g. a save-system
|
||||
* failure that produces an error every tick). Reaching the cap adds
|
||||
* an ErrorContextTruncatedInformation to the analysis so consumers
|
||||
* can flag truncation rather than silently dropping later hits.
|
||||
*/
|
||||
public const int HIT_CAP = 500;
|
||||
|
||||
public function analyse(): AnalysisInterface
|
||||
{
|
||||
$analysis = new Analysis();
|
||||
$analysis->setLog($this->log);
|
||||
|
||||
$entries = [];
|
||||
foreach ($this->log as $entry) {
|
||||
$entries[] = $entry;
|
||||
}
|
||||
$count = count($entries);
|
||||
|
||||
$hits = 0;
|
||||
$truncated = false;
|
||||
$lastEmittedIndex = -1;
|
||||
|
||||
for ($i = 0; $i < $count; $i++) {
|
||||
$type = $this->classify($entries[$i]);
|
||||
if ($type === null) {
|
||||
continue;
|
||||
}
|
||||
|
||||
if ($hits >= self::HIT_CAP) {
|
||||
$truncated = true;
|
||||
break;
|
||||
}
|
||||
|
||||
$beforeStart = max($lastEmittedIndex + 1, $i - self::CONTEXT_BEFORE);
|
||||
if ($beforeStart > $i) {
|
||||
$beforeStart = $i;
|
||||
}
|
||||
$afterStart = max($lastEmittedIndex + 1, $i + 1);
|
||||
$afterEnd = min($count - 1, $i + self::CONTEXT_AFTER);
|
||||
$afterLength = max(0, $afterEnd - $afterStart + 1);
|
||||
|
||||
$analysis->addInsight((new ErrorContextProblem())
|
||||
->setEntry($entries[$i])
|
||||
->setType($type)
|
||||
->setEntryIndex($i + 1)
|
||||
->setBefore(array_slice($entries, $beforeStart, $i - $beforeStart))
|
||||
->setAfter(array_slice($entries, $afterStart, $afterLength)));
|
||||
|
||||
$hits++;
|
||||
$lastEmittedIndex = max($lastEmittedIndex, $afterEnd);
|
||||
}
|
||||
|
||||
if ($truncated) {
|
||||
$analysis->addInsight((new ErrorContextTruncatedInformation())
|
||||
->setHitCap(self::HIT_CAP));
|
||||
}
|
||||
|
||||
return $analysis;
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify an entry as 'error', 'warning', or null based on its Level.
|
||||
* Levels at or below ERROR (EMERGENCY/ALERT/CRITICAL/ERROR) collapse
|
||||
* into 'error'; WARNING alone collapses into 'warning'. Returns null
|
||||
* for anything less severe so the analyser skips it.
|
||||
*/
|
||||
protected function classify(EntryInterface $entry): ?string
|
||||
{
|
||||
$level = $entry->getLevel()->asInt();
|
||||
if ($level <= Level::ERROR->asInt()) {
|
||||
return 'error';
|
||||
}
|
||||
if ($level === Level::WARNING->asInt()) {
|
||||
return 'warning';
|
||||
}
|
||||
return null;
|
||||
}
|
||||
}
|
||||
130
src/Analysis/ProjectZomboid/ErrorContextProblem.php
Normal file
130
src/Analysis/ProjectZomboid/ErrorContextProblem.php
Normal file
@@ -0,0 +1,130 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analysis\InsightInterface;
|
||||
use IndifferentKetchup\Codex\Analysis\Problem;
|
||||
use IndifferentKetchup\Codex\Log\EntryInterface;
|
||||
|
||||
/**
|
||||
* Problem emitted by ErrorContextAnalyser for each ERROR or WARNING entry,
|
||||
* carrying a sliding window of surrounding entries as before/after
|
||||
* context. Coalesced by 1-based entryIndex so re-adding the same hit
|
||||
* never produces duplicate problems.
|
||||
*/
|
||||
class ErrorContextProblem extends Problem
|
||||
{
|
||||
private string $type = 'error';
|
||||
private int $entryIndex = 0;
|
||||
|
||||
/**
|
||||
* @var EntryInterface[]
|
||||
*/
|
||||
private array $before = [];
|
||||
|
||||
/**
|
||||
* @var EntryInterface[]
|
||||
*/
|
||||
private array $after = [];
|
||||
|
||||
/**
|
||||
* @param string $type 'error' or 'warning'
|
||||
* @return $this
|
||||
*/
|
||||
public function setType(string $type): static
|
||||
{
|
||||
$this->type = $type;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return string
|
||||
*/
|
||||
public function getType(): string
|
||||
{
|
||||
return $this->type;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param int $entryIndex 1-based index of the hit entry within the log
|
||||
* @return $this
|
||||
*/
|
||||
public function setEntryIndex(int $entryIndex): static
|
||||
{
|
||||
$this->entryIndex = $entryIndex;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return int 1-based index of the hit entry within the log
|
||||
*/
|
||||
public function getEntryIndex(): int
|
||||
{
|
||||
return $this->entryIndex;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param EntryInterface[] $entries
|
||||
* @return $this
|
||||
*/
|
||||
public function setBefore(array $entries): static
|
||||
{
|
||||
$this->before = $entries;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getBefore(): array
|
||||
{
|
||||
return $this->before;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param EntryInterface[] $entries
|
||||
* @return $this
|
||||
*/
|
||||
public function setAfter(array $entries): static
|
||||
{
|
||||
$this->after = $entries;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getAfter(): array
|
||||
{
|
||||
return $this->after;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convenience accessor returning before-context, hit entry, and
|
||||
* after-context as a single ordered array of at most
|
||||
* ErrorContextAnalyser::CONTEXT_BEFORE + 1 + CONTEXT_AFTER = 41
|
||||
* entries.
|
||||
*
|
||||
* @return EntryInterface[]
|
||||
*/
|
||||
public function getContext(): array
|
||||
{
|
||||
return [...$this->before, $this->getEntry(), ...$this->after];
|
||||
}
|
||||
|
||||
public function getMessage(): string
|
||||
{
|
||||
return sprintf(
|
||||
'%s at entry %d (%d before, %d after)',
|
||||
strtoupper($this->type),
|
||||
$this->entryIndex,
|
||||
count($this->before),
|
||||
count($this->after)
|
||||
);
|
||||
}
|
||||
|
||||
public function isEqual(InsightInterface $insight): bool
|
||||
{
|
||||
return $insight instanceof self && $insight->getEntryIndex() === $this->entryIndex;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,42 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Analysis\Information;
|
||||
use IndifferentKetchup\Codex\Analysis\InsightInterface;
|
||||
|
||||
/**
|
||||
* Emitted by ErrorContextAnalyser exactly once when its hit cap is
|
||||
* reached, so downstream consumers can surface a "results truncated"
|
||||
* notice instead of silently dropping subsequent error/warning hits.
|
||||
*/
|
||||
class ErrorContextTruncatedInformation extends Information
|
||||
{
|
||||
private int $hitCap = 0;
|
||||
|
||||
/**
|
||||
* @param int $hitCap the cap that was hit (mirrors
|
||||
* ErrorContextAnalyser::HIT_CAP at emission time)
|
||||
* @return $this
|
||||
*/
|
||||
public function setHitCap(int $hitCap): static
|
||||
{
|
||||
$this->hitCap = $hitCap;
|
||||
$this->setLabel('Error context');
|
||||
$this->setValue(sprintf('truncated after %d hits', $hitCap));
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return int
|
||||
*/
|
||||
public function getHitCap(): int
|
||||
{
|
||||
return $this->hitCap;
|
||||
}
|
||||
|
||||
public function isEqual(InsightInterface $insight): bool
|
||||
{
|
||||
return $insight instanceof self;
|
||||
}
|
||||
}
|
||||
@@ -15,7 +15,7 @@ namespace IndifferentKetchup\Codex\Pattern\ProjectZomboid;
|
||||
*/
|
||||
class DebugServerPattern
|
||||
{
|
||||
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+,\s+t:\d+,\s+st:[\d,]+>\s+.*$/';
|
||||
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+(?:,\s+t:\d+)?,?\s+st:[\d,]+>\s+.*$/';
|
||||
|
||||
public const string VERSION = '/version=(?<version>\S+) (?<hash>[a-f0-9]{40}) (?<date>\d{4}-\d{2}-\d{2}) (?<time>\d{2}:\d{2}:\d{2})/';
|
||||
|
||||
|
||||
185
src/Util/ProjectZomboid/ProjectZomboidRedactor.php
Normal file
185
src/Util/ProjectZomboid/ProjectZomboidRedactor.php
Normal file
@@ -0,0 +1,185 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Util\ProjectZomboid;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\RedactorInterface;
|
||||
|
||||
/**
|
||||
* Render-time PII filter for Project Zomboid log content.
|
||||
*
|
||||
* Applies up to four sequential regex passes over the raw log string,
|
||||
* each controlled by a boolean toggle (all enabled by default):
|
||||
*
|
||||
* 1. IP address pass — replaces IPv4 addresses (with optional :port
|
||||
* suffix) and IPv6 addresses (full, abbreviated, bracketed, and
|
||||
* IPv4-mapped forms; all with optional :port when bracketed) with
|
||||
* a placeholder token. Pattern-disjoint from the other passes.
|
||||
* 2. Steam ID pass — replaces 17-digit Steam IDs with a placeholder
|
||||
* token.
|
||||
* 3. Player name pass — replaces player display names with a placeholder
|
||||
* token. This pass anchors on the already-redacted Steam ID token, so
|
||||
* the ordering Steam ID -> name -> coordinates is mandatory.
|
||||
* 4. Coordinates pass — replaces world coordinate triplets with a
|
||||
* placeholder token.
|
||||
*
|
||||
* Pass 1 runs first by convention, not dependency: it shares no anchors
|
||||
* with passes 2-4 and could run anywhere in the chain without affecting
|
||||
* their output.
|
||||
*
|
||||
* All regex passes use the /u flag for Unicode safety.
|
||||
*
|
||||
* Replacements are not reversible; do not apply to content that must later be
|
||||
* restored to its original form.
|
||||
*/
|
||||
class ProjectZomboidRedactor implements RedactorInterface
|
||||
{
|
||||
/** Generic placeholder substituted for every matched IPv4 or IPv6 address (with port suffix consumed when present). */
|
||||
public const string IP_REPLACEMENT = '[REDACTED_IP]';
|
||||
|
||||
/** Strict IPv4 with valid 0-255 octets and optional :port suffix. Lookarounds reject matches embedded in longer alphanumeric or dotted-decimal tokens; the (?<!\d\.) / (?!\.\d) pair specifically prevents matching inside an N-octet (N>4) sequence like 1.2.3.4.5 while still allowing a trailing sentence period after the IP/port. */
|
||||
public const string IPV4_REGEX = '/'
|
||||
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
|
||||
. '(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
|
||||
. '(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'
|
||||
. '(?::\d{1,5})?'
|
||||
. '(?![A-Za-z0-9_:])(?!\.\d)'
|
||||
. '/u';
|
||||
|
||||
/** Coarse IPv6 candidate matcher (bracketed-with-port, or bare 2-7-colon hex form covering full / abbreviated / IPv4-mapped). Each match is validated with filter_var() in the redact() callback so PHP/Java scope ops like Foo::Bar and PZ timestamps like 12:00:00.000 are rejected. Boundary lookarounds mirror the IPv4 regex so trailing sentence periods don't block the match. */
|
||||
public const string IPV6_REGEX = '/'
|
||||
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
|
||||
. '(?:'
|
||||
. '\[(?<bracketed>[0-9a-fA-F:.]+)\](?::\d{1,5})?'
|
||||
. '|'
|
||||
. '(?<bare>(?:[0-9a-fA-F]{0,4}:){2,7}[0-9a-fA-F.]*)'
|
||||
. ')'
|
||||
. '(?![A-Za-z0-9_:])(?!\.\d)'
|
||||
. '/u';
|
||||
|
||||
/** Regex matching a 17-digit SteamID64 anchored on the 76561198 universe prefix, with lookaround boundaries that reject embedded occurrences. */
|
||||
public const string STEAM_ID_REGEX = '/(?<![A-Za-z0-9])76561198\d{9}(?![A-Za-z0-9])/u';
|
||||
|
||||
/** Zeroed-out SteamID64 placeholder; syntactically valid but refers to no real account. */
|
||||
public const string STEAM_ID_REPLACEMENT = '76561198000000000';
|
||||
|
||||
/** Generic placeholder substituted for every matched player display name. */
|
||||
public const string PLAYER_NAME_REPLACEMENT = '<player>';
|
||||
|
||||
/** Matches a double-quoted player name that immediately follows the redacted Steam ID placeholder (cmd.txt / admin.txt shape); relies on the Steam ID pass having run first. */
|
||||
public const string PLAYER_AFTER_STEAMID_REGEX = '/(?<=76561198000000000) "(?<name>[^"]+)"/u';
|
||||
|
||||
/** Matches the author value inside a ChatMessage{...} envelope, using a fixed-length lookbehind on ", author='" and a lookahead on the closing "'" so only the bare name is replaced. */
|
||||
public const string PLAYER_IN_CHATMESSAGE_REGEX = '/(?<=, author=\')(?<name>[^\']+)(?=\')/u';
|
||||
|
||||
/** Matches the first double-quoted player name following a Combat: or Safety: subsystem token (pvp.txt shape); does NOT redact the second name after "hit" — deferred to v2. */
|
||||
public const string PLAYER_IN_PVP_SUBSYSTEM_REGEX = '/(?<=(?:Combat|Safety): )"(?<name>[^"]+)"/u';
|
||||
|
||||
/** Zeroed-out coordinate triple used as the inner replacement; bracket/paren/`at` wrapper is preserved by the regex lookaround anchors. */
|
||||
public const string COORDS_REPLACEMENT = '0,0,0';
|
||||
|
||||
/** Matches integer or float coordinate triplets that immediately follow the literal ` at ` token (map.txt / item.txt shape); the trailing dot is preserved via lookahead. */
|
||||
public const string COORDS_AT_CLAUSE_REGEX = '/(?<= at )(?<x>[\d.]+),(?<y>[\d.]+),(?<z>-?[\d.]+)(?=\.)/u';
|
||||
|
||||
/** Matches integer coordinate triplets enclosed in square brackets (ClientActionLog.txt / PerkLog.txt / cmd.txt @-context shape); the surrounding brackets are preserved via lookaround. */
|
||||
public const string COORDS_BRACKETED_REGEX = '/(?<=\[)(?<x>\d+),(?<y>\d+),(?<z>-?\d+)(?=\])/u';
|
||||
|
||||
/** Matches integer coordinate triplets enclosed in round parentheses, anchored on a trailing PvP verb to disambiguate from server-metadata triples (pvp.txt Combat:/Safety: shape); only the attacker/first-coord set is redacted per line — the victim coords lack the trailing keyword and are deferred to v2. */
|
||||
public const string COORDS_PARENTHESISED_REGEX = '/(?<=\()(?<x>\d+),(?<y>\d+),(?<z>-?\d+)(?=\) (?:hit|restore|store|true|false))/u';
|
||||
|
||||
private bool $redactIpAddresses = true;
|
||||
private bool $redactSteamIds = true;
|
||||
private bool $redactPlayerNames = true;
|
||||
private bool $redactCoordinates = true;
|
||||
|
||||
/**
|
||||
* Enable or disable the IP address redaction pass (covers IPv4 and IPv6).
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactIpAddresses(bool $on): static
|
||||
{
|
||||
$this->redactIpAddresses = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the Steam ID redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactSteamIds(bool $on): static
|
||||
{
|
||||
$this->redactSteamIds = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the player-name redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactPlayerNames(bool $on): static
|
||||
{
|
||||
$this->redactPlayerNames = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enable or disable the coordinates redaction pass.
|
||||
*
|
||||
* @param bool $on Pass true to enable, false to disable.
|
||||
* @return static
|
||||
*/
|
||||
public function redactCoordinates(bool $on): static
|
||||
{
|
||||
$this->redactCoordinates = $on;
|
||||
return $this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Redact PII from the given Project Zomboid log content.
|
||||
*
|
||||
* Passes are applied in the order: IP address -> Steam ID -> player
|
||||
* name -> coordinates. The Steam ID -> name -> coordinates ordering
|
||||
* is mandatory (see class docblock); the IP pass is pattern-disjoint
|
||||
* and runs first by convention.
|
||||
*
|
||||
* @param string $content Raw log content that may contain PII.
|
||||
* @return string Content with enabled PII categories replaced by tokens.
|
||||
*/
|
||||
public function redact(string $content): string
|
||||
{
|
||||
if ($this->redactIpAddresses) {
|
||||
$content = preg_replace_callback(
|
||||
self::IPV6_REGEX,
|
||||
static function (array $matches): string {
|
||||
$candidate = ($matches['bracketed'] ?? '') !== ''
|
||||
? $matches['bracketed']
|
||||
: ($matches['bare'] ?? '');
|
||||
return filter_var($candidate, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) !== false
|
||||
? self::IP_REPLACEMENT
|
||||
: $matches[0];
|
||||
},
|
||||
$content
|
||||
);
|
||||
$content = preg_replace(self::IPV4_REGEX, self::IP_REPLACEMENT, $content);
|
||||
}
|
||||
if ($this->redactSteamIds) {
|
||||
$content = preg_replace(self::STEAM_ID_REGEX, self::STEAM_ID_REPLACEMENT, $content);
|
||||
}
|
||||
if ($this->redactPlayerNames) {
|
||||
$content = preg_replace(self::PLAYER_AFTER_STEAMID_REGEX, ' "' . self::PLAYER_NAME_REPLACEMENT . '"', $content);
|
||||
$content = preg_replace(self::PLAYER_IN_CHATMESSAGE_REGEX, self::PLAYER_NAME_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::PLAYER_IN_PVP_SUBSYSTEM_REGEX, '"' . self::PLAYER_NAME_REPLACEMENT . '"', $content);
|
||||
}
|
||||
if ($this->redactCoordinates) {
|
||||
$content = preg_replace(self::COORDS_AT_CLAUSE_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::COORDS_BRACKETED_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
$content = preg_replace(self::COORDS_PARENTHESISED_REGEX, self::COORDS_REPLACEMENT, $content);
|
||||
}
|
||||
return $content;
|
||||
}
|
||||
}
|
||||
20
src/Util/RedactorInterface.php
Normal file
20
src/Util/RedactorInterface.php
Normal file
@@ -0,0 +1,20 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Util;
|
||||
|
||||
interface RedactorInterface
|
||||
{
|
||||
/**
|
||||
* Redact PII from the given content string and return the result.
|
||||
*
|
||||
* The method is stateless from the caller's perspective: the same instance
|
||||
* may be called repeatedly and each call operates independently on its
|
||||
* input. Configuration (which passes are enabled, replacement tokens, etc.)
|
||||
* is applied once via implementation-specific setters before the first call
|
||||
* to redact().
|
||||
*
|
||||
* @param string $content Raw log content that may contain PII.
|
||||
* @return string Content with PII replaced by redaction tokens.
|
||||
*/
|
||||
public function redact(string $content): string;
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0 st:48,648,157,434> SLF4J(W): No SLF4J providers were found..
|
||||
[16-04-26 00:00:42.315] LOG : General f:0 st:48,648,157,492> SLF4J(W): Defaulting to no-operation (NOP) logger implementation.
|
||||
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,584> version=42.17.0 0000000000000000000000000000000000000000 2026-04-20 14:34:44 (ZB) demo=false.
|
||||
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,585> revision=0000000000000000000000000000000000000000 date=2026-04-20 time=14:34:44 (ZB).
|
||||
[16-04-26 00:01:19.080] ERROR: General f:0 st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
|
||||
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
|
||||
Stack trace:
|
||||
java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
|
||||
java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
|
||||
java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
|
||||
java.base/sun.nio.fs.AbstractPoller.processRequests(Unknown Source)
|
||||
java.base/sun.nio.fs.LinuxWatchService$Poller.run(Unknown Source)
|
||||
[16-04-26 00:01:19.131] LOG : Mod f:0 st:48,648,194,309> loading example_mod_alpha.
|
||||
[16-04-26 00:01:19.142] LOG : Mod f:0 st:48,648,194,320> loading example_mod_beta.
|
||||
[16-04-26 00:01:19.155] LOG : Mod f:0 st:48,648,194,333> loading example_mod_gamma.
|
||||
[16-04-26 00:01:19.200] WARN : Mod f:0 st:48,648,194,378> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
|
||||
[16-04-26 00:01:45.937] ERROR: WorldGen f:0 st:48,648,221,115> IsoPropertyType.lookupOrDefaultStr> Exception thrown
|
||||
zombie.core.properties.IsoPropertyType$IsoPropertyTypeNotFoundException: Property Name not found: ladderW at IsoPropertyType.lookup(IsoPropertyType.java:269). Message: Property Name not found: ladderW
|
||||
at zombie.core.properties.IsoPropertyType.lookup(IsoPropertyType.java:269)
|
||||
at zombie.iso.IsoChunkData.PostProcessChunk(IsoChunkData.java:512)
|
||||
[16-04-26 00:02:00.000] LOG : General f:0 st:48,648,235,178> server initialised.
|
||||
[16-04-26 00:05:00.000] LOG : General f:0 st:48,648,415,178> shutdown requested.
|
||||
@@ -0,0 +1,128 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Games\ProjectZomboid\Analyser;
|
||||
|
||||
use IndifferentKetchup\Codex\Analyser\AnalyserInterface;
|
||||
use IndifferentKetchup\Codex\Analyser\ProjectZomboid\ErrorContextAnalyser;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
|
||||
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
|
||||
use IndifferentKetchup\Codex\Log\AnalysableLog;
|
||||
use IndifferentKetchup\Codex\Log\Entry;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
use IndifferentKetchup\Codex\Log\Line;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ErrorContextAnalyserTest extends TestCase
|
||||
{
|
||||
/**
|
||||
* Build an in-memory AnalysableLog with $count entries; entries whose
|
||||
* 1-based index is in $errorIndices are tagged Level::ERROR, the rest
|
||||
* Level::INFO. Anonymous AnalysableLog subclass keeps the fixture
|
||||
* inline since we exercise the analyser directly via setLog().
|
||||
*
|
||||
* @param int[] $errorIndices 1-based entry indices to mark as ERROR
|
||||
*/
|
||||
private function makeLog(array $errorIndices, int $count): AnalysableLog
|
||||
{
|
||||
$errorSet = array_flip($errorIndices);
|
||||
$log = new class extends AnalysableLog {
|
||||
public static function getDefaultAnalyser(): AnalyserInterface
|
||||
{
|
||||
return new ErrorContextAnalyser();
|
||||
}
|
||||
};
|
||||
for ($n = 1; $n <= $count; $n++) {
|
||||
$level = isset($errorSet[$n]) ? Level::ERROR : Level::INFO;
|
||||
$entry = (new Entry())
|
||||
->setLevel($level)
|
||||
->addLine(new Line($n, sprintf('line %d', $n)));
|
||||
$log->addEntry($entry);
|
||||
}
|
||||
return $log;
|
||||
}
|
||||
|
||||
public function testEmitsThreeNonOverlappingWindows(): void
|
||||
{
|
||||
$log = $this->makeLog([10, 50, 95], 100);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(3, $problems);
|
||||
|
||||
$this->assertSame(10, $problems[0]->getEntryIndex());
|
||||
$this->assertSame(50, $problems[1]->getEntryIndex());
|
||||
$this->assertSame(95, $problems[2]->getEntryIndex());
|
||||
|
||||
// First hit (entry 10): 9 entries before (1..9), 20 after (11..30).
|
||||
$this->assertCount(9, $problems[0]->getBefore());
|
||||
$this->assertCount(20, $problems[0]->getAfter());
|
||||
|
||||
// Second hit (entry 50): clipped to 19 before (31..49), 20 after (51..70).
|
||||
$this->assertCount(19, $problems[1]->getBefore());
|
||||
$this->assertCount(20, $problems[1]->getAfter());
|
||||
|
||||
// Third hit (entry 95): clipped to 20 before (75..94), 5 after (96..100).
|
||||
$this->assertCount(20, $problems[2]->getBefore());
|
||||
$this->assertCount(5, $problems[2]->getAfter());
|
||||
|
||||
// Total window per hit never exceeds 1 + CONTEXT_BEFORE + CONTEXT_AFTER = 41.
|
||||
foreach ($problems as $problem) {
|
||||
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_BEFORE, count($problem->getBefore()));
|
||||
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_AFTER, count($problem->getAfter()));
|
||||
$this->assertLessThanOrEqual(41, count($problem->getContext()));
|
||||
}
|
||||
|
||||
// No entry appears in two problems' context arrays.
|
||||
$seen = [];
|
||||
foreach ($problems as $problem) {
|
||||
foreach ([...$problem->getBefore(), ...$problem->getAfter()] as $entry) {
|
||||
$id = spl_object_id($entry);
|
||||
$this->assertArrayNotHasKey($id, $seen, 'Entry duplicated across problem context arrays');
|
||||
$seen[$id] = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public function testMergesAdjacentWindowsWhenWithinContextRange(): void
|
||||
{
|
||||
// Errors 5 entries apart; without merge their windows would
|
||||
// overlap heavily.
|
||||
$log = $this->makeLog([10, 15], 50);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(2, $problems);
|
||||
|
||||
// First hit: 9 before (1..9), 20 after (11..30). lastEmittedIndex=29 (0-based).
|
||||
$this->assertCount(9, $problems[0]->getBefore());
|
||||
$this->assertCount(20, $problems[0]->getAfter());
|
||||
|
||||
// Second hit at entry 15 (i=14). beforeStart clamped past i so before is empty.
|
||||
// afterStart=max(30, 15)=30, afterEnd=min(49, 34)=34, so after=entries 31..35
|
||||
// (5 entries, all unseen).
|
||||
$this->assertCount(0, $problems[1]->getBefore());
|
||||
$this->assertCount(5, $problems[1]->getAfter());
|
||||
|
||||
// Confirm no entry appears in both problems' context arrays.
|
||||
$first = [...$problems[0]->getBefore(), ...$problems[0]->getAfter()];
|
||||
$second = [...$problems[1]->getBefore(), ...$problems[1]->getAfter()];
|
||||
foreach ($second as $entry) {
|
||||
$this->assertNotContains($entry, $first, 'Entry duplicated across merged windows');
|
||||
}
|
||||
}
|
||||
|
||||
public function testTruncatesAtHitCap(): void
|
||||
{
|
||||
// 600 consecutive ERROR entries — analyser should cap emission at
|
||||
// HIT_CAP and add exactly one truncation Information.
|
||||
$log = $this->makeLog(range(1, 600), 600);
|
||||
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
|
||||
|
||||
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
|
||||
$this->assertCount(ErrorContextAnalyser::HIT_CAP, $problems);
|
||||
|
||||
$information = $analysis->getFilteredInsights(ErrorContextTruncatedInformation::class);
|
||||
$this->assertCount(1, $information);
|
||||
$this->assertSame(ErrorContextAnalyser::HIT_CAP, $information[0]->getHitCap());
|
||||
}
|
||||
}
|
||||
@@ -6,18 +6,31 @@ use IndifferentKetchup\Codex\Detective\Detective;
|
||||
use IndifferentKetchup\Codex\Log\File\PathLogFile;
|
||||
use IndifferentKetchup\Codex\Log\Level;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidServerLog;
|
||||
use PHPUnit\Framework\Attributes\DataProvider;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidServerLogTest extends TestCase
|
||||
{
|
||||
private function fixturePath(): string
|
||||
/**
|
||||
* Both PZ B41 and B42 line shapes must parse identically. B41 (and the
|
||||
* fixture used by every analyser test) emits `f:N, t:N, st:N,N,N,N>`;
|
||||
* B42 (release branch from 2026-04 onward, e.g. build 42.17) drops the
|
||||
* `t:` microsecond field entirely and tightens whitespace to
|
||||
* `f:N st:N,N,N,N>`.
|
||||
*/
|
||||
public static function fixtureProvider(): array
|
||||
{
|
||||
return __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures/debug-server-minimal.txt';
|
||||
$base = __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures';
|
||||
return [
|
||||
'pz41-format' => [$base . '/debug-server-minimal.txt'],
|
||||
'pz42-format' => [$base . '/debug-server-42x-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
public function testParsesEntriesWithLevelAndPrefix(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testParsesEntriesWithLevelAndPrefix(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$entries = $log->getEntries();
|
||||
@@ -29,9 +42,10 @@ class ProjectZomboidServerLogTest extends TestCase
|
||||
$this->assertNotNull($first->getTime());
|
||||
}
|
||||
|
||||
public function testStackTraceLinesAttachToTriggeringErrorEntry(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testStackTraceLinesAttachToTriggeringErrorEntry(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$errorEntry = null;
|
||||
@@ -46,19 +60,21 @@ class ProjectZomboidServerLogTest extends TestCase
|
||||
$this->assertGreaterThan(1, count($errorEntry->getLines()));
|
||||
}
|
||||
|
||||
public function testWarnLevelMapsCorrectly(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testWarnLevelMapsCorrectly(string $fixturePath): void
|
||||
{
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
|
||||
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
|
||||
$log->parse();
|
||||
|
||||
$warnEntries = array_filter($log->getEntries(), fn($e) => $e->getLevel() === Level::WARNING);
|
||||
$this->assertNotEmpty($warnEntries);
|
||||
}
|
||||
|
||||
public function testDetectiveDispatchesByContent(): void
|
||||
#[DataProvider('fixtureProvider')]
|
||||
public function testDetectiveDispatchesByContent(string $fixturePath): void
|
||||
{
|
||||
$detective = (new Detective())
|
||||
->setLogFile(new PathLogFile($this->fixturePath()))
|
||||
->setLogFile(new PathLogFile($fixturePath))
|
||||
->addPossibleLogClass(ProjectZomboidServerLog::class);
|
||||
|
||||
$log = $detective->detect();
|
||||
|
||||
146
test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php
Normal file
146
test/tests/Util/Redactor/ProjectZomboidRedactorCombinedTest.php
Normal file
@@ -0,0 +1,146 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorCombinedTest extends TestCase
|
||||
{
|
||||
public function testFullScrubAllTogglesOn(): void
|
||||
{
|
||||
// Realistic multi-line input touching all three PII categories:
|
||||
// Steam IDs, player names in multiple contexts (after Steam ID, in ChatMessage,
|
||||
// after Combat:/Safety:), and coordinates in multiple shapes (at clause,
|
||||
// bracketed, parenthesised before PvP verb).
|
||||
$input = implode("\n", [
|
||||
// cmd.txt / admin.txt: Steam ID + quoted name + at-clause coords (keyword " at ")
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
// map.txt: Steam ID + quoted name + at-clause float coords
|
||||
'[16-04-26 12:00:01.000] 76561198222222222 "Player2" added IsoObject (fence_01) at 1050.0,2050.0,0.0.',
|
||||
// chat.txt: ChatMessage author
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
// pvp.txt Combat: name + attacker parenthesised coords before "hit"
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
// pvp.txt Safety: name + parenthesised coords before "restore"
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
// ClientActionLog: bracketed Steam ID + action + name + coords bracket
|
||||
'[16-04-26 12:00:02.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "<player>" added Base.Aerosolbomb at 0,0,0.',
|
||||
'[16-04-26 12:00:01.000] 76561198000000000 "<player>" added IsoObject (fence_01) at 0,0,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "<player>" (0,0,0) restore true.',
|
||||
'[16-04-26 12:00:02.000] [76561198000000000][ISEnterVehicle][Player2][0,0,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With all three toggles on, every Steam ID, player name context, and coord shape must be replaced.');
|
||||
}
|
||||
|
||||
public function testSteamIdToggleOffLeavesSteamIdsIntact(): void
|
||||
{
|
||||
// All three PII categories present; Steam ID toggle is disabled.
|
||||
//
|
||||
// Important nuance: PLAYER_AFTER_STEAMID_REGEX anchors on the redacted placeholder
|
||||
// 76561198000000000. With redactSteamIds(false) the raw Steam ID survives, so the
|
||||
// regex does NOT fire for lines in the "after-Steam-ID" shape — those names survive
|
||||
// too. Names anchored by other contexts (ChatMessage author, Combat:/Safety:) are
|
||||
// still redacted because those regexes don't depend on the Steam ID pass.
|
||||
$input = implode("\n", [
|
||||
// after-Steam-ID shape: name will NOT be redacted because the Steam ID is raw
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
// ChatMessage author: still redacted (anchor is independent of Steam ID pass)
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
// Combat: name + attacker coords
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player2" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
// Steam ID intact; "Player1" NOT redacted (anchor regex didn't fire)
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 0,0,0.',
|
||||
// ChatMessage name redacted; coords were an at-clause → redacted
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.",
|
||||
// Combat: name + attacker coords both redacted
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player1" (1006,2005,0) weapon="Pipe Bomb" damage=1.0.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame(
|
||||
$expected,
|
||||
$output,
|
||||
'With Steam ID toggle off: raw Steam IDs survive; PLAYER_AFTER_STEAMID_REGEX does not fire (no placeholder to anchor on) so those names also survive; ChatMessage and Combat:/Safety: names are still redacted; coords are still redacted.',
|
||||
);
|
||||
}
|
||||
|
||||
public function testPlayerNameToggleOffLeavesNamesIntact(): void
|
||||
{
|
||||
// Steam IDs and coords redact; player names survive verbatim.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (1050,2050,0) restore true.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "Player1" added Base.Aerosolbomb at 0,0,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='bye'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "AdminUser" (0,0,0) restore true.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With player-name toggle off, all player names must survive; Steam IDs and coords must still be redacted.');
|
||||
}
|
||||
|
||||
public function testCoordinatesToggleOffLeavesCoordsIntact(): void
|
||||
{
|
||||
// Steam IDs and player names redact; coordinates survive verbatim.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198222222222][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.',
|
||||
]);
|
||||
|
||||
$expected = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000000 "<player>" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198000000000][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "<player>" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Baseball Bat" damage=0.5.',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'With coordinates toggle off, all coord triplets must survive; Steam IDs and player names must still be redacted.');
|
||||
}
|
||||
|
||||
public function testAllTogglesOffReturnsInputByteForByte(): void
|
||||
{
|
||||
// Disabling every toggle must produce an output identical to the input —
|
||||
// the "passthrough" contract: opt-out means truly nothing happens.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player2', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "AdminUser" (1005,2005,0) hit "Player1" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 12:00:01.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With all three toggles disabled, the output must be byte-for-byte identical to the input.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,124 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorCoordinatesTest extends TestCase
|
||||
{
|
||||
public function testRedactsAtClauseCoords(): void
|
||||
{
|
||||
// map.txt / item.txt shape: integer coords following " at " with trailing dot.
|
||||
$input = '[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.';
|
||||
$expected = '[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 0,0,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Integer coords following " at " must be replaced; leading "at " and trailing "." must be preserved.');
|
||||
}
|
||||
|
||||
public function testRedactsAtClauseFloatCoords(): void
|
||||
{
|
||||
// map.txt shape: IsoObject form with float coords (x.x,y.y,z.z).
|
||||
$input = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 1010.0,2010.0,0.0.';
|
||||
$expected = '[16-04-26 12:00:01.000] 76561198000000001 "Player1" added IsoObject (fencing_damaged_01_124) at 0,0,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Float coords following " at " must be replaced; the IsoObject parenthesised form must be unaffected.');
|
||||
}
|
||||
|
||||
public function testRedactsBracketedCoords(): void
|
||||
{
|
||||
// ClientActionLog.txt shape: strict 5-field bracketed structure.
|
||||
// The Steam ID bracket and action/player/param brackets must survive.
|
||||
$input = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].';
|
||||
$expected = '[16-04-26 12:00:02.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Coord bracket must become [0,0,0]; Steam ID, action, player name, and param brackets must be unaffected.');
|
||||
}
|
||||
|
||||
public function testRedactsBracketedNegativeZ(): void
|
||||
{
|
||||
// Basement Z coordinates are negative; the regex must handle the leading minus.
|
||||
$input = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].';
|
||||
$expected = '[16-04-26 12:00:03.000] [76561198000000001][ISEnterVehicle][Player1][0,0,0][Van_LectroMax].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Negative Z (basement level) inside square brackets must be replaced.');
|
||||
}
|
||||
|
||||
public function testRedactsParenthesisedCoordsBeforeHit(): void
|
||||
{
|
||||
// pvp.txt Combat: shape. The attacker coords are followed by ") hit" and ARE
|
||||
// redacted. The victim coords are followed by ") weapon=" and are NOT redacted
|
||||
// in v1 — the trailing-keyword anchor is intentionally absent for that position.
|
||||
$input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
$expected = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
// Attacker coords (before "hit") are redacted; victim coords (before "weapon=") are NOT — deferred to v2.
|
||||
$this->assertSame($expected, $output, 'Attacker coords before "hit" must be replaced; victim coords without a trailing keyword must survive.');
|
||||
}
|
||||
|
||||
public function testRedactsParenthesisedCoordsBeforeSafetyVerb(): void
|
||||
{
|
||||
// pvp.txt Safety: shape; coords followed by ") restore true".
|
||||
$input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.';
|
||||
$expected = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (0,0,0) restore true.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Coords followed by ") restore" must be replaced.');
|
||||
}
|
||||
|
||||
public function testServerMetadataTriplesAreNotRedacted(): void
|
||||
{
|
||||
// DebugLog-server.txt entries contain server-state metadata that superficially
|
||||
// resembles coordinates but is not: "st:48,648,157,584" is a 4-component token,
|
||||
// "t:1776297642406" is a millisecond timestamp. Neither pattern lives inside
|
||||
// brackets, parentheses followed by a PvP verb, or after " at " — so none of
|
||||
// the three coordinate regexes should fire.
|
||||
$input = '[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297642406, st:48,648,157,584> Server starting up.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'Server metadata triples (st:) and millisecond timestamps (t:) must pass through unchanged.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesCoordsIntact(): void
|
||||
{
|
||||
$input = '[16-04-26 12:00:04.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redactCoordinates(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the coordinates toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
/**
|
||||
* Verifies the idempotence property of ProjectZomboidRedactor::redact().
|
||||
*
|
||||
* Idempotence: redact(redact(x)) === redact(x) for all valid inputs.
|
||||
*
|
||||
* A downstream consumer might accidentally double-pipe content through the
|
||||
* Redactor. The result must be stable — a second pass must make no further
|
||||
* changes. If a regex were poorly anchored such that the post-redact placeholder
|
||||
* itself matched and was re-redacted to something different, idempotence would
|
||||
* fail. Specifically, the player-name regex PLAYER_AFTER_STEAMID_REGEX anchors
|
||||
* on 76561198000000000 — the same value the Steam ID pass writes. This test
|
||||
* suite verifies that applying redact() twice is safe: on the second pass, names
|
||||
* already written as <player> do not accidentally re-match and produce a doubly-
|
||||
* nested result like "<player>" → something else.
|
||||
*/
|
||||
class ProjectZomboidRedactorIdempotenceTest extends TestCase
|
||||
{
|
||||
public function testIdempotenceSteamIdOnly(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Players: 76561198111111111, 76561198222222222, 76561198333333333 connected.',
|
||||
'[16-04-26 12:00:00.000] [76561198111111111][ISEnterVehicle][Player1][1000,2000,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to Steam-ID-only input must produce the same result as applying it once.');
|
||||
}
|
||||
|
||||
public function testIdempotencePlayerNamesOnly(): void
|
||||
{
|
||||
// Input already has the Steam ID placeholder in place (as the Steam ID pass
|
||||
// would have written it), so PLAYER_AFTER_STEAMID_REGEX can fire. After the
|
||||
// first pass the name becomes "<player>"; the second pass must leave "<player>"
|
||||
// untouched — it is not a valid display name inside double quotes preceded
|
||||
// by the Steam ID placeholder anchor in a way that would re-match, because
|
||||
// the replacement written is: 76561198000000000 "<player>", and the regex
|
||||
// would need an unquoted player name inside quotes after the placeholder.
|
||||
// "<player>" (with the angle brackets) does satisfy [^"]+ but the second
|
||||
// pass must still produce an identical result.
|
||||
$input = implode("\n", [
|
||||
'76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hi'}.",
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player2" (1000,2000,0) restore true.',
|
||||
]);
|
||||
|
||||
$redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactCoordinates(false);
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to player-name-only input must produce the same result as applying it once.');
|
||||
}
|
||||
|
||||
public function testIdempotenceCoordsOnly(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198000000001 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] [76561198000000001][ISEnterVehicle][Player1][1020,2020,-1][Van_LectroMax].',
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
]);
|
||||
|
||||
$redactor = (new ProjectZomboidRedactor())->redactSteamIds(false)->redactPlayerNames(false);
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to coords-only input must produce the same result as applying it once; the placeholder 0,0,0 must not be re-matched.');
|
||||
}
|
||||
|
||||
public function testIdempotenceAllCategories(): void
|
||||
{
|
||||
// Full input: all three PII categories in multiple lexical contexts.
|
||||
// After the first redact(), every placeholder is in place. The second
|
||||
// redact() must make no further changes.
|
||||
$input = implode("\n", [
|
||||
'[16-04-26 12:00:00.000] 76561198111111111 "Player1" added Base.Aerosolbomb at 1000,2000,0.',
|
||||
'[16-04-26 12:00:01.000] 76561198222222222 "Player2" teleported to 1050,2050,0.',
|
||||
"[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='AdminUser', text='hello'}.",
|
||||
'[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.',
|
||||
'[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.',
|
||||
'[16-04-26 12:00:02.000] [76561198333333333][ISEnterVehicle][Player2][1020,2020,0][Van_LectroMax].',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$redacted = $redactor->redact($input);
|
||||
$redactedAgain = $redactor->redact($redacted);
|
||||
|
||||
$this->assertSame($redacted, $redactedAgain, 'Applying redact() twice to input with all PII categories must produce the same result as applying it once; no placeholder must re-match on the second pass.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,272 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Log\File\PathLogFile;
|
||||
use IndifferentKetchup\Codex\Log\File\StringLogFile;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidAdminLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidBurdJournalsLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidChatLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidClientActionLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidCmdLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidItemLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidMapLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidPerkLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidPvpLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidServerLog;
|
||||
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidUserLog;
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\Attributes\DataProvider;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
/**
|
||||
* Integration tests: drive all 11 existing PZ fixtures through ProjectZomboidRedactor
|
||||
* and verify that the output is well-formed.
|
||||
*
|
||||
* Three properties are checked across all fixtures:
|
||||
*
|
||||
* 1. Steam ID normalisation — no non-zero-placeholder Steam IDs survive.
|
||||
* 2. Structural preservation — parsing the redacted content yields the same
|
||||
* entry count as parsing the original.
|
||||
* 3. Idempotence — applying redact() a second time produces no further changes.
|
||||
*
|
||||
* Known v1 limitations documented inline:
|
||||
*
|
||||
* - pvp.txt: victim names after `hit "..."` are NOT redacted (Task 3 limitation).
|
||||
* Player2 can therefore still appear after `hit` in the redacted pvp output.
|
||||
* - pvp.txt: victim coords after `hit "(x,y,z)"` are NOT redacted (Task 4
|
||||
* limitation). COORDS_PARENTHESISED_REGEX anchors on the trailing PvP verb
|
||||
* which is present only for the attacker bracket.
|
||||
* - admin.txt: `teleported X to <x,y,z>` coords survive because COORDS_AT_CLAUSE_REGEX
|
||||
* anchors on ` at `, not ` to `.
|
||||
*/
|
||||
class ProjectZomboidRedactorIntegrationTest extends TestCase
|
||||
{
|
||||
private static string $fixturesDir = __DIR__ . '/../../../src/Games/ProjectZomboid/fixtures';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Data providers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Yields [fixturePath] for every PZ fixture file.
|
||||
*/
|
||||
public static function fixturePathProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'admin' => [$dir . '/admin-minimal.txt'],
|
||||
'burd-journals' => [$dir . '/burd-journals-minimal.txt'],
|
||||
'chat' => [$dir . '/chat-minimal.txt'],
|
||||
'client-action' => [$dir . '/client-action-minimal.txt'],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt'],
|
||||
'debug-server' => [$dir . '/debug-server-minimal.txt'],
|
||||
'item' => [$dir . '/item-minimal.txt'],
|
||||
'map' => [$dir . '/map-minimal.txt'],
|
||||
'perk' => [$dir . '/perk-minimal.txt'],
|
||||
'pvp' => [$dir . '/pvp-minimal.txt'],
|
||||
'user' => [$dir . '/user-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
/**
|
||||
* Yields [fixturePath] for the subset of fixtures where every synthetic
|
||||
* player name (Player1 / Player2 / AdminUser / PlayerSuspect) appears
|
||||
* exclusively in a context the redactor recognises:
|
||||
*
|
||||
* - chat: ChatMessage{author='...'} envelope
|
||||
* - cmd, item, map, user: 77-char-Steam-ID followed by "..." quoted name
|
||||
*
|
||||
* Fixtures intentionally excluded:
|
||||
*
|
||||
* - admin: names appear in free-text positions (no Steam-ID anchor,
|
||||
* no quotes, no Combat:/Safety: prefix). Names survive in v1.
|
||||
* - client-action,
|
||||
* perk: names appear inside [...] brackets, not "..." quotes.
|
||||
* PLAYER_AFTER_STEAMID_REGEX requires double-quotes.
|
||||
* - pvp: attacker name redacts but victim name after `hit "..."`
|
||||
* survives in v1 (Task 3 limitation).
|
||||
* - burd-journals,
|
||||
* debug-server: no synthetic player names present.
|
||||
*/
|
||||
public static function fixturesWhereAllNamesAreInCoveredContextsProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'chat' => [$dir . '/chat-minimal.txt'],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt'],
|
||||
'item' => [$dir . '/item-minimal.txt'],
|
||||
'map' => [$dir . '/map-minimal.txt'],
|
||||
'user' => [$dir . '/user-minimal.txt'],
|
||||
];
|
||||
}
|
||||
|
||||
/**
|
||||
* Yields [fixturePath, logClass] for the fixtures whose log class parses
|
||||
* them. All 11 fixtures are represented.
|
||||
*/
|
||||
public static function fixtureWithLogClassProvider(): array
|
||||
{
|
||||
$dir = self::$fixturesDir;
|
||||
return [
|
||||
'admin' => [$dir . '/admin-minimal.txt', ProjectZomboidAdminLog::class],
|
||||
'burd-journals' => [$dir . '/burd-journals-minimal.txt', ProjectZomboidBurdJournalsLog::class],
|
||||
'chat' => [$dir . '/chat-minimal.txt', ProjectZomboidChatLog::class],
|
||||
'client-action' => [$dir . '/client-action-minimal.txt', ProjectZomboidClientActionLog::class],
|
||||
'cmd' => [$dir . '/cmd-minimal.txt', ProjectZomboidCmdLog::class],
|
||||
'debug-server' => [$dir . '/debug-server-minimal.txt', ProjectZomboidServerLog::class],
|
||||
'item' => [$dir . '/item-minimal.txt', ProjectZomboidItemLog::class],
|
||||
'map' => [$dir . '/map-minimal.txt', ProjectZomboidMapLog::class],
|
||||
'perk' => [$dir . '/perk-minimal.txt', ProjectZomboidPerkLog::class],
|
||||
'pvp' => [$dir . '/pvp-minimal.txt', ProjectZomboidPvpLog::class],
|
||||
'user' => [$dir . '/user-minimal.txt', ProjectZomboidUserLog::class],
|
||||
];
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
private function redact(string $content): string
|
||||
{
|
||||
return (new ProjectZomboidRedactor())->redact($content);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 1 — Steam ID normalisation
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* After redaction every 17-digit Steam ID that is NOT the zero-placeholder
|
||||
* must be gone. The zero-placeholder itself (76561198000000000) is the only
|
||||
* Steam ID that may remain.
|
||||
*/
|
||||
#[DataProvider('fixturePathProvider')]
|
||||
public function testFixtureContainsNoSteamIdsAfterRedaction(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
$matches = preg_match_all('/(?<![A-Za-z0-9])76561198(?!000000000)\d{9}(?![A-Za-z0-9])/u', $redacted);
|
||||
|
||||
$this->assertSame(
|
||||
0,
|
||||
$matches,
|
||||
sprintf(
|
||||
'After redaction, fixture "%s" must contain no non-zero-placeholder Steam IDs, but %d were found.',
|
||||
basename($fixturePath),
|
||||
$matches,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 2 — Structural preservation (re-parse after redaction)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* The redacted content, fed back through the corresponding parser, must
|
||||
* produce exactly the same number of log entries as the original content.
|
||||
*
|
||||
* This asserts that the redactor does not corrupt timestamps, delimiters,
|
||||
* or structural tokens that the parser relies on.
|
||||
*
|
||||
* @param string $fixturePath Path to the fixture file.
|
||||
* @param class-string<\IndifferentKetchup\Codex\Log\Log> $logClass
|
||||
* Fully-qualified name of the Log subclass that corresponds to this fixture.
|
||||
*/
|
||||
#[DataProvider('fixtureWithLogClassProvider')]
|
||||
public function testFixtureRedactedOutputParsesToSameEntryCount(string $fixturePath, string $logClass): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
|
||||
/** @var \IndifferentKetchup\Codex\Log\Log $originalLog */
|
||||
$originalLog = (new $logClass())->setLogFile(new PathLogFile($fixturePath));
|
||||
$originalLog->parse();
|
||||
$originalCount = count($originalLog->getEntries());
|
||||
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
/** @var \IndifferentKetchup\Codex\Log\Log $redactedLog */
|
||||
$redactedLog = (new $logClass())->setLogFile(new StringLogFile($redacted));
|
||||
$redactedLog->parse();
|
||||
$redactedCount = count($redactedLog->getEntries());
|
||||
|
||||
$this->assertSame(
|
||||
$originalCount,
|
||||
$redactedCount,
|
||||
sprintf(
|
||||
'Parsing the redacted "%s" fixture with %s must yield the same entry count (%d) as parsing the original, but got %d.',
|
||||
basename($fixturePath),
|
||||
$logClass,
|
||||
$originalCount,
|
||||
$redactedCount,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 3 — Idempotence
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Applying redact() a second time must produce no further changes:
|
||||
* redact(redact(content)) === redact(content).
|
||||
*
|
||||
* This guards against poorly-anchored regexes that would re-match the
|
||||
* redaction placeholders themselves on a second pass.
|
||||
*/
|
||||
#[DataProvider('fixturePathProvider')]
|
||||
public function testFixtureIsIdempotent(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($content);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame(
|
||||
$once,
|
||||
$twice,
|
||||
sprintf(
|
||||
'redact(redact(content)) must equal redact(content) for fixture "%s"; a second pass must be a no-op.',
|
||||
basename($fixturePath),
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 4 — Player-name collapse in fully-covered fixtures
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* For fixtures where every synthetic player name appears exclusively in a
|
||||
* context the redactor recognises, no synthetic name should remain after
|
||||
* redaction.
|
||||
*
|
||||
* This addresses observation #3 from the final code review (the integration
|
||||
* tests previously asserted Steam-ID elimination + structural preservation
|
||||
* + idempotence, but did not directly verify name collapse). The unit tests
|
||||
* in ProjectZomboidRedactorPlayerNameTest cover this property exhaustively
|
||||
* per-context; this integration test re-verifies it end-to-end against the
|
||||
* fixtures that ride into iblogs.
|
||||
*/
|
||||
#[DataProvider('fixturesWhereAllNamesAreInCoveredContextsProvider')]
|
||||
public function testFixturePlayerNamesCollapseInCoveredContexts(string $fixturePath): void
|
||||
{
|
||||
$content = (new PathLogFile($fixturePath))->getContent();
|
||||
$redacted = $this->redact($content);
|
||||
|
||||
foreach (['Player1', 'Player2', 'AdminUser', 'PlayerSuspect'] as $name) {
|
||||
$this->assertStringNotContainsString(
|
||||
$name,
|
||||
$redacted,
|
||||
sprintf(
|
||||
'Fixture "%s": synthetic name %s survived redaction. Every name in this fixture should appear only in a covered lexical context.',
|
||||
basename($fixturePath),
|
||||
$name,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
114
test/tests/Util/Redactor/ProjectZomboidRedactorIpv4Test.php
Normal file
114
test/tests/Util/Redactor/ProjectZomboidRedactorIpv4Test.php
Normal file
@@ -0,0 +1,114 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorIpv4Test extends TestCase
|
||||
{
|
||||
public function testRedactsBareIpv4(): void
|
||||
{
|
||||
$input = 'Connection from 192.168.1.1 closed.';
|
||||
$expected = 'Connection from [REDACTED_IP] closed.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsIpv4WithPortSuffix(): void
|
||||
{
|
||||
$input = 'Connected to 10.0.0.42:27015.';
|
||||
$expected = 'Connected to [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsMultipleIpv4OnOneLine(): void
|
||||
{
|
||||
$input = 'Peer 192.168.1.10 -> 192.168.1.20 via 10.0.0.1:8080.';
|
||||
$expected = 'Peer [REDACTED_IP] -> [REDACTED_IP] via [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsLoopbackAndBoundaryAddresses(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'127.0.0.1',
|
||||
'0.0.0.0',
|
||||
'255.255.255.255',
|
||||
]);
|
||||
$expected = implode("\n", [
|
||||
'[REDACTED_IP]',
|
||||
'[REDACTED_IP]',
|
||||
'[REDACTED_IP]',
|
||||
]);
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactOutOfRangeOctets(): void
|
||||
{
|
||||
// 999 is not a valid octet under the 0-255 alternation; the address
|
||||
// must therefore be left untouched.
|
||||
$input = 'Bogus: 999.999.999.999';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactInsideLongerDottedSequence(): void
|
||||
{
|
||||
// Five dotted segments are not an IPv4 address; the lookarounds must
|
||||
// reject any partial match inside the longer sequence.
|
||||
$input = 'Path frag 1.2.3.4.5 should not match.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactThreeSegmentBuildNumbers(): void
|
||||
{
|
||||
// PZ build numbers are 3-segment (e.g. 41.78.16) and must not match.
|
||||
$input = 'Build 41.78.16 starting up.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesIpv4Intact(): void
|
||||
{
|
||||
$input = 'Connection from 192.168.1.1:27015 closed.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactIpAddresses(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testIdempotence(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Connection from 192.168.1.1:27015 closed.',
|
||||
'Peer 10.0.0.42 -> 10.0.0.43 via 172.16.0.1:8080.',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($input);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame($once, $twice);
|
||||
}
|
||||
}
|
||||
135
test/tests/Util/Redactor/ProjectZomboidRedactorIpv6Test.php
Normal file
135
test/tests/Util/Redactor/ProjectZomboidRedactorIpv6Test.php
Normal file
@@ -0,0 +1,135 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorIpv6Test extends TestCase
|
||||
{
|
||||
public function testRedactsFullIpv6(): void
|
||||
{
|
||||
$input = 'Bound 2001:0db8:85a3:0000:0000:8a2e:0370:7334 ok.';
|
||||
$expected = 'Bound [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsAbbreviatedIpv6(): void
|
||||
{
|
||||
$input = 'Server peer 2001:db8::1 connected.';
|
||||
$expected = 'Server peer [REDACTED_IP] connected.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsLoopbackIpv6(): void
|
||||
{
|
||||
$input = 'localhost ::1 reachable.';
|
||||
$expected = 'localhost [REDACTED_IP] reachable.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsBracketedIpv6WithPort(): void
|
||||
{
|
||||
$input = 'Bound to [2001:db8::1]:8080 ok.';
|
||||
$expected = 'Bound to [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsBracketedLoopbackWithPort(): void
|
||||
{
|
||||
$input = 'Listening on [::1]:27015.';
|
||||
$expected = 'Listening on [REDACTED_IP].';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testRedactsIpv4MappedIpv6(): void
|
||||
{
|
||||
// IPv4-mapped form must be handled by the IPv6 pass before the IPv4
|
||||
// pass so the leading "::ffff:" doesn't get orphaned. With the IPv6
|
||||
// pass first, the whole token collapses into a single placeholder.
|
||||
$input = 'Mapped ::ffff:192.168.1.1 ok.';
|
||||
$expected = 'Mapped [REDACTED_IP] ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactJavaScopeOperator(): void
|
||||
{
|
||||
// Java method references and PHP scope operators look superficially
|
||||
// like leading-:: IPv6 forms but fail filter_var validation; the
|
||||
// word-boundary lookbehind also rejects matches that follow letters.
|
||||
$input = 'Foo::bar called Object::toString.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactTimestampShape(): void
|
||||
{
|
||||
// PZ log timestamps include hh:mm:ss.v segments which match the coarse
|
||||
// IPv6 candidate pattern but are rejected by filter_var.
|
||||
$input = '[16-04-26 12:00:00.000][LOG] startup complete';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testDoesNotRedactSteamIdAsIpv6(): void
|
||||
{
|
||||
// 17-digit Steam IDs share no characters with IPv6 syntax, but assert
|
||||
// explicitly so a future change to the IPv6 regex doesn't accidentally
|
||||
// collide with the Steam ID pass.
|
||||
$input = 'Player 76561198111111111 joined.';
|
||||
$expected = 'Player 76561198000000000 joined.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output);
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesIpv6Intact(): void
|
||||
{
|
||||
$input = 'Bound to [2001:db8::1]:8080 ok.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactIpAddresses(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output);
|
||||
}
|
||||
|
||||
public function testIdempotence(): void
|
||||
{
|
||||
$input = implode("\n", [
|
||||
'Server peer 2001:db8::1 connected.',
|
||||
'Listening on [::1]:27015.',
|
||||
'Mapped ::ffff:192.168.1.1 ok.',
|
||||
'[16-04-26 12:00:00.000][LOG] startup complete',
|
||||
]);
|
||||
|
||||
$redactor = new ProjectZomboidRedactor();
|
||||
$once = $redactor->redact($input);
|
||||
$twice = $redactor->redact($once);
|
||||
|
||||
$this->assertSame($once, $twice);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,93 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorPlayerNameTest extends TestCase
|
||||
{
|
||||
public function testRedactsPlayerNameAfterRedactedSteamId(): void
|
||||
{
|
||||
// The Steam ID pass has already run; the literal placeholder 76561198000000000
|
||||
// precedes the quoted name. The player-name pass must redact the name.
|
||||
$input = '76561198000000000 "AdminUser" admin.broadcastMessage @ 1020,2020,0.';
|
||||
$expected = '76561198000000000 "<player>" admin.broadcastMessage @ 1020,2020,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Player name following the redacted Steam ID placeholder must be replaced.');
|
||||
}
|
||||
|
||||
public function testRedactsChatMessageAuthor(): void
|
||||
{
|
||||
// The author field inside ChatMessage{...} must be replaced; the text
|
||||
// payload ('hello') is not in scope for player-name redaction and must
|
||||
// survive unchanged.
|
||||
$input = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='Player1', text='hello'}.";
|
||||
$expected = "[16-04-26 17:05:03.280][info] Got message:ChatMessage{chat=Local, author='<player>', text='hello'}.";
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'ChatMessage author must be replaced while the text payload remains unchanged.');
|
||||
}
|
||||
|
||||
public function testRedactsCombatNameInPvpLog(): void
|
||||
{
|
||||
// Only the FIRST quoted name (after "Combat: ") is redacted in v1.
|
||||
// The second name (after "hit") is NOT yet redacted — deferred to v2.
|
||||
// The weapon name ("Tire Iron (Worn)") must also survive unchanged.
|
||||
$input = '[16-04-26 17:14:35.128][INFO] Combat: "Player1" (1005,2005,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
// Attacker coords (before "hit") are also replaced by the coordinates pass.
|
||||
// Victim coords (before "weapon=") lack the trailing keyword and are NOT replaced — deferred to v2.
|
||||
$expected = '[16-04-26 17:14:35.128][INFO] Combat: "<player>" (0,0,0) hit "Player2" (1006,2005,0) weapon="Tire Iron (Worn)" damage=0.112317.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
// Player1 (after "Combat: ") is replaced; attacker coords (before "hit") are also replaced.
|
||||
// Player2 (after "hit") and victim coords (before "weapon=") are NOT replaced in v1 — deferred.
|
||||
$this->assertSame($expected, $output, 'First Combat: player name and attacker coords must be replaced; second name, victim coords, and weapon must survive.');
|
||||
}
|
||||
|
||||
public function testRedactsSafetyNameInPvpLog(): void
|
||||
{
|
||||
$input = '[16-04-26 16:17:49.731][LOG] Safety: "Player1" (1000,2000,0) restore true.';
|
||||
// Coords (before ") restore") are also replaced by the coordinates pass.
|
||||
$expected = '[16-04-26 16:17:49.731][LOG] Safety: "<player>" (0,0,0) restore true.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'Player name and coords following the Safety: token must both be replaced.');
|
||||
}
|
||||
|
||||
public function testBareQuotedStringWithoutAnchorIsNotTouched(): void
|
||||
{
|
||||
// "foo" is not preceded by a redacted Steam ID, not inside ChatMessage{...},
|
||||
// and not after Combat:/Safety: — it must pass through unchanged.
|
||||
$input = 'option changed to "foo" successfully.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'A quoted string with no matching anchor must not be redacted.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesNamesIntact(): void
|
||||
{
|
||||
$input = '76561198000000000 "Player1" ISLogSystem.writeLog @ 1000,2000,0.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redactPlayerNames(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the player-name toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
<?php
|
||||
|
||||
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
|
||||
|
||||
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
|
||||
use PHPUnit\Framework\TestCase;
|
||||
|
||||
class ProjectZomboidRedactorSteamIdTest extends TestCase
|
||||
{
|
||||
public function testCollapsesDistinctSteamIdsToZeroPlaceholder(): void
|
||||
{
|
||||
$input = 'Players: 76561198111111111, 76561198222222222, 76561198333333333 connected.';
|
||||
$expected = 'Players: 76561198000000000, 76561198000000000, 76561198000000000 connected.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($expected, $output, 'All three distinct Steam IDs should be replaced with the zero placeholder.');
|
||||
}
|
||||
|
||||
public function testNonSteamIdLongDigitsAreNotTouched(): void
|
||||
{
|
||||
// 13-digit Unix-millisecond timestamp (PZ log t: shape) and a 17-digit number
|
||||
// that does not begin with 76561198 — neither should be altered.
|
||||
$input = 't:1776297642406 score=12345678901234567';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'Non-SteamID digit sequences must not be modified.');
|
||||
}
|
||||
|
||||
public function testEmbeddedSteamIdInsideLongerAlphanumericTokenIsNotTouched(): void
|
||||
{
|
||||
// The SteamID64 pattern is embedded inside a longer alphanumeric token;
|
||||
// the negative lookaround boundaries should prevent a match.
|
||||
$input = 'token=abc76561198000000001def other=data';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'A Steam ID embedded inside an alphanumeric token must not be redacted.');
|
||||
}
|
||||
|
||||
public function testToggleOffLeavesSteamIdsIntact(): void
|
||||
{
|
||||
$input = 'Connected: 76561198111111111 and 76561198222222222.';
|
||||
|
||||
$output = (new ProjectZomboidRedactor())
|
||||
->redactSteamIds(false)
|
||||
->redact($input);
|
||||
|
||||
$this->assertSame($input, $output, 'With the Steam ID toggle disabled the original input must be returned unchanged.');
|
||||
}
|
||||
}
|
||||
310
tools/pz-analyzer/pz_classify.py
Normal file
310
tools/pz-analyzer/pz_classify.py
Normal file
@@ -0,0 +1,310 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pz_classify.py — Deterministic Project Zomboid log classifier orchestrator.
|
||||
|
||||
Walks ``*DebugLog-server*.txt`` files under the redacted-logs directory,
|
||||
runs the pz_parser pipeline per file, merges records cross-file by their
|
||||
deterministic ``signature``, and emits the spec-shaped JSON report.
|
||||
|
||||
Companion to the existing Qwen-backed discovery tool ``pz_error_analysis.py``
|
||||
(left untouched). Zero AI dependency, stdlib-only, runs in seconds.
|
||||
|
||||
By convention the input is always the redacted directory produced by
|
||||
``pz_redact_all.sh``; ``meta.redacted`` is therefore hard-coded ``true``.
|
||||
If the user overrides ``--input`` to a non-redacted source we still emit
|
||||
``true`` because we have no upstream way to verify redaction status.
|
||||
|
||||
Pipeline:
|
||||
parser.parse_file per-file Entry list
|
||||
parser.classify_entries per-file deduped Record list
|
||||
_merge_cross_file global Record list deduped across files
|
||||
_build_summary top-line stats + by_kind / by_attribution / top_mods
|
||||
|
||||
Output schema, CLI flags, and aggregation rules are defined in
|
||||
``docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import dataclasses
|
||||
import json
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from pz_parser import (
|
||||
MAX_CAUSE_CHAIN_LEVELS,
|
||||
MAX_STACK_FRAMES,
|
||||
SEVERITY_LEVELS,
|
||||
Record,
|
||||
classify_entries,
|
||||
parse_file,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Defaults / constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
DEFAULT_INPUT: Path = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
|
||||
DEFAULT_OUT: Path = _REPO_ROOT / ".scratch" / "pz" / "classify.json"
|
||||
|
||||
#: Filename glob driving the directory walk.
|
||||
INPUT_GLOB: str = "*DebugLog-server*.txt"
|
||||
#: Cap on entries in ``summary.top_mods`` — most occurrence-count-heavy mods.
|
||||
TOP_MODS_LIMIT: int = 10
|
||||
|
||||
#: Confidence / attribution promotion ladders (higher rank wins on merge).
|
||||
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
|
||||
_ATTRIBUTION_RANK: dict[str, int] = {
|
||||
"unattributed": 0,
|
||||
"inferred": 1,
|
||||
"direct": 2,
|
||||
}
|
||||
#: Levels that count as errors (vs warnings) in the summary.
|
||||
_ERROR_LEVELS: frozenset[str] = frozenset({"ERROR", "SEVERE", "FATAL"})
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cross-file aggregation (spec §9, inter-file equivalent of parser dedup)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _merge_cross_file(per_file_records: list[Record]) -> list[Record]:
|
||||
"""Merge ``Record`` instances across files by ``signature``.
|
||||
|
||||
The parser already dedups within a single file. This is the inter-file
|
||||
equivalent: when the same signature appears in records from multiple
|
||||
files, sum occurrences, union file lists, promote attribution/confidence,
|
||||
and merge stack and cause-chain (deduped, capped at parser constants).
|
||||
First-seen is the earliest by file-then-line; since callers feed records
|
||||
in sorted file order, the first record we encounter per signature is
|
||||
already the earliest.
|
||||
"""
|
||||
by_signature: dict[str, Record] = {}
|
||||
for incoming in per_file_records:
|
||||
existing = by_signature.get(incoming.signature)
|
||||
if existing is None:
|
||||
# First occurrence — copy so we don't mutate the caller's list.
|
||||
by_signature[incoming.signature] = Record(
|
||||
signature=incoming.signature,
|
||||
pattern_id=incoming.pattern_id,
|
||||
level=incoming.level,
|
||||
kind=incoming.kind,
|
||||
mod_id=incoming.mod_id,
|
||||
mod_name=incoming.mod_name,
|
||||
attribution=incoming.attribution,
|
||||
confidence=incoming.confidence,
|
||||
attribution_reason=incoming.attribution_reason,
|
||||
file=incoming.file,
|
||||
line=incoming.line,
|
||||
cause_chain=incoming.cause_chain,
|
||||
stack=list(incoming.stack),
|
||||
first_seen=incoming.first_seen,
|
||||
occurrence_count=incoming.occurrence_count,
|
||||
files=list(incoming.files),
|
||||
excerpt=incoming.excerpt,
|
||||
)
|
||||
continue
|
||||
# Aggregate.
|
||||
existing.occurrence_count += incoming.occurrence_count
|
||||
for fname in incoming.files:
|
||||
if fname not in existing.files:
|
||||
existing.files.append(fname)
|
||||
# Promote attribution / confidence / mod_name on stronger evidence.
|
||||
if _ATTRIBUTION_RANK[incoming.attribution] > _ATTRIBUTION_RANK[existing.attribution]:
|
||||
existing.attribution = incoming.attribution
|
||||
existing.attribution_reason = incoming.attribution_reason
|
||||
if incoming.mod_name:
|
||||
existing.mod_name = incoming.mod_name
|
||||
if _CONFIDENCE_RANK[incoming.confidence] > _CONFIDENCE_RANK[existing.confidence]:
|
||||
existing.confidence = incoming.confidence
|
||||
# Merge stack frames preserving order, capped.
|
||||
for frame in incoming.stack:
|
||||
if frame not in existing.stack and len(existing.stack) < MAX_STACK_FRAMES:
|
||||
existing.stack.append(frame)
|
||||
# Merge cause chain (deduped tokens, capped).
|
||||
if incoming.cause_chain and incoming.cause_chain != existing.cause_chain:
|
||||
old = existing.cause_chain.split(" -> ") if existing.cause_chain else []
|
||||
new = incoming.cause_chain.split(" -> ")
|
||||
merged = list(old)
|
||||
for tok in new:
|
||||
if tok and tok not in merged:
|
||||
merged.append(tok)
|
||||
existing.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
return list(by_signature.values())
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Summary computation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _build_summary(records: list[Record]) -> dict[str, object]:
|
||||
"""Build the ``summary`` block per spec.
|
||||
|
||||
Counts records (signatures), not raw occurrences, except for ``top_mods``
|
||||
which sums ``occurrence_count`` per mod_id so that volume-driving mods
|
||||
surface even when they hit the same shape repeatedly.
|
||||
"""
|
||||
errors = sum(1 for r in records if r.level in _ERROR_LEVELS)
|
||||
warnings = sum(1 for r in records if r.level == "WARN")
|
||||
by_kind = Counter(r.kind for r in records)
|
||||
by_attribution = Counter(r.attribution for r in records)
|
||||
by_confidence = Counter(r.confidence for r in records)
|
||||
|
||||
# Group by mod_id summing total occurrence_count; preserve any mod_name.
|
||||
mod_totals: dict[str, int] = {}
|
||||
mod_names: dict[str, str] = {}
|
||||
for r in records:
|
||||
mod_totals[r.mod_id] = mod_totals.get(r.mod_id, 0) + r.occurrence_count
|
||||
# First non-empty mod_name wins; subsequent records may have empty
|
||||
# mod_name (e.g. for unattributed) so don't overwrite with "".
|
||||
if r.mod_name and r.mod_id not in mod_names:
|
||||
mod_names[r.mod_id] = r.mod_name
|
||||
top_mods = sorted(
|
||||
(
|
||||
{
|
||||
"mod_id": mod_id,
|
||||
"mod_name": mod_names.get(mod_id, ""),
|
||||
"occurrence_count": total,
|
||||
}
|
||||
for mod_id, total in mod_totals.items()
|
||||
),
|
||||
key=lambda d: d["occurrence_count"],
|
||||
reverse=True,
|
||||
)[:TOP_MODS_LIMIT]
|
||||
|
||||
return {
|
||||
"errors": errors,
|
||||
"warnings": warnings,
|
||||
"by_kind": dict(by_kind),
|
||||
"by_attribution": dict(by_attribution),
|
||||
"by_confidence": dict(by_confidence),
|
||||
"top_mods": top_mods,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Driver
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _run(input_dir: Path, out_path: Path, *, quiet: bool) -> int:
|
||||
if not input_dir.is_dir():
|
||||
print(
|
||||
f"pz_classify: --input directory not found: {input_dir}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 2
|
||||
|
||||
started = datetime.now(timezone.utc).isoformat(timespec="seconds")
|
||||
files = sorted(input_dir.glob(INPUT_GLOB))
|
||||
|
||||
all_records: list[Record] = []
|
||||
log_lines_total = 0
|
||||
error_lines_total = 0
|
||||
|
||||
for path in files:
|
||||
try:
|
||||
entries = parse_file(path)
|
||||
except Exception as exc: # noqa: BLE001 — orchestrator must keep going.
|
||||
print(
|
||||
f"pz_classify: warning: failed to parse {path.name}: {exc}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
continue
|
||||
# Body-line totals: every line under every parsed entry contributes
|
||||
# to log_lines_total; severity-level entries' body lines feed
|
||||
# error_lines_total. Counted before dedup so it reflects raw volume.
|
||||
for e in entries:
|
||||
log_lines_total += len(e.body)
|
||||
if e.level in SEVERITY_LEVELS:
|
||||
error_lines_total += len(e.body)
|
||||
all_records.extend(classify_entries(entries, source_file=path.name))
|
||||
|
||||
merged = _merge_cross_file(all_records)
|
||||
merged.sort(key=lambda r: r.occurrence_count, reverse=True)
|
||||
|
||||
finished = datetime.now(timezone.utc).isoformat(timespec="seconds")
|
||||
|
||||
unique_patterns = len({r.pattern_id for r in merged})
|
||||
|
||||
document: dict[str, object] = {
|
||||
"meta": {
|
||||
"input_dir": str(input_dir),
|
||||
"files_scanned": len(files),
|
||||
"log_lines_total": log_lines_total,
|
||||
"error_lines_total": error_lines_total,
|
||||
"unique_signatures": len(merged),
|
||||
"unique_patterns": unique_patterns,
|
||||
"redacted": True,
|
||||
"started": started,
|
||||
"finished": finished,
|
||||
},
|
||||
"signatures": [dataclasses.asdict(r) for r in merged],
|
||||
"summary": _build_summary(merged),
|
||||
}
|
||||
|
||||
tmp = out_path.with_suffix(out_path.suffix + ".tmp")
|
||||
try:
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with tmp.open("w", encoding="utf-8") as f:
|
||||
json.dump(document, f, ensure_ascii=False, indent=2)
|
||||
f.write("\n")
|
||||
tmp.replace(out_path)
|
||||
except OSError as exc:
|
||||
print(f"pz_classify: failed to write {out_path}: {exc}", file=sys.stderr)
|
||||
# Best-effort cleanup of the temp file.
|
||||
try:
|
||||
tmp.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
return 1
|
||||
|
||||
if not quiet:
|
||||
print(
|
||||
f"pz_classify: {len(files)} file(s), {log_lines_total} log lines, "
|
||||
f"{error_lines_total} error lines, {len(merged)} records "
|
||||
f"({unique_patterns} unique patterns) -> {out_path}"
|
||||
)
|
||||
return 0
|
||||
|
||||
|
||||
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="pz_classify",
|
||||
description=(
|
||||
"Deterministic Project Zomboid log classifier. Walks redacted "
|
||||
"DebugLog-server*.txt files, classifies errors/warnings, and "
|
||||
"emits a JSON report."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input",
|
||||
type=Path,
|
||||
default=DEFAULT_INPUT,
|
||||
help=f"Input directory of redacted log files (default: {DEFAULT_INPUT}).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--out",
|
||||
type=Path,
|
||||
default=DEFAULT_OUT,
|
||||
help=f"Output JSON path (default: {DEFAULT_OUT}).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--quiet",
|
||||
action="store_true",
|
||||
help="Suppress the trailing one-line summary.",
|
||||
)
|
||||
return parser.parse_args(argv)
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
args = _parse_args(argv)
|
||||
return _run(args.input, args.out, quiet=args.quiet)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
467
tools/pz-analyzer/pz_error_analysis.py
Normal file
467
tools/pz-analyzer/pz_error_analysis.py
Normal file
@@ -0,0 +1,467 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
pz_error_analysis.py — Qwen-backed Project Zomboid error analyzer.
|
||||
|
||||
Walks `*DebugLog-server*.txt` files (DEFAULT_INPUT — already PII-redacted by
|
||||
pz_redact_all.sh), groups WARN/ERROR/FATAL entries with surrounding context,
|
||||
deduplicates by signature hash, and asks Qwen to classify each unique
|
||||
signature into a fixed taxonomy (missing_mod, java_exception, lua_error,
|
||||
out_of_memory, ...) with a short title / summary / likely_cause /
|
||||
suggested_fix / confidence.
|
||||
|
||||
Standalone: requires Python 3.10+ and the `openai` package
|
||||
(`pip install openai>=1.30`). Talks to a local OpenAI-compatible endpoint
|
||||
(default sam-desktop llama-swap on port 8401); override with QWEN_BASE_URL
|
||||
and QWEN_MODEL env vars.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import datetime as dt
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator
|
||||
|
||||
from openai import OpenAI
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
|
||||
DEFAULT_INPUT = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
|
||||
DEFAULT_OUT = _REPO_ROOT / ".scratch" / "pz" / "analysis.json"
|
||||
|
||||
# --- Qwen client (inlined from /opt/analytics/ib_analytics/llm/local_client.py
|
||||
# so this script has no cross-repo dependency; mirror upstream changes if
|
||||
# the analytics client API evolves) ---
|
||||
|
||||
QWEN_DEFAULT_BASE_URL = "http://100.101.41.16:8401/v1"
|
||||
QWEN_DEFAULT_MODEL = "qwen3.6-35b-a3b"
|
||||
|
||||
SAMPLING_STRUCTURED: dict[str, Any] = {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.80,
|
||||
"extra_body": {
|
||||
"top_k": 20,
|
||||
"presence_penalty": 1.5,
|
||||
"chat_template_kwargs": {"enable_thinking": False},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def get_client() -> OpenAI:
|
||||
return OpenAI(
|
||||
base_url=os.environ.get("QWEN_BASE_URL", QWEN_DEFAULT_BASE_URL),
|
||||
api_key="EMPTY",
|
||||
)
|
||||
|
||||
|
||||
def get_model() -> str:
|
||||
return os.environ.get("QWEN_MODEL", QWEN_DEFAULT_MODEL)
|
||||
|
||||
|
||||
def structured_call(
|
||||
tool_schema: dict[str, Any],
|
||||
messages: list[dict[str, Any]],
|
||||
*,
|
||||
sampling: dict[str, Any] = SAMPLING_STRUCTURED,
|
||||
client: OpenAI | None = None,
|
||||
model: str | None = None,
|
||||
max_tokens: int = 4096,
|
||||
) -> dict[str, Any]:
|
||||
cli = client or get_client()
|
||||
mdl = model or get_model()
|
||||
fn_name = tool_schema["function"]["name"]
|
||||
kwargs = dict(sampling)
|
||||
extra_body = dict(kwargs.pop("extra_body", {}))
|
||||
response = cli.chat.completions.create(
|
||||
model=mdl,
|
||||
messages=messages,
|
||||
tools=[tool_schema],
|
||||
tool_choice="required",
|
||||
max_tokens=max_tokens,
|
||||
extra_body=extra_body,
|
||||
**kwargs,
|
||||
)
|
||||
choice = response.choices[0]
|
||||
tool_calls = getattr(choice.message, "tool_calls", None) or []
|
||||
if not tool_calls:
|
||||
raise ValueError(
|
||||
f"Qwen did not invoke {fn_name}; finish_reason={choice.finish_reason}, "
|
||||
f"content={(choice.message.content or '')[:500]}"
|
||||
)
|
||||
call = tool_calls[0]
|
||||
if call.function.name != fn_name:
|
||||
raise ValueError(
|
||||
f"Qwen invoked unexpected tool {call.function.name!r}; expected {fn_name!r}"
|
||||
)
|
||||
try:
|
||||
return json.loads(call.function.arguments)
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(
|
||||
f"Malformed tool-call arguments for {fn_name}: {e}; "
|
||||
f"raw={call.function.arguments[:500]}"
|
||||
) from e
|
||||
|
||||
|
||||
# --- Parser ---
|
||||
|
||||
ENTRY_RE = re.compile(
|
||||
r"^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
|
||||
r"(LOG|WARN|ERROR|FATAL)\s*:\s*(.*)"
|
||||
)
|
||||
SESSION_META_RE = re.compile(r"^[A-Za-z]+\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*")
|
||||
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
|
||||
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
|
||||
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
|
||||
WS_RUN_RE = re.compile(r"\s+")
|
||||
|
||||
CATEGORIES = [
|
||||
"missing_mod", "mod_conflict", "lua_error", "java_exception",
|
||||
"out_of_memory", "corrupt_save", "network_error", "load_order",
|
||||
"performance", "server_crash", "unknown",
|
||||
]
|
||||
|
||||
TOOL_SCHEMA: dict[str, Any] = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "submit_error_analysis",
|
||||
"description": (
|
||||
"Analyse a single Project Zomboid server error block and emit "
|
||||
"structured insight."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"category": {"type": "string", "enum": CATEGORIES},
|
||||
"severity": {"type": "string", "enum": ["problem", "warning", "info"]},
|
||||
"title": {"type": "string", "description": "One-line headline (<=80 chars)"},
|
||||
"summary": {"type": "string", "description": "1-3 sentences explaining what happened"},
|
||||
"likely_cause": {"type": "string", "description": "Most plausible cause given the context"},
|
||||
"suggested_fix": {"type": "string", "description": "Concrete remediation, server-admin actionable"},
|
||||
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
|
||||
},
|
||||
"required": [
|
||||
"category", "severity", "title", "summary",
|
||||
"likely_cause", "suggested_fix", "confidence",
|
||||
],
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
SYSTEM_PROMPT = """You are a Project Zomboid dedicated server administrator
|
||||
diagnosing a server log. You receive one error/warning event with surrounding
|
||||
context (entries marked with `>>>` are the hit; the rest are leading or
|
||||
trailing context). Classify the event using the submit_error_analysis tool
|
||||
ONLY — never reply in plain text.
|
||||
|
||||
Rules:
|
||||
- `category` must be one of the enum values; choose `unknown` only if no
|
||||
other fits.
|
||||
- `severity`: problem = breaks something users notice; warning = degraded
|
||||
but functional; info = noteworthy but not failing.
|
||||
- `title`: at most 80 chars, neutral and specific.
|
||||
- `suggested_fix`: a concrete admin action ("subscribe to mod X", "increase
|
||||
-Xmx to 8G", "remove the conflicting mod from Mods= line"), not generic
|
||||
advice.
|
||||
- `confidence`: 0.0-1.0; lower it when the evidence is ambiguous.
|
||||
"""
|
||||
|
||||
MAX_PROMPT_CHARS = 4000
|
||||
|
||||
|
||||
def parse_file(path: Path) -> list[dict[str, Any]]:
|
||||
"""Parse a DebugLog-server file into a list of multi-line entries.
|
||||
|
||||
Continuation lines (lines that don't match ENTRY_RE) append to the
|
||||
previous entry, mirroring codex's PatternParser behaviour.
|
||||
"""
|
||||
entries: list[dict[str, Any]] = []
|
||||
current: dict[str, Any] | None = None
|
||||
with path.open("r", encoding="utf-8", errors="replace") as f:
|
||||
for lineno, raw in enumerate(f, start=1):
|
||||
line = raw.rstrip("\n")
|
||||
m = ENTRY_RE.match(line)
|
||||
if m:
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
current = {
|
||||
"timestamp": m.group(1),
|
||||
"level": m.group(2),
|
||||
"body": [m.group(3)],
|
||||
"line_start": lineno,
|
||||
"line_end": lineno,
|
||||
}
|
||||
elif current is not None:
|
||||
current["body"].append(line)
|
||||
current["line_end"] = lineno
|
||||
# else: orphan line at start of file (no preceding entry); ignore.
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
return entries
|
||||
|
||||
|
||||
def signature_for(level: str, body_lines: list[str]) -> str:
|
||||
"""Stable signature derived from the first body line only.
|
||||
|
||||
Stack-trace continuations are deliberately ignored: the same logical
|
||||
exception can produce slightly different traces (e.g. timing-related
|
||||
code paths) but should still collapse to one signature. Quoted strings
|
||||
(vehicle names, mod IDs, paths) are flattened to <S>; numeric runs of
|
||||
length >= 2 are flattened to <N>; session-metadata prefix
|
||||
(`General f:0,t:N,st:N,N,N>`) is stripped.
|
||||
"""
|
||||
first = (body_lines[0] if body_lines else "").strip()
|
||||
first = SESSION_META_RE.sub("", first)
|
||||
first = DOUBLE_QUOTED_RE.sub('"<S>"', first)
|
||||
first = SINGLE_QUOTED_RE.sub("'<S>'", first)
|
||||
first = NUMERIC_RUN_RE.sub("<N>", first)
|
||||
first = WS_RUN_RE.sub(" ", first)
|
||||
first = first[:200]
|
||||
h = hashlib.sha256(f"{level}\n{first}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
def build_excerpt(
|
||||
entries: list[dict[str, Any]], hit_idx: int, context: int
|
||||
) -> str:
|
||||
"""Render an excerpt centered on entries[hit_idx] with ±context entries."""
|
||||
start = max(0, hit_idx - context)
|
||||
end = min(len(entries), hit_idx + context + 1)
|
||||
lines: list[str] = []
|
||||
for i in range(start, end):
|
||||
e = entries[i]
|
||||
is_hit = i == hit_idx
|
||||
marker = ">>>" if is_hit else " "
|
||||
prefix = f'{marker} [{e["timestamp"]}] {e["level"]}: '
|
||||
body = e["body"]
|
||||
if is_hit:
|
||||
for j, body_line in enumerate(body):
|
||||
lines.append(prefix + body_line if j == 0 else " " + body_line)
|
||||
else:
|
||||
first = (body[0] if body else "").strip()[:200]
|
||||
lines.append(prefix + first)
|
||||
if len(body) > 1:
|
||||
lines.append(f' ... (+{len(body) - 1} more lines)')
|
||||
excerpt = "\n".join(lines)
|
||||
if len(excerpt) > MAX_PROMPT_CHARS:
|
||||
excerpt = excerpt[:MAX_PROMPT_CHARS] + "\n... [truncated]"
|
||||
return excerpt
|
||||
|
||||
|
||||
def iter_warn_or_error(entries: list[dict[str, Any]]) -> Iterator[int]:
|
||||
for i, e in enumerate(entries):
|
||||
if e["level"] in ("WARN", "ERROR", "FATAL"):
|
||||
yield i
|
||||
|
||||
|
||||
def collect_signatures(
|
||||
input_dir: Path, context: int
|
||||
) -> tuple[dict[str, dict[str, Any]], dict[str, int]]:
|
||||
"""Walk DebugLog-server files and collect dedup'd signatures."""
|
||||
signatures: dict[str, dict[str, Any]] = {}
|
||||
files_scanned = 0
|
||||
log_lines_total = 0
|
||||
error_lines_total = 0
|
||||
|
||||
for path in sorted(input_dir.glob("*DebugLog-server*.txt")):
|
||||
files_scanned += 1
|
||||
entries = parse_file(path)
|
||||
log_lines_total += sum(len(e["body"]) for e in entries)
|
||||
for hit_idx in iter_warn_or_error(entries):
|
||||
hit = entries[hit_idx]
|
||||
error_lines_total += len(hit["body"])
|
||||
sig = signature_for(hit["level"], hit["body"])
|
||||
occurrence = {
|
||||
"file": path.name,
|
||||
"line": hit["line_start"],
|
||||
"timestamp": hit["timestamp"],
|
||||
}
|
||||
if sig not in signatures:
|
||||
signatures[sig] = {
|
||||
"signature": sig,
|
||||
"level": hit["level"],
|
||||
"first_seen": occurrence,
|
||||
"occurrence_count": 1,
|
||||
"files": [path.name],
|
||||
"excerpt": build_excerpt(entries, hit_idx, context),
|
||||
}
|
||||
else:
|
||||
rec = signatures[sig]
|
||||
rec["occurrence_count"] += 1
|
||||
if path.name not in rec["files"]:
|
||||
rec["files"].append(path.name)
|
||||
return signatures, {
|
||||
"files_scanned": files_scanned,
|
||||
"log_lines_total": log_lines_total,
|
||||
"error_lines_total": error_lines_total,
|
||||
}
|
||||
|
||||
|
||||
def call_qwen(client: OpenAI, model: str, sig_rec: dict[str, Any]) -> dict[str, Any]:
|
||||
user_prompt = (
|
||||
f'Level: {sig_rec["level"]}\n'
|
||||
f'First seen: {sig_rec["first_seen"]["file"]} '
|
||||
f'line {sig_rec["first_seen"]["line"]}\n'
|
||||
f'Occurrences across this run: {sig_rec["occurrence_count"]} '
|
||||
f'(across {len(sig_rec["files"])} file(s))\n\n'
|
||||
f'Log excerpt:\n{sig_rec["excerpt"]}'
|
||||
)
|
||||
return structured_call(
|
||||
TOOL_SCHEMA,
|
||||
[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": user_prompt},
|
||||
],
|
||||
sampling=SAMPLING_STRUCTURED,
|
||||
client=client,
|
||||
model=model,
|
||||
)
|
||||
|
||||
|
||||
def atomic_write(path: Path, payload: Any) -> None:
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = path.with_suffix(path.suffix + ".tmp")
|
||||
with tmp.open("w", encoding="utf-8") as f:
|
||||
json.dump(payload, f, indent=2, ensure_ascii=False)
|
||||
tmp.replace(path)
|
||||
|
||||
|
||||
def load_existing(path: Path) -> dict[str, dict[str, Any]]:
|
||||
"""Reload signatures previously written to --out.
|
||||
|
||||
Only signatures with an `llm` field count as completed. Bare records
|
||||
(left behind when --limit truncated a prior run) get re-attempted on
|
||||
resume so progressive analysis converges.
|
||||
"""
|
||||
if not path.exists():
|
||||
return {}
|
||||
try:
|
||||
with path.open("r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
return {
|
||||
s["signature"]: s
|
||||
for s in data.get("signatures", [])
|
||||
if "signature" in s and "llm" in s
|
||||
}
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
|
||||
def summarise(analyzed: list[dict[str, Any]]) -> dict[str, Any]:
|
||||
sev_counts = {"problem": 0, "warning": 0, "info": 0}
|
||||
by_cat: dict[str, int] = {}
|
||||
for s in analyzed:
|
||||
llm = s.get("llm") or {}
|
||||
sev = llm.get("severity")
|
||||
cat = llm.get("category")
|
||||
if sev in sev_counts:
|
||||
sev_counts[sev] += 1
|
||||
if cat:
|
||||
by_cat[cat] = by_cat.get(cat, 0) + 1
|
||||
return {
|
||||
"problems": sev_counts["problem"],
|
||||
"warnings": sev_counts["warning"],
|
||||
"info": sev_counts["info"],
|
||||
"by_category": by_cat,
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description=__doc__)
|
||||
ap.add_argument("--input", type=Path, default=DEFAULT_INPUT)
|
||||
ap.add_argument("--out", type=Path, default=DEFAULT_OUT)
|
||||
ap.add_argument("--context", type=int, default=20)
|
||||
ap.add_argument("--limit", type=int, default=None,
|
||||
help="Stop after N new signatures analysed.")
|
||||
ap.add_argument("--resume", action="store_true",
|
||||
help="Reuse existing analysis from --out if present.")
|
||||
ap.add_argument("--checkpoint-every", type=int, default=25)
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.input.is_dir():
|
||||
print(f"error: {args.input} not a directory", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
started = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
|
||||
print(f"[init] scanning {args.input}")
|
||||
signatures, file_stats = collect_signatures(args.input, args.context)
|
||||
print(
|
||||
f"[init] {file_stats['files_scanned']} file(s), "
|
||||
f"{file_stats['log_lines_total']} log lines, "
|
||||
f"{file_stats['error_lines_total']} error lines, "
|
||||
f"{len(signatures)} unique signature(s)"
|
||||
)
|
||||
|
||||
existing = load_existing(args.out) if args.resume else {}
|
||||
if existing:
|
||||
print(f"[init] {len(existing)} signature(s) already analysed; resuming")
|
||||
|
||||
client = get_client()
|
||||
model = get_model()
|
||||
print(f"[init] qwen model={model}")
|
||||
|
||||
n_new = 0
|
||||
t0 = time.time()
|
||||
analyzed: list[dict[str, Any]] = []
|
||||
|
||||
# Process in occurrence_count desc so --limit N picks the most-impactful
|
||||
# signatures rather than whichever happened to scan first.
|
||||
for sig, rec in sorted(
|
||||
signatures.items(), key=lambda kv: -kv[1]["occurrence_count"]
|
||||
):
|
||||
if sig in existing:
|
||||
analyzed.append(existing[sig])
|
||||
continue
|
||||
if args.limit is not None and n_new >= args.limit:
|
||||
analyzed.append(rec) # keep raw record so it's not lost on resume
|
||||
continue
|
||||
try:
|
||||
llm = call_qwen(client, model, rec)
|
||||
rec["llm"] = llm
|
||||
except Exception as e:
|
||||
rec["llm"] = {"error": str(e)[:500]}
|
||||
print(f" [{n_new + 1}] LLM error on {sig}: {e}", file=sys.stderr)
|
||||
analyzed.append(rec)
|
||||
n_new += 1
|
||||
if n_new % args.checkpoint_every == 0:
|
||||
payload = {
|
||||
"meta": {
|
||||
"input_dir": str(args.input),
|
||||
**file_stats,
|
||||
"unique_signatures": len(signatures),
|
||||
"redacted": True,
|
||||
"qwen_model": model,
|
||||
"started": started,
|
||||
"checkpoint_at": dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds"),
|
||||
},
|
||||
"signatures": analyzed,
|
||||
"summary": summarise(analyzed),
|
||||
}
|
||||
atomic_write(args.out, payload)
|
||||
rate = n_new / max(time.time() - t0, 1e-3)
|
||||
print(f" [{n_new}] checkpoint @ {rate:.2f} sig/s")
|
||||
|
||||
finished = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
|
||||
payload = {
|
||||
"meta": {
|
||||
"input_dir": str(args.input),
|
||||
**file_stats,
|
||||
"unique_signatures": len(signatures),
|
||||
"redacted": True,
|
||||
"qwen_model": model,
|
||||
"started": started,
|
||||
"finished": finished,
|
||||
},
|
||||
"signatures": analyzed,
|
||||
"summary": summarise(analyzed),
|
||||
}
|
||||
atomic_write(args.out, payload)
|
||||
print(f"[done] {n_new} new, {len(analyzed)} total -> {args.out}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
777
tools/pz-analyzer/pz_parser.py
Normal file
777
tools/pz-analyzer/pz_parser.py
Normal file
@@ -0,0 +1,777 @@
|
||||
"""
|
||||
pz_parser.py — Deterministic Project Zomboid log parser.
|
||||
|
||||
Pure module (no I/O beyond reading the path it is handed). Walks a redacted
|
||||
DebugLog-server*.txt file, extracts errors/warnings, attributes each to a mod
|
||||
where evidence allows, classifies by kind, and computes deterministic
|
||||
signatures. Output records are designed to be `dataclasses.asdict()`-ready
|
||||
for direct JSON serialisation.
|
||||
|
||||
Pipeline phases (per design spec at
|
||||
docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md):
|
||||
|
||||
1. Severity-prefix recognition (ERROR|SEVERE|WARN)
|
||||
2. Bidirectional stack collection (pre-stack walk back, post-stack walk forward)
|
||||
3. Mod attribution (direct, inferred, unattributed)
|
||||
4. File:line extraction (five fallbacks)
|
||||
5. Cause-chain extraction (Caused by: chains + standalone exception lines)
|
||||
6. Java exception kind detection
|
||||
7. Engine-noise tagging
|
||||
8. Signature computation (pattern_id + signature)
|
||||
9. Aggregation (dedup on signature)
|
||||
|
||||
Style notes mirror sibling tool pz_error_analysis.py: type hints with built-in
|
||||
generics, `from __future__ import annotations`, regex precompilation as
|
||||
module-level constants, stdlib-only.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import pathlib
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tunable constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Lookback window (in raw file lines) for inferred mod attribution.
|
||||
INFERRED_LOOKBACK_LINES: int = 40
|
||||
#: Maximum frames retained per record after pre+post stack merge.
|
||||
MAX_STACK_FRAMES: int = 8
|
||||
#: Maximum lines walked in each direction during bidirectional stack collection.
|
||||
STACK_WALK_LINES: int = 25
|
||||
#: Maximum cause-chain depth retained.
|
||||
MAX_CAUSE_CHAIN_LEVELS: int = 6
|
||||
#: Truncation length for the normalised first line that feeds pattern_id.
|
||||
PATTERN_ID_FIRST_LINE_MAX: int = 200
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Line-shape regexes (parsing)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: PZ DebugLog entry header.
|
||||
#: Example: ``[16-04-26 00:01:19.080] ERROR: General f:0, t:1, st:1,2,3,4> body``
|
||||
ENTRY_RE = re.compile(
|
||||
r"^\[(?P<ts>\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
|
||||
r"(?P<level>[A-Z]+)\s*:\s*(?P<rest>.*)$"
|
||||
)
|
||||
|
||||
#: Strips the "General f:N, t:N, st:N,N,N,N>" prefix from a body line.
|
||||
SESSION_META_RE = re.compile(
|
||||
r"^[A-Za-z][A-Za-z0-9]*\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Severity-prefix recognition (phase 1)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Severity tokens that flag a body line as an error/warning event when they
|
||||
#: appear at the start of body text. Per spec: broader than the existing
|
||||
#: pz_error_analysis.py regex (adds SEVERE for Java util-logging).
|
||||
SEVERITY_BODY_RE = re.compile(r"^\s*(ERROR|SEVERE|WARN)\s*[:\s]")
|
||||
#: Bracketed-level tokens that map to severity events.
|
||||
SEVERITY_LEVELS: tuple[str, ...] = ("ERROR", "WARN", "SEVERE", "FATAL")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stack-frame recognition (phase 2)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Markers that identify a line as stack-shaped. Used to gate pre/post stack
|
||||
#: collection so we don't latch onto non-stack continuation text.
|
||||
STACK_HINT_RE = re.compile(
|
||||
r"(?:\bat\s+\S+|\[string\s+\"|function:\s|file:\s|\.lua\b)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Mod attribution (phase 3)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Direct attribution marker: ``Lua((MOD:<name>))``.
|
||||
LUA_MOD_MARKER_RE = re.compile(r"Lua\(\(MOD:([^)]+)\)\)")
|
||||
#: Direct attribution: ``require("X") failed`` shape.
|
||||
REQUIRE_FAILED_RE = re.compile(
|
||||
r"""require\s*\(\s*["']([^"']+)["']\s*\)\s+failed""",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: Direct attribution: explicit ``needed by <mod>`` hint.
|
||||
NEEDED_BY_RE = re.compile(r"needed\s+by\s+([A-Za-z0-9_'\- ]+?)(?:[,.]|$)", re.IGNORECASE)
|
||||
|
||||
#: Patterns that flag a body as "Lua-shaped" — gating filter for inferred
|
||||
#: attribution. Mirrors the spec's enumeration.
|
||||
LUA_SHAPED_PATTERNS: tuple[re.Pattern[str], ...] = (
|
||||
re.compile(r"luamanager\.getfunctionobject", re.IGNORECASE),
|
||||
re.compile(r"no\s+such\s+function", re.IGNORECASE),
|
||||
re.compile(r"exception\s+thrown", re.IGNORECASE),
|
||||
re.compile(r"runtimeexception", re.IGNORECASE),
|
||||
re.compile(r"illegalstateexception", re.IGNORECASE),
|
||||
re.compile(r"\blua\b", re.IGNORECASE),
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# File:line extraction (phase 4) — five fallbacks tried in order
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: 1. ``at <path>.lua:<n>`` — typical Lua stack frame.
|
||||
FILE_LINE_AT_RE = re.compile(r"\bat\s+([^\s:]+\.lua):(\d+)")
|
||||
#: 2. ``function: ... file: <path>.lua line #<n>`` (or `: <n>`).
|
||||
FILE_LINE_FUNCTION_RE = re.compile(
|
||||
r"function:\s*[^,]*?file:\s*([^\s,]+\.lua)\s+line\s*(?:#|:)\s*(\d+)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: 3. ``[string "<path>.lua"]:<n>`` — Lua VM source string.
|
||||
FILE_LINE_STRING_RE = re.compile(r"""\[string\s+["']([^"']+\.lua)["']\]:(\d+)""")
|
||||
#: 4. quoted path ending in a known extension; line # optional.
|
||||
FILE_LINE_QUOTED_RE = re.compile(
|
||||
r"""["']([^"']+\.(?:lua|txt|xml|json|ini|cfg|bin))["'](?::(\d+))?"""
|
||||
)
|
||||
#: 5. unquoted path segment beginning with a recognised root.
|
||||
FILE_LINE_UNQUOTED_RE = re.compile(
|
||||
r"\b((?:media|maps|lua|scripts)/[\w./\-]+\.(?:lua|txt|xml|json|ini|cfg|bin))(?::(\d+))?"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cause-chain extraction (phase 5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: ``Caused by: <ExceptionClass>: <msg>`` (msg optional).
|
||||
CAUSED_BY_RE = re.compile(
|
||||
r"Caused\s+by:\s+((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?\s*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
#: Standalone Java exception line: ``com.foo.BarException: msg``.
|
||||
EXCEPTION_LINE_RE = re.compile(
|
||||
r"((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?(?=\s+at\s|\s*$)"
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Engine-noise tagging (phase 7)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
ENGINE_NOISE_PATTERNS: tuple[re.Pattern[str], ...] = (
|
||||
re.compile(r"kahluathread\.flusherrormessage", re.IGNORECASE),
|
||||
re.compile(r"dumping\s+lua\s+stack\s+trace", re.IGNORECASE),
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Signature normalisation (phase 8)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
|
||||
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
|
||||
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
|
||||
WS_RUN_RE = re.compile(r"\s+")
|
||||
#: Strips a leading ``ERROR:`` / ``SEVERE:`` / ``WARN:`` / ``FATAL:`` token
|
||||
#: from a body line so a body that happens to begin with the severity word
|
||||
#: hashes to the same pattern_id as the bracketed-only variant. Matches the
|
||||
#: token plus any colon and trailing whitespace; case-insensitive.
|
||||
SEVERITY_PREFIX_STRIP_RE = re.compile(
|
||||
r"^\s*(?:ERROR|SEVERE|WARN|FATAL)\s*[:\s]\s*", re.IGNORECASE
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dataclasses — match the JSON keys the spec mandates so consumers can
|
||||
# `dataclasses.asdict(record)` straight to JSON.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class Entry:
|
||||
"""One parsed log entry. Continuation lines (TAB-indented or otherwise
|
||||
non-header lines) are folded into ``body``. Phase-2 stack collection
|
||||
walks neighbouring entries (not raw lines), so no extra context is
|
||||
stored here.
|
||||
"""
|
||||
|
||||
timestamp: str
|
||||
level: str
|
||||
body: list[str]
|
||||
line_start: int
|
||||
line_end: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class FirstSeen:
|
||||
"""Provenance for the first occurrence of a deduped record."""
|
||||
|
||||
file: str
|
||||
line: int
|
||||
timestamp: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Record:
|
||||
"""One classified, deduplicated error/warning record. Field names mirror
|
||||
the JSON output schema in the spec verbatim — this object is intended to
|
||||
be `dataclasses.asdict()`-ed straight into the output document.
|
||||
"""
|
||||
|
||||
signature: str
|
||||
pattern_id: str
|
||||
level: str
|
||||
kind: str
|
||||
mod_id: str
|
||||
mod_name: str
|
||||
attribution: str
|
||||
confidence: str
|
||||
attribution_reason: str
|
||||
file: str
|
||||
line: int
|
||||
cause_chain: str
|
||||
stack: list[str]
|
||||
first_seen: FirstSeen
|
||||
occurrence_count: int
|
||||
files: list[str]
|
||||
excerpt: str
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 0: file parse
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def parse_file(path: pathlib.Path) -> list[Entry]:
|
||||
"""Parse a DebugLog-server file into a list of multi-line entries.
|
||||
|
||||
Continuation lines (those not matching ENTRY_RE) append to the previous
|
||||
entry's body, mirroring codex's PatternParser behaviour for multi-line
|
||||
Java stack traces under an ERROR header.
|
||||
"""
|
||||
entries: list[Entry] = []
|
||||
current: Entry | None = None
|
||||
with path.open("r", encoding="utf-8", errors="replace") as f:
|
||||
for lineno, raw in enumerate(f, start=1):
|
||||
line = raw.rstrip("\n")
|
||||
m = ENTRY_RE.match(line)
|
||||
if m:
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
current = Entry(
|
||||
timestamp=m.group("ts"),
|
||||
level=m.group("level"),
|
||||
body=[m.group("rest")],
|
||||
line_start=lineno,
|
||||
line_end=lineno,
|
||||
)
|
||||
elif current is not None:
|
||||
current.body.append(line)
|
||||
current.line_end = lineno
|
||||
# else: orphan line at start of file (no preceding entry); ignore.
|
||||
if current is not None:
|
||||
entries.append(current)
|
||||
return entries
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 1: severity-prefix recognition
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def is_severity_entry(entry: Entry) -> bool:
|
||||
"""True if this entry is an ERROR/WARN/SEVERE/FATAL — either by the
|
||||
bracketed level or a leading SEVERE/ERROR/WARN token in the body (after
|
||||
stripping the session-meta prefix)."""
|
||||
if entry.level in SEVERITY_LEVELS:
|
||||
return True
|
||||
if entry.body and SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0])):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def effective_level(entry: Entry) -> str:
|
||||
"""Return the effective severity for an entry. Body-prefix takes
|
||||
precedence — covers the SEVERE-in-body case where bracketed level is LOG
|
||||
*and* the case where bracketed level is ERROR but body says SEVERE.
|
||||
"""
|
||||
if entry.body:
|
||||
m = SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0]))
|
||||
if m:
|
||||
return m.group(1).upper()
|
||||
return entry.level
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 2: bidirectional stack collection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _is_stack_shaped(line: str) -> bool:
|
||||
return bool(STACK_HINT_RE.search(line))
|
||||
|
||||
|
||||
def _strip_session_meta(body_line: str) -> str:
|
||||
"""Strip the ``General f:N, t:N, st:...> `` session-metadata prefix from
|
||||
a body's first line so pattern matching can run against the meaningful tail.
|
||||
"""
|
||||
return SESSION_META_RE.sub("", body_line)
|
||||
|
||||
|
||||
def _collect_pre_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Walk back through prior entries; collect stack-shaped lines from each
|
||||
entry's body. Stop at the previous severity-flagged entry. Cap collection
|
||||
at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines examined.
|
||||
Per spec, only return the block if at least one line looks stack-shaped.
|
||||
"""
|
||||
collected: list[str] = []
|
||||
lines_examined = 0
|
||||
for j in range(hit_idx - 1, -1, -1):
|
||||
prior = entries[j]
|
||||
# Stop at another severity line (the previous error's boundary).
|
||||
if is_severity_entry(prior):
|
||||
break
|
||||
# Walk this entry's body in reverse; for body[0] the session-meta
|
||||
# prefix is part of the line — strip it before stack-shape check.
|
||||
for k in range(len(prior.body) - 1, -1, -1):
|
||||
line = prior.body[k]
|
||||
stripped = _strip_session_meta(line) if k == 0 else line
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(stripped):
|
||||
collected.append(stripped.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
break
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
break
|
||||
if len(collected) >= MAX_STACK_FRAMES or lines_examined >= STACK_WALK_LINES:
|
||||
break
|
||||
if not collected:
|
||||
return []
|
||||
collected.reverse() # restore source order
|
||||
return collected
|
||||
|
||||
|
||||
def _collect_post_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Look at the entry's own body continuation lines first (stack frames
|
||||
attached to the ERROR header become continuation lines after parsing),
|
||||
then walk forward through subsequent entries. Stop at the next severity
|
||||
entry. Cap at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines."""
|
||||
entry = entries[hit_idx]
|
||||
collected: list[str] = []
|
||||
lines_examined = 0
|
||||
# Body continuations (skip body[0] which is the headline itself).
|
||||
for line in entry.body[1:]:
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(line):
|
||||
collected.append(line.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
return collected
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
return collected
|
||||
for j in range(hit_idx + 1, len(entries)):
|
||||
next_entry = entries[j]
|
||||
if is_severity_entry(next_entry):
|
||||
break
|
||||
for k, line in enumerate(next_entry.body):
|
||||
stripped = _strip_session_meta(line) if k == 0 else line
|
||||
lines_examined += 1
|
||||
if _is_stack_shaped(stripped):
|
||||
collected.append(stripped.strip())
|
||||
if len(collected) >= MAX_STACK_FRAMES:
|
||||
return collected
|
||||
if lines_examined >= STACK_WALK_LINES:
|
||||
return collected
|
||||
return collected
|
||||
|
||||
|
||||
def collect_stack(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Merge pre + post stack, dedup preserving order, cap at MAX_STACK_FRAMES."""
|
||||
pre = _collect_pre_stack(entries, hit_idx)
|
||||
post = _collect_post_stack(entries, hit_idx)
|
||||
seen: set[str] = set()
|
||||
merged: list[str] = []
|
||||
for frame in pre + post:
|
||||
if frame in seen:
|
||||
continue
|
||||
seen.add(frame)
|
||||
merged.append(frame)
|
||||
if len(merged) >= MAX_STACK_FRAMES:
|
||||
break
|
||||
return merged
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 3: mod attribution
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _norm_mod_key(raw_name: str) -> str:
|
||||
"""Lowercase, strip spaces / apostrophes / hyphens. Used as mod_id."""
|
||||
s = raw_name.lower()
|
||||
for ch in (" ", "'", "-"):
|
||||
s = s.replace(ch, "")
|
||||
return s
|
||||
|
||||
|
||||
def _entry_text(entry: Entry) -> str:
|
||||
"""Whole-entry text (body + collected stack) for marker scanning."""
|
||||
return "\n".join(entry.body)
|
||||
|
||||
|
||||
def attribute_entry(entry: Entry, prior_lookback_lines: list[str]) -> tuple[str, str, str, str, str]:
|
||||
"""Determine ``(mod_id, mod_name, attribution, confidence, reason)``.
|
||||
|
||||
``prior_lookback_lines`` is the body lines from prior entries that fall
|
||||
within INFERRED_LOOKBACK_LINES raw-file-line distance from this entry's
|
||||
start, in source order. The list is scanned in reverse for the nearest
|
||||
``Lua((MOD:Y))`` marker when inferred attribution is being attempted.
|
||||
|
||||
Direct-attribution priority: Lua marker -> needed-by -> require-failed.
|
||||
|
||||
Rationale: ``needed by <mod>`` names the dependent mod (more semantically
|
||||
targeted) and is preferred over ``require("...") failed`` which only names
|
||||
the missing module path. ``Lua((MOD:...))`` is unambiguous and wins
|
||||
outright.
|
||||
"""
|
||||
text = _entry_text(entry)
|
||||
# 1. Direct via Lua((MOD:X)) — unambiguous; outranks every other signal.
|
||||
m = LUA_MOD_MARKER_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip()
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"direct",
|
||||
"high",
|
||||
"Lua((MOD:...)) marker on the entry itself",
|
||||
)
|
||||
# 2. Direct via "needed by <mod>"
|
||||
m = NEEDED_BY_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip().rstrip(".,;")
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"direct",
|
||||
"high",
|
||||
"needed by <mod> hint",
|
||||
)
|
||||
# 3. Direct via require("X") failed — attribute to required module name.
|
||||
m = REQUIRE_FAILED_RE.search(text)
|
||||
if m:
|
||||
raw = m.group(1).strip()
|
||||
# Mod-name first segment (PZ paths often look like Mod/Foo/Bar).
|
||||
mod_name = raw.split("/")[0] if "/" in raw else raw
|
||||
return (
|
||||
_norm_mod_key(mod_name),
|
||||
mod_name,
|
||||
"direct",
|
||||
"high",
|
||||
'require("...") failed shape',
|
||||
)
|
||||
# 4. Inferred — Lua-shaped body + recent Lua((MOD:Y)) within lookback.
|
||||
if any(p.search(text) for p in LUA_SHAPED_PATTERNS):
|
||||
for line in reversed(prior_lookback_lines):
|
||||
mm = LUA_MOD_MARKER_RE.search(line)
|
||||
if mm:
|
||||
raw = mm.group(1).strip()
|
||||
return (
|
||||
_norm_mod_key(raw),
|
||||
raw,
|
||||
"inferred",
|
||||
"medium",
|
||||
f"Lua-shaped body; nearest Lua((MOD:{raw})) within "
|
||||
f"{INFERRED_LOOKBACK_LINES}-line lookback",
|
||||
)
|
||||
return (
|
||||
"__unattributed__",
|
||||
"",
|
||||
"unattributed",
|
||||
"low",
|
||||
"no marker; body not Lua-shaped or no recent Lua((MOD:...))",
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 4: file:line extraction (five fallbacks, in order)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def extract_file_line(text: str) -> tuple[str, int]:
|
||||
"""Run the five fallbacks in order. Returns ``(file, line)`` with line=0
|
||||
when only a path was matched."""
|
||||
m = FILE_LINE_AT_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_FUNCTION_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_STRING_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2))
|
||||
m = FILE_LINE_QUOTED_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2)) if m.group(2) else 0
|
||||
m = FILE_LINE_UNQUOTED_RE.search(text)
|
||||
if m:
|
||||
return m.group(1), int(m.group(2)) if m.group(2) else 0
|
||||
return "", 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 5: cause-chain extraction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def extract_cause_chain(text: str) -> str:
|
||||
"""Return ``ExceptionA: msg -> ExceptionB: msg`` joined chain, deduped,
|
||||
capped at MAX_CAUSE_CHAIN_LEVELS levels.
|
||||
"""
|
||||
tokens: list[str] = []
|
||||
seen: set[str] = set()
|
||||
for line in text.splitlines():
|
||||
cb = CAUSED_BY_RE.search(line)
|
||||
if cb:
|
||||
cls = cb.group(1)
|
||||
msg = cb.group(2) or ""
|
||||
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
|
||||
if tok not in seen:
|
||||
seen.add(tok)
|
||||
tokens.append(tok)
|
||||
continue
|
||||
ex = EXCEPTION_LINE_RE.search(line)
|
||||
if ex:
|
||||
cls = ex.group(1)
|
||||
msg = ex.group(2) or ""
|
||||
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
|
||||
if tok not in seen:
|
||||
seen.add(tok)
|
||||
tokens.append(tok)
|
||||
if len(tokens) >= MAX_CAUSE_CHAIN_LEVELS:
|
||||
break
|
||||
return " -> ".join(tokens[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 6: Java exception kind detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
JAVA_EXCEPTION_RE = re.compile(r"(?:\w+\.)+\w+(?:Exception|Error)\b")
|
||||
|
||||
|
||||
def detect_kind(entry: Entry, attribution: str, body_text: str) -> str:
|
||||
"""Determine the ``kind`` field. Order: engine_noise > require_failed >
|
||||
java_exception > lua_runtime > runtime."""
|
||||
# Phase 7 short-circuit (engine noise outranks others per spec — engine
|
||||
# noise is PZ's own diagnostic chatter regardless of class).
|
||||
if any(p.search(body_text) for p in ENGINE_NOISE_PATTERNS):
|
||||
return "engine_noise"
|
||||
if REQUIRE_FAILED_RE.search(body_text):
|
||||
return "require_failed"
|
||||
has_java = bool(JAVA_EXCEPTION_RE.search(body_text))
|
||||
has_lua_marker = bool(LUA_MOD_MARKER_RE.search(body_text))
|
||||
if has_java and not has_lua_marker:
|
||||
return "java_exception"
|
||||
# Lua-attributed runtime / inferred
|
||||
if has_lua_marker or attribution in ("direct", "inferred"):
|
||||
return "lua_runtime"
|
||||
return "runtime"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Phase 8: signature computation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def normalize_first_line(first: str) -> str:
|
||||
"""Per spec: strip session metadata prefix, strip any leading severity
|
||||
word (so ``SEVERE: foo`` and ``foo`` produce the same pattern_id when both
|
||||
are SEVERE-level), flatten quoted strings to ``"<S>"`` / ``'<S>'``, flatten
|
||||
≥2-digit numeric runs to ``<N>``, collapse whitespace, truncate to 200
|
||||
chars.
|
||||
"""
|
||||
s = first.strip()
|
||||
s = SESSION_META_RE.sub("", s)
|
||||
# Strip any leading ERROR:/SEVERE:/WARN:/FATAL: that survived in the body
|
||||
# — the bracketed level already feeds pattern_id separately, so leaving
|
||||
# the body-prefix in place would fragment signatures across "body has
|
||||
# SEVERE: prefix" vs "body has no prefix but bracketed level is SEVERE."
|
||||
s = SEVERITY_PREFIX_STRIP_RE.sub("", s)
|
||||
s = DOUBLE_QUOTED_RE.sub('"<S>"', s)
|
||||
s = SINGLE_QUOTED_RE.sub("'<S>'", s)
|
||||
s = NUMERIC_RUN_RE.sub("<N>", s)
|
||||
s = WS_RUN_RE.sub(" ", s)
|
||||
return s[:PATTERN_ID_FIRST_LINE_MAX]
|
||||
|
||||
|
||||
def compute_pattern_id(level: str, first_line: str) -> str:
|
||||
"""``sha256(level + normalized_first_line)[:16]``, prefixed ``sha256:``.
|
||||
|
||||
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
|
||||
trade-off; consumers treat as opaque.
|
||||
"""
|
||||
norm = normalize_first_line(first_line)
|
||||
h = hashlib.sha256(f"{level}\n{norm}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
def compute_signature(pattern_id: str, mod_id: str) -> str:
|
||||
"""``sha256(pattern_id + mod_id)[:16]``, prefixed ``sha256:``.
|
||||
|
||||
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
|
||||
trade-off; consumers treat as opaque.
|
||||
"""
|
||||
h = hashlib.sha256(f"{pattern_id}\n{mod_id}".encode("utf-8")).hexdigest()
|
||||
return f"sha256:{h[:16]}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Aggregation (phase 9) and the public classify_entries entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
|
||||
_ATTRIBUTION_RANK: dict[str, int] = {
|
||||
"unattributed": 0,
|
||||
"inferred": 1,
|
||||
"direct": 2,
|
||||
}
|
||||
|
||||
|
||||
def _build_excerpt(entry: Entry, max_chars: int = 1000) -> str:
|
||||
"""Best-effort one-block excerpt of the entry (header + continuations)."""
|
||||
lines: list[str] = []
|
||||
header = f'[{entry.timestamp}] {entry.level}: '
|
||||
if entry.body:
|
||||
lines.append(header + entry.body[0])
|
||||
for cont in entry.body[1:]:
|
||||
lines.append(cont)
|
||||
text = "\n".join(lines)
|
||||
if len(text) > max_chars:
|
||||
text = text[:max_chars] + "\n... [truncated]"
|
||||
return text
|
||||
|
||||
|
||||
def _build_lookback_window(entries: list[Entry], hit_idx: int) -> list[str]:
|
||||
"""Collect body lines from prior entries whose ``line_start`` falls within
|
||||
INFERRED_LOOKBACK_LINES raw-file-line distance from the current entry.
|
||||
|
||||
Spec wording is "within the previous 40 lines", measured in raw file lines
|
||||
(mirrors pzmm's ``(i - last_mod_line) <= 40``, inclusive of 40). Counting
|
||||
raw lines means a multi-line entry (e.g., a 5-line Java stack trace) does
|
||||
not shrink the practical window the way a body-line budget would.
|
||||
|
||||
Returned list is in source order (oldest first) so callers can call
|
||||
``reversed()`` on it.
|
||||
"""
|
||||
if hit_idx <= 0:
|
||||
return []
|
||||
threshold = entries[hit_idx].line_start - INFERRED_LOOKBACK_LINES
|
||||
in_window: list[Entry] = []
|
||||
for j in range(hit_idx - 1, -1, -1):
|
||||
prior = entries[j]
|
||||
if prior.line_start < threshold:
|
||||
break
|
||||
in_window.append(prior)
|
||||
# We accumulated newest-first; reverse so we emit in source order.
|
||||
in_window.reverse()
|
||||
collected: list[str] = []
|
||||
for prior in in_window:
|
||||
collected.extend(prior.body)
|
||||
return collected
|
||||
|
||||
|
||||
def classify_entries(entries: list[Entry], source_file: str = "") -> list[Record]:
|
||||
"""Apply phases 1-9 to a parsed-file entry list. Returns one Record per
|
||||
unique (mod_id, error_shape) pair after dedup on signature.
|
||||
"""
|
||||
by_signature: dict[str, Record] = {}
|
||||
for hit_idx, entry in enumerate(entries):
|
||||
if not is_severity_entry(entry):
|
||||
continue
|
||||
level = effective_level(entry)
|
||||
body_text = _entry_text(entry)
|
||||
# Phase 2: stack collection
|
||||
stack = collect_stack(entries, hit_idx)
|
||||
# Phase 3: attribution (with INFERRED_LOOKBACK_LINES lookback)
|
||||
prior_window = _build_lookback_window(entries, hit_idx)
|
||||
mod_id, mod_name, attribution, confidence, attribution_reason = attribute_entry(
|
||||
entry, prior_window
|
||||
)
|
||||
# Phase 4: file:line extraction (search body + stack frames)
|
||||
search_text = body_text + "\n" + "\n".join(stack)
|
||||
file_path, line_no = extract_file_line(search_text)
|
||||
# Phase 5: cause-chain extraction
|
||||
cause_chain = extract_cause_chain(search_text)
|
||||
# Phase 6 & 7: kind detection (engine_noise short-circuits)
|
||||
kind = detect_kind(entry, attribution, body_text)
|
||||
# Phase 8: signature computation
|
||||
pattern_id = compute_pattern_id(level, entry.body[0] if entry.body else "")
|
||||
signature = compute_signature(pattern_id, mod_id)
|
||||
# Phase 9: dedup & aggregate
|
||||
if signature not in by_signature:
|
||||
by_signature[signature] = Record(
|
||||
signature=signature,
|
||||
pattern_id=pattern_id,
|
||||
level=level,
|
||||
kind=kind,
|
||||
mod_id=mod_id,
|
||||
mod_name=mod_name,
|
||||
attribution=attribution,
|
||||
confidence=confidence,
|
||||
attribution_reason=attribution_reason,
|
||||
file=file_path,
|
||||
line=line_no,
|
||||
cause_chain=cause_chain,
|
||||
stack=list(stack),
|
||||
first_seen=FirstSeen(
|
||||
file=source_file,
|
||||
line=entry.line_start,
|
||||
timestamp=entry.timestamp,
|
||||
),
|
||||
occurrence_count=1,
|
||||
files=[source_file] if source_file else [],
|
||||
excerpt=_build_excerpt(entry),
|
||||
)
|
||||
else:
|
||||
rec = by_signature[signature]
|
||||
rec.occurrence_count += 1
|
||||
if source_file and source_file not in rec.files:
|
||||
rec.files.append(source_file)
|
||||
# Promote attribution / confidence if this hit is stronger.
|
||||
if _ATTRIBUTION_RANK[attribution] > _ATTRIBUTION_RANK[rec.attribution]:
|
||||
rec.attribution = attribution
|
||||
rec.attribution_reason = attribution_reason
|
||||
if mod_name:
|
||||
rec.mod_name = mod_name
|
||||
if _CONFIDENCE_RANK[confidence] > _CONFIDENCE_RANK[rec.confidence]:
|
||||
rec.confidence = confidence
|
||||
# Merge stack frames (preserving order, capped).
|
||||
for frame in stack:
|
||||
if frame not in rec.stack and len(rec.stack) < MAX_STACK_FRAMES:
|
||||
rec.stack.append(frame)
|
||||
# Extend cause chain if the new hit has additional segments.
|
||||
if cause_chain and cause_chain != rec.cause_chain:
|
||||
# Concatenate unseen tokens.
|
||||
old = rec.cause_chain.split(" -> ") if rec.cause_chain else []
|
||||
new = cause_chain.split(" -> ")
|
||||
merged = list(old)
|
||||
for tok in new:
|
||||
if tok and tok not in merged:
|
||||
merged.append(tok)
|
||||
rec.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
|
||||
return list(by_signature.values())
|
||||
|
||||
|
||||
__all__ = [
|
||||
"Entry",
|
||||
"FirstSeen",
|
||||
"Record",
|
||||
"parse_file",
|
||||
"classify_entries",
|
||||
"is_severity_entry",
|
||||
"effective_level",
|
||||
"collect_stack",
|
||||
"attribute_entry",
|
||||
"extract_file_line",
|
||||
"extract_cause_chain",
|
||||
"detect_kind",
|
||||
"normalize_first_line",
|
||||
"compute_pattern_id",
|
||||
"compute_signature",
|
||||
"INFERRED_LOOKBACK_LINES",
|
||||
"MAX_STACK_FRAMES",
|
||||
"STACK_WALK_LINES",
|
||||
"MAX_CAUSE_CHAIN_LEVELS",
|
||||
"SEVERITY_LEVELS",
|
||||
]
|
||||
36
tools/pz-analyzer/pz_redact_all.sh
Executable file
36
tools/pz-analyzer/pz_redact_all.sh
Executable file
@@ -0,0 +1,36 @@
|
||||
#!/usr/bin/env bash
|
||||
# One-shot PII redaction over the PZ DebugLog-server files extracted from
|
||||
# /opt/ik-codex/Logs.zip. Produces /opt/ik-codex/.scratch/pz/Logs.redacted/
|
||||
# (gitignored alongside the source). Single Docker invocation; the codex
|
||||
# library's vendor/autoload.php is mounted read-write only because composer's
|
||||
# image refuses world-readable mounts under -u UID:GID.
|
||||
#
|
||||
# Re-runnable: rewrites every output file. Add --refresh-cache semantics by
|
||||
# rm -rf'ing the OUT directory first if you want.
|
||||
set -euo pipefail
|
||||
|
||||
IN=/opt/ik-codex/.scratch/pz/Logs
|
||||
OUT=/opt/ik-codex/.scratch/pz/Logs.redacted
|
||||
|
||||
if [ ! -d "$IN" ]; then
|
||||
echo "error: input directory $IN missing — extract Logs.zip first" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$OUT"
|
||||
|
||||
docker run --rm \
|
||||
--entrypoint php \
|
||||
-v /opt/ik-codex:/app -w /app \
|
||||
-v "$IN":/in:ro -v "$OUT":/out \
|
||||
-u "$(id -u):$(id -g)" \
|
||||
composer:latest \
|
||||
-r '
|
||||
require "vendor/autoload.php";
|
||||
$r = new IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor();
|
||||
$files = glob("/in/*DebugLog-server*.txt");
|
||||
foreach ($files as $f) {
|
||||
file_put_contents("/out/" . basename($f), $r->redact(file_get_contents($f)));
|
||||
}
|
||||
fprintf(STDERR, "redacted %d file(s)\n", count($files));
|
||||
'
|
||||
0
tools/pz-analyzer/tests/__init__.py
Normal file
0
tools/pz-analyzer/tests/__init__.py
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_cause_chain.txt
vendored
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_cause_chain.txt
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:04:00.000] ERROR: General f:0, t:1776297840000, st:48,648,355,178> Lua((MOD:Test Mod Alpha)) wrapper failure
|
||||
java.lang.RuntimeException: outer wrapper at zombie.Foo(Foo.java:10)
|
||||
Caused by: java.lang.IllegalStateException: middle layer
|
||||
Caused by: java.lang.NullPointerException: deepest cause
|
||||
at zombie.Bar(Bar.java:99)
|
||||
[16-04-26 00:04:01.000] LOG : General f:0, t:1776297841000, st:48,648,356,178> after.
|
||||
8
tools/pz-analyzer/tests/fixtures/fixture_dedup.txt
vendored
Normal file
8
tools/pz-analyzer/tests/fixtures/fixture_dedup.txt
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod Alpha)) crash 1
|
||||
at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.
|
||||
0
tools/pz-analyzer/tests/fixtures/fixture_empty.txt
vendored
Normal file
0
tools/pz-analyzer/tests/fixtures/fixture_empty.txt
vendored
Normal file
4
tools/pz-analyzer/tests/fixtures/fixture_engine_noise.txt
vendored
Normal file
4
tools/pz-analyzer/tests/fixtures/fixture_engine_noise.txt
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:03:00.000] ERROR: General f:0, t:1776297780000, st:48,648,295,178> KahluaThread.flusherrormessage> dumping lua stack trace
|
||||
at media/lua/client/Foo.lua:1
|
||||
[16-04-26 00:03:01.000] LOG : General f:0, t:1776297781000, st:48,648,296,178> after.
|
||||
10
tools/pz-analyzer/tests/fixtures/fixture_file_line_fallbacks.txt
vendored
Normal file
10
tools/pz-analyzer/tests/fixtures/fixture_file_line_fallbacks.txt
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod A)) format1
|
||||
at media/lua/client/F1.lua:11
|
||||
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod B)) format2
|
||||
function: doStuff -- file: media/lua/client/F2.lua line # 22
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod C)) format3
|
||||
[string "media/lua/client/F3.lua"]:33: bang
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod D)) format4 about "media/lua/client/F4.lua" failure
|
||||
[16-04-26 00:01:04.000] ERROR: General f:0, t:1776297664000, st:48,648,179,178> Lua((MOD:Test Mod E)) format5 path media/lua/client/F5.lua mention
|
||||
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> ok.
|
||||
7
tools/pz-analyzer/tests/fixtures/fixture_inferred.txt
vendored
Normal file
7
tools/pz-analyzer/tests/fixtures/fixture_inferred.txt
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> another log line.
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> LuaManager.GetFunctionObject> no such function: doStuff
|
||||
at media/lua/client/Spongie.lua:7
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
8
tools/pz-analyzer/tests/fixtures/fixture_java_exception.txt
vendored
Normal file
8
tools/pz-analyzer/tests/fixtures/fixture_java_exception.txt
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297679080, st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
|
||||
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
|
||||
Stack trace:
|
||||
at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
|
||||
at java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
|
||||
at java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
|
||||
[16-04-26 00:01:19.090] LOG : General f:0, t:1776297679090, st:48,648,194,268> after.
|
||||
45
tools/pz-analyzer/tests/fixtures/fixture_lookback_boundary.txt
vendored
Normal file
45
tools/pz-analyzer/tests/fixtures/fixture_lookback_boundary.txt
vendored
Normal file
@@ -0,0 +1,45 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Distant)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> filler 1.
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> filler 2.
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> filler 3.
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> filler 4.
|
||||
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> filler 5.
|
||||
[16-04-26 00:01:06.000] LOG : General f:0, t:1776297666000, st:48,648,181,178> filler 6.
|
||||
[16-04-26 00:01:07.000] LOG : General f:0, t:1776297667000, st:48,648,182,178> filler 7.
|
||||
[16-04-26 00:01:08.000] LOG : General f:0, t:1776297668000, st:48,648,183,178> filler 8.
|
||||
[16-04-26 00:01:09.000] LOG : General f:0, t:1776297669000, st:48,648,184,178> filler 9.
|
||||
[16-04-26 00:01:10.000] LOG : General f:0, t:1776297670000, st:48,648,185,178> filler 10.
|
||||
[16-04-26 00:01:11.000] LOG : General f:0, t:1776297671000, st:48,648,186,178> filler 11.
|
||||
[16-04-26 00:01:12.000] LOG : General f:0, t:1776297672000, st:48,648,187,178> filler 12.
|
||||
[16-04-26 00:01:13.000] LOG : General f:0, t:1776297673000, st:48,648,188,178> filler 13.
|
||||
[16-04-26 00:01:14.000] LOG : General f:0, t:1776297674000, st:48,648,189,178> filler 14.
|
||||
[16-04-26 00:01:15.000] LOG : General f:0, t:1776297675000, st:48,648,190,178> filler 15.
|
||||
[16-04-26 00:01:16.000] LOG : General f:0, t:1776297676000, st:48,648,191,178> filler 16.
|
||||
[16-04-26 00:01:17.000] LOG : General f:0, t:1776297677000, st:48,648,192,178> filler 17.
|
||||
[16-04-26 00:01:18.000] LOG : General f:0, t:1776297678000, st:48,648,193,178> filler 18.
|
||||
[16-04-26 00:01:19.000] LOG : General f:0, t:1776297679000, st:48,648,194,178> filler 19.
|
||||
[16-04-26 00:01:20.000] LOG : General f:0, t:1776297680000, st:48,648,195,178> filler 20.
|
||||
[16-04-26 00:01:21.000] LOG : General f:0, t:1776297681000, st:48,648,196,178> filler 21.
|
||||
[16-04-26 00:01:22.000] LOG : General f:0, t:1776297682000, st:48,648,197,178> filler 22.
|
||||
[16-04-26 00:01:23.000] LOG : General f:0, t:1776297683000, st:48,648,198,178> filler 23.
|
||||
[16-04-26 00:01:24.000] LOG : General f:0, t:1776297684000, st:48,648,199,178> filler 24.
|
||||
[16-04-26 00:01:25.000] LOG : General f:0, t:1776297685000, st:48,648,200,178> filler 25.
|
||||
[16-04-26 00:01:26.000] LOG : General f:0, t:1776297686000, st:48,648,201,178> filler 26.
|
||||
[16-04-26 00:01:27.000] LOG : General f:0, t:1776297687000, st:48,648,202,178> filler 27.
|
||||
[16-04-26 00:01:28.000] LOG : General f:0, t:1776297688000, st:48,648,203,178> filler 28.
|
||||
[16-04-26 00:01:29.000] LOG : General f:0, t:1776297689000, st:48,648,204,178> filler 29.
|
||||
[16-04-26 00:01:30.000] LOG : General f:0, t:1776297690000, st:48,648,205,178> filler 30.
|
||||
[16-04-26 00:01:31.000] LOG : General f:0, t:1776297691000, st:48,648,206,178> filler 31.
|
||||
[16-04-26 00:01:32.000] LOG : General f:0, t:1776297692000, st:48,648,207,178> filler 32.
|
||||
[16-04-26 00:01:33.000] LOG : General f:0, t:1776297693000, st:48,648,208,178> filler 33.
|
||||
[16-04-26 00:01:34.000] LOG : General f:0, t:1776297694000, st:48,648,209,178> filler 34.
|
||||
[16-04-26 00:01:35.000] LOG : General f:0, t:1776297695000, st:48,648,210,178> filler 35.
|
||||
[16-04-26 00:01:36.000] LOG : General f:0, t:1776297696000, st:48,648,211,178> filler 36.
|
||||
[16-04-26 00:01:37.000] LOG : General f:0, t:1776297697000, st:48,648,212,178> filler 37.
|
||||
[16-04-26 00:01:38.000] LOG : General f:0, t:1776297698000, st:48,648,213,178> filler 38.
|
||||
[16-04-26 00:01:39.000] LOG : General f:0, t:1776297699000, st:48,648,214,178> filler 39.
|
||||
[16-04-26 00:01:40.000] LOG : General f:0, t:1776297700000, st:48,648,215,178> filler 40.
|
||||
[16-04-26 00:01:41.000] LOG : General f:0, t:1776297701000, st:48,648,216,178> filler 41.
|
||||
[16-04-26 00:01:42.000] ERROR: General f:0, t:1776297702000, st:48,648,217,178> LuaManager.GetFunctionObject> no such function (way past lookback)
|
||||
[16-04-26 00:01:43.000] LOG : General f:0, t:1776297703000, st:48,648,218,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_lua_attributed.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_lua_attributed.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:19.131] LOG : Mod f:0, t:1776297679131, st:48,648,194,309> loading example_mod_alpha.
|
||||
[16-04-26 00:05:00.000] ERROR: General f:0, t:1776297900000, st:48,648,415,178> Lua((MOD:Test Mod Alpha)) something broke
|
||||
at media/lua/client/Foo.lua:42
|
||||
function: doStuff -- file: media/lua/client/Foo.lua line # 42
|
||||
[16-04-26 00:05:01.000] LOG : General f:0, t:1776297901000, st:48,648,416,178> after the error.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_no_errors.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_no_errors.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> ordinary line.
|
||||
[16-04-26 00:02:00.000] LOG : General f:0, t:1776297720000, st:48,648,235,178> nothing wrong.
|
||||
5
tools/pz-analyzer/tests/fixtures/fixture_non_lua_no_inferred.txt
vendored
Normal file
5
tools/pz-analyzer/tests/fixtures/fixture_non_lua_no_inferred.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Disk full while writing chunk data
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_post_stack.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_post_stack.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash now
|
||||
at media/lua/client/X.lua:11
|
||||
at media/lua/client/Y.lua:22
|
||||
[string "media/lua/client/Z.lua"]:33: oops
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
6
tools/pz-analyzer/tests/fixtures/fixture_pre_stack.txt
vendored
Normal file
6
tools/pz-analyzer/tests/fixtures/fixture_pre_stack.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> at media/lua/client/A.lua:11
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> at media/lua/client/B.lua:22
|
||||
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> [string "media/lua/client/C.lua"]:33: oops
|
||||
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod Alpha)) crash
|
||||
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_require_failed.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_require_failed.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> require("DependencyMod/Foo") failed: needed by Test Mod Alpha
|
||||
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ok.
|
||||
5
tools/pz-analyzer/tests/fixtures/fixture_severity_variants.txt
vendored
Normal file
5
tools/pz-analyzer/tests/fixtures/fixture_severity_variants.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> ERROR: top-level error message
|
||||
[16-04-26 00:01:01.000] WARN : General f:0, t:1776297661000, st:48,648,176,178> WARN: top-level warn message
|
||||
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> SEVERE: java-style severe message at zombie.Foo(Foo.java:5)
|
||||
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.
|
||||
3
tools/pz-analyzer/tests/fixtures/fixture_unattributed.txt
vendored
Normal file
3
tools/pz-analyzer/tests/fixtures/fixture_unattributed.txt
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
|
||||
[16-04-26 00:02:00.000] WARN : General f:0, t:1776297720000, st:48,648,235,178> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
|
||||
[16-04-26 00:02:01.000] LOG : General f:0, t:1776297721000, st:48,648,236,178> after.
|
||||
225
tools/pz-analyzer/tests/test_attribution.py
Normal file
225
tools/pz-analyzer/tests/test_attribution.py
Normal file
@@ -0,0 +1,225 @@
|
||||
"""Tests for pz_parser phase 3 — mod attribution."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
|
||||
|
||||
|
||||
def fixture(name: str) -> pathlib.Path:
|
||||
return FIXTURE_DIR / name
|
||||
|
||||
|
||||
class AttributionBucketTests(unittest.TestCase):
|
||||
"""Three confidence buckets: direct (high), inferred (medium),
|
||||
unattributed (low)."""
|
||||
|
||||
def test_direct_attribution_when_lua_marker_on_entry(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="la.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "direct")
|
||||
self.assertEqual(rec.confidence, "high")
|
||||
# mod_id is normalised: lowercase, no spaces / apostrophes / hyphens.
|
||||
self.assertEqual(rec.mod_id, "testmodalpha")
|
||||
self.assertEqual(rec.mod_name, "Test Mod Alpha")
|
||||
|
||||
def test_inferred_attribution_within_lookback_window(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_inferred.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="in.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "inferred")
|
||||
self.assertEqual(rec.confidence, "medium")
|
||||
self.assertEqual(rec.mod_id, "spongiesclothing")
|
||||
|
||||
def test_unattributed_when_no_marker_and_not_lua_shaped(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="ua.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
self.assertEqual(rec.confidence, "low")
|
||||
self.assertEqual(rec.mod_id, "__unattributed__")
|
||||
|
||||
|
||||
class LookbackBoundaryTests(unittest.TestCase):
|
||||
"""Phase 3 — 40-line inferred-attribution window boundary."""
|
||||
|
||||
def test_lua_marker_beyond_lookback_does_not_attribute(self) -> None:
|
||||
# Fixture places the Lua((MOD:...)) >40 lines before the ERROR.
|
||||
entries = pz_parser.parse_file(fixture("fixture_lookback_boundary.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="lb.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# The Lua-shaped ERROR is far enough back to be unattributed.
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
self.assertEqual(rec.mod_id, "__unattributed__")
|
||||
|
||||
def test_non_lua_shaped_body_rejects_inferred_attribution(self) -> None:
|
||||
# Recent Lua((MOD:Spongies Clothing)) emitted, but the ERROR body
|
||||
# ("Disk full while writing chunk data") isn't Lua-shaped.
|
||||
entries = pz_parser.parse_file(fixture("fixture_non_lua_no_inferred.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="nl.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertEqual(rec.attribution, "unattributed")
|
||||
|
||||
|
||||
class NeededByTests(unittest.TestCase):
|
||||
"""Phase 3 — direct attribution via "needed by <mod>" hint."""
|
||||
|
||||
def test_needed_by_extracts_dependent_mod(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="rf.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# "needed by Test Mod Alpha" should set the mod to Test Mod Alpha
|
||||
# (preferred over the require("...") side which would mention
|
||||
# DependencyMod). Either way we want direct/high.
|
||||
self.assertEqual(rec.attribution, "direct")
|
||||
self.assertEqual(rec.confidence, "high")
|
||||
# The "needed by" branch is checked before the require() branch in
|
||||
# the priority order; mod_id should reflect Test Mod Alpha.
|
||||
self.assertEqual(rec.mod_id, "testmodalpha")
|
||||
|
||||
|
||||
def _make_marker_line(idx: int) -> str:
|
||||
"""Synthesise a single LOG-level entry containing a Lua((MOD:...)) marker."""
|
||||
# Vary timestamps so the bracketed prefix is unique-ish; not strictly
|
||||
# required — they only feed Entry.timestamp, not parsing.
|
||||
return (
|
||||
f"[16-04-26 00:00:{idx:02d}.000] LOG : General f:0, "
|
||||
f"t:1776297642{idx:03d}, st:48,648,157,434> "
|
||||
"Lua((MOD:Test Mod Alpha)) initialised."
|
||||
)
|
||||
|
||||
|
||||
def _make_filler_line(idx: int) -> str:
|
||||
"""A plain LOG-level entry with no marker; one raw line."""
|
||||
return (
|
||||
f"[16-04-26 00:01:{idx % 60:02d}.000] LOG : General f:0, "
|
||||
f"t:177629760{idx:04d}, st:48,648,200,178> filler entry {idx}."
|
||||
)
|
||||
|
||||
|
||||
def _make_error_line() -> str:
|
||||
"""A Lua-shaped ERROR with no Lua((MOD:...)) marker on the entry itself
|
||||
— so attribution must come from the lookback window if it comes at all."""
|
||||
return (
|
||||
"[16-04-26 00:02:00.000] ERROR: General f:0, "
|
||||
"t:1776297900000, st:48,648,300,178> "
|
||||
"LuaManager.GetFunctionObject> no such function: doStuff"
|
||||
)
|
||||
|
||||
|
||||
class RawLineLookbackTests(unittest.TestCase):
|
||||
"""Phase 3 — lookback semantics measure raw file lines, not body-line
|
||||
budgets. Multi-line entries inside the window must not shrink the
|
||||
practical reach."""
|
||||
|
||||
def _write_fixture(self, name: str, lines: list[str]) -> pathlib.Path:
|
||||
path = FIXTURE_DIR / name
|
||||
path.write_text("\n".join(lines) + "\n")
|
||||
return path
|
||||
|
||||
def test_marker_exactly_at_lookback_boundary_attributes(self) -> None:
|
||||
# Marker on line 1, ERROR on line 41 -> raw-line distance = 40
|
||||
# (inclusive of INFERRED_LOOKBACK_LINES=40 -> still attributed).
|
||||
lines = [_make_marker_line(0)]
|
||||
for i in range(1, 40):
|
||||
lines.append(_make_filler_line(i))
|
||||
lines.append(_make_error_line()) # line 41 in the fixture
|
||||
path = self._write_fixture("_rawline_at_boundary.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
self.assertEqual(entries[-1].line_start, 41)
|
||||
records = pz_parser.classify_entries(entries, source_file="b1.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].attribution, "inferred")
|
||||
self.assertEqual(records[0].mod_id, "testmodalpha")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
def test_marker_one_line_past_boundary_does_not_attribute(self) -> None:
|
||||
# Marker on line 1, ERROR on line 42 -> raw-line distance = 41
|
||||
# (just outside INFERRED_LOOKBACK_LINES -> unattributed).
|
||||
lines = [_make_marker_line(0)]
|
||||
for i in range(1, 41):
|
||||
lines.append(_make_filler_line(i))
|
||||
lines.append(_make_error_line()) # line 42 in the fixture
|
||||
path = self._write_fixture("_rawline_past_boundary.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
self.assertEqual(entries[-1].line_start, 42)
|
||||
records = pz_parser.classify_entries(entries, source_file="b2.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].attribution, "unattributed")
|
||||
self.assertEqual(records[0].mod_id, "__unattributed__")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
def test_multiline_entry_does_not_shrink_practical_lookback(self) -> None:
|
||||
"""Multi-line entries inside the lookback window do not break
|
||||
attribution. (Old body-line-budget and new raw-line-distance semantics
|
||||
happen to be equivalent on contiguous PZ entries; this test locks the
|
||||
post-fix semantic against future regression to a budget that *would*
|
||||
differ — e.g. a body-line cap with a smaller value.)
|
||||
"""
|
||||
# Layout the file so a multi-line entry sits between marker and ERROR.
|
||||
# The marker on line 1 is within 40 raw lines of the ERROR even though
|
||||
# the file has a 6-line multi-line entry in between.
|
||||
lines = [_make_marker_line(0)] # raw line 1: marker entry
|
||||
# Single-line fillers on raw lines 2..30 (29 entries).
|
||||
for i in range(1, 30):
|
||||
lines.append(_make_filler_line(i))
|
||||
# Multi-line entry: header on raw line 31, 5 continuations on lines
|
||||
# 32..36 (Java-stack-trace shape).
|
||||
lines.append(
|
||||
"[16-04-26 00:01:30.000] LOG : General f:0, "
|
||||
"t:1776297930000, st:48,648,200,178> stack trace dump"
|
||||
)
|
||||
for k in range(5):
|
||||
lines.append(f"\tat zombie.SomeClass.method{k}(SomeClass.java:{k + 1})")
|
||||
# Single-line fillers on raw lines 37..40 (4 entries).
|
||||
for i in range(30, 34):
|
||||
lines.append(_make_filler_line(i))
|
||||
# ERROR at raw line 41 -> N - 1 = 40 -> within window.
|
||||
lines.append(_make_error_line())
|
||||
path = self._write_fixture("_rawline_multiline.txt", lines)
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
# Sanity-check the layout: first entry at line 1, multi-line entry
|
||||
# sits at line 31 with 6 body lines (header + 5 continuations),
|
||||
# ERROR at line 41.
|
||||
self.assertEqual(entries[0].line_start, 1)
|
||||
multi = next(
|
||||
e for e in entries
|
||||
if e.line_start == 31 and len(e.body) == 6
|
||||
)
|
||||
self.assertEqual(multi.line_end, 36)
|
||||
self.assertEqual(entries[-1].line_start, 41)
|
||||
records = pz_parser.classify_entries(entries, source_file="ml.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
# Raw-line-distance semantics: the marker on line 1 is 40 raw
|
||||
# lines from the ERROR on line 41, so attribution holds. (Old
|
||||
# body-line-budget would also pass here on contiguous entries;
|
||||
# this assertion locks the post-fix behavior against future
|
||||
# regression to a tighter cap.)
|
||||
self.assertEqual(records[0].attribution, "inferred")
|
||||
self.assertEqual(records[0].mod_id, "testmodalpha")
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
199
tools/pz-analyzer/tests/test_parser.py
Normal file
199
tools/pz-analyzer/tests/test_parser.py
Normal file
@@ -0,0 +1,199 @@
|
||||
"""Tests for pz_parser parsing pipeline (phases 1, 2, 4-7, 9)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
# Make the parser module importable when running via `python -m unittest
|
||||
# discover -s tools/pz-analyzer/tests`.
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
|
||||
|
||||
|
||||
def fixture(name: str) -> pathlib.Path:
|
||||
return FIXTURE_DIR / name
|
||||
|
||||
|
||||
class ParseFileTests(unittest.TestCase):
|
||||
"""Phase 0 — basic line-shape recognition and continuation folding."""
|
||||
|
||||
def test_parse_file_groups_continuations_under_entry(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
|
||||
# 3 bracketed entries; the ERROR has 4 continuation lines.
|
||||
self.assertEqual(len(entries), 3)
|
||||
error_entry = entries[1]
|
||||
self.assertEqual(error_entry.level, "ERROR")
|
||||
self.assertGreater(len(error_entry.body), 1)
|
||||
# First continuation should be the java exception line.
|
||||
self.assertIn("NoSuchFileException", error_entry.body[1])
|
||||
|
||||
def test_parse_file_handles_empty_file(self) -> None:
|
||||
self.assertEqual(pz_parser.parse_file(fixture("fixture_empty.txt")), [])
|
||||
|
||||
def test_parse_file_handles_no_errors(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
|
||||
self.assertEqual(len(entries), 3)
|
||||
self.assertTrue(all(e.level == "LOG" for e in entries))
|
||||
|
||||
|
||||
class SeverityRecognitionTests(unittest.TestCase):
|
||||
"""Phase 1 — ERROR / WARN / SEVERE recognition."""
|
||||
|
||||
def test_classify_picks_up_error_warn_and_severe(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_severity_variants.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="severity.txt")
|
||||
levels = sorted({r.level for r in records})
|
||||
# Spec accepts ERROR / WARN / SEVERE. The third entry has bracketed
|
||||
# ERROR but body starts with SEVERE: ; effective_level should be SEVERE.
|
||||
self.assertIn("ERROR", levels)
|
||||
self.assertIn("WARN", levels)
|
||||
self.assertIn("SEVERE", levels)
|
||||
|
||||
def test_log_lines_are_ignored(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="x.txt")
|
||||
self.assertEqual(records, [])
|
||||
|
||||
|
||||
class StackCollectionTests(unittest.TestCase):
|
||||
"""Phase 2 — bidirectional stack collection."""
|
||||
|
||||
def test_pre_stack_walk_picks_up_preceding_lua_frames(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_pre_stack.txt"))
|
||||
# The ERROR entry is the 5th LOG-bracketed line; its predecessors are
|
||||
# LOG-bracketed entries whose bodies are stack-shaped lines.
|
||||
records = pz_parser.classify_entries(entries, source_file="pre.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
# Pre-stack walk should pick up at least the "at media/lua/.../A.lua:11" frame.
|
||||
self.assertTrue(any("A.lua:11" in f for f in rec.stack))
|
||||
|
||||
def test_post_stack_collected_from_entry_body_continuations(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_post_stack.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="post.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
rec = records[0]
|
||||
self.assertTrue(any("X.lua:11" in f for f in rec.stack))
|
||||
self.assertTrue(any("Y.lua:22" in f for f in rec.stack))
|
||||
# Lua [string "..."]:N form preserves quoting in the captured frame.
|
||||
self.assertTrue(any("Z.lua" in f and ":33" in f for f in rec.stack))
|
||||
|
||||
def test_stack_capped_at_eight_frames(self) -> None:
|
||||
# Synthesise an ERROR with many continuation frames.
|
||||
lines = ["[16-04-26 00:00:42.314] ERROR: General f:0, t:1, st:1,2,3,4> Lua((MOD:Test Mod Alpha)) crash"]
|
||||
for i in range(20):
|
||||
lines.append(f"\tat media/lua/client/F{i}.lua:{i + 1}")
|
||||
path = FIXTURE_DIR / "_runtime_stack_cap.txt"
|
||||
path.write_text("\n".join(lines) + "\n")
|
||||
try:
|
||||
entries = pz_parser.parse_file(path)
|
||||
records = pz_parser.classify_entries(entries, source_file="cap.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertLessEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
|
||||
# And it should be exactly MAX_STACK_FRAMES given >MAX inputs.
|
||||
self.assertEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
|
||||
finally:
|
||||
path.unlink()
|
||||
|
||||
|
||||
class FileLineExtractionTests(unittest.TestCase):
|
||||
"""Phase 4 — five-fallback file:line extraction."""
|
||||
|
||||
def test_each_fallback_form_extracts_path(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_file_line_fallbacks.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="ff.txt")
|
||||
# 5 distinct ERRORs, distinct mods — should produce 5 records.
|
||||
files = sorted(r.file for r in records)
|
||||
self.assertEqual(
|
||||
files,
|
||||
sorted([
|
||||
"media/lua/client/F1.lua",
|
||||
"media/lua/client/F2.lua",
|
||||
"media/lua/client/F3.lua",
|
||||
"media/lua/client/F4.lua",
|
||||
"media/lua/client/F5.lua",
|
||||
]),
|
||||
)
|
||||
|
||||
def test_quoted_path_without_line_number_yields_zero(self) -> None:
|
||||
# Format 4 fixture line lacks a :NN suffix on the quoted path.
|
||||
file_path, line_no = pz_parser.extract_file_line(
|
||||
'failure about "media/lua/client/F4.lua" tail'
|
||||
)
|
||||
self.assertEqual(file_path, "media/lua/client/F4.lua")
|
||||
self.assertEqual(line_no, 0)
|
||||
|
||||
|
||||
class CauseChainTests(unittest.TestCase):
|
||||
"""Phase 5 — Caused-by chain unwinding."""
|
||||
|
||||
def test_caused_by_chain_renders_with_arrow_separator(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_cause_chain.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="cc.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
chain = records[0].cause_chain
|
||||
self.assertIn("RuntimeException", chain)
|
||||
self.assertIn("IllegalStateException", chain)
|
||||
self.assertIn("NullPointerException", chain)
|
||||
# Order preserved (outer -> inner).
|
||||
idx_runtime = chain.index("RuntimeException")
|
||||
idx_illegal = chain.index("IllegalStateException")
|
||||
idx_null = chain.index("NullPointerException")
|
||||
self.assertLess(idx_runtime, idx_illegal)
|
||||
self.assertLess(idx_illegal, idx_null)
|
||||
|
||||
def test_no_cause_chain_when_no_exceptions(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="u.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].cause_chain, "")
|
||||
|
||||
|
||||
class KindDetectionTests(unittest.TestCase):
|
||||
"""Phases 6 & 7 — kind classification."""
|
||||
|
||||
def test_java_exception_kind_when_no_lua_marker(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="je.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "java_exception")
|
||||
# Java engine errors should resolve to __unattributed__.
|
||||
self.assertEqual(records[0].mod_id, "__unattributed__")
|
||||
|
||||
def test_engine_noise_kind_for_kahluathread(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_engine_noise.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="en.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "engine_noise")
|
||||
|
||||
def test_lua_runtime_kind_for_attributed_lua_error(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="la.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "lua_runtime")
|
||||
|
||||
def test_require_failed_kind(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="rf.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].kind, "require_failed")
|
||||
|
||||
|
||||
class AggregationTests(unittest.TestCase):
|
||||
"""Phase 9 — dedup, occurrence_count, files-set growth."""
|
||||
|
||||
def test_three_identical_errors_dedup_to_one_record(self) -> None:
|
||||
entries = pz_parser.parse_file(fixture("fixture_dedup.txt"))
|
||||
records = pz_parser.classify_entries(entries, source_file="dd.txt")
|
||||
self.assertEqual(len(records), 1)
|
||||
self.assertEqual(records[0].occurrence_count, 3)
|
||||
# files list shouldn't duplicate "dd.txt".
|
||||
self.assertEqual(records[0].files, ["dd.txt"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
91
tools/pz-analyzer/tests/test_signatures.py
Normal file
91
tools/pz-analyzer/tests/test_signatures.py
Normal file
@@ -0,0 +1,91 @@
|
||||
"""Tests for pz_parser phase 8 — signature computation."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pathlib
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
|
||||
|
||||
import pz_parser # noqa: E402
|
||||
|
||||
|
||||
class PatternIdStabilityTests(unittest.TestCase):
|
||||
"""pattern_id should be invariant under formatting variations."""
|
||||
|
||||
def test_pattern_id_collapses_numeric_runs(self) -> None:
|
||||
a = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"General f:0, t:1776297642, st:48,648,157,434> failed at offset 12345",
|
||||
)
|
||||
b = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"General f:0, t:9999999999, st:99,99,99,99> failed at offset 99999",
|
||||
)
|
||||
self.assertEqual(a, b)
|
||||
|
||||
def test_pattern_id_collapses_quoted_strings_and_whitespace(self) -> None:
|
||||
a = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
'no such function "doStuff" in module',
|
||||
)
|
||||
b = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
'no such function "fooBarBaz" in module',
|
||||
)
|
||||
# Whitespace-collapse plus quoted-string-flatten => same pattern_id.
|
||||
self.assertEqual(a, b)
|
||||
|
||||
def test_pattern_id_changes_with_level(self) -> None:
|
||||
a = pz_parser.compute_pattern_id("ERROR", "exception thrown")
|
||||
b = pz_parser.compute_pattern_id("WARN", "exception thrown")
|
||||
self.assertNotEqual(a, b)
|
||||
|
||||
|
||||
class SignatureUniquenessTests(unittest.TestCase):
|
||||
"""signature should fan out across mods sharing a pattern_id."""
|
||||
|
||||
def test_signature_unique_per_mod_for_shared_pattern(self) -> None:
|
||||
# Same first line, different mod_ids — different signatures, same pattern_id.
|
||||
pat = pz_parser.compute_pattern_id("ERROR", "Lua((MOD:X)) crash")
|
||||
sig_a = pz_parser.compute_signature(pat, "spongiesclothing")
|
||||
sig_b = pz_parser.compute_signature(pat, "testmodalpha")
|
||||
self.assertNotEqual(sig_a, sig_b)
|
||||
# Both should share their pattern_id (consumer's pattern-fanout view).
|
||||
self.assertEqual(pat[:7], "sha256:")
|
||||
|
||||
|
||||
class SeverityPrefixStripTests(unittest.TestCase):
|
||||
"""A body line that begins with a literal severity word (``SEVERE:``,
|
||||
``ERROR:``, ``WARN:``, ``FATAL:``) should not fragment pattern_id away
|
||||
from the otherwise-identical body that lacks the prefix. The bracketed
|
||||
level already feeds pattern_id; the prefix is redundant and varies in
|
||||
practice."""
|
||||
|
||||
def test_pattern_id_invariant_under_body_prefix_severe(self) -> None:
|
||||
# Same logical error: one line carries ``SEVERE: `` body prefix, the
|
||||
# other doesn't. Both classified as SEVERE by their bracketed level.
|
||||
with_prefix = pz_parser.compute_pattern_id(
|
||||
"SEVERE",
|
||||
"SEVERE: foo at zombie.X(File.java:42)",
|
||||
)
|
||||
without_prefix = pz_parser.compute_pattern_id(
|
||||
"SEVERE",
|
||||
"foo at zombie.X(File.java:42)",
|
||||
)
|
||||
self.assertEqual(with_prefix, without_prefix)
|
||||
|
||||
def test_pattern_id_invariant_under_body_prefix_error(self) -> None:
|
||||
with_prefix = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"ERROR: doStuff failed in module",
|
||||
)
|
||||
without_prefix = pz_parser.compute_pattern_id(
|
||||
"ERROR",
|
||||
"doStuff failed in module",
|
||||
)
|
||||
self.assertEqual(with_prefix, without_prefix)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user