17 Commits

Author SHA1 Message Date
656142dbf8 docs: cut v0.3.0 in CHANGELOG
Some checks failed
Tests / Run tests on PHP v8.4 (push) Failing after 1s
Tests / Run tests on PHP v8.5 (push) Failing after 1s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:04:37 +00:00
c63adb06c4 Merge branch 'pz42x-line-regex'
Some checks failed
Tests / Run tests on PHP v8.4 (push) Failing after 2s
Tests / Run tests on PHP v8.5 (push) Failing after 1s
Fixes DebugServerPattern::LINE so PZ build 42.x logs (which dropped
the per-line `t:` microsecond field) parse with proper level/prefix
attribution. Without the fix every B42 entry fell through as level
INFO and ServerExceptionProblem / ModMissingProblem silently failed
to fire, leaving B42 log views with at most a single
EngineVersionInformation badge and no Problems panel. Backwards
compatible with B41 format; ProjectZomboidServerLogTest now runs
parameterised against both shapes via #[DataProvider].
2026-05-06 13:33:43 +00:00
0d18cfbfc6 fix: relax DebugServerPattern::LINE for PZ B42 log format
PZ build 42.x dropped the per-line `t:` (microsecond) field and
tightened the spacing between `f:N`, `t:N`, and `st:N,N,N,N>` markers.
The hardcoded `f:\d+,\s+t:\d+,\s+st:` requirement caused every B42
line to fail the parser's LINE regex, leaving ServerLog entries
without their level/prefix and silently disabling
ServerExceptionProblem and ModMissingProblem (the anchorless
EngineVersionInformation still fired against the joined entry text,
which is why the symptom was "one Information, no Problems").

Make `t:N,` optional via `(?:,\s+t:\d+)?` and the comma between
`f:N` and `st:` optional via `,?`. The B41 format remains a strict
match. Add `debug-server-42x-minimal.txt` mirroring the existing
synthetic fixture in the new format, and parameterise
ProjectZomboidServerLogTest with a #[DataProvider] so all four
parser-shape assertions now run against both formats. Spot-check:
analysers emit 3 Problems (2 exceptions, 1 missing mod) and 4
Information entries against the new fixture, identical to B41.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:33:35 +00:00
45a5e1a3da Merge branch 'feature/error-context-analyser'
Some checks failed
Tests / Run tests on PHP v8.4 (push) Failing after 1s
Tests / Run tests on PHP v8.5 (push) Failing after 1s
Adds a generic ErrorContextAnalyser under
src/Analyser/ProjectZomboid/ that walks Entry[] and emits one
ErrorContextProblem per ERROR or WARNING entry with up to 20
entries of before/after context. Overlapping windows clip so no
Entry appears in two context arrays; emission caps at 500 hits
with an ErrorContextTruncatedInformation note when reached.
2026-05-04 16:31:56 +00:00
6978175dff chore: track pre-production Qwen analyser and redactor wrapper
pz_redact_all.sh is the one-shot Docker wrapper that runs the PHP
ProjectZomboidRedactor over .scratch/pz/Logs/ and produces the
gitignored .scratch/pz/Logs.redacted/ directory consumed by both
the Qwen analyser and the deterministic classifier.

pz_error_analysis.py is the developer-facing Qwen-backed log
analyser: walks the redacted directory, dedupes signatures, and
asks the local sam-desktop Qwen endpoint to classify each unique
shape into a fixed taxonomy with title / cause / fix / confidence.
Runtime depends on the Qwen endpoint; the deterministic classifier
at pz_classify.py is the production-bound counterpart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:31:23 +00:00
3df6836909 feat: redact IPv4 and IPv6 addresses from PZ log content
Adds a fourth pass to ProjectZomboidRedactor that scrubs IPv4
(strict 0-255 octets, optional :port suffix) and IPv6 (full,
abbreviated, bracketed-with-port, IPv4-mapped) addresses, replacing
them with the literal [REDACTED_IP]. The new pass runs first
because it is pattern-disjoint from the Steam-ID -> name -> coords
chain. A single redactIpAddresses(bool) toggle controls both
families; the existing toggles are unchanged. Strict regexes plus
filter_var() validation prevent false positives on PZ timestamps
(12:00:00.000) and PHP/Java scope ops (Foo::bar). 20 new tests
cover bare/with-port/multiple/loopback/boundary IPv4, full /
abbreviated / bracketed / IPv4-mapped IPv6, scope-op rejection,
timestamp rejection, Steam-ID non-collision, toggle-off, and
idempotence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:31:10 +00:00
b6949ff0c3 docs: add downstream-consumers, release flow, feature-branch conventions
Captures iblogs as primary codex consumer with the call-site checklist
for cross-repo public-API changes; spells out the semver / changelog
cadence; documents the <feature>-bootstrap branch + --no-ff merge
pattern set by the redactor and iblogs-bootstrap branches; pins the
specs/plans path convention from the superpowers skills.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:30:50 +00:00
f1d2831d92 chore: gitignore Python bytecode caches and editor backup files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:30:14 +00:00
bb4ee0d16a Merge branch 'pz-classifier'
Adds a deterministic-only Project Zomboid log classifier under
tools/pz-analyzer/, parallel to the existing Qwen-based research
tool. pz_parser.py is a pure module (parsing, attribution, file:line,
cause-chain, kind detection, two-level signatures); pz_classify.py
walks the redacted DebugLog-server directory, merges cross-file by
signature, and writes the spec-shaped JSON. 32 unit tests.
2026-05-04 15:58:20 +00:00
58d0ef187b chore: declare SEVERITY_LEVELS in pz_parser.__all__
Constant was already imported by pz_classify; this just formalises
it as part of the public surface so __all__ matches actual usage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:55:02 +00:00
9cd898bc9f fix: route parent-directory creation through the JSON write try/except
Was leaking unhandled OSError tracebacks when the output's parent
path could not be created. Exit code stays 1; user-facing message
matches the existing write-failure path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:50:52 +00:00
87a0562bd6 feat: deterministic PZ log classifier orchestrator
Walks DebugLog-server*.txt under the redacted directory, runs the
parser per file, merges cross-file by signature, and emits the
spec-shaped JSON report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:43:15 +00:00
fdf70a0c06 docs: align lookback test purpose and spec normalization list
Honest test docstring (old/new semantics equivalent on contiguous
entries; test locks post-fix behavior against future regressions),
and add severity-prefix strip to the spec's normalization list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:39:44 +00:00
2e7bebc911 fix: address code review findings on pz_parser
- Strip body-prefix severity in normalize_first_line so pattern_id
  is stable across body-prefix vs bracketed-only variants.
- Lookback for inferred attribution now counts raw file lines
  (per spec literal), not body-line budget across entries.
- Document hash truncation (64-bit) and direct-attribution priority.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:33:56 +00:00
4fec3a58f6 feat: deterministic PZ log parser module + unit tests
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:18:41 +00:00
511583035b docs: add design spec for deterministic PZ log classifier
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:02:28 +00:00
e1a7785cf4 feat: add ErrorContextAnalyser for sliding-window error/warning surfacing
Walks Entry[] once and emits one ErrorContextProblem per ERROR or
WARNING entry, attaching up to 20 entries before and 20 after as
context. Overlapping windows clip the second hit's before- and
after-ranges so no Entry appears in two context arrays. Caps emission
at 500 hits and adds an ErrorContextTruncatedInformation when reached.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:29:52 +00:00
38 changed files with 3318 additions and 18 deletions

7
.gitignore vendored
View File

@@ -5,3 +5,10 @@ Logs.zip
.scratch/
.claude/
.claude.local.md
# Python bytecode caches from tools/pz-analyzer/.
__pycache__/
# Editor / manual backup files.
*.bak
*.bak-*

View File

@@ -6,6 +6,34 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and
## [Unreleased]
## [0.3.0] — 2026-05-04
Adds IP-address redaction to the PZ redactor, a new `ErrorContextAnalyser` for surrounding-context surfacing, the `tools/pz-analyzer/` Python toolset (pre-production Qwen-driven research analyser and production-bound deterministic classifier), and a parser fix for the PZ B42 log shape that was silently breaking level/prefix attribution since The Indie Stone dropped the per-line `t:` field. New public API surface across the redactor and the analyser-side classes makes this a minor bump rather than a patch.
### Added
- **IP redaction in `ProjectZomboidRedactor`** (`src/Util/ProjectZomboid/ProjectZomboidRedactor.php`) — fourth pass that scrubs IPv4 (strict 0-255 octets, optional `:port` suffix) and IPv6 (full, abbreviated, bracketed-with-port, IPv4-mapped) addresses, replacing them with the literal `[REDACTED_IP]`. New public API: `IP_REPLACEMENT`, `IPV4_REGEX`, `IPV6_REGEX` constants and a `redactIpAddresses(bool)` toggle (defaults on, mirroring the existing three category toggles). Pattern-disjoint from the Steam-ID → name → coordinates chain; runs first by convention. Strict regexes plus `filter_var()` validation prevent false positives on PZ timestamps and PHP / Java scope ops. 20 new unit tests across two files (`ProjectZomboidRedactorIpv4Test.php`, `ProjectZomboidRedactorIpv6Test.php`).
- **`ErrorContextAnalyser`** (`src/Analyser/ProjectZomboid/ErrorContextAnalyser.php`) — generic-purpose analyser that walks `Entry[]` once and emits one `ErrorContextProblem` per ERROR / WARNING entry with up to `CONTEXT_BEFORE` (20) entries of leading context and `CONTEXT_AFTER` (20) entries of trailing context. Overlapping windows clip to `lastEmittedIndex + 1` so no Entry appears in two context arrays; emission caps at `HIT_CAP` (500) with a single `ErrorContextTruncatedInformation` appended when reached. Standalone — not auto-registered to any existing Log subclass's `getDefaultAnalyser()`; consumers wire it in explicitly. Companion classes `ErrorContextProblem` and `ErrorContextTruncatedInformation` under `src/Analysis/ProjectZomboid/`. 3 unit tests, 134 assertions.
- **`tools/pz-analyzer/`** — Python toolset adjacent to the library (not part of the Composer package's autoload surface). `pz_redact_all.sh` is a one-shot Docker wrapper that runs the PHP redactor over `.scratch/pz/Logs/` and produces a gitignored `.scratch/pz/Logs.redacted/` directory. `pz_error_analysis.py` is a developer-facing Qwen-backed pre-production analyser that calls a local OpenAI-compatible endpoint to classify residual log shapes the deterministic side hasn't yet captured. `pz_parser.py` + `pz_classify.py` are the production-bound deterministic-only counterpart: pure parser module with mod attribution, file:line extraction, cause-chain unwinding, engine-noise tagging, and a two-level signature scheme (`pattern_id` + `signature`), plus a stdlib-only orchestrator that walks the redacted directory and emits a JSON report. 32 Python unit tests across three files, 16 synthetic fixtures.
- `docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md` — design contract for `pz_parser.py` / `pz_classify.py`. The PHP-side `ErrorContextAnalyser` ships without a separate spec; its design fell out of a brainstorming session inline with the pzmm-pattern-port discussion.
- New synthetic fixture `test/src/Games/ProjectZomboid/fixtures/debug-server-42x-minimal.txt` mirroring the existing B41 fixture in PZ B42 line shape.
### Changed
- **`DebugServerPattern::LINE` regex relaxed** to handle PZ build 42.x. The Indie Stone dropped the per-line `t:` (microsecond) field and tightened the spacing between `f:N`, `t:N`, and `st:N,N,N,N>` markers somewhere on the way to build 42.17. The previous regex required the full `f:\d+,\s+t:\d+,\s+st:` triplet and silently failed on every B42 line. Now `(?:,\s+t:\d+)?` makes the `t:N,` field optional and `,?` makes the inter-field comma optional. Backwards-compatible — every B41 line continues to parse identically. `ProjectZomboidServerLogTest` now runs each parser-shape assertion via `#[DataProvider]` against both fixtures.
- **Pass order in `ProjectZomboidRedactor::redact()`**: the new IP pass runs first, so the chain is now `IP → Steam ID → player name → coordinates`. The mandatory Steam ID → name → coordinates ordering is preserved; placement of the IP pass is by convention since its regexes are pattern-disjoint from the rest.
- **`CLAUDE.md`** documents `iblogs` as the primary downstream consumer with a per-component checklist for cross-repo public API impact; the release-flow cadence; the feature-branch workflow set by the `redactor` and `iblogs-bootstrap` precedents; and the `docs/superpowers/specs|plans/` path convention.
- **`.gitignore`** excludes `__pycache__/` (Python bytecode caches generated under `tools/pz-analyzer/`) and `*.bak` / `*.bak-*` (editor / manual backup files).
### Fixed
- PZ build 42.x server logs now parse with proper level / prefix attribution. Previously, every B42 line failed `DebugServerPattern::LINE` and the resulting ServerLog entries fell through as level `INFO` with no prefix. This silently disabled `ServerExceptionProblem` and `ModMissingProblem` (their regexes anchor on `[timestamp]...` at entry start, which a level-less orphan entry doesn't emit). The anchorless `EngineVersionInformation` continued to fire against the joined entry text, producing the user-visible symptom "one Information badge, empty Problems panel" on B42 logs. The fix restores per-line parsing, re-enables both Problem classes, and makes the error-count badge populate correctly.
### Test counts
- PHP suite: **287 tests, 654 assertions** (up from 260 / 492 at v0.2.0).
- Python suite under `tools/pz-analyzer/`: **32 tests** (stdlib `unittest`, sub-10 ms).
## [0.2.0] — 2026-05-01
Render-time PII redaction utility added on the same calendar day as v0.1.0. Cut as a minor version bump rather than a patch because it adds a new public API surface (`RedactorInterface` plus the per-game implementation), which under semver is a minor change, not a patch. Consumers (notably iblogs) pin to `^0.2.0` to opt into the redactor-aware version.
@@ -51,5 +79,6 @@ First public release. Codex is a generic PHP log parsing and analysis framework
- **Other game implementations** — `Minecraft`, `Hytale`, and `SevenDaysToDie` are detective-stub-only. Each has a TODO `<Game>Detective` extending base `Detective`; their per-component subdirectories under `Analyser`, `Log`, `Parser`, and `Pattern` contain only `.gitkeep` placeholders. Real implementations land if and when fixtures and demand exist.
- **Packagist publication** — v0.1.0 is consumable via Composer's `vcs` repository entry pointing at the Gitea remote. Pushing to Packagist is a separate decision and is not in scope for this release.
[0.3.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.3.0
[0.2.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.2.0
[0.1.0]: https://git.indifferentketchup.com/indifferentketchup/ik-codex/releases/tag/v0.1.0

View File

@@ -84,6 +84,10 @@ Scaffolded games: `Minecraft`, `Hytale`, `SevenDaysToDie` (stubs only — empty
At minimum: (1) entry count after `parse()` matches the synthetic fixture's line count, (2) one or more named-group `FIELDS` regexes from the `<Type>Pattern` class extract correctly from a representative line, (3) `Detective` handed the fixture path returns an instance of this Log class. Use `#[DataProvider]` when the same shape repeats per file.
### Downstream consumers
`iblogs` (sibling repo at `/opt/iblogs`, package `indifferentketchup/iblogs`, fork of `aternosorg/mclogs`) is the primary consumer of codex via a Composer `vcs` repository entry pinned to the latest minor tag. Public-API changes in `src/{Detective,Log,Printer,Util}/*.php` and `src/Analysis/*.php` propagate there; when modifying those types, sanity-check the iblogs call sites at `/opt/iblogs/src/{Detective.php,Log.php,Printer/Printer.php,Printer/FormatModification.php,Api/Response/CodexLogResponse.php}` and the stub class at `/opt/iblogs/src/Data/Deobfuscator.php`.
## Pitfalls
1. **`PatternParser` is incompatible with named regex groups.** PHP's `preg_match` returns named groups *plus* their numeric duplicates in the same array; `PatternParser`'s foreach iterates both and throws on the string-key entries. Convention: `LINE` regexes (used by the parser) use **unnamed** groups with field order documented in the Pattern class's docblock. Named groups are fine inside extractor regexes invoked from analysers, since `PatternAnalyser` hands the whole match array to `Insight::setMatches`.
@@ -97,6 +101,9 @@ At minimum: (1) entry count after `parse()` matches the synthetic fixture's line
- **One commit per concrete log type** when adding game support: pattern class + log subclass + synthetic fixture + test in a single commit, run `composer test`, then move on. `<Game>Detective::__construct()` wiring goes in its own follow-up commit once all log types are present.
- **Out-of-scope cleanup goes in its own commit.** Tempting workflow/lint fixes (e.g. deprecated CI syntax, comment hygiene) noticed mid-feature should not be folded in — separate commit or follow-up PR.
- **Pre-destructive checkpoint pattern.** Before bulk renames/moves: `git commit --allow-empty -m "pre-X checkpoint"` as a revert anchor. Skip the empty slot if it produces no diff at the end of a plan.
- **Release flow.** Semver: a new public API surface bumps the minor version, not the patch (`v0.1.x → v0.2.x`). Cut: rename `[Unreleased]` to `[X.Y.Z] — YYYY-MM-DD` in `CHANGELOG.md`, add a `[X.Y.Z]:` link reference at the bottom, fresh empty `[Unreleased]` above; lightweight `backup/pre-vX.Y.Z` tag (local only) before annotated `git tag -a vX.Y.Z`; push the annotated tag only.
- **Feature branches.** Substantive feature work lands on a `<feature>-bootstrap`-style branch off master with a `backup/pre-<feature>` lightweight tag at the branch start, merged `--no-ff` after user review. The `redactor` and `iblogs-bootstrap` branches set the precedent.
- **Specs and plans live at** `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md` and `docs/superpowers/plans/YYYY-MM-DD-<topic>.md` per the brainstorming and writing-plans skill conventions.
## Privacy / fixture rules

View File

@@ -0,0 +1,246 @@
# PZ deterministic classifier — design spec
> Drafted 2026-05-04. Status: design-approved, awaiting implementation plan.
> Sibling tool to the existing pre-production Qwen analyzer (`pz_error_analysis.py`), which is unaffected by this work.
## Summary
A new deterministic-only Project Zomboid log classifier that lives alongside the existing Qwen-based analyzer in `tools/pz-analyzer/`. Walks redacted `DebugLog-server*.txt` files, extracts errors/warnings, attributes each to a mod where evidence allows, classifies by kind, and emits a structured JSON report. **Zero AI dependency**: this is the artefact that informs the future PHP / iblogs production path.
The patterns it implements are inspired by `paraxaQQ/pzmm`'s `core/inspector.py` — Lua mod-marker attribution, multi-fallback file:line extraction, bidirectional stack collection, cause-chain unwinding, engine-noise tagging. Reimplemented originally; no code copied verbatim.
## Why a separate tool, not an edit of `pz_error_analysis.py`
Two artefacts, two purposes:
- `pz_error_analysis.py` (existing, untouched) — pre-production discovery tool. Sends residual log content to Qwen so the developer can see what categories the deterministic side hasn't yet captured.
- `pz_classify.py` (new) — production-bound deterministic classifier. Output is what an iblogs PHP port would eventually emit. Runs in seconds, no API dependency, no PII-going-to-LLM consideration.
Coexisting them lets the developer compare outputs and treat the LLM's residual output as the "deterministic to-do list."
## Scope
**In scope:**
- Two new files: `tools/pz-analyzer/pz_parser.py` (pure module) and `tools/pz-analyzer/pz_classify.py` (CLI orchestrator).
- Tests under `tools/pz-analyzer/tests/` with synthetic fixtures.
- Operates exclusively on the already-redacted directory produced by `pz_redact_all.sh` (`.scratch/pz/Logs.redacted/`).
**Out of scope:**
- Any modification to `pz_error_analysis.py`, `pz_redact_all.sh`, or PHP codex source.
- Filesystem-based mod-scan reattribution (pzmm's symbol-index, vehicle-index, file-path-ownership reattribution requires an actual mod folder we don't have on the server side).
- iblogs / bosslogs integration. The output schema is designed with that future port in mind, but no PHP code is written here.
- Generic AI tab patterns from pzmm's `core/ai.py`. Explicitly excluded.
## Architecture
```
redacted .txt files
|
v
+---------------------------+
| pz_classify.py | argparse · directory walk · aggregate · JSON write
| (orchestrator) |
+-------------+-------------+
|
v
+---------------------------+
| pz_parser.py | regexes · parse · classify · sign
| (pure module, no I/O |
| beyond reading the path |
| it is handed) |
+---------------------------+
```
Two files inside `tools/pz-analyzer/`:
- **`pz_parser.py`** — stateless. All regex constants, `parse_file(path) -> list[Entry]`, attribution helpers, file:line extractors, cause-chain extractor, signature computation. No `argparse`, no JSON writing, no directory walking. Unit-testable in isolation.
- **`pz_classify.py`** — entry point. CLI args, walks the redacted directory, calls `pz_parser`, aggregates records by signature, writes JSON, prints a one-line stats summary.
The split is deliberate: `pz_parser.py` is the module that eventually wants to be ported to PHP codex (separate spec). Keeping it pure makes that port mechanical and Python-side tests trivial.
## Parser pipeline phases
For each `*DebugLog-server*.txt`, the parser walks lines once and emits records via the following phases.
### 1. Severity-prefix recognition
Regex: `^\s*(ERROR|SEVERE|WARN)\s*[:\s]`. Broader than the existing `pz_error_analysis.py` regex — adds `SEVERE` (Java util-logging convention; appears in some PZ Java exception blocks). `LOG`/`INFO` is ignored at this layer.
### 2. Stack collection — bidirectional
Pzmm's contribution: PZ emits stack frames *before* the ERROR/WARN line as often as after.
- **Pre-stack**: walk up to 25 lines back from the severity line. Stop at another severity line or 8 collected. Only keep the block if at least one line looks stack-shaped (`at `, `[string ...]`, `function:`, `file:`, `.lua` markers).
- **Post-stack**: walk forward up to 25 lines, gated by engine-noise detection. Stop at another severity line or 8 collected.
- Merge deduped, preserving order; cap at 8 frames per record.
### 3. Mod attribution — three buckets
| Bucket | Trigger | Confidence |
|---|---|---|
| `direct` | Line itself matches `Lua\(\(MOD:([^)]+)\)\)` (or the `require("X") failed` shape, or an explicit `needed by <mod>` hint elsewhere in the entry) | `high` |
| `inferred` | No marker on this line, but body is Lua-shaped (see below) *and* a `Lua((MOD:Y))` was emitted within the previous 40 lines | `medium` |
| `unattributed` | Neither of the above | `low`; `mod_id = "__unattributed__"` |
"Lua-shaped" means the body matches at least one of (case-insensitive): `luamanager.getfunctionobject`, `no such function`, `exception thrown`, `runtimeexception`, `illegalstateexception`, or contains the bare token `lua`. This filter prevents inferred attribution from latching onto unrelated severity lines that happened to fall within the lookback window.
`mod_id` derives from the marker's raw name with a `_norm_mod_key` transform: lowercase, strip spaces / apostrophes / hyphens. `mod_name` preserves the human-readable form.
We do **not** attempt pzmm's filesystem-based reattribution.
### 4. File:line extraction — five fallbacks
Tried in order against the entry body and stack frames:
1. `at <path>.lua:<n>`
2. `function: ... file: <path>.lua line #<n>` (or `: <n>`)
3. `[string "<path>.lua"]:<n>`
4. quoted path ending in `.lua` / `.txt` / `.xml` / `.json` / `.ini` / `.cfg` / `.bin`
5. unquoted path segment beginning with `media/`, `maps/`, `lua/`, `scripts/`
Returns `(file, line)`; `line=0` if the matched form had no line number.
### 5. Cause-chain extraction
`Caused by: <X>` chains plus standalone exception lines (`(\w+\.)+\w+(Exception|Error): <msg>`) are normalised to `<ExceptionClass>: <msg>` tokens and joined with ` -> `. Up to 6 chain levels, deduped. Captures both Java exception nesting and Lua-wrapped exception chains.
### 6. Java exception kind detection
DebugLog-server has both Lua and Java exceptions; pzmm targets `console.txt` which is Lua-dominant. Extension here:
- `kind = "java_exception"` when the entry body or stack contains `(\w+\.)+\w+(Exception|Error)` AND no `Lua((MOD:X))` marker is present anywhere in the entry.
- These typically resolve to `mod_id: __unattributed__` because Java code in PZ is engine, not mod. The exception class name becomes part of the message skeleton so similar Java exceptions dedup tightly.
### 7. Engine-noise tagging
`kind = "engine_noise"` when the body contains `kahluathread.flusherrormessage` or `dumping lua stack trace`. These severity-ERROR lines are PZ's own diagnostic chatter about its error reporting, not actual errors. They stay in the output (consumer can filter on `kind`).
### 8. Signature computation
Two-level deterministic identity, both stored on every record:
```
pattern_id = sha256(level + normalized_first_line)[:16]
signature = sha256(pattern_id + mod_id)[:16]
```
Normalization for `pattern_id`:
- Strip session metadata prefix (`General f:N, t:N, st:N,N,N,N>` shape)
- Strip body-prefix severity token (`ERROR:` / `SEVERE:` / `WARN:` / `FATAL:`, case-insensitive) so a body that opens with the severity word still hashes the same as one that doesn't.
- Flatten double- and single-quoted strings to `"<S>"` / `'<S>'`
- Flatten ≥2-digit numeric runs to `<N>`
- Collapse whitespace
- Truncate to 200 chars
Both fields ride on every record. Two consumer views, neither requires LLM:
- **Per-mod view** (signature is the dedup key): one record per `(mod_id, error_shape)` pair.
- **Pattern fan-out view** (group records by `pattern_id`): see all mods that hit the same shape.
### 9. Aggregation
Records dedup on `signature`. On second-and-subsequent occurrences: `occurrence_count++`, `files` set-extends, attribution-confidence promotes (direct beats inferred beats unattributed), stack and `cause_chain` merge.
## Output schema
```json
{
"meta": {
"input_dir": "/opt/ik-codex/.scratch/pz/Logs.redacted",
"files_scanned": 6,
"log_lines_total": 78654,
"error_lines_total": 30984,
"unique_signatures": N,
"unique_patterns": M,
"redacted": true,
"started": "ISO8601",
"finished": "ISO8601"
},
"signatures": [
{
"signature": "sha256:...",
"pattern_id": "sha256:...",
"level": "ERROR",
"kind": "lua_runtime|require_failed|java_exception|engine_noise|runtime",
"mod_id": "spongies_clothing",
"mod_name": "Spongie's Clothing",
"attribution": "direct|inferred|unattributed",
"confidence": "high|medium|low",
"attribution_reason": "...",
"file": "media/lua/client/X.lua",
"line": 42,
"cause_chain": "ExceptionA: msg -> ExceptionB: msg",
"stack": ["at A.lua:12", "at B.lua:34"],
"first_seen": {"file": "...", "line": 1234, "timestamp": "26-04-26 17:14:35.128"},
"occurrence_count": 47,
"files": ["..."],
"excerpt": "..."
}
],
"summary": {
"errors": N,
"warnings": N,
"by_kind": {"lua_runtime": ..., "java_exception": ..., "require_failed": ..., "engine_noise": ..., "runtime": ...},
"by_attribution": {"direct": ..., "inferred": ..., "unattributed": ...},
"by_confidence": {"high": ..., "medium": ..., "low": ...},
"top_mods": [{"mod_id": "...", "mod_name": "...", "occurrence_count": N}, ...]
}
}
```
Default output path: `/opt/ik-codex/.scratch/pz/classify.json` (gitignored under `.scratch/`).
## CLI
```
pz_classify.py [--input <dir>] [--out <path>] [--quiet]
```
- `--input` defaults to `<repo>/.scratch/pz/Logs.redacted`
- `--out` defaults to `<repo>/.scratch/pz/classify.json`
- `--quiet` suppresses the trailing summary line
No `--limit`, `--resume`, or `--checkpoint-every`. Runs in seconds; nothing to throttle or resume.
## Tests
New directory `tools/pz-analyzer/tests/`. Stdlib `unittest`. Three files, ~18 tests total.
- **`test_parser.py`** (~10 tests) — one fixture per scenario in `tests/fixtures/` (synthetic, tracked in git): pure-Lua-attributed, pure-Java-exception, inferred-from-context, unattributed-engine-noise, multi-cause-chain, pre-stack-collection, post-stack-collection, severity-variants, file-line-extraction-fallbacks. All synthetic identifiers (placeholder Steam IDs / mod names) per the existing PHP-side `test/src/Games/ProjectZomboid/fixtures/` convention.
- **`test_attribution.py`** (~5 tests) — three confidence buckets, the 40-line lookback boundary, "needed by X" extraction, and the rejection of inferred attribution when the message isn't Lua-shaped.
- **`test_signatures.py`** (~3 tests) — `pattern_id` stability across formatting variations (whitespace, numeric values, quoted strings) and `signature` uniqueness across mods.
Invocation: `python -m unittest discover tools/pz-analyzer/tests/`. No external deps.
## Verification
End-to-end smoke against the redacted real-data directory:
```
bash /opt/ik-codex/tools/pz-analyzer/pz_redact_all.sh # one-time, already done
python /opt/ik-codex/tools/pz-analyzer/pz_classify.py
```
Expect:
- 6 files scanned, ~30,984 error lines processed.
- A meaningful number of unique signatures and patterns (likely in the low hundreds for signatures; fewer patterns).
- `top_mods` lists the highest-occurrence mods.
- PII audit: no real Steam IDs, IPs, or coordinates in the output JSON (input is already redacted; classifier doesn't introduce PII).
Test invocation: `python -m unittest discover tools/pz-analyzer/tests/` should be all-green.
## Risks and open questions
- **Inferred attribution accuracy.** The 40-line lookback is pzmm's heuristic; it's correct for tightly-paced server bursts but can mis-attribute when an unrelated mod logs in the gap. Surface as `confidence: medium` so consumers can choose to treat them differently. Acceptable for v1; tunable via a constant in `pz_parser.py`.
- **Pzmm targets `console.txt`, we target `DebugLog-server.txt`.** Format overlap is high (both share `Lua((MOD:X))` markers, Caused-by chains, Java exception shapes), but some patterns may be `console.txt`-specific. Tests use `DebugLog-server`-shaped fixtures only.
- **Future PHP port.** `pz_parser.py` is structured for mechanical translation to a `LuaErrorAnalyser` / `ModAttributionAnalyser` pair under `src/Analyser/ProjectZomboid/` in a separate spec. Output schema chosen to be PHP-codex-compatible (Insight subclasses with typed fields).
- **Licence.** The `paraxaQQ/pzmm` zip we reviewed has no top-level LICENSE; this spec mandates rewriting the patterns originally rather than copying code. Regex shapes and heuristics are general programming patterns and not author-specific, but no code blocks are lifted verbatim.
## Out of scope (explicit)
- Editing `pz_error_analysis.py` or `pz_redact_all.sh`.
- Modifying any file in `/opt/ik-codex/src/`, `/opt/ik-codex/test/`, or `/opt/iblogs/`.
- AI / LLM integration of any kind in the new tool.
- LLM inference at runtime in iblogs / bosslogs production. The Qwen analyzer (`pz_error_analysis.py`) is a developer-only discovery tool used to expand the deterministic ruleset in `pz_parser.py` (and its future PHP port). Production rendering is deterministic-only, forever.
- iblogs front-end rendering of the classification output.
- Filesystem mod-scan reattribution (pzmm's symbol/vehicle indexes).

View File

@@ -0,0 +1,131 @@
<?php
namespace IndifferentKetchup\Codex\Analyser\ProjectZomboid;
use IndifferentKetchup\Codex\Analyser\Analyser;
use IndifferentKetchup\Codex\Analysis\Analysis;
use IndifferentKetchup\Codex\Analysis\AnalysisInterface;
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
use IndifferentKetchup\Codex\Log\EntryInterface;
use IndifferentKetchup\Codex\Log\Level;
/**
* Surfaces ERROR or WARNING entries with a sliding context window of
* surrounding entries, so a viewer can see the lead-up and aftermath of
* each event without scanning the full log. PatternAnalyser cannot
* express this because windows span multiple entries; this walks once,
* classifies by Level (already resolved by the parser), and emits one
* ErrorContextProblem per hit.
*
* Stack-trace continuation lines are absorbed into the same Entry as the
* level header that preceded them by PatternParser, so noise filtering
* happens at parse time — windows here count Entries, not raw lines, and
* a stack-trace ERROR contributes exactly one window.
*
* Overlapping windows are merged: when two error/warning entries fall
* within CONTEXT_BEFORE + CONTEXT_AFTER of each other, the later
* window's before- and after-ranges are clipped to start past the
* previously emitted range so no Entry appears in two context arrays.
* The hit cap is enforced after emission; reaching it adds an
* ErrorContextTruncatedInformation to the analysis instead of further
* problems.
*/
class ErrorContextAnalyser extends Analyser
{
/**
* Number of entries preceding a hit captured as leading context.
* Twenty entries is wide enough to surface the immediate precursor
* events (mod load, player join, prior warning) for a server-log
* error without dragging in unrelated activity from minutes earlier.
*/
public const int CONTEXT_BEFORE = 20;
/**
* Number of entries following a hit captured as trailing context.
* Mirrors CONTEXT_BEFORE so windows are symmetric and the maximum
* window size is CONTEXT_BEFORE + 1 (hit) + CONTEXT_AFTER = 41
* entries.
*/
public const int CONTEXT_AFTER = 20;
/**
* Maximum number of hits emitted before truncation. Caps memory and
* output size on logs with cascading errors (e.g. a save-system
* failure that produces an error every tick). Reaching the cap adds
* an ErrorContextTruncatedInformation to the analysis so consumers
* can flag truncation rather than silently dropping later hits.
*/
public const int HIT_CAP = 500;
public function analyse(): AnalysisInterface
{
$analysis = new Analysis();
$analysis->setLog($this->log);
$entries = [];
foreach ($this->log as $entry) {
$entries[] = $entry;
}
$count = count($entries);
$hits = 0;
$truncated = false;
$lastEmittedIndex = -1;
for ($i = 0; $i < $count; $i++) {
$type = $this->classify($entries[$i]);
if ($type === null) {
continue;
}
if ($hits >= self::HIT_CAP) {
$truncated = true;
break;
}
$beforeStart = max($lastEmittedIndex + 1, $i - self::CONTEXT_BEFORE);
if ($beforeStart > $i) {
$beforeStart = $i;
}
$afterStart = max($lastEmittedIndex + 1, $i + 1);
$afterEnd = min($count - 1, $i + self::CONTEXT_AFTER);
$afterLength = max(0, $afterEnd - $afterStart + 1);
$analysis->addInsight((new ErrorContextProblem())
->setEntry($entries[$i])
->setType($type)
->setEntryIndex($i + 1)
->setBefore(array_slice($entries, $beforeStart, $i - $beforeStart))
->setAfter(array_slice($entries, $afterStart, $afterLength)));
$hits++;
$lastEmittedIndex = max($lastEmittedIndex, $afterEnd);
}
if ($truncated) {
$analysis->addInsight((new ErrorContextTruncatedInformation())
->setHitCap(self::HIT_CAP));
}
return $analysis;
}
/**
* Classify an entry as 'error', 'warning', or null based on its Level.
* Levels at or below ERROR (EMERGENCY/ALERT/CRITICAL/ERROR) collapse
* into 'error'; WARNING alone collapses into 'warning'. Returns null
* for anything less severe so the analyser skips it.
*/
protected function classify(EntryInterface $entry): ?string
{
$level = $entry->getLevel()->asInt();
if ($level <= Level::ERROR->asInt()) {
return 'error';
}
if ($level === Level::WARNING->asInt()) {
return 'warning';
}
return null;
}
}

View File

@@ -0,0 +1,130 @@
<?php
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
use IndifferentKetchup\Codex\Analysis\InsightInterface;
use IndifferentKetchup\Codex\Analysis\Problem;
use IndifferentKetchup\Codex\Log\EntryInterface;
/**
* Problem emitted by ErrorContextAnalyser for each ERROR or WARNING entry,
* carrying a sliding window of surrounding entries as before/after
* context. Coalesced by 1-based entryIndex so re-adding the same hit
* never produces duplicate problems.
*/
class ErrorContextProblem extends Problem
{
private string $type = 'error';
private int $entryIndex = 0;
/**
* @var EntryInterface[]
*/
private array $before = [];
/**
* @var EntryInterface[]
*/
private array $after = [];
/**
* @param string $type 'error' or 'warning'
* @return $this
*/
public function setType(string $type): static
{
$this->type = $type;
return $this;
}
/**
* @return string
*/
public function getType(): string
{
return $this->type;
}
/**
* @param int $entryIndex 1-based index of the hit entry within the log
* @return $this
*/
public function setEntryIndex(int $entryIndex): static
{
$this->entryIndex = $entryIndex;
return $this;
}
/**
* @return int 1-based index of the hit entry within the log
*/
public function getEntryIndex(): int
{
return $this->entryIndex;
}
/**
* @param EntryInterface[] $entries
* @return $this
*/
public function setBefore(array $entries): static
{
$this->before = $entries;
return $this;
}
/**
* @return EntryInterface[]
*/
public function getBefore(): array
{
return $this->before;
}
/**
* @param EntryInterface[] $entries
* @return $this
*/
public function setAfter(array $entries): static
{
$this->after = $entries;
return $this;
}
/**
* @return EntryInterface[]
*/
public function getAfter(): array
{
return $this->after;
}
/**
* Convenience accessor returning before-context, hit entry, and
* after-context as a single ordered array of at most
* ErrorContextAnalyser::CONTEXT_BEFORE + 1 + CONTEXT_AFTER = 41
* entries.
*
* @return EntryInterface[]
*/
public function getContext(): array
{
return [...$this->before, $this->getEntry(), ...$this->after];
}
public function getMessage(): string
{
return sprintf(
'%s at entry %d (%d before, %d after)',
strtoupper($this->type),
$this->entryIndex,
count($this->before),
count($this->after)
);
}
public function isEqual(InsightInterface $insight): bool
{
return $insight instanceof self && $insight->getEntryIndex() === $this->entryIndex;
}
}

View File

@@ -0,0 +1,42 @@
<?php
namespace IndifferentKetchup\Codex\Analysis\ProjectZomboid;
use IndifferentKetchup\Codex\Analysis\Information;
use IndifferentKetchup\Codex\Analysis\InsightInterface;
/**
* Emitted by ErrorContextAnalyser exactly once when its hit cap is
* reached, so downstream consumers can surface a "results truncated"
* notice instead of silently dropping subsequent error/warning hits.
*/
class ErrorContextTruncatedInformation extends Information
{
private int $hitCap = 0;
/**
* @param int $hitCap the cap that was hit (mirrors
* ErrorContextAnalyser::HIT_CAP at emission time)
* @return $this
*/
public function setHitCap(int $hitCap): static
{
$this->hitCap = $hitCap;
$this->setLabel('Error context');
$this->setValue(sprintf('truncated after %d hits', $hitCap));
return $this;
}
/**
* @return int
*/
public function getHitCap(): int
{
return $this->hitCap;
}
public function isEqual(InsightInterface $insight): bool
{
return $insight instanceof self;
}
}

View File

@@ -15,7 +15,7 @@ namespace IndifferentKetchup\Codex\Pattern\ProjectZomboid;
*/
class DebugServerPattern
{
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+,\s+t:\d+,\s+st:[\d,]+>\s+.*$/';
public const string LINE = '/^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+(\w+)\s*:\s+(\S+)\s+f:\d+(?:,\s+t:\d+)?,?\s+st:[\d,]+>\s+.*$/';
public const string VERSION = '/version=(?<version>\S+) (?<hash>[a-f0-9]{40}) (?<date>\d{4}-\d{2}-\d{2}) (?<time>\d{2}:\d{2}:\d{2})/';

View File

@@ -7,15 +7,24 @@ use IndifferentKetchup\Codex\Util\RedactorInterface;
/**
* Render-time PII filter for Project Zomboid log content.
*
* Applies up to three sequential regex passes over the raw log string,
* Applies up to four sequential regex passes over the raw log string,
* each controlled by a boolean toggle (all enabled by default):
*
* 1. Steam ID pass — replaces 17-digit Steam IDs with a placeholder token.
* 2. Player name pass — replaces player display names with a placeholder
* 1. IP address pass — replaces IPv4 addresses (with optional :port
* suffix) and IPv6 addresses (full, abbreviated, bracketed, and
* IPv4-mapped forms; all with optional :port when bracketed) with
* a placeholder token. Pattern-disjoint from the other passes.
* 2. Steam ID pass — replaces 17-digit Steam IDs with a placeholder
* token.
* 3. Player name pass — replaces player display names with a placeholder
* token. This pass anchors on the already-redacted Steam ID token, so
* the ordering Steam ID -> name -> coordinates is mandatory.
* 3. Coordinates pass — replaces world coordinate triplets with a placeholder
* token.
* 4. Coordinates pass — replaces world coordinate triplets with a
* placeholder token.
*
* Pass 1 runs first by convention, not dependency: it shares no anchors
* with passes 2-4 and could run anywhere in the chain without affecting
* their output.
*
* All regex passes use the /u flag for Unicode safety.
*
@@ -24,6 +33,29 @@ use IndifferentKetchup\Codex\Util\RedactorInterface;
*/
class ProjectZomboidRedactor implements RedactorInterface
{
/** Generic placeholder substituted for every matched IPv4 or IPv6 address (with port suffix consumed when present). */
public const string IP_REPLACEMENT = '[REDACTED_IP]';
/** Strict IPv4 with valid 0-255 octets and optional :port suffix. Lookarounds reject matches embedded in longer alphanumeric or dotted-decimal tokens; the (?<!\d\.) / (?!\.\d) pair specifically prevents matching inside an N-octet (N>4) sequence like 1.2.3.4.5 while still allowing a trailing sentence period after the IP/port. */
public const string IPV4_REGEX = '/'
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
. '(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
. '(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'
. '(?::\d{1,5})?'
. '(?![A-Za-z0-9_:])(?!\.\d)'
. '/u';
/** Coarse IPv6 candidate matcher (bracketed-with-port, or bare 2-7-colon hex form covering full / abbreviated / IPv4-mapped). Each match is validated with filter_var() in the redact() callback so PHP/Java scope ops like Foo::Bar and PZ timestamps like 12:00:00.000 are rejected. Boundary lookarounds mirror the IPv4 regex so trailing sentence periods don't block the match. */
public const string IPV6_REGEX = '/'
. '(?<![A-Za-z0-9_:])(?<!\d\.)'
. '(?:'
. '\[(?<bracketed>[0-9a-fA-F:.]+)\](?::\d{1,5})?'
. '|'
. '(?<bare>(?:[0-9a-fA-F]{0,4}:){2,7}[0-9a-fA-F.]*)'
. ')'
. '(?![A-Za-z0-9_:])(?!\.\d)'
. '/u';
/** Regex matching a 17-digit SteamID64 anchored on the 76561198 universe prefix, with lookaround boundaries that reject embedded occurrences. */
public const string STEAM_ID_REGEX = '/(?<![A-Za-z0-9])76561198\d{9}(?![A-Za-z0-9])/u';
@@ -54,10 +86,23 @@ class ProjectZomboidRedactor implements RedactorInterface
/** Matches integer coordinate triplets enclosed in round parentheses, anchored on a trailing PvP verb to disambiguate from server-metadata triples (pvp.txt Combat:/Safety: shape); only the attacker/first-coord set is redacted per line — the victim coords lack the trailing keyword and are deferred to v2. */
public const string COORDS_PARENTHESISED_REGEX = '/(?<=\()(?<x>\d+),(?<y>\d+),(?<z>-?\d+)(?=\) (?:hit|restore|store|true|false))/u';
private bool $redactIpAddresses = true;
private bool $redactSteamIds = true;
private bool $redactPlayerNames = true;
private bool $redactCoordinates = true;
/**
* Enable or disable the IP address redaction pass (covers IPv4 and IPv6).
*
* @param bool $on Pass true to enable, false to disable.
* @return static
*/
public function redactIpAddresses(bool $on): static
{
$this->redactIpAddresses = $on;
return $this;
}
/**
* Enable or disable the Steam ID redaction pass.
*
@@ -97,14 +142,31 @@ class ProjectZomboidRedactor implements RedactorInterface
/**
* Redact PII from the given Project Zomboid log content.
*
* Passes are applied in the mandatory order: Steam ID -> player name ->
* coordinates. See class docblock for rationale.
* Passes are applied in the order: IP address -> Steam ID -> player
* name -> coordinates. The Steam ID -> name -> coordinates ordering
* is mandatory (see class docblock); the IP pass is pattern-disjoint
* and runs first by convention.
*
* @param string $content Raw log content that may contain PII.
* @return string Content with enabled PII categories replaced by tokens.
*/
public function redact(string $content): string
{
if ($this->redactIpAddresses) {
$content = preg_replace_callback(
self::IPV6_REGEX,
static function (array $matches): string {
$candidate = ($matches['bracketed'] ?? '') !== ''
? $matches['bracketed']
: ($matches['bare'] ?? '');
return filter_var($candidate, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) !== false
? self::IP_REPLACEMENT
: $matches[0];
},
$content
);
$content = preg_replace(self::IPV4_REGEX, self::IP_REPLACEMENT, $content);
}
if ($this->redactSteamIds) {
$content = preg_replace(self::STEAM_ID_REGEX, self::STEAM_ID_REPLACEMENT, $content);
}

View File

@@ -0,0 +1,22 @@
[16-04-26 00:00:42.314] LOG : General f:0 st:48,648,157,434> SLF4J(W): No SLF4J providers were found..
[16-04-26 00:00:42.315] LOG : General f:0 st:48,648,157,492> SLF4J(W): Defaulting to no-operation (NOP) logger implementation.
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,584> version=42.17.0 0000000000000000000000000000000000000000 2026-04-20 14:34:44 (ZB) demo=false.
[16-04-26 00:00:42.407] LOG : General f:0 st:48,648,157,585> revision=0000000000000000000000000000000000000000 date=2026-04-20 time=14:34:44 (ZB).
[16-04-26 00:01:19.080] ERROR: General f:0 st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
Stack trace:
java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
java.base/sun.nio.fs.AbstractPoller.processRequests(Unknown Source)
java.base/sun.nio.fs.LinuxWatchService$Poller.run(Unknown Source)
[16-04-26 00:01:19.131] LOG : Mod f:0 st:48,648,194,309> loading example_mod_alpha.
[16-04-26 00:01:19.142] LOG : Mod f:0 st:48,648,194,320> loading example_mod_beta.
[16-04-26 00:01:19.155] LOG : Mod f:0 st:48,648,194,333> loading example_mod_gamma.
[16-04-26 00:01:19.200] WARN : Mod f:0 st:48,648,194,378> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
[16-04-26 00:01:45.937] ERROR: WorldGen f:0 st:48,648,221,115> IsoPropertyType.lookupOrDefaultStr> Exception thrown
zombie.core.properties.IsoPropertyType$IsoPropertyTypeNotFoundException: Property Name not found: ladderW at IsoPropertyType.lookup(IsoPropertyType.java:269). Message: Property Name not found: ladderW
at zombie.core.properties.IsoPropertyType.lookup(IsoPropertyType.java:269)
at zombie.iso.IsoChunkData.PostProcessChunk(IsoChunkData.java:512)
[16-04-26 00:02:00.000] LOG : General f:0 st:48,648,235,178> server initialised.
[16-04-26 00:05:00.000] LOG : General f:0 st:48,648,415,178> shutdown requested.

View File

@@ -0,0 +1,128 @@
<?php
namespace IndifferentKetchup\Codex\Test\Tests\Games\ProjectZomboid\Analyser;
use IndifferentKetchup\Codex\Analyser\AnalyserInterface;
use IndifferentKetchup\Codex\Analyser\ProjectZomboid\ErrorContextAnalyser;
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextProblem;
use IndifferentKetchup\Codex\Analysis\ProjectZomboid\ErrorContextTruncatedInformation;
use IndifferentKetchup\Codex\Log\AnalysableLog;
use IndifferentKetchup\Codex\Log\Entry;
use IndifferentKetchup\Codex\Log\Level;
use IndifferentKetchup\Codex\Log\Line;
use PHPUnit\Framework\TestCase;
class ErrorContextAnalyserTest extends TestCase
{
/**
* Build an in-memory AnalysableLog with $count entries; entries whose
* 1-based index is in $errorIndices are tagged Level::ERROR, the rest
* Level::INFO. Anonymous AnalysableLog subclass keeps the fixture
* inline since we exercise the analyser directly via setLog().
*
* @param int[] $errorIndices 1-based entry indices to mark as ERROR
*/
private function makeLog(array $errorIndices, int $count): AnalysableLog
{
$errorSet = array_flip($errorIndices);
$log = new class extends AnalysableLog {
public static function getDefaultAnalyser(): AnalyserInterface
{
return new ErrorContextAnalyser();
}
};
for ($n = 1; $n <= $count; $n++) {
$level = isset($errorSet[$n]) ? Level::ERROR : Level::INFO;
$entry = (new Entry())
->setLevel($level)
->addLine(new Line($n, sprintf('line %d', $n)));
$log->addEntry($entry);
}
return $log;
}
public function testEmitsThreeNonOverlappingWindows(): void
{
$log = $this->makeLog([10, 50, 95], 100);
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
$this->assertCount(3, $problems);
$this->assertSame(10, $problems[0]->getEntryIndex());
$this->assertSame(50, $problems[1]->getEntryIndex());
$this->assertSame(95, $problems[2]->getEntryIndex());
// First hit (entry 10): 9 entries before (1..9), 20 after (11..30).
$this->assertCount(9, $problems[0]->getBefore());
$this->assertCount(20, $problems[0]->getAfter());
// Second hit (entry 50): clipped to 19 before (31..49), 20 after (51..70).
$this->assertCount(19, $problems[1]->getBefore());
$this->assertCount(20, $problems[1]->getAfter());
// Third hit (entry 95): clipped to 20 before (75..94), 5 after (96..100).
$this->assertCount(20, $problems[2]->getBefore());
$this->assertCount(5, $problems[2]->getAfter());
// Total window per hit never exceeds 1 + CONTEXT_BEFORE + CONTEXT_AFTER = 41.
foreach ($problems as $problem) {
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_BEFORE, count($problem->getBefore()));
$this->assertLessThanOrEqual(ErrorContextAnalyser::CONTEXT_AFTER, count($problem->getAfter()));
$this->assertLessThanOrEqual(41, count($problem->getContext()));
}
// No entry appears in two problems' context arrays.
$seen = [];
foreach ($problems as $problem) {
foreach ([...$problem->getBefore(), ...$problem->getAfter()] as $entry) {
$id = spl_object_id($entry);
$this->assertArrayNotHasKey($id, $seen, 'Entry duplicated across problem context arrays');
$seen[$id] = true;
}
}
}
public function testMergesAdjacentWindowsWhenWithinContextRange(): void
{
// Errors 5 entries apart; without merge their windows would
// overlap heavily.
$log = $this->makeLog([10, 15], 50);
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
$this->assertCount(2, $problems);
// First hit: 9 before (1..9), 20 after (11..30). lastEmittedIndex=29 (0-based).
$this->assertCount(9, $problems[0]->getBefore());
$this->assertCount(20, $problems[0]->getAfter());
// Second hit at entry 15 (i=14). beforeStart clamped past i so before is empty.
// afterStart=max(30, 15)=30, afterEnd=min(49, 34)=34, so after=entries 31..35
// (5 entries, all unseen).
$this->assertCount(0, $problems[1]->getBefore());
$this->assertCount(5, $problems[1]->getAfter());
// Confirm no entry appears in both problems' context arrays.
$first = [...$problems[0]->getBefore(), ...$problems[0]->getAfter()];
$second = [...$problems[1]->getBefore(), ...$problems[1]->getAfter()];
foreach ($second as $entry) {
$this->assertNotContains($entry, $first, 'Entry duplicated across merged windows');
}
}
public function testTruncatesAtHitCap(): void
{
// 600 consecutive ERROR entries — analyser should cap emission at
// HIT_CAP and add exactly one truncation Information.
$log = $this->makeLog(range(1, 600), 600);
$analysis = (new ErrorContextAnalyser())->setLog($log)->analyse();
$problems = $analysis->getFilteredInsights(ErrorContextProblem::class);
$this->assertCount(ErrorContextAnalyser::HIT_CAP, $problems);
$information = $analysis->getFilteredInsights(ErrorContextTruncatedInformation::class);
$this->assertCount(1, $information);
$this->assertSame(ErrorContextAnalyser::HIT_CAP, $information[0]->getHitCap());
}
}

View File

@@ -6,18 +6,31 @@ use IndifferentKetchup\Codex\Detective\Detective;
use IndifferentKetchup\Codex\Log\File\PathLogFile;
use IndifferentKetchup\Codex\Log\Level;
use IndifferentKetchup\Codex\Log\ProjectZomboid\ProjectZomboidServerLog;
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\TestCase;
class ProjectZomboidServerLogTest extends TestCase
{
private function fixturePath(): string
/**
* Both PZ B41 and B42 line shapes must parse identically. B41 (and the
* fixture used by every analyser test) emits `f:N, t:N, st:N,N,N,N>`;
* B42 (release branch from 2026-04 onward, e.g. build 42.17) drops the
* `t:` microsecond field entirely and tightens whitespace to
* `f:N st:N,N,N,N>`.
*/
public static function fixtureProvider(): array
{
return __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures/debug-server-minimal.txt';
$base = __DIR__ . '/../../../../src/Games/ProjectZomboid/fixtures';
return [
'pz41-format' => [$base . '/debug-server-minimal.txt'],
'pz42-format' => [$base . '/debug-server-42x-minimal.txt'],
];
}
public function testParsesEntriesWithLevelAndPrefix(): void
#[DataProvider('fixtureProvider')]
public function testParsesEntriesWithLevelAndPrefix(string $fixturePath): void
{
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
$log->parse();
$entries = $log->getEntries();
@@ -29,9 +42,10 @@ class ProjectZomboidServerLogTest extends TestCase
$this->assertNotNull($first->getTime());
}
public function testStackTraceLinesAttachToTriggeringErrorEntry(): void
#[DataProvider('fixtureProvider')]
public function testStackTraceLinesAttachToTriggeringErrorEntry(string $fixturePath): void
{
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
$log->parse();
$errorEntry = null;
@@ -46,19 +60,21 @@ class ProjectZomboidServerLogTest extends TestCase
$this->assertGreaterThan(1, count($errorEntry->getLines()));
}
public function testWarnLevelMapsCorrectly(): void
#[DataProvider('fixtureProvider')]
public function testWarnLevelMapsCorrectly(string $fixturePath): void
{
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($this->fixturePath()));
$log = (new ProjectZomboidServerLog())->setLogFile(new PathLogFile($fixturePath));
$log->parse();
$warnEntries = array_filter($log->getEntries(), fn($e) => $e->getLevel() === Level::WARNING);
$this->assertNotEmpty($warnEntries);
}
public function testDetectiveDispatchesByContent(): void
#[DataProvider('fixtureProvider')]
public function testDetectiveDispatchesByContent(string $fixturePath): void
{
$detective = (new Detective())
->setLogFile(new PathLogFile($this->fixturePath()))
->setLogFile(new PathLogFile($fixturePath))
->addPossibleLogClass(ProjectZomboidServerLog::class);
$log = $detective->detect();

View File

@@ -0,0 +1,114 @@
<?php
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
use PHPUnit\Framework\TestCase;
class ProjectZomboidRedactorIpv4Test extends TestCase
{
public function testRedactsBareIpv4(): void
{
$input = 'Connection from 192.168.1.1 closed.';
$expected = 'Connection from [REDACTED_IP] closed.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsIpv4WithPortSuffix(): void
{
$input = 'Connected to 10.0.0.42:27015.';
$expected = 'Connected to [REDACTED_IP].';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsMultipleIpv4OnOneLine(): void
{
$input = 'Peer 192.168.1.10 -> 192.168.1.20 via 10.0.0.1:8080.';
$expected = 'Peer [REDACTED_IP] -> [REDACTED_IP] via [REDACTED_IP].';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsLoopbackAndBoundaryAddresses(): void
{
$input = implode("\n", [
'127.0.0.1',
'0.0.0.0',
'255.255.255.255',
]);
$expected = implode("\n", [
'[REDACTED_IP]',
'[REDACTED_IP]',
'[REDACTED_IP]',
]);
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testDoesNotRedactOutOfRangeOctets(): void
{
// 999 is not a valid octet under the 0-255 alternation; the address
// must therefore be left untouched.
$input = 'Bogus: 999.999.999.999';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($input, $output);
}
public function testDoesNotRedactInsideLongerDottedSequence(): void
{
// Five dotted segments are not an IPv4 address; the lookarounds must
// reject any partial match inside the longer sequence.
$input = 'Path frag 1.2.3.4.5 should not match.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($input, $output);
}
public function testDoesNotRedactThreeSegmentBuildNumbers(): void
{
// PZ build numbers are 3-segment (e.g. 41.78.16) and must not match.
$input = 'Build 41.78.16 starting up.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($input, $output);
}
public function testToggleOffLeavesIpv4Intact(): void
{
$input = 'Connection from 192.168.1.1:27015 closed.';
$output = (new ProjectZomboidRedactor())
->redactIpAddresses(false)
->redact($input);
$this->assertSame($input, $output);
}
public function testIdempotence(): void
{
$input = implode("\n", [
'Connection from 192.168.1.1:27015 closed.',
'Peer 10.0.0.42 -> 10.0.0.43 via 172.16.0.1:8080.',
]);
$redactor = new ProjectZomboidRedactor();
$once = $redactor->redact($input);
$twice = $redactor->redact($once);
$this->assertSame($once, $twice);
}
}

View File

@@ -0,0 +1,135 @@
<?php
namespace IndifferentKetchup\Codex\Test\Tests\Util\Redactor;
use IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor;
use PHPUnit\Framework\TestCase;
class ProjectZomboidRedactorIpv6Test extends TestCase
{
public function testRedactsFullIpv6(): void
{
$input = 'Bound 2001:0db8:85a3:0000:0000:8a2e:0370:7334 ok.';
$expected = 'Bound [REDACTED_IP] ok.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsAbbreviatedIpv6(): void
{
$input = 'Server peer 2001:db8::1 connected.';
$expected = 'Server peer [REDACTED_IP] connected.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsLoopbackIpv6(): void
{
$input = 'localhost ::1 reachable.';
$expected = 'localhost [REDACTED_IP] reachable.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsBracketedIpv6WithPort(): void
{
$input = 'Bound to [2001:db8::1]:8080 ok.';
$expected = 'Bound to [REDACTED_IP] ok.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsBracketedLoopbackWithPort(): void
{
$input = 'Listening on [::1]:27015.';
$expected = 'Listening on [REDACTED_IP].';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testRedactsIpv4MappedIpv6(): void
{
// IPv4-mapped form must be handled by the IPv6 pass before the IPv4
// pass so the leading "::ffff:" doesn't get orphaned. With the IPv6
// pass first, the whole token collapses into a single placeholder.
$input = 'Mapped ::ffff:192.168.1.1 ok.';
$expected = 'Mapped [REDACTED_IP] ok.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testDoesNotRedactJavaScopeOperator(): void
{
// Java method references and PHP scope operators look superficially
// like leading-:: IPv6 forms but fail filter_var validation; the
// word-boundary lookbehind also rejects matches that follow letters.
$input = 'Foo::bar called Object::toString.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($input, $output);
}
public function testDoesNotRedactTimestampShape(): void
{
// PZ log timestamps include hh:mm:ss.v segments which match the coarse
// IPv6 candidate pattern but are rejected by filter_var.
$input = '[16-04-26 12:00:00.000][LOG] startup complete';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($input, $output);
}
public function testDoesNotRedactSteamIdAsIpv6(): void
{
// 17-digit Steam IDs share no characters with IPv6 syntax, but assert
// explicitly so a future change to the IPv6 regex doesn't accidentally
// collide with the Steam ID pass.
$input = 'Player 76561198111111111 joined.';
$expected = 'Player 76561198000000000 joined.';
$output = (new ProjectZomboidRedactor())->redact($input);
$this->assertSame($expected, $output);
}
public function testToggleOffLeavesIpv6Intact(): void
{
$input = 'Bound to [2001:db8::1]:8080 ok.';
$output = (new ProjectZomboidRedactor())
->redactIpAddresses(false)
->redact($input);
$this->assertSame($input, $output);
}
public function testIdempotence(): void
{
$input = implode("\n", [
'Server peer 2001:db8::1 connected.',
'Listening on [::1]:27015.',
'Mapped ::ffff:192.168.1.1 ok.',
'[16-04-26 12:00:00.000][LOG] startup complete',
]);
$redactor = new ProjectZomboidRedactor();
$once = $redactor->redact($input);
$twice = $redactor->redact($once);
$this->assertSame($once, $twice);
}
}

View File

@@ -0,0 +1,310 @@
#!/usr/bin/env python3
"""
pz_classify.py — Deterministic Project Zomboid log classifier orchestrator.
Walks ``*DebugLog-server*.txt`` files under the redacted-logs directory,
runs the pz_parser pipeline per file, merges records cross-file by their
deterministic ``signature``, and emits the spec-shaped JSON report.
Companion to the existing Qwen-backed discovery tool ``pz_error_analysis.py``
(left untouched). Zero AI dependency, stdlib-only, runs in seconds.
By convention the input is always the redacted directory produced by
``pz_redact_all.sh``; ``meta.redacted`` is therefore hard-coded ``true``.
If the user overrides ``--input`` to a non-redacted source we still emit
``true`` because we have no upstream way to verify redaction status.
Pipeline:
parser.parse_file per-file Entry list
parser.classify_entries per-file deduped Record list
_merge_cross_file global Record list deduped across files
_build_summary top-line stats + by_kind / by_attribution / top_mods
Output schema, CLI flags, and aggregation rules are defined in
``docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md``.
"""
from __future__ import annotations
import argparse
import dataclasses
import json
import sys
from collections import Counter
from datetime import datetime, timezone
from pathlib import Path
from pz_parser import (
MAX_CAUSE_CHAIN_LEVELS,
MAX_STACK_FRAMES,
SEVERITY_LEVELS,
Record,
classify_entries,
parse_file,
)
# ---------------------------------------------------------------------------
# Defaults / constants
# ---------------------------------------------------------------------------
_REPO_ROOT = Path(__file__).resolve().parents[2]
DEFAULT_INPUT: Path = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
DEFAULT_OUT: Path = _REPO_ROOT / ".scratch" / "pz" / "classify.json"
#: Filename glob driving the directory walk.
INPUT_GLOB: str = "*DebugLog-server*.txt"
#: Cap on entries in ``summary.top_mods`` — most occurrence-count-heavy mods.
TOP_MODS_LIMIT: int = 10
#: Confidence / attribution promotion ladders (higher rank wins on merge).
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
_ATTRIBUTION_RANK: dict[str, int] = {
"unattributed": 0,
"inferred": 1,
"direct": 2,
}
#: Levels that count as errors (vs warnings) in the summary.
_ERROR_LEVELS: frozenset[str] = frozenset({"ERROR", "SEVERE", "FATAL"})
# ---------------------------------------------------------------------------
# Cross-file aggregation (spec §9, inter-file equivalent of parser dedup)
# ---------------------------------------------------------------------------
def _merge_cross_file(per_file_records: list[Record]) -> list[Record]:
"""Merge ``Record`` instances across files by ``signature``.
The parser already dedups within a single file. This is the inter-file
equivalent: when the same signature appears in records from multiple
files, sum occurrences, union file lists, promote attribution/confidence,
and merge stack and cause-chain (deduped, capped at parser constants).
First-seen is the earliest by file-then-line; since callers feed records
in sorted file order, the first record we encounter per signature is
already the earliest.
"""
by_signature: dict[str, Record] = {}
for incoming in per_file_records:
existing = by_signature.get(incoming.signature)
if existing is None:
# First occurrence — copy so we don't mutate the caller's list.
by_signature[incoming.signature] = Record(
signature=incoming.signature,
pattern_id=incoming.pattern_id,
level=incoming.level,
kind=incoming.kind,
mod_id=incoming.mod_id,
mod_name=incoming.mod_name,
attribution=incoming.attribution,
confidence=incoming.confidence,
attribution_reason=incoming.attribution_reason,
file=incoming.file,
line=incoming.line,
cause_chain=incoming.cause_chain,
stack=list(incoming.stack),
first_seen=incoming.first_seen,
occurrence_count=incoming.occurrence_count,
files=list(incoming.files),
excerpt=incoming.excerpt,
)
continue
# Aggregate.
existing.occurrence_count += incoming.occurrence_count
for fname in incoming.files:
if fname not in existing.files:
existing.files.append(fname)
# Promote attribution / confidence / mod_name on stronger evidence.
if _ATTRIBUTION_RANK[incoming.attribution] > _ATTRIBUTION_RANK[existing.attribution]:
existing.attribution = incoming.attribution
existing.attribution_reason = incoming.attribution_reason
if incoming.mod_name:
existing.mod_name = incoming.mod_name
if _CONFIDENCE_RANK[incoming.confidence] > _CONFIDENCE_RANK[existing.confidence]:
existing.confidence = incoming.confidence
# Merge stack frames preserving order, capped.
for frame in incoming.stack:
if frame not in existing.stack and len(existing.stack) < MAX_STACK_FRAMES:
existing.stack.append(frame)
# Merge cause chain (deduped tokens, capped).
if incoming.cause_chain and incoming.cause_chain != existing.cause_chain:
old = existing.cause_chain.split(" -> ") if existing.cause_chain else []
new = incoming.cause_chain.split(" -> ")
merged = list(old)
for tok in new:
if tok and tok not in merged:
merged.append(tok)
existing.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
return list(by_signature.values())
# ---------------------------------------------------------------------------
# Summary computation
# ---------------------------------------------------------------------------
def _build_summary(records: list[Record]) -> dict[str, object]:
"""Build the ``summary`` block per spec.
Counts records (signatures), not raw occurrences, except for ``top_mods``
which sums ``occurrence_count`` per mod_id so that volume-driving mods
surface even when they hit the same shape repeatedly.
"""
errors = sum(1 for r in records if r.level in _ERROR_LEVELS)
warnings = sum(1 for r in records if r.level == "WARN")
by_kind = Counter(r.kind for r in records)
by_attribution = Counter(r.attribution for r in records)
by_confidence = Counter(r.confidence for r in records)
# Group by mod_id summing total occurrence_count; preserve any mod_name.
mod_totals: dict[str, int] = {}
mod_names: dict[str, str] = {}
for r in records:
mod_totals[r.mod_id] = mod_totals.get(r.mod_id, 0) + r.occurrence_count
# First non-empty mod_name wins; subsequent records may have empty
# mod_name (e.g. for unattributed) so don't overwrite with "".
if r.mod_name and r.mod_id not in mod_names:
mod_names[r.mod_id] = r.mod_name
top_mods = sorted(
(
{
"mod_id": mod_id,
"mod_name": mod_names.get(mod_id, ""),
"occurrence_count": total,
}
for mod_id, total in mod_totals.items()
),
key=lambda d: d["occurrence_count"],
reverse=True,
)[:TOP_MODS_LIMIT]
return {
"errors": errors,
"warnings": warnings,
"by_kind": dict(by_kind),
"by_attribution": dict(by_attribution),
"by_confidence": dict(by_confidence),
"top_mods": top_mods,
}
# ---------------------------------------------------------------------------
# Driver
# ---------------------------------------------------------------------------
def _run(input_dir: Path, out_path: Path, *, quiet: bool) -> int:
if not input_dir.is_dir():
print(
f"pz_classify: --input directory not found: {input_dir}",
file=sys.stderr,
)
return 2
started = datetime.now(timezone.utc).isoformat(timespec="seconds")
files = sorted(input_dir.glob(INPUT_GLOB))
all_records: list[Record] = []
log_lines_total = 0
error_lines_total = 0
for path in files:
try:
entries = parse_file(path)
except Exception as exc: # noqa: BLE001 — orchestrator must keep going.
print(
f"pz_classify: warning: failed to parse {path.name}: {exc}",
file=sys.stderr,
)
continue
# Body-line totals: every line under every parsed entry contributes
# to log_lines_total; severity-level entries' body lines feed
# error_lines_total. Counted before dedup so it reflects raw volume.
for e in entries:
log_lines_total += len(e.body)
if e.level in SEVERITY_LEVELS:
error_lines_total += len(e.body)
all_records.extend(classify_entries(entries, source_file=path.name))
merged = _merge_cross_file(all_records)
merged.sort(key=lambda r: r.occurrence_count, reverse=True)
finished = datetime.now(timezone.utc).isoformat(timespec="seconds")
unique_patterns = len({r.pattern_id for r in merged})
document: dict[str, object] = {
"meta": {
"input_dir": str(input_dir),
"files_scanned": len(files),
"log_lines_total": log_lines_total,
"error_lines_total": error_lines_total,
"unique_signatures": len(merged),
"unique_patterns": unique_patterns,
"redacted": True,
"started": started,
"finished": finished,
},
"signatures": [dataclasses.asdict(r) for r in merged],
"summary": _build_summary(merged),
}
tmp = out_path.with_suffix(out_path.suffix + ".tmp")
try:
out_path.parent.mkdir(parents=True, exist_ok=True)
with tmp.open("w", encoding="utf-8") as f:
json.dump(document, f, ensure_ascii=False, indent=2)
f.write("\n")
tmp.replace(out_path)
except OSError as exc:
print(f"pz_classify: failed to write {out_path}: {exc}", file=sys.stderr)
# Best-effort cleanup of the temp file.
try:
tmp.unlink()
except OSError:
pass
return 1
if not quiet:
print(
f"pz_classify: {len(files)} file(s), {log_lines_total} log lines, "
f"{error_lines_total} error lines, {len(merged)} records "
f"({unique_patterns} unique patterns) -> {out_path}"
)
return 0
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="pz_classify",
description=(
"Deterministic Project Zomboid log classifier. Walks redacted "
"DebugLog-server*.txt files, classifies errors/warnings, and "
"emits a JSON report."
),
)
parser.add_argument(
"--input",
type=Path,
default=DEFAULT_INPUT,
help=f"Input directory of redacted log files (default: {DEFAULT_INPUT}).",
)
parser.add_argument(
"--out",
type=Path,
default=DEFAULT_OUT,
help=f"Output JSON path (default: {DEFAULT_OUT}).",
)
parser.add_argument(
"--quiet",
action="store_true",
help="Suppress the trailing one-line summary.",
)
return parser.parse_args(argv)
def main(argv: list[str] | None = None) -> int:
args = _parse_args(argv)
return _run(args.input, args.out, quiet=args.quiet)
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,467 @@
#!/usr/bin/env python3
"""
pz_error_analysis.py — Qwen-backed Project Zomboid error analyzer.
Walks `*DebugLog-server*.txt` files (DEFAULT_INPUT — already PII-redacted by
pz_redact_all.sh), groups WARN/ERROR/FATAL entries with surrounding context,
deduplicates by signature hash, and asks Qwen to classify each unique
signature into a fixed taxonomy (missing_mod, java_exception, lua_error,
out_of_memory, ...) with a short title / summary / likely_cause /
suggested_fix / confidence.
Standalone: requires Python 3.10+ and the `openai` package
(`pip install openai>=1.30`). Talks to a local OpenAI-compatible endpoint
(default sam-desktop llama-swap on port 8401); override with QWEN_BASE_URL
and QWEN_MODEL env vars.
"""
from __future__ import annotations
import argparse
import datetime as dt
import hashlib
import json
import os
import re
import sys
import time
from pathlib import Path
from typing import Any, Iterator
from openai import OpenAI
_REPO_ROOT = Path(__file__).resolve().parents[2]
DEFAULT_INPUT = _REPO_ROOT / ".scratch" / "pz" / "Logs.redacted"
DEFAULT_OUT = _REPO_ROOT / ".scratch" / "pz" / "analysis.json"
# --- Qwen client (inlined from /opt/analytics/ib_analytics/llm/local_client.py
# so this script has no cross-repo dependency; mirror upstream changes if
# the analytics client API evolves) ---
QWEN_DEFAULT_BASE_URL = "http://100.101.41.16:8401/v1"
QWEN_DEFAULT_MODEL = "qwen3.6-35b-a3b"
SAMPLING_STRUCTURED: dict[str, Any] = {
"temperature": 0.7,
"top_p": 0.80,
"extra_body": {
"top_k": 20,
"presence_penalty": 1.5,
"chat_template_kwargs": {"enable_thinking": False},
},
}
def get_client() -> OpenAI:
return OpenAI(
base_url=os.environ.get("QWEN_BASE_URL", QWEN_DEFAULT_BASE_URL),
api_key="EMPTY",
)
def get_model() -> str:
return os.environ.get("QWEN_MODEL", QWEN_DEFAULT_MODEL)
def structured_call(
tool_schema: dict[str, Any],
messages: list[dict[str, Any]],
*,
sampling: dict[str, Any] = SAMPLING_STRUCTURED,
client: OpenAI | None = None,
model: str | None = None,
max_tokens: int = 4096,
) -> dict[str, Any]:
cli = client or get_client()
mdl = model or get_model()
fn_name = tool_schema["function"]["name"]
kwargs = dict(sampling)
extra_body = dict(kwargs.pop("extra_body", {}))
response = cli.chat.completions.create(
model=mdl,
messages=messages,
tools=[tool_schema],
tool_choice="required",
max_tokens=max_tokens,
extra_body=extra_body,
**kwargs,
)
choice = response.choices[0]
tool_calls = getattr(choice.message, "tool_calls", None) or []
if not tool_calls:
raise ValueError(
f"Qwen did not invoke {fn_name}; finish_reason={choice.finish_reason}, "
f"content={(choice.message.content or '')[:500]}"
)
call = tool_calls[0]
if call.function.name != fn_name:
raise ValueError(
f"Qwen invoked unexpected tool {call.function.name!r}; expected {fn_name!r}"
)
try:
return json.loads(call.function.arguments)
except json.JSONDecodeError as e:
raise ValueError(
f"Malformed tool-call arguments for {fn_name}: {e}; "
f"raw={call.function.arguments[:500]}"
) from e
# --- Parser ---
ENTRY_RE = re.compile(
r"^\[(\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
r"(LOG|WARN|ERROR|FATAL)\s*:\s*(.*)"
)
SESSION_META_RE = re.compile(r"^[A-Za-z]+\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*")
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
WS_RUN_RE = re.compile(r"\s+")
CATEGORIES = [
"missing_mod", "mod_conflict", "lua_error", "java_exception",
"out_of_memory", "corrupt_save", "network_error", "load_order",
"performance", "server_crash", "unknown",
]
TOOL_SCHEMA: dict[str, Any] = {
"type": "function",
"function": {
"name": "submit_error_analysis",
"description": (
"Analyse a single Project Zomboid server error block and emit "
"structured insight."
),
"parameters": {
"type": "object",
"properties": {
"category": {"type": "string", "enum": CATEGORIES},
"severity": {"type": "string", "enum": ["problem", "warning", "info"]},
"title": {"type": "string", "description": "One-line headline (<=80 chars)"},
"summary": {"type": "string", "description": "1-3 sentences explaining what happened"},
"likely_cause": {"type": "string", "description": "Most plausible cause given the context"},
"suggested_fix": {"type": "string", "description": "Concrete remediation, server-admin actionable"},
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
},
"required": [
"category", "severity", "title", "summary",
"likely_cause", "suggested_fix", "confidence",
],
},
},
}
SYSTEM_PROMPT = """You are a Project Zomboid dedicated server administrator
diagnosing a server log. You receive one error/warning event with surrounding
context (entries marked with `>>>` are the hit; the rest are leading or
trailing context). Classify the event using the submit_error_analysis tool
ONLY — never reply in plain text.
Rules:
- `category` must be one of the enum values; choose `unknown` only if no
other fits.
- `severity`: problem = breaks something users notice; warning = degraded
but functional; info = noteworthy but not failing.
- `title`: at most 80 chars, neutral and specific.
- `suggested_fix`: a concrete admin action ("subscribe to mod X", "increase
-Xmx to 8G", "remove the conflicting mod from Mods= line"), not generic
advice.
- `confidence`: 0.0-1.0; lower it when the evidence is ambiguous.
"""
MAX_PROMPT_CHARS = 4000
def parse_file(path: Path) -> list[dict[str, Any]]:
"""Parse a DebugLog-server file into a list of multi-line entries.
Continuation lines (lines that don't match ENTRY_RE) append to the
previous entry, mirroring codex's PatternParser behaviour.
"""
entries: list[dict[str, Any]] = []
current: dict[str, Any] | None = None
with path.open("r", encoding="utf-8", errors="replace") as f:
for lineno, raw in enumerate(f, start=1):
line = raw.rstrip("\n")
m = ENTRY_RE.match(line)
if m:
if current is not None:
entries.append(current)
current = {
"timestamp": m.group(1),
"level": m.group(2),
"body": [m.group(3)],
"line_start": lineno,
"line_end": lineno,
}
elif current is not None:
current["body"].append(line)
current["line_end"] = lineno
# else: orphan line at start of file (no preceding entry); ignore.
if current is not None:
entries.append(current)
return entries
def signature_for(level: str, body_lines: list[str]) -> str:
"""Stable signature derived from the first body line only.
Stack-trace continuations are deliberately ignored: the same logical
exception can produce slightly different traces (e.g. timing-related
code paths) but should still collapse to one signature. Quoted strings
(vehicle names, mod IDs, paths) are flattened to <S>; numeric runs of
length >= 2 are flattened to <N>; session-metadata prefix
(`General f:0,t:N,st:N,N,N>`) is stripped.
"""
first = (body_lines[0] if body_lines else "").strip()
first = SESSION_META_RE.sub("", first)
first = DOUBLE_QUOTED_RE.sub('"<S>"', first)
first = SINGLE_QUOTED_RE.sub("'<S>'", first)
first = NUMERIC_RUN_RE.sub("<N>", first)
first = WS_RUN_RE.sub(" ", first)
first = first[:200]
h = hashlib.sha256(f"{level}\n{first}".encode("utf-8")).hexdigest()
return f"sha256:{h[:16]}"
def build_excerpt(
entries: list[dict[str, Any]], hit_idx: int, context: int
) -> str:
"""Render an excerpt centered on entries[hit_idx] with ±context entries."""
start = max(0, hit_idx - context)
end = min(len(entries), hit_idx + context + 1)
lines: list[str] = []
for i in range(start, end):
e = entries[i]
is_hit = i == hit_idx
marker = ">>>" if is_hit else " "
prefix = f'{marker} [{e["timestamp"]}] {e["level"]}: '
body = e["body"]
if is_hit:
for j, body_line in enumerate(body):
lines.append(prefix + body_line if j == 0 else " " + body_line)
else:
first = (body[0] if body else "").strip()[:200]
lines.append(prefix + first)
if len(body) > 1:
lines.append(f' ... (+{len(body) - 1} more lines)')
excerpt = "\n".join(lines)
if len(excerpt) > MAX_PROMPT_CHARS:
excerpt = excerpt[:MAX_PROMPT_CHARS] + "\n... [truncated]"
return excerpt
def iter_warn_or_error(entries: list[dict[str, Any]]) -> Iterator[int]:
for i, e in enumerate(entries):
if e["level"] in ("WARN", "ERROR", "FATAL"):
yield i
def collect_signatures(
input_dir: Path, context: int
) -> tuple[dict[str, dict[str, Any]], dict[str, int]]:
"""Walk DebugLog-server files and collect dedup'd signatures."""
signatures: dict[str, dict[str, Any]] = {}
files_scanned = 0
log_lines_total = 0
error_lines_total = 0
for path in sorted(input_dir.glob("*DebugLog-server*.txt")):
files_scanned += 1
entries = parse_file(path)
log_lines_total += sum(len(e["body"]) for e in entries)
for hit_idx in iter_warn_or_error(entries):
hit = entries[hit_idx]
error_lines_total += len(hit["body"])
sig = signature_for(hit["level"], hit["body"])
occurrence = {
"file": path.name,
"line": hit["line_start"],
"timestamp": hit["timestamp"],
}
if sig not in signatures:
signatures[sig] = {
"signature": sig,
"level": hit["level"],
"first_seen": occurrence,
"occurrence_count": 1,
"files": [path.name],
"excerpt": build_excerpt(entries, hit_idx, context),
}
else:
rec = signatures[sig]
rec["occurrence_count"] += 1
if path.name not in rec["files"]:
rec["files"].append(path.name)
return signatures, {
"files_scanned": files_scanned,
"log_lines_total": log_lines_total,
"error_lines_total": error_lines_total,
}
def call_qwen(client: OpenAI, model: str, sig_rec: dict[str, Any]) -> dict[str, Any]:
user_prompt = (
f'Level: {sig_rec["level"]}\n'
f'First seen: {sig_rec["first_seen"]["file"]} '
f'line {sig_rec["first_seen"]["line"]}\n'
f'Occurrences across this run: {sig_rec["occurrence_count"]} '
f'(across {len(sig_rec["files"])} file(s))\n\n'
f'Log excerpt:\n{sig_rec["excerpt"]}'
)
return structured_call(
TOOL_SCHEMA,
[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
sampling=SAMPLING_STRUCTURED,
client=client,
model=model,
)
def atomic_write(path: Path, payload: Any) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
with tmp.open("w", encoding="utf-8") as f:
json.dump(payload, f, indent=2, ensure_ascii=False)
tmp.replace(path)
def load_existing(path: Path) -> dict[str, dict[str, Any]]:
"""Reload signatures previously written to --out.
Only signatures with an `llm` field count as completed. Bare records
(left behind when --limit truncated a prior run) get re-attempted on
resume so progressive analysis converges.
"""
if not path.exists():
return {}
try:
with path.open("r", encoding="utf-8") as f:
data = json.load(f)
return {
s["signature"]: s
for s in data.get("signatures", [])
if "signature" in s and "llm" in s
}
except Exception:
return {}
def summarise(analyzed: list[dict[str, Any]]) -> dict[str, Any]:
sev_counts = {"problem": 0, "warning": 0, "info": 0}
by_cat: dict[str, int] = {}
for s in analyzed:
llm = s.get("llm") or {}
sev = llm.get("severity")
cat = llm.get("category")
if sev in sev_counts:
sev_counts[sev] += 1
if cat:
by_cat[cat] = by_cat.get(cat, 0) + 1
return {
"problems": sev_counts["problem"],
"warnings": sev_counts["warning"],
"info": sev_counts["info"],
"by_category": by_cat,
}
def main() -> None:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--input", type=Path, default=DEFAULT_INPUT)
ap.add_argument("--out", type=Path, default=DEFAULT_OUT)
ap.add_argument("--context", type=int, default=20)
ap.add_argument("--limit", type=int, default=None,
help="Stop after N new signatures analysed.")
ap.add_argument("--resume", action="store_true",
help="Reuse existing analysis from --out if present.")
ap.add_argument("--checkpoint-every", type=int, default=25)
args = ap.parse_args()
if not args.input.is_dir():
print(f"error: {args.input} not a directory", file=sys.stderr)
sys.exit(2)
started = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
print(f"[init] scanning {args.input}")
signatures, file_stats = collect_signatures(args.input, args.context)
print(
f"[init] {file_stats['files_scanned']} file(s), "
f"{file_stats['log_lines_total']} log lines, "
f"{file_stats['error_lines_total']} error lines, "
f"{len(signatures)} unique signature(s)"
)
existing = load_existing(args.out) if args.resume else {}
if existing:
print(f"[init] {len(existing)} signature(s) already analysed; resuming")
client = get_client()
model = get_model()
print(f"[init] qwen model={model}")
n_new = 0
t0 = time.time()
analyzed: list[dict[str, Any]] = []
# Process in occurrence_count desc so --limit N picks the most-impactful
# signatures rather than whichever happened to scan first.
for sig, rec in sorted(
signatures.items(), key=lambda kv: -kv[1]["occurrence_count"]
):
if sig in existing:
analyzed.append(existing[sig])
continue
if args.limit is not None and n_new >= args.limit:
analyzed.append(rec) # keep raw record so it's not lost on resume
continue
try:
llm = call_qwen(client, model, rec)
rec["llm"] = llm
except Exception as e:
rec["llm"] = {"error": str(e)[:500]}
print(f" [{n_new + 1}] LLM error on {sig}: {e}", file=sys.stderr)
analyzed.append(rec)
n_new += 1
if n_new % args.checkpoint_every == 0:
payload = {
"meta": {
"input_dir": str(args.input),
**file_stats,
"unique_signatures": len(signatures),
"redacted": True,
"qwen_model": model,
"started": started,
"checkpoint_at": dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds"),
},
"signatures": analyzed,
"summary": summarise(analyzed),
}
atomic_write(args.out, payload)
rate = n_new / max(time.time() - t0, 1e-3)
print(f" [{n_new}] checkpoint @ {rate:.2f} sig/s")
finished = dt.datetime.now(dt.timezone.utc).isoformat(timespec="seconds")
payload = {
"meta": {
"input_dir": str(args.input),
**file_stats,
"unique_signatures": len(signatures),
"redacted": True,
"qwen_model": model,
"started": started,
"finished": finished,
},
"signatures": analyzed,
"summary": summarise(analyzed),
}
atomic_write(args.out, payload)
print(f"[done] {n_new} new, {len(analyzed)} total -> {args.out}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,777 @@
"""
pz_parser.py — Deterministic Project Zomboid log parser.
Pure module (no I/O beyond reading the path it is handed). Walks a redacted
DebugLog-server*.txt file, extracts errors/warnings, attributes each to a mod
where evidence allows, classifies by kind, and computes deterministic
signatures. Output records are designed to be `dataclasses.asdict()`-ready
for direct JSON serialisation.
Pipeline phases (per design spec at
docs/superpowers/specs/2026-05-04-pz-deterministic-classifier-design.md):
1. Severity-prefix recognition (ERROR|SEVERE|WARN)
2. Bidirectional stack collection (pre-stack walk back, post-stack walk forward)
3. Mod attribution (direct, inferred, unattributed)
4. File:line extraction (five fallbacks)
5. Cause-chain extraction (Caused by: chains + standalone exception lines)
6. Java exception kind detection
7. Engine-noise tagging
8. Signature computation (pattern_id + signature)
9. Aggregation (dedup on signature)
Style notes mirror sibling tool pz_error_analysis.py: type hints with built-in
generics, `from __future__ import annotations`, regex precompilation as
module-level constants, stdlib-only.
"""
from __future__ import annotations
import hashlib
import pathlib
import re
from dataclasses import dataclass
# ---------------------------------------------------------------------------
# Tunable constants
# ---------------------------------------------------------------------------
#: Lookback window (in raw file lines) for inferred mod attribution.
INFERRED_LOOKBACK_LINES: int = 40
#: Maximum frames retained per record after pre+post stack merge.
MAX_STACK_FRAMES: int = 8
#: Maximum lines walked in each direction during bidirectional stack collection.
STACK_WALK_LINES: int = 25
#: Maximum cause-chain depth retained.
MAX_CAUSE_CHAIN_LEVELS: int = 6
#: Truncation length for the normalised first line that feeds pattern_id.
PATTERN_ID_FIRST_LINE_MAX: int = 200
# ---------------------------------------------------------------------------
# Line-shape regexes (parsing)
# ---------------------------------------------------------------------------
#: PZ DebugLog entry header.
#: Example: ``[16-04-26 00:01:19.080] ERROR: General f:0, t:1, st:1,2,3,4> body``
ENTRY_RE = re.compile(
r"^\[(?P<ts>\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\]\s+"
r"(?P<level>[A-Z]+)\s*:\s*(?P<rest>.*)$"
)
#: Strips the "General f:N, t:N, st:N,N,N,N>" prefix from a body line.
SESSION_META_RE = re.compile(
r"^[A-Za-z][A-Za-z0-9]*\s+f:\d+,?\s*(?:t:\d+,?\s*)?st:[\d,]+>\s*"
)
# ---------------------------------------------------------------------------
# Severity-prefix recognition (phase 1)
# ---------------------------------------------------------------------------
#: Severity tokens that flag a body line as an error/warning event when they
#: appear at the start of body text. Per spec: broader than the existing
#: pz_error_analysis.py regex (adds SEVERE for Java util-logging).
SEVERITY_BODY_RE = re.compile(r"^\s*(ERROR|SEVERE|WARN)\s*[:\s]")
#: Bracketed-level tokens that map to severity events.
SEVERITY_LEVELS: tuple[str, ...] = ("ERROR", "WARN", "SEVERE", "FATAL")
# ---------------------------------------------------------------------------
# Stack-frame recognition (phase 2)
# ---------------------------------------------------------------------------
#: Markers that identify a line as stack-shaped. Used to gate pre/post stack
#: collection so we don't latch onto non-stack continuation text.
STACK_HINT_RE = re.compile(
r"(?:\bat\s+\S+|\[string\s+\"|function:\s|file:\s|\.lua\b)",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Mod attribution (phase 3)
# ---------------------------------------------------------------------------
#: Direct attribution marker: ``Lua((MOD:<name>))``.
LUA_MOD_MARKER_RE = re.compile(r"Lua\(\(MOD:([^)]+)\)\)")
#: Direct attribution: ``require("X") failed`` shape.
REQUIRE_FAILED_RE = re.compile(
r"""require\s*\(\s*["']([^"']+)["']\s*\)\s+failed""",
re.IGNORECASE,
)
#: Direct attribution: explicit ``needed by <mod>`` hint.
NEEDED_BY_RE = re.compile(r"needed\s+by\s+([A-Za-z0-9_'\- ]+?)(?:[,.]|$)", re.IGNORECASE)
#: Patterns that flag a body as "Lua-shaped" — gating filter for inferred
#: attribution. Mirrors the spec's enumeration.
LUA_SHAPED_PATTERNS: tuple[re.Pattern[str], ...] = (
re.compile(r"luamanager\.getfunctionobject", re.IGNORECASE),
re.compile(r"no\s+such\s+function", re.IGNORECASE),
re.compile(r"exception\s+thrown", re.IGNORECASE),
re.compile(r"runtimeexception", re.IGNORECASE),
re.compile(r"illegalstateexception", re.IGNORECASE),
re.compile(r"\blua\b", re.IGNORECASE),
)
# ---------------------------------------------------------------------------
# File:line extraction (phase 4) — five fallbacks tried in order
# ---------------------------------------------------------------------------
#: 1. ``at <path>.lua:<n>`` — typical Lua stack frame.
FILE_LINE_AT_RE = re.compile(r"\bat\s+([^\s:]+\.lua):(\d+)")
#: 2. ``function: ... file: <path>.lua line #<n>`` (or `: <n>`).
FILE_LINE_FUNCTION_RE = re.compile(
r"function:\s*[^,]*?file:\s*([^\s,]+\.lua)\s+line\s*(?:#|:)\s*(\d+)",
re.IGNORECASE,
)
#: 3. ``[string "<path>.lua"]:<n>`` — Lua VM source string.
FILE_LINE_STRING_RE = re.compile(r"""\[string\s+["']([^"']+\.lua)["']\]:(\d+)""")
#: 4. quoted path ending in a known extension; line # optional.
FILE_LINE_QUOTED_RE = re.compile(
r"""["']([^"']+\.(?:lua|txt|xml|json|ini|cfg|bin))["'](?::(\d+))?"""
)
#: 5. unquoted path segment beginning with a recognised root.
FILE_LINE_UNQUOTED_RE = re.compile(
r"\b((?:media|maps|lua|scripts)/[\w./\-]+\.(?:lua|txt|xml|json|ini|cfg|bin))(?::(\d+))?"
)
# ---------------------------------------------------------------------------
# Cause-chain extraction (phase 5)
# ---------------------------------------------------------------------------
#: ``Caused by: <ExceptionClass>: <msg>`` (msg optional).
CAUSED_BY_RE = re.compile(
r"Caused\s+by:\s+((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?\s*$",
re.IGNORECASE,
)
#: Standalone Java exception line: ``com.foo.BarException: msg``.
EXCEPTION_LINE_RE = re.compile(
r"((?:\w+\.)+\w+(?:Exception|Error))(?::\s*(.+?))?(?=\s+at\s|\s*$)"
)
# ---------------------------------------------------------------------------
# Engine-noise tagging (phase 7)
# ---------------------------------------------------------------------------
ENGINE_NOISE_PATTERNS: tuple[re.Pattern[str], ...] = (
re.compile(r"kahluathread\.flusherrormessage", re.IGNORECASE),
re.compile(r"dumping\s+lua\s+stack\s+trace", re.IGNORECASE),
)
# ---------------------------------------------------------------------------
# Signature normalisation (phase 8)
# ---------------------------------------------------------------------------
DOUBLE_QUOTED_RE = re.compile(r'"[^"]*"')
SINGLE_QUOTED_RE = re.compile(r"'[^']*'")
NUMERIC_RUN_RE = re.compile(r"\d{2,}")
WS_RUN_RE = re.compile(r"\s+")
#: Strips a leading ``ERROR:`` / ``SEVERE:`` / ``WARN:`` / ``FATAL:`` token
#: from a body line so a body that happens to begin with the severity word
#: hashes to the same pattern_id as the bracketed-only variant. Matches the
#: token plus any colon and trailing whitespace; case-insensitive.
SEVERITY_PREFIX_STRIP_RE = re.compile(
r"^\s*(?:ERROR|SEVERE|WARN|FATAL)\s*[:\s]\s*", re.IGNORECASE
)
# ---------------------------------------------------------------------------
# Dataclasses — match the JSON keys the spec mandates so consumers can
# `dataclasses.asdict(record)` straight to JSON.
# ---------------------------------------------------------------------------
@dataclass
class Entry:
"""One parsed log entry. Continuation lines (TAB-indented or otherwise
non-header lines) are folded into ``body``. Phase-2 stack collection
walks neighbouring entries (not raw lines), so no extra context is
stored here.
"""
timestamp: str
level: str
body: list[str]
line_start: int
line_end: int
@dataclass
class FirstSeen:
"""Provenance for the first occurrence of a deduped record."""
file: str
line: int
timestamp: str
@dataclass
class Record:
"""One classified, deduplicated error/warning record. Field names mirror
the JSON output schema in the spec verbatim — this object is intended to
be `dataclasses.asdict()`-ed straight into the output document.
"""
signature: str
pattern_id: str
level: str
kind: str
mod_id: str
mod_name: str
attribution: str
confidence: str
attribution_reason: str
file: str
line: int
cause_chain: str
stack: list[str]
first_seen: FirstSeen
occurrence_count: int
files: list[str]
excerpt: str
# ---------------------------------------------------------------------------
# Phase 0: file parse
# ---------------------------------------------------------------------------
def parse_file(path: pathlib.Path) -> list[Entry]:
"""Parse a DebugLog-server file into a list of multi-line entries.
Continuation lines (those not matching ENTRY_RE) append to the previous
entry's body, mirroring codex's PatternParser behaviour for multi-line
Java stack traces under an ERROR header.
"""
entries: list[Entry] = []
current: Entry | None = None
with path.open("r", encoding="utf-8", errors="replace") as f:
for lineno, raw in enumerate(f, start=1):
line = raw.rstrip("\n")
m = ENTRY_RE.match(line)
if m:
if current is not None:
entries.append(current)
current = Entry(
timestamp=m.group("ts"),
level=m.group("level"),
body=[m.group("rest")],
line_start=lineno,
line_end=lineno,
)
elif current is not None:
current.body.append(line)
current.line_end = lineno
# else: orphan line at start of file (no preceding entry); ignore.
if current is not None:
entries.append(current)
return entries
# ---------------------------------------------------------------------------
# Phase 1: severity-prefix recognition
# ---------------------------------------------------------------------------
def is_severity_entry(entry: Entry) -> bool:
"""True if this entry is an ERROR/WARN/SEVERE/FATAL — either by the
bracketed level or a leading SEVERE/ERROR/WARN token in the body (after
stripping the session-meta prefix)."""
if entry.level in SEVERITY_LEVELS:
return True
if entry.body and SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0])):
return True
return False
def effective_level(entry: Entry) -> str:
"""Return the effective severity for an entry. Body-prefix takes
precedence — covers the SEVERE-in-body case where bracketed level is LOG
*and* the case where bracketed level is ERROR but body says SEVERE.
"""
if entry.body:
m = SEVERITY_BODY_RE.match(_strip_session_meta(entry.body[0]))
if m:
return m.group(1).upper()
return entry.level
# ---------------------------------------------------------------------------
# Phase 2: bidirectional stack collection
# ---------------------------------------------------------------------------
def _is_stack_shaped(line: str) -> bool:
return bool(STACK_HINT_RE.search(line))
def _strip_session_meta(body_line: str) -> str:
"""Strip the ``General f:N, t:N, st:...> `` session-metadata prefix from
a body's first line so pattern matching can run against the meaningful tail.
"""
return SESSION_META_RE.sub("", body_line)
def _collect_pre_stack(entries: list[Entry], hit_idx: int) -> list[str]:
"""Walk back through prior entries; collect stack-shaped lines from each
entry's body. Stop at the previous severity-flagged entry. Cap collection
at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines examined.
Per spec, only return the block if at least one line looks stack-shaped.
"""
collected: list[str] = []
lines_examined = 0
for j in range(hit_idx - 1, -1, -1):
prior = entries[j]
# Stop at another severity line (the previous error's boundary).
if is_severity_entry(prior):
break
# Walk this entry's body in reverse; for body[0] the session-meta
# prefix is part of the line — strip it before stack-shape check.
for k in range(len(prior.body) - 1, -1, -1):
line = prior.body[k]
stripped = _strip_session_meta(line) if k == 0 else line
lines_examined += 1
if _is_stack_shaped(stripped):
collected.append(stripped.strip())
if len(collected) >= MAX_STACK_FRAMES:
break
if lines_examined >= STACK_WALK_LINES:
break
if len(collected) >= MAX_STACK_FRAMES or lines_examined >= STACK_WALK_LINES:
break
if not collected:
return []
collected.reverse() # restore source order
return collected
def _collect_post_stack(entries: list[Entry], hit_idx: int) -> list[str]:
"""Look at the entry's own body continuation lines first (stack frames
attached to the ERROR header become continuation lines after parsing),
then walk forward through subsequent entries. Stop at the next severity
entry. Cap at MAX_STACK_FRAMES and at STACK_WALK_LINES of body lines."""
entry = entries[hit_idx]
collected: list[str] = []
lines_examined = 0
# Body continuations (skip body[0] which is the headline itself).
for line in entry.body[1:]:
lines_examined += 1
if _is_stack_shaped(line):
collected.append(line.strip())
if len(collected) >= MAX_STACK_FRAMES:
return collected
if lines_examined >= STACK_WALK_LINES:
return collected
for j in range(hit_idx + 1, len(entries)):
next_entry = entries[j]
if is_severity_entry(next_entry):
break
for k, line in enumerate(next_entry.body):
stripped = _strip_session_meta(line) if k == 0 else line
lines_examined += 1
if _is_stack_shaped(stripped):
collected.append(stripped.strip())
if len(collected) >= MAX_STACK_FRAMES:
return collected
if lines_examined >= STACK_WALK_LINES:
return collected
return collected
def collect_stack(entries: list[Entry], hit_idx: int) -> list[str]:
"""Merge pre + post stack, dedup preserving order, cap at MAX_STACK_FRAMES."""
pre = _collect_pre_stack(entries, hit_idx)
post = _collect_post_stack(entries, hit_idx)
seen: set[str] = set()
merged: list[str] = []
for frame in pre + post:
if frame in seen:
continue
seen.add(frame)
merged.append(frame)
if len(merged) >= MAX_STACK_FRAMES:
break
return merged
# ---------------------------------------------------------------------------
# Phase 3: mod attribution
# ---------------------------------------------------------------------------
def _norm_mod_key(raw_name: str) -> str:
"""Lowercase, strip spaces / apostrophes / hyphens. Used as mod_id."""
s = raw_name.lower()
for ch in (" ", "'", "-"):
s = s.replace(ch, "")
return s
def _entry_text(entry: Entry) -> str:
"""Whole-entry text (body + collected stack) for marker scanning."""
return "\n".join(entry.body)
def attribute_entry(entry: Entry, prior_lookback_lines: list[str]) -> tuple[str, str, str, str, str]:
"""Determine ``(mod_id, mod_name, attribution, confidence, reason)``.
``prior_lookback_lines`` is the body lines from prior entries that fall
within INFERRED_LOOKBACK_LINES raw-file-line distance from this entry's
start, in source order. The list is scanned in reverse for the nearest
``Lua((MOD:Y))`` marker when inferred attribution is being attempted.
Direct-attribution priority: Lua marker -> needed-by -> require-failed.
Rationale: ``needed by <mod>`` names the dependent mod (more semantically
targeted) and is preferred over ``require("...") failed`` which only names
the missing module path. ``Lua((MOD:...))`` is unambiguous and wins
outright.
"""
text = _entry_text(entry)
# 1. Direct via Lua((MOD:X)) — unambiguous; outranks every other signal.
m = LUA_MOD_MARKER_RE.search(text)
if m:
raw = m.group(1).strip()
return (
_norm_mod_key(raw),
raw,
"direct",
"high",
"Lua((MOD:...)) marker on the entry itself",
)
# 2. Direct via "needed by <mod>"
m = NEEDED_BY_RE.search(text)
if m:
raw = m.group(1).strip().rstrip(".,;")
return (
_norm_mod_key(raw),
raw,
"direct",
"high",
"needed by <mod> hint",
)
# 3. Direct via require("X") failed — attribute to required module name.
m = REQUIRE_FAILED_RE.search(text)
if m:
raw = m.group(1).strip()
# Mod-name first segment (PZ paths often look like Mod/Foo/Bar).
mod_name = raw.split("/")[0] if "/" in raw else raw
return (
_norm_mod_key(mod_name),
mod_name,
"direct",
"high",
'require("...") failed shape',
)
# 4. Inferred — Lua-shaped body + recent Lua((MOD:Y)) within lookback.
if any(p.search(text) for p in LUA_SHAPED_PATTERNS):
for line in reversed(prior_lookback_lines):
mm = LUA_MOD_MARKER_RE.search(line)
if mm:
raw = mm.group(1).strip()
return (
_norm_mod_key(raw),
raw,
"inferred",
"medium",
f"Lua-shaped body; nearest Lua((MOD:{raw})) within "
f"{INFERRED_LOOKBACK_LINES}-line lookback",
)
return (
"__unattributed__",
"",
"unattributed",
"low",
"no marker; body not Lua-shaped or no recent Lua((MOD:...))",
)
# ---------------------------------------------------------------------------
# Phase 4: file:line extraction (five fallbacks, in order)
# ---------------------------------------------------------------------------
def extract_file_line(text: str) -> tuple[str, int]:
"""Run the five fallbacks in order. Returns ``(file, line)`` with line=0
when only a path was matched."""
m = FILE_LINE_AT_RE.search(text)
if m:
return m.group(1), int(m.group(2))
m = FILE_LINE_FUNCTION_RE.search(text)
if m:
return m.group(1), int(m.group(2))
m = FILE_LINE_STRING_RE.search(text)
if m:
return m.group(1), int(m.group(2))
m = FILE_LINE_QUOTED_RE.search(text)
if m:
return m.group(1), int(m.group(2)) if m.group(2) else 0
m = FILE_LINE_UNQUOTED_RE.search(text)
if m:
return m.group(1), int(m.group(2)) if m.group(2) else 0
return "", 0
# ---------------------------------------------------------------------------
# Phase 5: cause-chain extraction
# ---------------------------------------------------------------------------
def extract_cause_chain(text: str) -> str:
"""Return ``ExceptionA: msg -> ExceptionB: msg`` joined chain, deduped,
capped at MAX_CAUSE_CHAIN_LEVELS levels.
"""
tokens: list[str] = []
seen: set[str] = set()
for line in text.splitlines():
cb = CAUSED_BY_RE.search(line)
if cb:
cls = cb.group(1)
msg = cb.group(2) or ""
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
if tok not in seen:
seen.add(tok)
tokens.append(tok)
continue
ex = EXCEPTION_LINE_RE.search(line)
if ex:
cls = ex.group(1)
msg = ex.group(2) or ""
tok = f"{cls}: {msg.strip()}".rstrip(": ").strip()
if tok not in seen:
seen.add(tok)
tokens.append(tok)
if len(tokens) >= MAX_CAUSE_CHAIN_LEVELS:
break
return " -> ".join(tokens[:MAX_CAUSE_CHAIN_LEVELS])
# ---------------------------------------------------------------------------
# Phase 6: Java exception kind detection
# ---------------------------------------------------------------------------
JAVA_EXCEPTION_RE = re.compile(r"(?:\w+\.)+\w+(?:Exception|Error)\b")
def detect_kind(entry: Entry, attribution: str, body_text: str) -> str:
"""Determine the ``kind`` field. Order: engine_noise > require_failed >
java_exception > lua_runtime > runtime."""
# Phase 7 short-circuit (engine noise outranks others per spec — engine
# noise is PZ's own diagnostic chatter regardless of class).
if any(p.search(body_text) for p in ENGINE_NOISE_PATTERNS):
return "engine_noise"
if REQUIRE_FAILED_RE.search(body_text):
return "require_failed"
has_java = bool(JAVA_EXCEPTION_RE.search(body_text))
has_lua_marker = bool(LUA_MOD_MARKER_RE.search(body_text))
if has_java and not has_lua_marker:
return "java_exception"
# Lua-attributed runtime / inferred
if has_lua_marker or attribution in ("direct", "inferred"):
return "lua_runtime"
return "runtime"
# ---------------------------------------------------------------------------
# Phase 8: signature computation
# ---------------------------------------------------------------------------
def normalize_first_line(first: str) -> str:
"""Per spec: strip session metadata prefix, strip any leading severity
word (so ``SEVERE: foo`` and ``foo`` produce the same pattern_id when both
are SEVERE-level), flatten quoted strings to ``"<S>"`` / ``'<S>'``, flatten
≥2-digit numeric runs to ``<N>``, collapse whitespace, truncate to 200
chars.
"""
s = first.strip()
s = SESSION_META_RE.sub("", s)
# Strip any leading ERROR:/SEVERE:/WARN:/FATAL: that survived in the body
# — the bracketed level already feeds pattern_id separately, so leaving
# the body-prefix in place would fragment signatures across "body has
# SEVERE: prefix" vs "body has no prefix but bracketed level is SEVERE."
s = SEVERITY_PREFIX_STRIP_RE.sub("", s)
s = DOUBLE_QUOTED_RE.sub('"<S>"', s)
s = SINGLE_QUOTED_RE.sub("'<S>'", s)
s = NUMERIC_RUN_RE.sub("<N>", s)
s = WS_RUN_RE.sub(" ", s)
return s[:PATTERN_ID_FIRST_LINE_MAX]
def compute_pattern_id(level: str, first_line: str) -> str:
"""``sha256(level + normalized_first_line)[:16]``, prefixed ``sha256:``.
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
trade-off; consumers treat as opaque.
"""
norm = normalize_first_line(first_line)
h = hashlib.sha256(f"{level}\n{norm}".encode("utf-8")).hexdigest()
return f"sha256:{h[:16]}"
def compute_signature(pattern_id: str, mod_id: str) -> str:
"""``sha256(pattern_id + mod_id)[:16]``, prefixed ``sha256:``.
16 hex chars (64 bits) chosen for JSON readability vs collision-resistance
trade-off; consumers treat as opaque.
"""
h = hashlib.sha256(f"{pattern_id}\n{mod_id}".encode("utf-8")).hexdigest()
return f"sha256:{h[:16]}"
# ---------------------------------------------------------------------------
# Aggregation (phase 9) and the public classify_entries entry point
# ---------------------------------------------------------------------------
_CONFIDENCE_RANK: dict[str, int] = {"low": 0, "medium": 1, "high": 2}
_ATTRIBUTION_RANK: dict[str, int] = {
"unattributed": 0,
"inferred": 1,
"direct": 2,
}
def _build_excerpt(entry: Entry, max_chars: int = 1000) -> str:
"""Best-effort one-block excerpt of the entry (header + continuations)."""
lines: list[str] = []
header = f'[{entry.timestamp}] {entry.level}: '
if entry.body:
lines.append(header + entry.body[0])
for cont in entry.body[1:]:
lines.append(cont)
text = "\n".join(lines)
if len(text) > max_chars:
text = text[:max_chars] + "\n... [truncated]"
return text
def _build_lookback_window(entries: list[Entry], hit_idx: int) -> list[str]:
"""Collect body lines from prior entries whose ``line_start`` falls within
INFERRED_LOOKBACK_LINES raw-file-line distance from the current entry.
Spec wording is "within the previous 40 lines", measured in raw file lines
(mirrors pzmm's ``(i - last_mod_line) <= 40``, inclusive of 40). Counting
raw lines means a multi-line entry (e.g., a 5-line Java stack trace) does
not shrink the practical window the way a body-line budget would.
Returned list is in source order (oldest first) so callers can call
``reversed()`` on it.
"""
if hit_idx <= 0:
return []
threshold = entries[hit_idx].line_start - INFERRED_LOOKBACK_LINES
in_window: list[Entry] = []
for j in range(hit_idx - 1, -1, -1):
prior = entries[j]
if prior.line_start < threshold:
break
in_window.append(prior)
# We accumulated newest-first; reverse so we emit in source order.
in_window.reverse()
collected: list[str] = []
for prior in in_window:
collected.extend(prior.body)
return collected
def classify_entries(entries: list[Entry], source_file: str = "") -> list[Record]:
"""Apply phases 1-9 to a parsed-file entry list. Returns one Record per
unique (mod_id, error_shape) pair after dedup on signature.
"""
by_signature: dict[str, Record] = {}
for hit_idx, entry in enumerate(entries):
if not is_severity_entry(entry):
continue
level = effective_level(entry)
body_text = _entry_text(entry)
# Phase 2: stack collection
stack = collect_stack(entries, hit_idx)
# Phase 3: attribution (with INFERRED_LOOKBACK_LINES lookback)
prior_window = _build_lookback_window(entries, hit_idx)
mod_id, mod_name, attribution, confidence, attribution_reason = attribute_entry(
entry, prior_window
)
# Phase 4: file:line extraction (search body + stack frames)
search_text = body_text + "\n" + "\n".join(stack)
file_path, line_no = extract_file_line(search_text)
# Phase 5: cause-chain extraction
cause_chain = extract_cause_chain(search_text)
# Phase 6 & 7: kind detection (engine_noise short-circuits)
kind = detect_kind(entry, attribution, body_text)
# Phase 8: signature computation
pattern_id = compute_pattern_id(level, entry.body[0] if entry.body else "")
signature = compute_signature(pattern_id, mod_id)
# Phase 9: dedup & aggregate
if signature not in by_signature:
by_signature[signature] = Record(
signature=signature,
pattern_id=pattern_id,
level=level,
kind=kind,
mod_id=mod_id,
mod_name=mod_name,
attribution=attribution,
confidence=confidence,
attribution_reason=attribution_reason,
file=file_path,
line=line_no,
cause_chain=cause_chain,
stack=list(stack),
first_seen=FirstSeen(
file=source_file,
line=entry.line_start,
timestamp=entry.timestamp,
),
occurrence_count=1,
files=[source_file] if source_file else [],
excerpt=_build_excerpt(entry),
)
else:
rec = by_signature[signature]
rec.occurrence_count += 1
if source_file and source_file not in rec.files:
rec.files.append(source_file)
# Promote attribution / confidence if this hit is stronger.
if _ATTRIBUTION_RANK[attribution] > _ATTRIBUTION_RANK[rec.attribution]:
rec.attribution = attribution
rec.attribution_reason = attribution_reason
if mod_name:
rec.mod_name = mod_name
if _CONFIDENCE_RANK[confidence] > _CONFIDENCE_RANK[rec.confidence]:
rec.confidence = confidence
# Merge stack frames (preserving order, capped).
for frame in stack:
if frame not in rec.stack and len(rec.stack) < MAX_STACK_FRAMES:
rec.stack.append(frame)
# Extend cause chain if the new hit has additional segments.
if cause_chain and cause_chain != rec.cause_chain:
# Concatenate unseen tokens.
old = rec.cause_chain.split(" -> ") if rec.cause_chain else []
new = cause_chain.split(" -> ")
merged = list(old)
for tok in new:
if tok and tok not in merged:
merged.append(tok)
rec.cause_chain = " -> ".join(merged[:MAX_CAUSE_CHAIN_LEVELS])
return list(by_signature.values())
__all__ = [
"Entry",
"FirstSeen",
"Record",
"parse_file",
"classify_entries",
"is_severity_entry",
"effective_level",
"collect_stack",
"attribute_entry",
"extract_file_line",
"extract_cause_chain",
"detect_kind",
"normalize_first_line",
"compute_pattern_id",
"compute_signature",
"INFERRED_LOOKBACK_LINES",
"MAX_STACK_FRAMES",
"STACK_WALK_LINES",
"MAX_CAUSE_CHAIN_LEVELS",
"SEVERITY_LEVELS",
]

View File

@@ -0,0 +1,36 @@
#!/usr/bin/env bash
# One-shot PII redaction over the PZ DebugLog-server files extracted from
# /opt/ik-codex/Logs.zip. Produces /opt/ik-codex/.scratch/pz/Logs.redacted/
# (gitignored alongside the source). Single Docker invocation; the codex
# library's vendor/autoload.php is mounted read-write only because composer's
# image refuses world-readable mounts under -u UID:GID.
#
# Re-runnable: rewrites every output file. Add --refresh-cache semantics by
# rm -rf'ing the OUT directory first if you want.
set -euo pipefail
IN=/opt/ik-codex/.scratch/pz/Logs
OUT=/opt/ik-codex/.scratch/pz/Logs.redacted
if [ ! -d "$IN" ]; then
echo "error: input directory $IN missing — extract Logs.zip first" >&2
exit 1
fi
mkdir -p "$OUT"
docker run --rm \
--entrypoint php \
-v /opt/ik-codex:/app -w /app \
-v "$IN":/in:ro -v "$OUT":/out \
-u "$(id -u):$(id -g)" \
composer:latest \
-r '
require "vendor/autoload.php";
$r = new IndifferentKetchup\Codex\Util\ProjectZomboid\ProjectZomboidRedactor();
$files = glob("/in/*DebugLog-server*.txt");
foreach ($files as $f) {
file_put_contents("/out/" . basename($f), $r->redact(file_get_contents($f)));
}
fprintf(STDERR, "redacted %d file(s)\n", count($files));
'

View File

View File

@@ -0,0 +1,7 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:04:00.000] ERROR: General f:0, t:1776297840000, st:48,648,355,178> Lua((MOD:Test Mod Alpha)) wrapper failure
java.lang.RuntimeException: outer wrapper at zombie.Foo(Foo.java:10)
Caused by: java.lang.IllegalStateException: middle layer
Caused by: java.lang.NullPointerException: deepest cause
at zombie.Bar(Bar.java:99)
[16-04-26 00:04:01.000] LOG : General f:0, t:1776297841000, st:48,648,356,178> after.

View File

@@ -0,0 +1,8 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash 1
at media/lua/client/A.lua:11
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod Alpha)) crash 1
at media/lua/client/A.lua:11
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod Alpha)) crash 1
at media/lua/client/A.lua:11
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.

View File

View File

@@ -0,0 +1,4 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:03:00.000] ERROR: General f:0, t:1776297780000, st:48,648,295,178> KahluaThread.flusherrormessage> dumping lua stack trace
at media/lua/client/Foo.lua:1
[16-04-26 00:03:01.000] LOG : General f:0, t:1776297781000, st:48,648,296,178> after.

View File

@@ -0,0 +1,10 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod A)) format1
at media/lua/client/F1.lua:11
[16-04-26 00:01:01.000] ERROR: General f:0, t:1776297661000, st:48,648,176,178> Lua((MOD:Test Mod B)) format2
function: doStuff -- file: media/lua/client/F2.lua line # 22
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> Lua((MOD:Test Mod C)) format3
[string "media/lua/client/F3.lua"]:33: bang
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod D)) format4 about "media/lua/client/F4.lua" failure
[16-04-26 00:01:04.000] ERROR: General f:0, t:1776297664000, st:48,648,179,178> Lua((MOD:Test Mod E)) format5 path media/lua/client/F5.lua mention
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> ok.

View File

@@ -0,0 +1,7 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> another log line.
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> LuaManager.GetFunctionObject> no such function: doStuff
at media/lua/client/Spongie.lua:7
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.

View File

@@ -0,0 +1,8 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:19.080] ERROR: General f:0, t:1776297679080, st:48,648,194,258> DebugFileWatcher.registerDir> Exception thrown
java.nio.file.NoSuchFileException: /placeholder/config/mods at UnixException.translateToIOException(null:-1).
Stack trace:
at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
at java.base/sun.nio.fs.UnixException.asIOException(Unknown Source)
at java.base/sun.nio.fs.LinuxWatchService$Poller.implRegister(Unknown Source)
[16-04-26 00:01:19.090] LOG : General f:0, t:1776297679090, st:48,648,194,268> after.

View File

@@ -0,0 +1,45 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Distant)) initialised.
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> filler 1.
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> filler 2.
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> filler 3.
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> filler 4.
[16-04-26 00:01:05.000] LOG : General f:0, t:1776297665000, st:48,648,180,178> filler 5.
[16-04-26 00:01:06.000] LOG : General f:0, t:1776297666000, st:48,648,181,178> filler 6.
[16-04-26 00:01:07.000] LOG : General f:0, t:1776297667000, st:48,648,182,178> filler 7.
[16-04-26 00:01:08.000] LOG : General f:0, t:1776297668000, st:48,648,183,178> filler 8.
[16-04-26 00:01:09.000] LOG : General f:0, t:1776297669000, st:48,648,184,178> filler 9.
[16-04-26 00:01:10.000] LOG : General f:0, t:1776297670000, st:48,648,185,178> filler 10.
[16-04-26 00:01:11.000] LOG : General f:0, t:1776297671000, st:48,648,186,178> filler 11.
[16-04-26 00:01:12.000] LOG : General f:0, t:1776297672000, st:48,648,187,178> filler 12.
[16-04-26 00:01:13.000] LOG : General f:0, t:1776297673000, st:48,648,188,178> filler 13.
[16-04-26 00:01:14.000] LOG : General f:0, t:1776297674000, st:48,648,189,178> filler 14.
[16-04-26 00:01:15.000] LOG : General f:0, t:1776297675000, st:48,648,190,178> filler 15.
[16-04-26 00:01:16.000] LOG : General f:0, t:1776297676000, st:48,648,191,178> filler 16.
[16-04-26 00:01:17.000] LOG : General f:0, t:1776297677000, st:48,648,192,178> filler 17.
[16-04-26 00:01:18.000] LOG : General f:0, t:1776297678000, st:48,648,193,178> filler 18.
[16-04-26 00:01:19.000] LOG : General f:0, t:1776297679000, st:48,648,194,178> filler 19.
[16-04-26 00:01:20.000] LOG : General f:0, t:1776297680000, st:48,648,195,178> filler 20.
[16-04-26 00:01:21.000] LOG : General f:0, t:1776297681000, st:48,648,196,178> filler 21.
[16-04-26 00:01:22.000] LOG : General f:0, t:1776297682000, st:48,648,197,178> filler 22.
[16-04-26 00:01:23.000] LOG : General f:0, t:1776297683000, st:48,648,198,178> filler 23.
[16-04-26 00:01:24.000] LOG : General f:0, t:1776297684000, st:48,648,199,178> filler 24.
[16-04-26 00:01:25.000] LOG : General f:0, t:1776297685000, st:48,648,200,178> filler 25.
[16-04-26 00:01:26.000] LOG : General f:0, t:1776297686000, st:48,648,201,178> filler 26.
[16-04-26 00:01:27.000] LOG : General f:0, t:1776297687000, st:48,648,202,178> filler 27.
[16-04-26 00:01:28.000] LOG : General f:0, t:1776297688000, st:48,648,203,178> filler 28.
[16-04-26 00:01:29.000] LOG : General f:0, t:1776297689000, st:48,648,204,178> filler 29.
[16-04-26 00:01:30.000] LOG : General f:0, t:1776297690000, st:48,648,205,178> filler 30.
[16-04-26 00:01:31.000] LOG : General f:0, t:1776297691000, st:48,648,206,178> filler 31.
[16-04-26 00:01:32.000] LOG : General f:0, t:1776297692000, st:48,648,207,178> filler 32.
[16-04-26 00:01:33.000] LOG : General f:0, t:1776297693000, st:48,648,208,178> filler 33.
[16-04-26 00:01:34.000] LOG : General f:0, t:1776297694000, st:48,648,209,178> filler 34.
[16-04-26 00:01:35.000] LOG : General f:0, t:1776297695000, st:48,648,210,178> filler 35.
[16-04-26 00:01:36.000] LOG : General f:0, t:1776297696000, st:48,648,211,178> filler 36.
[16-04-26 00:01:37.000] LOG : General f:0, t:1776297697000, st:48,648,212,178> filler 37.
[16-04-26 00:01:38.000] LOG : General f:0, t:1776297698000, st:48,648,213,178> filler 38.
[16-04-26 00:01:39.000] LOG : General f:0, t:1776297699000, st:48,648,214,178> filler 39.
[16-04-26 00:01:40.000] LOG : General f:0, t:1776297700000, st:48,648,215,178> filler 40.
[16-04-26 00:01:41.000] LOG : General f:0, t:1776297701000, st:48,648,216,178> filler 41.
[16-04-26 00:01:42.000] ERROR: General f:0, t:1776297702000, st:48,648,217,178> LuaManager.GetFunctionObject> no such function (way past lookback)
[16-04-26 00:01:43.000] LOG : General f:0, t:1776297703000, st:48,648,218,178> ok.

View File

@@ -0,0 +1,6 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:19.131] LOG : Mod f:0, t:1776297679131, st:48,648,194,309> loading example_mod_alpha.
[16-04-26 00:05:00.000] ERROR: General f:0, t:1776297900000, st:48,648,415,178> Lua((MOD:Test Mod Alpha)) something broke
at media/lua/client/Foo.lua:42
function: doStuff -- file: media/lua/client/Foo.lua line # 42
[16-04-26 00:05:01.000] LOG : General f:0, t:1776297901000, st:48,648,416,178> after the error.

View File

@@ -0,0 +1,3 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> ordinary line.
[16-04-26 00:02:00.000] LOG : General f:0, t:1776297720000, st:48,648,235,178> nothing wrong.

View File

@@ -0,0 +1,5 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Spongies Clothing)) initialised.
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ordinary log line.
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Disk full while writing chunk data
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.

View File

@@ -0,0 +1,6 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> Lua((MOD:Test Mod Alpha)) crash now
at media/lua/client/X.lua:11
at media/lua/client/Y.lua:22
[string "media/lua/client/Z.lua"]:33: oops
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.

View File

@@ -0,0 +1,6 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] LOG : General f:0, t:1776297660000, st:48,648,175,178> at media/lua/client/A.lua:11
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> at media/lua/client/B.lua:22
[16-04-26 00:01:02.000] LOG : General f:0, t:1776297662000, st:48,648,177,178> [string "media/lua/client/C.lua"]:33: oops
[16-04-26 00:01:03.000] ERROR: General f:0, t:1776297663000, st:48,648,178,178> Lua((MOD:Test Mod Alpha)) crash
[16-04-26 00:01:04.000] LOG : General f:0, t:1776297664000, st:48,648,179,178> ok.

View File

@@ -0,0 +1,3 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> require("DependencyMod/Foo") failed: needed by Test Mod Alpha
[16-04-26 00:01:01.000] LOG : General f:0, t:1776297661000, st:48,648,176,178> ok.

View File

@@ -0,0 +1,5 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:01:00.000] ERROR: General f:0, t:1776297660000, st:48,648,175,178> ERROR: top-level error message
[16-04-26 00:01:01.000] WARN : General f:0, t:1776297661000, st:48,648,176,178> WARN: top-level warn message
[16-04-26 00:01:02.000] ERROR: General f:0, t:1776297662000, st:48,648,177,178> SEVERE: java-style severe message at zombie.Foo(Foo.java:5)
[16-04-26 00:01:03.000] LOG : General f:0, t:1776297663000, st:48,648,178,178> ok.

View File

@@ -0,0 +1,3 @@
[16-04-26 00:00:42.314] LOG : General f:0, t:1776297642254, st:48,648,157,434> server starting.
[16-04-26 00:02:00.000] WARN : General f:0, t:1776297720000, st:48,648,235,178> ZomboidFileSystem.loadModAndRequired> required mod "absent_mod" not found.
[16-04-26 00:02:01.000] LOG : General f:0, t:1776297721000, st:48,648,236,178> after.

View File

@@ -0,0 +1,225 @@
"""Tests for pz_parser phase 3 — mod attribution."""
from __future__ import annotations
import pathlib
import sys
import unittest
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
import pz_parser # noqa: E402
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
def fixture(name: str) -> pathlib.Path:
return FIXTURE_DIR / name
class AttributionBucketTests(unittest.TestCase):
"""Three confidence buckets: direct (high), inferred (medium),
unattributed (low)."""
def test_direct_attribution_when_lua_marker_on_entry(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
records = pz_parser.classify_entries(entries, source_file="la.txt")
self.assertEqual(len(records), 1)
rec = records[0]
self.assertEqual(rec.attribution, "direct")
self.assertEqual(rec.confidence, "high")
# mod_id is normalised: lowercase, no spaces / apostrophes / hyphens.
self.assertEqual(rec.mod_id, "testmodalpha")
self.assertEqual(rec.mod_name, "Test Mod Alpha")
def test_inferred_attribution_within_lookback_window(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_inferred.txt"))
records = pz_parser.classify_entries(entries, source_file="in.txt")
self.assertEqual(len(records), 1)
rec = records[0]
self.assertEqual(rec.attribution, "inferred")
self.assertEqual(rec.confidence, "medium")
self.assertEqual(rec.mod_id, "spongiesclothing")
def test_unattributed_when_no_marker_and_not_lua_shaped(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
records = pz_parser.classify_entries(entries, source_file="ua.txt")
self.assertEqual(len(records), 1)
rec = records[0]
self.assertEqual(rec.attribution, "unattributed")
self.assertEqual(rec.confidence, "low")
self.assertEqual(rec.mod_id, "__unattributed__")
class LookbackBoundaryTests(unittest.TestCase):
"""Phase 3 — 40-line inferred-attribution window boundary."""
def test_lua_marker_beyond_lookback_does_not_attribute(self) -> None:
# Fixture places the Lua((MOD:...)) >40 lines before the ERROR.
entries = pz_parser.parse_file(fixture("fixture_lookback_boundary.txt"))
records = pz_parser.classify_entries(entries, source_file="lb.txt")
self.assertEqual(len(records), 1)
rec = records[0]
# The Lua-shaped ERROR is far enough back to be unattributed.
self.assertEqual(rec.attribution, "unattributed")
self.assertEqual(rec.mod_id, "__unattributed__")
def test_non_lua_shaped_body_rejects_inferred_attribution(self) -> None:
# Recent Lua((MOD:Spongies Clothing)) emitted, but the ERROR body
# ("Disk full while writing chunk data") isn't Lua-shaped.
entries = pz_parser.parse_file(fixture("fixture_non_lua_no_inferred.txt"))
records = pz_parser.classify_entries(entries, source_file="nl.txt")
self.assertEqual(len(records), 1)
rec = records[0]
self.assertEqual(rec.attribution, "unattributed")
class NeededByTests(unittest.TestCase):
"""Phase 3 — direct attribution via "needed by <mod>" hint."""
def test_needed_by_extracts_dependent_mod(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
records = pz_parser.classify_entries(entries, source_file="rf.txt")
self.assertEqual(len(records), 1)
rec = records[0]
# "needed by Test Mod Alpha" should set the mod to Test Mod Alpha
# (preferred over the require("...") side which would mention
# DependencyMod). Either way we want direct/high.
self.assertEqual(rec.attribution, "direct")
self.assertEqual(rec.confidence, "high")
# The "needed by" branch is checked before the require() branch in
# the priority order; mod_id should reflect Test Mod Alpha.
self.assertEqual(rec.mod_id, "testmodalpha")
def _make_marker_line(idx: int) -> str:
"""Synthesise a single LOG-level entry containing a Lua((MOD:...)) marker."""
# Vary timestamps so the bracketed prefix is unique-ish; not strictly
# required — they only feed Entry.timestamp, not parsing.
return (
f"[16-04-26 00:00:{idx:02d}.000] LOG : General f:0, "
f"t:1776297642{idx:03d}, st:48,648,157,434> "
"Lua((MOD:Test Mod Alpha)) initialised."
)
def _make_filler_line(idx: int) -> str:
"""A plain LOG-level entry with no marker; one raw line."""
return (
f"[16-04-26 00:01:{idx % 60:02d}.000] LOG : General f:0, "
f"t:177629760{idx:04d}, st:48,648,200,178> filler entry {idx}."
)
def _make_error_line() -> str:
"""A Lua-shaped ERROR with no Lua((MOD:...)) marker on the entry itself
— so attribution must come from the lookback window if it comes at all."""
return (
"[16-04-26 00:02:00.000] ERROR: General f:0, "
"t:1776297900000, st:48,648,300,178> "
"LuaManager.GetFunctionObject> no such function: doStuff"
)
class RawLineLookbackTests(unittest.TestCase):
"""Phase 3 — lookback semantics measure raw file lines, not body-line
budgets. Multi-line entries inside the window must not shrink the
practical reach."""
def _write_fixture(self, name: str, lines: list[str]) -> pathlib.Path:
path = FIXTURE_DIR / name
path.write_text("\n".join(lines) + "\n")
return path
def test_marker_exactly_at_lookback_boundary_attributes(self) -> None:
# Marker on line 1, ERROR on line 41 -> raw-line distance = 40
# (inclusive of INFERRED_LOOKBACK_LINES=40 -> still attributed).
lines = [_make_marker_line(0)]
for i in range(1, 40):
lines.append(_make_filler_line(i))
lines.append(_make_error_line()) # line 41 in the fixture
path = self._write_fixture("_rawline_at_boundary.txt", lines)
try:
entries = pz_parser.parse_file(path)
self.assertEqual(entries[0].line_start, 1)
self.assertEqual(entries[-1].line_start, 41)
records = pz_parser.classify_entries(entries, source_file="b1.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].attribution, "inferred")
self.assertEqual(records[0].mod_id, "testmodalpha")
finally:
path.unlink()
def test_marker_one_line_past_boundary_does_not_attribute(self) -> None:
# Marker on line 1, ERROR on line 42 -> raw-line distance = 41
# (just outside INFERRED_LOOKBACK_LINES -> unattributed).
lines = [_make_marker_line(0)]
for i in range(1, 41):
lines.append(_make_filler_line(i))
lines.append(_make_error_line()) # line 42 in the fixture
path = self._write_fixture("_rawline_past_boundary.txt", lines)
try:
entries = pz_parser.parse_file(path)
self.assertEqual(entries[0].line_start, 1)
self.assertEqual(entries[-1].line_start, 42)
records = pz_parser.classify_entries(entries, source_file="b2.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].attribution, "unattributed")
self.assertEqual(records[0].mod_id, "__unattributed__")
finally:
path.unlink()
def test_multiline_entry_does_not_shrink_practical_lookback(self) -> None:
"""Multi-line entries inside the lookback window do not break
attribution. (Old body-line-budget and new raw-line-distance semantics
happen to be equivalent on contiguous PZ entries; this test locks the
post-fix semantic against future regression to a budget that *would*
differ — e.g. a body-line cap with a smaller value.)
"""
# Layout the file so a multi-line entry sits between marker and ERROR.
# The marker on line 1 is within 40 raw lines of the ERROR even though
# the file has a 6-line multi-line entry in between.
lines = [_make_marker_line(0)] # raw line 1: marker entry
# Single-line fillers on raw lines 2..30 (29 entries).
for i in range(1, 30):
lines.append(_make_filler_line(i))
# Multi-line entry: header on raw line 31, 5 continuations on lines
# 32..36 (Java-stack-trace shape).
lines.append(
"[16-04-26 00:01:30.000] LOG : General f:0, "
"t:1776297930000, st:48,648,200,178> stack trace dump"
)
for k in range(5):
lines.append(f"\tat zombie.SomeClass.method{k}(SomeClass.java:{k + 1})")
# Single-line fillers on raw lines 37..40 (4 entries).
for i in range(30, 34):
lines.append(_make_filler_line(i))
# ERROR at raw line 41 -> N - 1 = 40 -> within window.
lines.append(_make_error_line())
path = self._write_fixture("_rawline_multiline.txt", lines)
try:
entries = pz_parser.parse_file(path)
# Sanity-check the layout: first entry at line 1, multi-line entry
# sits at line 31 with 6 body lines (header + 5 continuations),
# ERROR at line 41.
self.assertEqual(entries[0].line_start, 1)
multi = next(
e for e in entries
if e.line_start == 31 and len(e.body) == 6
)
self.assertEqual(multi.line_end, 36)
self.assertEqual(entries[-1].line_start, 41)
records = pz_parser.classify_entries(entries, source_file="ml.txt")
self.assertEqual(len(records), 1)
# Raw-line-distance semantics: the marker on line 1 is 40 raw
# lines from the ERROR on line 41, so attribution holds. (Old
# body-line-budget would also pass here on contiguous entries;
# this assertion locks the post-fix behavior against future
# regression to a tighter cap.)
self.assertEqual(records[0].attribution, "inferred")
self.assertEqual(records[0].mod_id, "testmodalpha")
finally:
path.unlink()
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,199 @@
"""Tests for pz_parser parsing pipeline (phases 1, 2, 4-7, 9)."""
from __future__ import annotations
import pathlib
import sys
import unittest
# Make the parser module importable when running via `python -m unittest
# discover -s tools/pz-analyzer/tests`.
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
import pz_parser # noqa: E402
FIXTURE_DIR = pathlib.Path(__file__).resolve().parent / "fixtures"
def fixture(name: str) -> pathlib.Path:
return FIXTURE_DIR / name
class ParseFileTests(unittest.TestCase):
"""Phase 0 — basic line-shape recognition and continuation folding."""
def test_parse_file_groups_continuations_under_entry(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
# 3 bracketed entries; the ERROR has 4 continuation lines.
self.assertEqual(len(entries), 3)
error_entry = entries[1]
self.assertEqual(error_entry.level, "ERROR")
self.assertGreater(len(error_entry.body), 1)
# First continuation should be the java exception line.
self.assertIn("NoSuchFileException", error_entry.body[1])
def test_parse_file_handles_empty_file(self) -> None:
self.assertEqual(pz_parser.parse_file(fixture("fixture_empty.txt")), [])
def test_parse_file_handles_no_errors(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
self.assertEqual(len(entries), 3)
self.assertTrue(all(e.level == "LOG" for e in entries))
class SeverityRecognitionTests(unittest.TestCase):
"""Phase 1 — ERROR / WARN / SEVERE recognition."""
def test_classify_picks_up_error_warn_and_severe(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_severity_variants.txt"))
records = pz_parser.classify_entries(entries, source_file="severity.txt")
levels = sorted({r.level for r in records})
# Spec accepts ERROR / WARN / SEVERE. The third entry has bracketed
# ERROR but body starts with SEVERE: ; effective_level should be SEVERE.
self.assertIn("ERROR", levels)
self.assertIn("WARN", levels)
self.assertIn("SEVERE", levels)
def test_log_lines_are_ignored(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_no_errors.txt"))
records = pz_parser.classify_entries(entries, source_file="x.txt")
self.assertEqual(records, [])
class StackCollectionTests(unittest.TestCase):
"""Phase 2 — bidirectional stack collection."""
def test_pre_stack_walk_picks_up_preceding_lua_frames(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_pre_stack.txt"))
# The ERROR entry is the 5th LOG-bracketed line; its predecessors are
# LOG-bracketed entries whose bodies are stack-shaped lines.
records = pz_parser.classify_entries(entries, source_file="pre.txt")
self.assertEqual(len(records), 1)
rec = records[0]
# Pre-stack walk should pick up at least the "at media/lua/.../A.lua:11" frame.
self.assertTrue(any("A.lua:11" in f for f in rec.stack))
def test_post_stack_collected_from_entry_body_continuations(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_post_stack.txt"))
records = pz_parser.classify_entries(entries, source_file="post.txt")
self.assertEqual(len(records), 1)
rec = records[0]
self.assertTrue(any("X.lua:11" in f for f in rec.stack))
self.assertTrue(any("Y.lua:22" in f for f in rec.stack))
# Lua [string "..."]:N form preserves quoting in the captured frame.
self.assertTrue(any("Z.lua" in f and ":33" in f for f in rec.stack))
def test_stack_capped_at_eight_frames(self) -> None:
# Synthesise an ERROR with many continuation frames.
lines = ["[16-04-26 00:00:42.314] ERROR: General f:0, t:1, st:1,2,3,4> Lua((MOD:Test Mod Alpha)) crash"]
for i in range(20):
lines.append(f"\tat media/lua/client/F{i}.lua:{i + 1}")
path = FIXTURE_DIR / "_runtime_stack_cap.txt"
path.write_text("\n".join(lines) + "\n")
try:
entries = pz_parser.parse_file(path)
records = pz_parser.classify_entries(entries, source_file="cap.txt")
self.assertEqual(len(records), 1)
self.assertLessEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
# And it should be exactly MAX_STACK_FRAMES given >MAX inputs.
self.assertEqual(len(records[0].stack), pz_parser.MAX_STACK_FRAMES)
finally:
path.unlink()
class FileLineExtractionTests(unittest.TestCase):
"""Phase 4 — five-fallback file:line extraction."""
def test_each_fallback_form_extracts_path(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_file_line_fallbacks.txt"))
records = pz_parser.classify_entries(entries, source_file="ff.txt")
# 5 distinct ERRORs, distinct mods — should produce 5 records.
files = sorted(r.file for r in records)
self.assertEqual(
files,
sorted([
"media/lua/client/F1.lua",
"media/lua/client/F2.lua",
"media/lua/client/F3.lua",
"media/lua/client/F4.lua",
"media/lua/client/F5.lua",
]),
)
def test_quoted_path_without_line_number_yields_zero(self) -> None:
# Format 4 fixture line lacks a :NN suffix on the quoted path.
file_path, line_no = pz_parser.extract_file_line(
'failure about "media/lua/client/F4.lua" tail'
)
self.assertEqual(file_path, "media/lua/client/F4.lua")
self.assertEqual(line_no, 0)
class CauseChainTests(unittest.TestCase):
"""Phase 5 — Caused-by chain unwinding."""
def test_caused_by_chain_renders_with_arrow_separator(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_cause_chain.txt"))
records = pz_parser.classify_entries(entries, source_file="cc.txt")
self.assertEqual(len(records), 1)
chain = records[0].cause_chain
self.assertIn("RuntimeException", chain)
self.assertIn("IllegalStateException", chain)
self.assertIn("NullPointerException", chain)
# Order preserved (outer -> inner).
idx_runtime = chain.index("RuntimeException")
idx_illegal = chain.index("IllegalStateException")
idx_null = chain.index("NullPointerException")
self.assertLess(idx_runtime, idx_illegal)
self.assertLess(idx_illegal, idx_null)
def test_no_cause_chain_when_no_exceptions(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_unattributed.txt"))
records = pz_parser.classify_entries(entries, source_file="u.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].cause_chain, "")
class KindDetectionTests(unittest.TestCase):
"""Phases 6 & 7 — kind classification."""
def test_java_exception_kind_when_no_lua_marker(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_java_exception.txt"))
records = pz_parser.classify_entries(entries, source_file="je.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].kind, "java_exception")
# Java engine errors should resolve to __unattributed__.
self.assertEqual(records[0].mod_id, "__unattributed__")
def test_engine_noise_kind_for_kahluathread(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_engine_noise.txt"))
records = pz_parser.classify_entries(entries, source_file="en.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].kind, "engine_noise")
def test_lua_runtime_kind_for_attributed_lua_error(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_lua_attributed.txt"))
records = pz_parser.classify_entries(entries, source_file="la.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].kind, "lua_runtime")
def test_require_failed_kind(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_require_failed.txt"))
records = pz_parser.classify_entries(entries, source_file="rf.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].kind, "require_failed")
class AggregationTests(unittest.TestCase):
"""Phase 9 — dedup, occurrence_count, files-set growth."""
def test_three_identical_errors_dedup_to_one_record(self) -> None:
entries = pz_parser.parse_file(fixture("fixture_dedup.txt"))
records = pz_parser.classify_entries(entries, source_file="dd.txt")
self.assertEqual(len(records), 1)
self.assertEqual(records[0].occurrence_count, 3)
# files list shouldn't duplicate "dd.txt".
self.assertEqual(records[0].files, ["dd.txt"])
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,91 @@
"""Tests for pz_parser phase 8 — signature computation."""
from __future__ import annotations
import pathlib
import sys
import unittest
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[1]))
import pz_parser # noqa: E402
class PatternIdStabilityTests(unittest.TestCase):
"""pattern_id should be invariant under formatting variations."""
def test_pattern_id_collapses_numeric_runs(self) -> None:
a = pz_parser.compute_pattern_id(
"ERROR",
"General f:0, t:1776297642, st:48,648,157,434> failed at offset 12345",
)
b = pz_parser.compute_pattern_id(
"ERROR",
"General f:0, t:9999999999, st:99,99,99,99> failed at offset 99999",
)
self.assertEqual(a, b)
def test_pattern_id_collapses_quoted_strings_and_whitespace(self) -> None:
a = pz_parser.compute_pattern_id(
"ERROR",
'no such function "doStuff" in module',
)
b = pz_parser.compute_pattern_id(
"ERROR",
'no such function "fooBarBaz" in module',
)
# Whitespace-collapse plus quoted-string-flatten => same pattern_id.
self.assertEqual(a, b)
def test_pattern_id_changes_with_level(self) -> None:
a = pz_parser.compute_pattern_id("ERROR", "exception thrown")
b = pz_parser.compute_pattern_id("WARN", "exception thrown")
self.assertNotEqual(a, b)
class SignatureUniquenessTests(unittest.TestCase):
"""signature should fan out across mods sharing a pattern_id."""
def test_signature_unique_per_mod_for_shared_pattern(self) -> None:
# Same first line, different mod_ids — different signatures, same pattern_id.
pat = pz_parser.compute_pattern_id("ERROR", "Lua((MOD:X)) crash")
sig_a = pz_parser.compute_signature(pat, "spongiesclothing")
sig_b = pz_parser.compute_signature(pat, "testmodalpha")
self.assertNotEqual(sig_a, sig_b)
# Both should share their pattern_id (consumer's pattern-fanout view).
self.assertEqual(pat[:7], "sha256:")
class SeverityPrefixStripTests(unittest.TestCase):
"""A body line that begins with a literal severity word (``SEVERE:``,
``ERROR:``, ``WARN:``, ``FATAL:``) should not fragment pattern_id away
from the otherwise-identical body that lacks the prefix. The bracketed
level already feeds pattern_id; the prefix is redundant and varies in
practice."""
def test_pattern_id_invariant_under_body_prefix_severe(self) -> None:
# Same logical error: one line carries ``SEVERE: `` body prefix, the
# other doesn't. Both classified as SEVERE by their bracketed level.
with_prefix = pz_parser.compute_pattern_id(
"SEVERE",
"SEVERE: foo at zombie.X(File.java:42)",
)
without_prefix = pz_parser.compute_pattern_id(
"SEVERE",
"foo at zombie.X(File.java:42)",
)
self.assertEqual(with_prefix, without_prefix)
def test_pattern_id_invariant_under_body_prefix_error(self) -> None:
with_prefix = pz_parser.compute_pattern_id(
"ERROR",
"ERROR: doStuff failed in module",
)
without_prefix = pz_parser.compute_pattern_id(
"ERROR",
"doStuff failed in module",
)
self.assertEqual(with_prefix, without_prefix)
if __name__ == "__main__":
unittest.main()