Files
sortof/docs/plans/2026-05-01-collection-expansion.md

1680 lines
62 KiB
Markdown

# Collection Expansion + Live Drain Progress Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Accept Steam Workshop collection URLs in `/api/sort`, expand them server-side via `GetCollectionDetails`, and drive a polling endpoint (`/api/jobs/{job_id}`) that gives the frontend live `cached / queued / draining` counters during cold loads.
**Architecture:** A new `sort_jobs` table tracks asynchronous expansion + drain lifecycles. `/api/sort` becomes polymorphic: all-cached bare wsids return synchronously (unchanged); anything that needs work returns a `job_id`. The frontend polls `GET /api/jobs/{job_id}` every 2.5s and renders phase-specific status strip text. Phase is **derived live** from `download_jobs` counts on every poll - no event log, no leader, restart-resilient by construction.
**Tech Stack:** Postgres (new table + indexes via additive migration), FastAPI (two new routes + background `asyncio.create_task` for expansion), asyncpg parameterized queries (mirroring existing patterns), httpx for `GetCollectionDetails`, vanilla React + Babel-standalone on the frontend (no build step - same as Spec A).
> **Spec dependency:** Read `/opt/sortof/docs/specs/2026-05-01-collection-expansion.md` (270 lines, all decisions locked in §10) before starting. The acceptance criteria in §11 and the test recipes in §12 are what Task 11 verifies.
---
## File structure
| Path | Action | Responsibility |
|---|---|---|
| `/opt/sortof/init/02_sort_jobs.sql` | **Create** | Schema for fresh deploys (idempotent `CREATE TABLE IF NOT EXISTS`). Identical DDL also applied to the live DB via one-shot `psql` in Task 1. |
| `/opt/sortof/api/parse.py` | Modify | Add `parse_with_collections(text) -> (wsids, collection_ids)`. Reuses the existing wsid extractor; classifies URL-form IDs as candidate collections. |
| `/opt/sortof/api/steam.py` | Modify | Add `async fetch_collection_details(client, ids)` mirroring the existing `fetch_workshop_details` pattern. |
| `/opt/sortof/api/jobs.py` | **Create** | `sort_jobs` row CRUD, phase derivation (the §4 rule executed inside `GET`), live counts SQL, lifespan-startup stale-expansion sweep. |
| `/opt/sortof/api/app.py` | Modify | Polymorphic `/api/sort`; new `GET /api/jobs/{job_id}`; new `DELETE /api/jobs/{job_id}`; lifespan sweep wired in. |
| `/opt/sortof/frontend/sortof-app.jsx` | Modify | Detect `job_id` in `/api/sort` response; `pollJob()` async loop @ 2.5s; phase-specific status-strip text; cancel button; 404 expired-job toast. |
| `/opt/sortof/frontend/index.html` | Modify | CSS for new phase indicators (e.g. `.status-pill.expanding`, `.cancel-btn`). |
No changes to `/opt/sortof/worker/` - drain stays exactly as-is. Collections expand at API time; the resulting wsids flow into `download_jobs` via the existing queueing path.
**Verification fixtures** (referenced throughout):
- All-cached bare wsids: `2169435993;2392709985;2487022075` → MODS_LINE="modoptions;tsarslib;TMC_TrueActions" (canonical sync regression).
- Synthetic collection (no Steam round-trip): direct `INSERT INTO collections` with known children. Used for cache-hit verification.
- Real Steam collection: Task 11 step 2 instructs the implementer to find a public PZ collection URL on `https://steamcommunity.com/workshop/browse/?appid=108600&section=collections` and use its ID. Required for cold-expansion path.
---
## Task 1: Schema migration - `sort_jobs` table
**Files:**
- Create: `/opt/sortof/init/02_sort_jobs.sql`
- One-shot apply to live DB via `docker exec sortof_db psql`
- [ ] **Step 1: Write the schema file**
Create `/opt/sortof/init/02_sort_jobs.sql` with:
```sql
-- Async sort jobs: lifecycle + result for collection expansion + cold drains.
-- Created 2026-05-01 (Spec B+F).
CREATE TABLE IF NOT EXISTS sort_jobs (
job_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
phase TEXT NOT NULL CHECK (phase IN ('expanding','queued','draining','done','failed')),
phase_started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
input_raw TEXT NOT NULL,
collection_ids TEXT[] NOT NULL DEFAULT '{}',
wsids TEXT[],
rules_raw TEXT,
result_json JSONB,
failure_reason TEXT
);
CREATE INDEX IF NOT EXISTS sort_jobs_phase_idx ON sort_jobs (phase);
CREATE INDEX IF NOT EXISTS sort_jobs_updated_idx ON sort_jobs (updated_at);
DROP TRIGGER IF EXISTS sort_jobs_touch ON sort_jobs;
CREATE TRIGGER sort_jobs_touch
BEFORE UPDATE ON sort_jobs
FOR EACH ROW
EXECUTE FUNCTION touch_updated_at();
```
The `touch_updated_at()` function already exists (defined in `init/01_schema.sql` for `download_jobs`).
Note: `init/` is owned by root. Use `sudo tee` to write the file:
```bash
sudo tee /opt/sortof/init/02_sort_jobs.sql > /dev/null <<'SQL'
-- Async sort jobs: lifecycle + result for collection expansion + cold drains.
-- Created 2026-05-01 (Spec B+F).
CREATE TABLE IF NOT EXISTS sort_jobs (
job_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
phase TEXT NOT NULL CHECK (phase IN ('expanding','queued','draining','done','failed')),
phase_started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
input_raw TEXT NOT NULL,
collection_ids TEXT[] NOT NULL DEFAULT '{}',
wsids TEXT[],
rules_raw TEXT,
result_json JSONB,
failure_reason TEXT
);
CREATE INDEX IF NOT EXISTS sort_jobs_phase_idx ON sort_jobs (phase);
CREATE INDEX IF NOT EXISTS sort_jobs_updated_idx ON sort_jobs (updated_at);
DROP TRIGGER IF EXISTS sort_jobs_touch ON sort_jobs;
CREATE TRIGGER sort_jobs_touch
BEFORE UPDATE ON sort_jobs
FOR EACH ROW
EXECUTE FUNCTION touch_updated_at();
SQL
```
- [ ] **Step 2: Apply DDL to the live DB**
```bash
sudo docker exec -i sortof_db psql -U sortof -d sortof < /opt/sortof/init/02_sort_jobs.sql
```
Expected: a few `CREATE TABLE / CREATE INDEX / CREATE TRIGGER` notices (or none if already applied).
- [ ] **Step 3: Verify the table exists with the right columns**
```bash
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "\d sort_jobs"
```
Expected: 11 columns matching the schema, 3 indexes (PK + phase_idx + updated_idx), trigger present. The `\d` output should mention `gen_random_uuid()` as the default for `job_id` and the CHECK constraint on `phase`.
- [ ] **Step 4: Smoke insert / select**
```bash
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "
INSERT INTO sort_jobs (phase, input_raw) VALUES ('expanding', 'smoke') RETURNING job_id, phase;
DELETE FROM sort_jobs WHERE input_raw='smoke';"
```
Expected: one row returned with a UUID and phase='expanding', followed by `DELETE 1`.
- [ ] **Step 5: Checkpoint** - schema is live; foundation for Tasks 4+ ready. No backup needed (DDL is idempotent and the table was empty).
---
## Task 2: Parser extension - `parse_with_collections()`
**Files:**
- Modify: `/opt/sortof/api/parse.py`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/api/parse.py /opt/sortof/api/parse.py.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Add the new function and a Steam-URL regex**
Open `/opt/sortof/api/parse.py`. The current file defines only `parse_workshop_input(text)`. Add at the bottom (do **not** modify the existing function - it's still used by `/api/sort` for backwards compat through the polymorphic path):
```python
import re as _re_module # already imported at top; alias avoided if duplicate
# Steam Workshop URL form: https://steamcommunity.com/{sharedfiles,workshop}/filedetails/?id=NNNNNNN
_STEAM_URL_RE = re.compile(
r"https?://steamcommunity\.com/(?:sharedfiles|workshop)/filedetails/\?id=(\d{7,12})",
re.IGNORECASE,
)
def parse_with_collections(text: str) -> tuple[List[str], List[str]]:
"""Split an input blob into bare wsids and candidate collection IDs.
A "candidate collection" is any 7-12-digit ID that appears inside a
Steam Workshop URL. Bare numeric IDs in the same blob are treated as
mod wsids (current behavior). Steam doesn't syntactically distinguish
collection IDs from mod IDs; the candidate list is sent to
GetCollectionDetails to confirm. If a candidate isn't actually a
collection, the caller falls it back to wsids.
Returns (wsids, collection_ids), each deduped and in first-seen order.
"""
if not text:
return ([], [])
# 1. Find URL-form IDs FIRST (so they don't get double-counted as bare).
url_ids: List[str] = []
seen_url: set[str] = set()
for m in _STEAM_URL_RE.finditer(text):
i = m.group(1)
if i not in seen_url:
seen_url.add(i)
url_ids.append(i)
# 2. Strip the URLs out before extracting bare numbers.
text_minus_urls = _STEAM_URL_RE.sub("", text)
# 3. Bare wsids: same regex as parse_workshop_input.
cleaned = re.sub(
r"^\s*(WorkshopItems|Mods|Map)\s*=\s*",
"",
text_minus_urls,
flags=re.MULTILINE | re.IGNORECASE,
)
bare_ids = re.findall(r"\b\d{7,12}\b", cleaned)
seen_bare: set[str] = set()
bare_unique: List[str] = []
for i in bare_ids:
if i not in seen_bare and i not in seen_url:
seen_bare.add(i)
bare_unique.append(i)
return (bare_unique, url_ids)
```
(The `import re as _re_module` line is a paste-safe stub - `re` is already imported at the top of the file. Drop the alias line if a static check complains about a duplicate import.)
- [ ] **Step 3: py_compile**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/parse.py && echo PY_OK
```
- [ ] **Step 4: Functional smoke test in the venv REPL**
```bash
cd /opt/sortof/api && .venv/bin/python -c "
from parse import parse_with_collections, parse_workshop_input
# Pure bare wsids - backwards compat.
assert parse_with_collections('2169435993;2392709985') == (['2169435993','2392709985'], [])
# Pure URL.
assert parse_with_collections('https://steamcommunity.com/sharedfiles/filedetails/?id=2200148440') == ([], ['2200148440'])
# Mixed: URL + bare.
assert parse_with_collections('https://steamcommunity.com/sharedfiles/filedetails/?id=2200148440\n2169435993') == (['2169435993'], ['2200148440'])
# Same ID appearing both as URL AND as bare → URL wins, bare side dedups.
assert parse_with_collections('2200148440\nhttps://steamcommunity.com/sharedfiles/filedetails/?id=2200148440') == ([], ['2200148440'])
# Empty.
assert parse_with_collections('') == ([], [])
assert parse_with_collections(None) == ([], [])
# Existing parse_workshop_input still works.
assert parse_workshop_input('2169435993;2392709985') == ['2169435993','2392709985']
print('ALL_OK')
"
```
Expected: `ALL_OK`. Any AssertionError stops the task - fix the regex/dedupe logic before proceeding.
- [ ] **Step 5: Checkpoint** - parser ready for the API to consume.
---
## Task 3: Steam helper - `fetch_collection_details()`
**Files:**
- Modify: `/opt/sortof/api/steam.py`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/api/steam.py /opt/sortof/api/steam.py.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Add the helper, mirroring `fetch_workshop_details`**
Open `/opt/sortof/api/steam.py`. The current file has one async helper. Add a sibling at the bottom:
```python
COLLECTION_URL = (
"https://api.steampowered.com/ISteamRemoteStorage/GetCollectionDetails/v1/"
)
async def fetch_collection_details(
client: httpx.AsyncClient,
collection_ids: List[str],
) -> Dict[str, Dict]:
"""Resolve candidate collection IDs to their child wsids.
Returns a dict keyed by collection_id with shape:
{ "result": int, "children": List[str] }
Anonymous endpoint; no API key needed. result==1 means valid collection;
result!=1 means the ID isn't a collection (could be a mod, deleted, or
private). Caller decides what to do with non-1 results - see Spec B+F
§10 Q3 "Partial expansion failure" and Q4 "Flakiness".
"""
if not collection_ids:
return {}
data: Dict[str, str] = {"collectioncount": str(len(collection_ids))}
for i, cid in enumerate(collection_ids):
data[f"publishedfileids[{i}]"] = cid
r = await client.post(COLLECTION_URL, data=data)
r.raise_for_status()
body = r.json()
out: Dict[str, Dict] = {}
for item in body.get("response", {}).get("collectiondetails", []) or []:
cid = item.get("publishedfileid")
if not cid:
continue
out[cid] = {
"result": int(item.get("result", 0)),
"children": [
c.get("publishedfileid", "")
for c in (item.get("children") or [])
if c.get("publishedfileid")
],
}
return out
```
- [ ] **Step 3: py_compile + smoke import**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/steam.py && echo PY_OK
cd /opt/sortof/api && .venv/bin/python -c "import app; from steam import fetch_collection_details; print(fetch_collection_details.__doc__.split(chr(10))[0])"
```
Expected: `PY_OK` and the first line of the helper's docstring.
- [ ] **Step 4: Functional smoke test against real Steam (one collection ID)**
The implementer should pick a known PZ collection - search `https://steamcommunity.com/workshop/browse/?appid=108600&section=collections` for any active collection, copy its ID from the URL bar, and use it here. Substitute below:
```bash
COLL_ID="<paste-real-PZ-collection-id-here>"
curl -sS -X POST 'https://api.steampowered.com/ISteamRemoteStorage/GetCollectionDetails/v1/' \
--data-urlencode 'collectioncount=1' \
--data-urlencode "publishedfileids[0]=$COLL_ID" \
| jq '.response.collectiondetails[0] | {result, n_children: (.children | length)}'
```
Expected: `result: 1`, `n_children > 0`.
Then call our helper through the venv:
```bash
cd /opt/sortof/api && .venv/bin/python -c "
import asyncio, httpx
from steam import fetch_collection_details
async def main():
async with httpx.AsyncClient(timeout=30.0) as c:
out = await fetch_collection_details(c, ['$COLL_ID'])
print(out)
asyncio.run(main())
"
```
Expected: a dict like `{'<COLL_ID>': {'result': 1, 'children': ['<wsid1>', '<wsid2>', ...]}}`.
- [ ] **Step 5: Checkpoint** - Steam helper verified end-to-end.
---
## Task 4: `jobs.py` - CRUD, phase derivation, live counts, lifespan sweep
**Files:**
- Create: `/opt/sortof/api/jobs.py`
- [ ] **Step 1: Write the module**
Create `/opt/sortof/api/jobs.py` with:
```python
"""sort_jobs persistence + phase derivation.
Phase is *derived* on every GET (Spec B+F §4): never stored as the source
of truth except for terminal states. The function `derive_phase` reads
live counts from download_jobs and decides expanding/queued/draining/done.
This makes the system restart-resilient by construction - there is no
event log to replay.
"""
from __future__ import annotations
import json
from typing import Any, Dict, List, Optional, Tuple
from uuid import UUID
import asyncpg
# ── CRUD ────────────────────────────────────────────────────────────────────
async def create_job(
conn: asyncpg.Connection,
*,
input_raw: str,
collection_ids: List[str],
wsids: Optional[List[str]],
rules_raw: Optional[str],
initial_phase: str,
) -> str:
"""Insert a sort_jobs row and return the job_id (UUID as string).
initial_phase: 'expanding' if collections still need resolving,
'queued' if wsids are already resolved at submit time.
"""
row = await conn.fetchrow(
"""
INSERT INTO sort_jobs (phase, input_raw, collection_ids, wsids, rules_raw)
VALUES ($1, $2, $3, $4, $5)
RETURNING job_id
""",
initial_phase, input_raw, collection_ids, wsids, rules_raw,
)
return str(row["job_id"])
async def get_job_row(conn: asyncpg.Connection, job_id: str) -> Optional[Dict[str, Any]]:
"""Fetch a sort_jobs row by id. Returns None if not found.
job_id may be either a string UUID or asyncpg-native UUID.
"""
try:
uid = UUID(job_id) if isinstance(job_id, str) else job_id
except ValueError:
return None
row = await conn.fetchrow(
"SELECT * FROM sort_jobs WHERE job_id = $1",
uid,
)
return dict(row) if row else None
async def update_phase(
conn: asyncpg.Connection,
job_id: str,
phase: str,
*,
wsids: Optional[List[str]] = None,
result_json: Optional[Dict[str, Any]] = None,
failure_reason: Optional[str] = None,
) -> None:
"""Advance a job's phase. wsids/result_json/failure_reason are optional
column updates that pair with phase transitions."""
sets = ["phase = $2", "phase_started_at = now()"]
params: List[Any] = [UUID(job_id), phase]
idx = 3
if wsids is not None:
sets.append(f"wsids = ${idx}::text[]")
params.append(wsids)
idx += 1
if result_json is not None:
sets.append(f"result_json = ${idx}::jsonb")
params.append(json.dumps(result_json))
idx += 1
if failure_reason is not None:
sets.append(f"failure_reason = ${idx}")
params.append(failure_reason)
idx += 1
await conn.execute(
f"UPDATE sort_jobs SET {', '.join(sets)} WHERE job_id = $1",
*params,
)
# ── live counts (Spec B+F §6) ───────────────────────────────────────────────
async def compute_counts(conn: asyncpg.Connection, wsids: List[str]) -> Dict[str, int]:
"""Compute live cached/queued/draining counts for a set of wsids.
Empty wsids → all zeros."""
if not wsids:
return {"cached": 0, "queued": 0, "draining": 0}
rows = await conn.fetch(
"""
SELECT
(SELECT COUNT(DISTINCT mp.workshop_id)
FROM mod_parsed mp
JOIN workshop_meta wm ON wm.workshop_id = mp.workshop_id
WHERE mp.workshop_id = ANY($1::text[])
AND mp.parsed_at_time_updated = wm.time_updated) AS cached,
(SELECT COUNT(DISTINCT workshop_id)
FROM download_jobs
WHERE workshop_id = ANY($1::text[]) AND status = 'queued') AS queued,
(SELECT COUNT(DISTINCT workshop_id)
FROM download_jobs
WHERE workshop_id = ANY($1::text[]) AND status = 'downloading') AS draining
""",
wsids,
)
r = rows[0]
return {"cached": int(r["cached"]), "queued": int(r["queued"]), "draining": int(r["draining"])}
# ── phase derivation (Spec B+F §4) ──────────────────────────────────────────
def derive_phase(
stored_phase: str,
wsids: Optional[List[str]],
counts: Dict[str, int],
) -> str:
"""Decide the live phase from the row's stored phase + current counts.
Terminal phases (done/failed) are never demoted. Non-terminal phases
are recomputed from current state.
"""
if stored_phase in ("done", "failed"):
return stored_phase
if wsids is None:
return "expanding"
if counts["draining"] > 0:
return "draining"
if counts["queued"] > 0:
return "queued"
if counts["cached"] >= len(wsids):
return "done"
# Transient gap: a row just left 'queued' and hasn't shown up in
# mod_parsed yet. Most likely just-failed and not yet re-queued.
return "queued"
# ── stale-expansion sweep (Spec B+F §9) ─────────────────────────────────────
STALE_EXPANSION_SQL = """
UPDATE sort_jobs
SET phase = 'failed',
failure_reason = 'expansion timed out',
updated_at = now()
WHERE phase = 'expanding'
AND phase_started_at < now() - interval '10 minutes'
RETURNING job_id;
"""
async def sweep_stale_expansions(conn: asyncpg.Connection) -> int:
"""Run on uvicorn lifespan startup. Returns the number of jobs reaped."""
rows = await conn.fetch(STALE_EXPANSION_SQL)
return len(rows)
```
- [ ] **Step 2: py_compile + smoke import**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/jobs.py && echo PY_OK
cd /opt/sortof/api && .venv/bin/python -c "import jobs; print(sorted(n for n in dir(jobs) if not n.startswith('_'))[:8])"
```
Expected: `PY_OK` followed by a list including `compute_counts`, `create_job`, `derive_phase`, `get_job_row`, `sweep_stale_expansions`, `update_phase`.
- [ ] **Step 3: Phase derivation unit smoke**
Phase derivation is pure (no DB), so it's testable without a connection:
```bash
cd /opt/sortof/api && .venv/bin/python -c "
from jobs import derive_phase
# Terminal preserved.
assert derive_phase('done', ['a'], {'cached':1,'queued':0,'draining':0}) == 'done'
assert derive_phase('failed', ['a'], {'cached':0,'queued':0,'draining':0}) == 'failed'
# wsids null → expanding.
assert derive_phase('expanding', None, {'cached':0,'queued':0,'draining':0}) == 'expanding'
# Active drain.
assert derive_phase('queued', ['a','b'], {'cached':0,'queued':1,'draining':1}) == 'draining'
# Just queued.
assert derive_phase('queued', ['a','b'], {'cached':0,'queued':2,'draining':0}) == 'queued'
# All cached.
assert derive_phase('queued', ['a','b'], {'cached':2,'queued':0,'draining':0}) == 'done'
# Transient gap (between queued exit and parsed entry).
assert derive_phase('queued', ['a','b'], {'cached':1,'queued':0,'draining':0}) == 'queued'
print('PHASE_OK')
"
```
Expected: `PHASE_OK`.
- [ ] **Step 4: Live-counts smoke (DB round-trip)**
```bash
cd /opt/sortof/api && .venv/bin/python -c "
import asyncio, db
from jobs import compute_counts
async def main():
pool = await db.create_pool()
async with pool.acquire() as conn:
c1 = await compute_counts(conn, [])
assert c1 == {'cached':0,'queued':0,'draining':0}
# canonical 3-mod test set is fully cached.
c2 = await compute_counts(conn, ['2169435993','2392709985','2487022075'])
assert c2['cached'] == 3
await pool.close()
print('COUNTS_OK')
asyncio.run(main())
"
```
Expected: `COUNTS_OK`.
- [ ] **Step 5: Checkpoint** - module reusable from `app.py`.
---
## Task 5: Background expansion task
**Files:**
- Create: `/opt/sortof/api/expansion.py`
- [ ] **Step 1: Write the expansion runner**
Create `/opt/sortof/api/expansion.py` with:
```python
"""Background async task: take a freshly-created sort_jobs row in 'expanding'
phase, resolve its collection_ids via Steam, populate wsids[], advance phase
to 'queued' (and drop wsids into download_jobs as needed)."""
from __future__ import annotations
import asyncio
import logging
from typing import Any, Dict, List
import asyncpg
import httpx
from jobs import update_phase
from steam import fetch_collection_details
log = logging.getLogger("sortof.expansion")
COLLECTION_TTL_SECONDS = 6 * 3600 # Spec B+F §5.3
async def _resolve_collections(
conn: asyncpg.Connection,
http: httpx.AsyncClient,
collection_ids: List[str],
) -> tuple[Dict[str, List[str]], List[str]]:
"""Returns (resolved, unresolvable). resolved maps collection_id ->
[child_wsids]. unresolvable lists collection_ids that GetCollectionDetails
couldn't fetch (after one retry)."""
if not collection_ids:
return ({}, [])
# Cache lookup (TTL = 6h via last_fetched_at).
cache_rows = await conn.fetch(
"""
SELECT collection_id, child_workshop_ids
FROM collections
WHERE collection_id = ANY($1::text[])
AND last_fetched_at > now() - interval '6 hours'
""",
collection_ids,
)
resolved: Dict[str, List[str]] = {
r["collection_id"]: list(r["child_workshop_ids"])
for r in cache_rows
}
miss = [cid for cid in collection_ids if cid not in resolved]
unresolvable: List[str] = []
if miss:
for attempt in (1, 2):
try:
api_out = await fetch_collection_details(http, miss)
except httpx.HTTPError as e:
log.warning("GetCollectionDetails attempt %d failed: %s", attempt, e)
if attempt == 1:
await asyncio.sleep(2.0)
continue
unresolvable = list(miss)
api_out = {}
for cid in miss:
rec = api_out.get(cid)
if rec is None or rec.get("result") != 1:
unresolvable.append(cid)
continue
children = rec.get("children") or []
resolved[cid] = list(children)
await conn.execute(
"""
INSERT INTO collections (collection_id, child_workshop_ids, last_fetched_at)
VALUES ($1, $2, now())
ON CONFLICT (collection_id) DO UPDATE
SET child_workshop_ids = EXCLUDED.child_workshop_ids,
last_fetched_at = now()
""",
cid, children,
)
break # success - stop retrying
# Dedupe (in case retry-on-flake added the same cid twice).
seen: set[str] = set()
out_unres: List[str] = []
for u in unresolvable:
if u not in seen:
seen.add(u)
out_unres.append(u)
return (resolved, out_unres)
async def run_expansion(
pool: asyncpg.Pool,
http: httpx.AsyncClient,
job_id: str,
bare_wsids: List[str],
collection_ids: List[str],
) -> None:
"""Top-level expansion task. Logs and persists; never raises out."""
try:
async with pool.acquire() as conn:
resolved, unresolvable = await _resolve_collections(conn, http, collection_ids)
# Compose wsids: collections (in input order) + bare wsids, deduped.
seen: set[str] = set()
wsids: List[str] = []
for cid in collection_ids:
for w in resolved.get(cid, []):
if w and w not in seen:
seen.add(w)
wsids.append(w)
for w in bare_wsids:
if w not in seen:
seen.add(w)
wsids.append(w)
if not wsids:
# All collections unresolvable AND no bare wsids. Job dies.
await update_phase(
conn, job_id, "failed",
failure_reason="all input collections unresolvable",
)
log.info("expansion %s: failed - all collections unresolvable", job_id)
return
partial_warnings = [
{
"tag": "collection-partial",
"level": "warning",
"msg": f"collection {cid} could not be fetched",
}
for cid in unresolvable
]
seed_result: Dict[str, Any] = {"WARNINGS": partial_warnings} if partial_warnings else None
await update_phase(
conn, job_id, "queued",
wsids=wsids,
result_json=seed_result,
)
log.info(
"expansion %s: queued (wsids=%d unresolvable=%d)",
job_id, len(wsids), len(unresolvable),
)
except Exception:
log.exception("expansion %s: crashed", job_id)
try:
async with pool.acquire() as conn:
await update_phase(conn, job_id, "failed", failure_reason="expansion crashed")
except Exception:
log.exception("expansion %s: cleanup failed", job_id)
```
- [ ] **Step 2: py_compile + smoke import**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/expansion.py && echo PY_OK
cd /opt/sortof/api && .venv/bin/python -c "from expansion import run_expansion; print('IMPORT_OK')"
```
Expected: `PY_OK` and `IMPORT_OK`.
- [ ] **Step 3: End-to-end smoke against the live DB + Steam**
```bash
cd /opt/sortof/api && .venv/bin/python -c "
import asyncio, httpx, db, jobs, expansion
COLL_ID = '<paste-real-PZ-collection-id>' # same one you used in Task 3 step 4
async def main():
pool = await db.create_pool()
async with pool.acquire() as conn:
# Pre-clear any cached row to test the cold path.
await conn.execute('DELETE FROM collections WHERE collection_id=\$1', COLL_ID)
jid = await jobs.create_job(
conn, input_raw=f'https://steamcommunity.com/sharedfiles/filedetails/?id={COLL_ID}',
collection_ids=[COLL_ID], wsids=None, rules_raw=None,
initial_phase='expanding',
)
async with httpx.AsyncClient(timeout=30.0) as http:
await expansion.run_expansion(pool, http, jid, [], [COLL_ID])
async with pool.acquire() as conn:
row = await jobs.get_job_row(conn, jid)
assert row['phase'] == 'queued', row
assert row['wsids'] is not None and len(row['wsids']) > 0
# Cleanup.
await conn.execute('DELETE FROM sort_jobs WHERE job_id=\$1', row['job_id'])
await pool.close()
print('EXPANSION_OK')
asyncio.run(main())
"
```
Expected: `EXPANSION_OK`. Substitute `COLL_ID` with the real ID from Task 3.
- [ ] **Step 4: Checkpoint** - expansion runner ready to be triggered from `/api/sort`.
---
## Task 6: Polymorphic `/api/sort` + lifespan sweep wiring
**Files:**
- Modify: `/opt/sortof/api/app.py`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/api/app.py /opt/sortof/api/app.py.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Add new imports**
Find the existing import block (around lines 1-30). Add:
```python
import asyncio
import jobs
import expansion
from parse import parse_with_collections # existing parse_workshop_input import stays
```
- [ ] **Step 3: Wire stale-expansion sweep into lifespan startup**
Find the existing `@asynccontextmanager async def lifespan(app: FastAPI):` block (around line 38). Inside the body, after the pool/http are created and before `yield`, add:
```python
async with pool.acquire() as conn:
n_reaped = await jobs.sweep_stale_expansions(conn)
if n_reaped:
log.info("lifespan startup: reaped %d stale expansion job(s)", n_reaped)
```
- [ ] **Step 4: Make `/api/sort` polymorphic**
Find the `async def sort_endpoint(...)` function. The current body parses input via `parse_workshop_input`, hits Steam, queues misses, runs `mlos_sort`, returns sync. Replace the parsing line:
```python
input_ids = parse_workshop_input(req.input or "")
```
with:
```python
bare_wsids, collection_ids = parse_with_collections(req.input or "")
input_ids = bare_wsids # used by existing code paths below
```
Then, immediately after the validation `raise HTTPException(...)` checks (so we still 400 on empty input and 413 on >MAX_IDS), but **before** the Steam metadata fetch, insert a fork:
```python
# ── B+F: route to async job if collections present OR uncached wsids
# require drain time ───────────────────────────────────────────────────
if collection_ids or len(input_ids) > 0:
# Fast-path probe: are ALL bare wsids already cache-fresh? If so AND
# there are no collections, fall through to the existing sync path.
# (Spec B+F §10 Q1: "Bare wsid + all-cached → synchronous".)
if not collection_ids and input_ids:
try:
steam_details = await steam.fetch_workshop_details(
request.app.state.http, input_ids,
)
except httpx.HTTPError as e:
log.warning("steam api error: %s", e)
elapsed_ms = int((time.monotonic() - t0) * 1000)
log.info(
"sort done hits=0 misses=%d status=error ms=%d",
len(input_ids), elapsed_ms,
)
return _empty_payload(input_ids, "error")
# Cache check: are all input_ids in mod_parsed and fresh?
pool = request.app.state.db
async with pool.acquire() as conn:
fresh = 0
for wid in input_ids:
d = steam_details.get(wid)
if not d or d.get("result") != 1:
break # bail out - there's a non-cacheable id, route to job
tu = int(d.get("time_updated", 0))
row = await conn.fetchrow(
"SELECT 1 FROM mod_parsed "
"WHERE workshop_id = $1 AND parsed_at_time_updated = $2 LIMIT 1",
wid, tu,
)
if row is not None:
fresh += 1
else:
break
if fresh == len(input_ids):
# All cache-fresh - sync path. Re-use the existing flow
# by NOT routing to a job. Fall through.
pass
else:
# Async path.
return await _route_to_job(
request, conn, req.input or "", req.rules,
bare_wsids, collection_ids,
)
elif collection_ids:
pool = request.app.state.db
async with pool.acquire() as conn:
return await _route_to_job(
request, conn, req.input or "", req.rules,
bare_wsids, collection_ids,
)
```
This is the routing fork. The fast-path probe lets all-cached bare-wsid input fall through to the existing sync code unchanged. Anything else (uncached wsids OR any collection) returns a job_id.
- [ ] **Step 5: Add `_route_to_job` helper near the route definition**
Just above the `@app.post("/api/sort")` decorator, insert:
```python
async def _route_to_job(
request: Request,
conn,
input_raw: str,
rules_raw: Optional[str],
bare_wsids: List[str],
collection_ids: List[str],
) -> Dict[str, Any]:
"""Create a sort_jobs row and (if needed) kick off background expansion.
Returns {status, job_id} for the client to start polling."""
if collection_ids:
# Will resolve in the background.
job_id = await jobs.create_job(
conn,
input_raw=input_raw,
collection_ids=collection_ids,
wsids=None,
rules_raw=rules_raw,
initial_phase="expanding",
)
asyncio.create_task(expansion.run_expansion(
request.app.state.db,
request.app.state.http,
job_id,
bare_wsids,
collection_ids,
))
return {"status": "expanding", "job_id": job_id}
else:
# Bare wsids that include uncached. Kick off cold drain by queueing.
# We dedupe wsids before storing them on the job (the existing
# /api/sort flow does this for bare input lists).
seen: set = set()
wsids: List[str] = []
for w in bare_wsids:
if w not in seen:
seen.add(w)
wsids.append(w)
job_id = await jobs.create_job(
conn,
input_raw=input_raw,
collection_ids=[],
wsids=wsids,
rules_raw=rules_raw,
initial_phase="queued",
)
# Queue any wsids not already in download_jobs (mirrors the existing
# flow at the bottom of sort_endpoint, but we don't need Steam validation
# here since the GET poll will surface unknowns/non-mods naturally
# via the counts contract).
for wid in wsids:
existing = await conn.fetchval(
"SELECT 1 FROM download_jobs "
"WHERE workshop_id = $1 AND status IN ('queued','downloading') LIMIT 1",
wid,
)
if existing is None:
await conn.execute(
"INSERT INTO download_jobs (workshop_id, status) VALUES ($1, 'queued')",
wid,
)
return {"status": "queued", "job_id": job_id}
```
- [ ] **Step 6: py_compile + smoke import**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/app.py && echo PY_OK
cd /opt/sortof/api && .venv/bin/python -c "import app" && echo IMPORT_OK
```
- [ ] **Step 7: Restart API**
```bash
sudo systemctl restart sortof-api && sleep 2 && sudo systemctl is-active sortof-api
```
Expected: `active`.
- [ ] **Step 8: Verify the sync fast path is unchanged**
```bash
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d '{"input":"2169435993;2392709985;2487022075"}' \
| jq '{status, MODS_LINE, has_job_id: (has("job_id"))}'
```
Expected: `{"status":"success","MODS_LINE":"modoptions;tsarslib;TMC_TrueActions","has_job_id":false}`.
- [ ] **Step 9: Verify the bare-uncached path returns a job_id**
```bash
# First, find a wsid that ISN'T cached. The HellDrinx wsid is non_mod, not great.
# Use a real PZ mod that isn't in mod_parsed yet - implementer needs to find one
# fresh from Steam. Or simpler: temporarily delete a cached row to force a miss:
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "
DELETE FROM mod_parsed WHERE workshop_id='2196102849';
DELETE FROM workshop_meta WHERE workshop_id='2196102849';"
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d '{"input":"2196102849"}' | jq '{status, has_job_id: (has("job_id")), job_id_preview: (.job_id // "" | .[0:8])}'
```
Expected: `{"status":"queued","has_job_id":true,"job_id_preview":"<8-hex-chars>"}`.
The drain will reprocess `2196102849` (Raven Creek) and re-cache it; that's fine.
- [ ] **Step 10: Verify the collection path returns expanding + job_id**
```bash
COLL_ID="<real-PZ-collection-id>"
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\"}" \
| jq '{status, has_job_id: (has("job_id"))}'
```
Expected: `{"status":"expanding","has_job_id":true}`.
- [ ] **Step 11: Checkpoint** - `/api/sort` polymorphism live; jobs being created. Polling endpoint is Task 7.
---
## Task 7: `GET` + `DELETE /api/jobs/{job_id}`
**Files:**
- Modify: `/opt/sortof/api/app.py` (add two endpoints near the existing routes)
- [ ] **Step 1: Backup (if more than ~15 minutes since the Task 6 backup)**
```bash
cp /opt/sortof/api/app.py /opt/sortof/api/app.py.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Add the GET endpoint**
Find the end of `sort_endpoint` (the function body ends with `return payload`). Below it, insert:
```python
@app.get("/api/jobs/{job_id}")
async def get_job_endpoint(job_id: str, request: Request) -> Dict[str, Any]:
pool = request.app.state.db
async with pool.acquire() as conn:
row = await jobs.get_job_row(conn, job_id)
if row is None:
raise HTTPException(status_code=404, detail="job not found or expired")
wsids = list(row["wsids"]) if row["wsids"] else None
counts = await jobs.compute_counts(conn, wsids or [])
phase = jobs.derive_phase(row["phase"], wsids, counts)
# If we just transitioned a non-terminal job to 'done', persist the
# final result for future polls (and for the §3 24h TTL artifact).
result_json = row["result_json"]
if phase == "done" and row["phase"] != "done":
result_json = await _build_result_for_job(conn, wsids, row["rules_raw"])
await jobs.update_phase(
conn, job_id, "done", result_json=result_json,
)
elif phase != "done" and wsids:
# Compute a fresh partial result on every poll - cheap, avoids
# staleness. Don't persist; only `done` writes result_json.
result_json = await _build_result_for_job(conn, wsids, row["rules_raw"])
return {
"job_id": str(row["job_id"]),
"phase": phase,
"counts": counts,
"wsids": wsids,
"result": result_json,
"failure_reason": row["failure_reason"],
}
```
- [ ] **Step 3: Add the `_build_result_for_job` helper**
Just above the `_empty_payload` helper (around line 100), insert:
```python
async def _build_result_for_job(
conn,
wsids: List[str],
rules_raw: Optional[str],
) -> Dict[str, Any]:
"""Compute the SORTOF_DATA payload from currently-cached mod_parsed rows
for the given wsids. Used both for partial results during draining and
for the final result on phase transition to 'done'."""
if not wsids:
return _empty_payload([], "success")
rows = await conn.fetch(
"""
SELECT mp.workshop_id, mp.mod_id, mp.name, mp.category,
mp.requirements, mp.load_after, mp.load_before,
mp.incompatible_mods, mp.load_first, mp.load_last,
mp.tags, mp.maps
FROM mod_parsed mp
JOIN workshop_meta wm ON wm.workshop_id = mp.workshop_id
WHERE mp.workshop_id = ANY($1::text[])
AND mp.parsed_at_time_updated = wm.time_updated
ORDER BY mp.workshop_id, mp.mod_id
""",
wsids,
)
mods = [_row_to_modinfo(r) for r in rows]
rules: Dict[str, Any] = {}
if rules_raw:
try:
rules = parse_sorting_rules(rules_raw)
except Exception:
log.warning("job result: failed to parse sorting_rules")
sort_result = sort_mods(mods, rules)
cached_ids = list({r["workshop_id"] for r in rows})
payload = adapters.build_response(
input_ids=wsids, # contract: WORKSHOP_ITEMS_LINE = wsids[] at job creation
hit_ids=cached_ids,
mods=mods,
sort_result=sort_result,
status="success" if len(cached_ids) >= len(wsids) else "partial",
)
# Forced override: WORKSHOP_ITEMS_LINE locked to the original wsids[]
# regardless of which are currently cached (Spec A §8 / Spec B+F §6).
payload["WORKSHOP_ITEMS_LINE"] = ";".join(wsids) + ";" if wsids else ""
payload["pending"] = [w for w in wsids if w not in set(cached_ids)]
payload["unknown"] = [] # this endpoint doesn't compute Steam-result-9
payload["non_mod"] = [] # nor non-mod classification - those are sync-path concerns
return payload
```
- [ ] **Step 4: Add the DELETE endpoint**
Below the GET endpoint, insert:
```python
@app.delete("/api/jobs/{job_id}", status_code=204)
async def delete_job_endpoint(job_id: str, request: Request):
"""Cancel a job. Idempotent: cancelling a terminal job is a no-op 204.
Does NOT touch download_jobs (Spec B+F §8)."""
pool = request.app.state.db
async with pool.acquire() as conn:
row = await jobs.get_job_row(conn, job_id)
if row is None:
raise HTTPException(status_code=404, detail="job not found")
if row["phase"] not in ("done", "failed"):
await jobs.update_phase(conn, job_id, "failed", failure_reason="cancelled")
return None
```
- [ ] **Step 5: py_compile + restart**
```bash
/opt/sortof/api/.venv/bin/python -m py_compile /opt/sortof/api/app.py && echo PY_OK
cd /opt/sortof/api && .venv/bin/python -c "import app" && echo IMPORT_OK
sudo systemctl restart sortof-api && sleep 2 && sudo systemctl is-active sortof-api
```
- [ ] **Step 6: Verify GET on a fresh collection job**
```bash
COLL_ID="<real-PZ-collection-id>"
JOB_RESP=$(curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\"}")
JID=$(echo "$JOB_RESP" | jq -r '.job_id')
echo "job_id=$JID"
# First poll, likely expanding
curl -sS http://100.114.205.53:8801/api/jobs/$JID | jq '{phase, counts, has_wsids: (.wsids != null)}'
# Wait for expansion + initial drain
sleep 3
curl -sS http://100.114.205.53:8801/api/jobs/$JID | jq '{phase, counts, n_wsids: (.wsids|length)}'
# 404 on garbage id
curl -sS -o /dev/null -w 'http=%{http_code}\n' http://100.114.205.53:8801/api/jobs/00000000-0000-0000-0000-000000000000
```
Expected: first poll has `phase: "expanding"`; second poll has `phase` in `(queued, draining, done)` with `n_wsids > 0`; the garbage id returns `http=404`.
- [ ] **Step 7: Verify DELETE on an active job**
```bash
# Submit a fresh job so we can cancel it before it drains
JID=$(curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\"}" \
| jq -r '.job_id')
# Cancel immediately
curl -sS -o /dev/null -w 'cancel=%{http_code}\n' -X DELETE http://100.114.205.53:8801/api/jobs/$JID
# Idempotent re-cancel
curl -sS -o /dev/null -w 'recancel=%{http_code}\n' -X DELETE http://100.114.205.53:8801/api/jobs/$JID
# Confirm phase
curl -sS http://100.114.205.53:8801/api/jobs/$JID | jq '{phase, failure_reason}'
```
Expected: `cancel=204`, `recancel=204`, `phase: "failed"`, `failure_reason: "cancelled"`.
- [ ] **Step 8: Checkpoint** - backend complete. Frontend wiring is Tasks 8-10.
---
## Task 8: Frontend - detect `job_id`, polling loop, partial-result rendering
**Files:**
- Modify: `/opt/sortof/frontend/sortof-app.jsx`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/frontend/sortof-app.jsx /opt/sortof/frontend/sortof-app.jsx.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Add a polling helper near other top-of-file helpers**
Find the `buildModsLine` function (around line 32). Below it (and **above** the `isRadioMode` / `defaultSelectionForBranches` block), add:
```jsx
// Spec B+F: poll a job. Resolves with { phase, result, counts, wsids } on
// terminal phase OR when an explicit stop signal fires. Caller controls the
// AbortSignal to cancel polling on unmount / cancel-button / new-sort.
const POLL_INTERVAL_MS = 2500;
async function pollJobOnce(jobId) {
const res = await fetch(`/api/jobs/${jobId}`);
if (res.status === 404) return { kind: 'expired' };
if (!res.ok) return { kind: 'error', status: res.status };
const json = await res.json();
return { kind: 'ok', body: json };
}
function pollJobLoop(jobId, signal, onTick) {
// Returns a Promise that resolves on terminal phase or AbortSignal.
return new Promise((resolve) => {
let timer = null;
async function tick() {
if (signal.aborted) { if (timer) clearTimeout(timer); resolve({ kind: 'aborted' }); return; }
const r = await pollJobOnce(jobId);
if (signal.aborted) { resolve({ kind: 'aborted' }); return; }
onTick(r);
if (r.kind === 'expired' || r.kind === 'error') { resolve(r); return; }
const phase = r.body.phase;
if (phase === 'done' || phase === 'failed') { resolve(r); return; }
timer = setTimeout(tick, POLL_INTERVAL_MS);
}
tick();
});
}
```
- [ ] **Step 3: Add an AbortController ref + cancel-job state in App**
Find the App function's state declarations (search for `const [pzBuild, setPzBuild]`). Below the existing useState/useRef block but above the existing useEffects, add:
```jsx
const pollAbortRef = useRef(null);
const [activeJobId, setActiveJobId] = useState(null);
```
- [ ] **Step 4: Update `onSort` to branch on `job_id`**
Find the `async function onSort()` body. The current body POSTs to `/api/sort` and applies the response. Find the line that receives the response:
```jsx
const json = await res.json();
_liveSortData = json;
```
Insert a branch immediately before `_liveSortData = json`:
```jsx
const json = await res.json();
if (json.job_id) {
// Async path - start polling and let the loop drive state.
// Abort any in-flight previous poll.
if (pollAbortRef.current) { pollAbortRef.current.abort(); }
const ctrl = new AbortController();
pollAbortRef.current = ctrl;
setActiveJobId(json.job_id);
pollJobLoop(json.job_id, ctrl.signal, (r) => {
if (r.kind !== 'ok') return;
const b = r.body;
if (b.result) {
_liveSortData = b.result;
sortContextRef.current = {
workshopItemsLine: (b.result.WORKSHOP_ITEMS_LINE) || '',
originalQueued: (b.result.pending || []).length,
unknown: b.result.unknown || [],
nonMod: b.result.non_mod || [],
};
}
setProgress(b.phase === 'expanding' ? 5 : Math.min(95, 10 + b.counts.cached));
setCounts({
cached: b.counts.cached,
queued: b.counts.queued,
parsing: b.counts.draining,
warnings: ((b.result && b.result.WARNINGS) || []).length,
unknown: ((b.result && b.result.unknown) || []).length,
nonMod: ((b.result && b.result.non_mod) || []).length,
});
setState(b.phase); // 'expanding' | 'queued' | 'draining' | 'done' | 'failed'
}).then((final) => {
setActiveJobId(null);
if (final.kind === 'expired') {
setState('error');
_liveSortData = {
...(_liveSortData || {}),
WARNINGS: [
...((_liveSortData?.WARNINGS) || []),
{ tag: 'retry', level: 'red', msg: 'this job expired - re-submit' },
],
};
}
});
return;
}
// Sync fast path - existing code follows.
_liveSortData = json;
```
- [ ] **Step 5: Verify served file picks up the new symbols**
```bash
curl -sS http://100.114.205.53:8801/sortof-app.jsx | grep -cE 'pollJobLoop|pollJobOnce|activeJobId|pollAbortRef|/api/jobs/'
```
Expected: ≥ 6.
- [ ] **Step 6: Manual browser smoke (or curl-driven simulation)**
The implementer should open `https://sortof.indifferentketchup.com/` and submit a real PZ collection URL. Expect: status strip text changes from `expanding…` to `queued`/`draining` to `done`. Network tab shows `GET /api/jobs/<uuid>` calls every 2.5s. Output panel populates as mods land in `mod_parsed`.
If headless: replicate by curling `/api/sort` with a collection URL, capturing the job_id, and curling `/api/jobs/<id>` repeatedly until `phase=done`. Confirm `result_json` populates with `MODS_LINE` etc. once draining completes.
- [ ] **Step 7: Checkpoint** - polling drives `_liveSortData` updates. Phase-specific status strip is Task 9.
---
## Task 9: Frontend - phase-specific status strip
**Files:**
- Modify: `/opt/sortof/frontend/sortof-app.jsx`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/frontend/sortof-app.jsx /opt/sortof/frontend/sortof-app.jsx.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Update `StatusStrip` to render phase-specific text**
Find the existing `function StatusStrip({ state, counts, progress })`. The existing function returns either an idle strip or a counts strip based on `state`. Replace its body with:
```jsx
function StatusStrip({ state, counts, progress }) {
// Idle / terminal states - single pill summary.
if (state === 'idle' || state === 'success' || state === 'error' || state === 'cold' || state === 'done' || state === 'failed') {
return (
<div className="status-strip">
<span className={'status-pill ' + (state === 'failed' || state === 'error' ? 'idle' : 'idle')}>
<span className="dot-led"></span>
{state === 'idle' && 'ready when you are'}
{state === 'success' && `done. ${counts.cached} mods, ${counts.warnings} warnings`}
{state === 'done' && `done. ${counts.cached} mods, ${counts.warnings} warnings`}
{state === 'error' && 'something went sideways'}
{state === 'failed' && 'job failed'}
{state === 'cold' && 'cache miss - be patient'}
</span>
</div>
);
}
// 'expanding' phase - no useful counts yet.
if (state === 'expanding') {
return (
<div className="status-strip">
<span className="status-pill expanding">
<span className="dot-led"></span>expanding collection
</span>
<div className="progress-bar"><i style={{ width: `${progress}%` }}></i></div>
</div>
);
}
// 'queued' or 'draining' - live counts. Existing 'partial'/'loading' too.
return (
<div className="status-strip">
<span className="status-pill cached">
<span className="dot-led"></span>{counts.cached} cached
</span>
<span className="status-pill queued">
<span className="dot-led"></span>{counts.queued} queued
</span>
<span className="status-pill parse">
<span className="dot-led"></span>{counts.parsing} draining
</span>
{counts.unknown > 0 && (
<span className="status-pill unknown" title="Steam doesn't recognize these IDs (deleted, typo'd, or private)">
<span className="dot-led"></span>{counts.unknown} unknown
</span>
)}
{counts.nonMod > 0 && (
<span className="status-pill nonmod" title="Workshop items that aren't loadable mods (collections, art, etc.)">
<span className="dot-led"></span>{counts.nonMod} non-mod
</span>
)}
<div className="progress-bar"><i style={{ width: `${progress}%` }}></i></div>
</div>
);
}
```
- [ ] **Step 3: Verify served**
```bash
curl -sS http://100.114.205.53:8801/sortof-app.jsx | grep -cE 'expanding collection|state === .expanding|state === .draining|state === .done|state === .failed'
```
Expected: ≥ 4.
- [ ] **Step 4: Manual browser smoke**
Submit a collection URL. Expect: strip starts with `expanding collection…`, transitions to live counts, ends with `done. N mods, …` summary.
---
## Task 10: Frontend - Cancel button + 404 expired-job handling + CSS
**Files:**
- Modify: `/opt/sortof/frontend/sortof-app.jsx`
- Modify: `/opt/sortof/frontend/index.html`
- [ ] **Step 1: Backup**
```bash
cp /opt/sortof/frontend/sortof-app.jsx /opt/sortof/frontend/sortof-app.jsx.bak-$(date +%Y%m%d-%H%M)
cp /opt/sortof/frontend/index.html /opt/sortof/frontend/index.html.bak-$(date +%Y%m%d-%H%M)
```
- [ ] **Step 2: Render a Cancel button when a job is active**
Find where the Sort button is rendered (search for `sort-btn` and `onClick={onSort}`). It's inside the left column. Below the existing Sort button JSX, add:
```jsx
{activeJobId && (
<button
className="cancel-btn"
onClick={async () => {
if (pollAbortRef.current) pollAbortRef.current.abort();
try {
await fetch(`/api/jobs/${activeJobId}`, { method: 'DELETE' });
} catch {}
setActiveJobId(null);
setState('idle');
setProgress(0);
}}
>cancel</button>
)}
```
- [ ] **Step 3: CSS for `.status-pill.expanding` and `.cancel-btn`**
Open `/opt/sortof/frontend/index.html`. Find the `.status-pill.nonmod` rule (added during the unknown/non-mod feature). Below it, add:
```css
.status-pill.expanding { color: var(--acc-blue); }
.status-pill.expanding .dot-led { background: var(--acc-blue); animation: bl 1.2s ease-in-out infinite; }
.cancel-btn {
appearance: none;
width: 100%;
height: 32px;
margin-top: 6px;
border: 1px solid var(--line);
background: transparent;
color: var(--fg-2);
border-radius: var(--radius);
font-family: var(--mono);
font-size: 12px;
cursor: pointer;
transition: color .12s, border-color .12s, background .12s;
}
.cancel-btn:hover { color: var(--acc-red); border-color: var(--acc-red); background: var(--acc-red-bg); }
```
- [ ] **Step 4: Verify served**
```bash
curl -sS http://100.114.205.53:8801/sortof-app.jsx | grep -c 'cancel-btn'
curl -sS http://100.114.205.53:8801/ | grep -cE '\.status-pill\.expanding|\.cancel-btn'
```
Expected: ≥ 1 (jsx), ≥ 2 (CSS).
- [ ] **Step 5: Manual browser smoke**
Submit a fresh cold collection. While the strip reads `expanding` or `draining`, click cancel. Expect: strip clears, `setActiveJobId(null)` fires, no further GET polls in the network tab.
---
## Task 11: Spec §11 acceptance + §12 test recipes
For each item, document expected vs actual. If any fails, return to the relevant task.
- [ ] **Step 1: §11.1 Sync fast path** - bare wsids, all cached.
```bash
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d '{"input":"2169435993;2392709985;2487022075"}' | jq '{has_job_id: (has("job_id")), MODS_LINE}'
```
Expected: `{"has_job_id": false, "MODS_LINE": "modoptions;tsarslib;TMC_TrueActions"}`.
- [ ] **Step 2: §11.2 Async path on uncached bare wsid**
```bash
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "DELETE FROM mod_parsed WHERE workshop_id='2196102849'; DELETE FROM workshop_meta WHERE workshop_id='2196102849';"
curl -sS -X POST http://100.114.205.53:8801/api/sort -H 'Content-Type: application/json' -d '{"input":"2196102849"}' | jq '{status, has_job_id: (has("job_id"))}'
```
Expected: `{"status":"queued","has_job_id":true}`.
- [ ] **Step 3: §11.3 Collection URL → expanding**
```bash
COLL_ID="<paste-real-PZ-collection-id>"
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\"}" \
| jq '{status, has_job_id: (has("job_id"))}'
```
Expected: `{"status":"expanding","has_job_id":true}`.
- [ ] **Step 4: §11.4 GET on bogus job → 404**
```bash
curl -sS -o /dev/null -w 'http=%{http_code}\n' http://100.114.205.53:8801/api/jobs/00000000-0000-0000-0000-000000000000
```
Expected: `http=404`.
- [ ] **Step 5: §11.5 DELETE → idempotent 204**
```bash
JID=$(curl -sS -X POST http://100.114.205.53:8801/api/sort -H 'Content-Type: application/json' -d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\"}" | jq -r '.job_id')
curl -sS -o /dev/null -w 'cancel1=%{http_code}\n' -X DELETE http://100.114.205.53:8801/api/jobs/$JID
curl -sS -o /dev/null -w 'cancel2=%{http_code}\n' -X DELETE http://100.114.205.53:8801/api/jobs/$JID
curl -sS http://100.114.205.53:8801/api/jobs/$JID | jq '{phase, failure_reason}'
```
Expected: `cancel1=204`, `cancel2=204`, `{"phase":"failed","failure_reason":"cancelled"}`.
- [ ] **Step 6: §11.6 Steam URL detection + GetCollectionDetails routing**
Already verified by step 3 of this task. Confirm via journal:
```bash
sudo journalctl -u sortof-api --since "2 min ago" | grep -E 'GetCollectionDetails|expansion'
```
Expected: at least one entry naming the collection ID.
- [ ] **Step 7: §11.7 Cache hit on second submit (within 6h)**
Re-submit the same collection URL. Confirm: response is fast; `journalctl` for the new request does NOT show a fresh GetCollectionDetails call. (The expansion task's cache hit short-circuits the API call.) An implementation note: a second submit creates a new sort_jobs row but reuses the cached children.
- [ ] **Step 8: §11.8 Partial-collection failure**
```bash
# Combine real + bogus collection URL
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d "{\"input\":\"https://steamcommunity.com/sharedfiles/filedetails/?id=$COLL_ID\nhttps://steamcommunity.com/sharedfiles/filedetails/?id=99999999\"}" | jq '.job_id'
```
Then poll the returned job until `phase=done`, and check `result.WARNINGS`:
```bash
curl -sS http://100.114.205.53:8801/api/jobs/<jid> | jq '.result.WARNINGS[] | select(.tag=="collection-partial")'
```
Expected: one entry with `msg` mentioning `99999999`.
- [ ] **Step 9: §11.9 All collections fail**
```bash
curl -sS -X POST http://100.114.205.53:8801/api/sort \
-H 'Content-Type: application/json' \
-d '{"input":"https://steamcommunity.com/sharedfiles/filedetails/?id=99999999"}' | jq -r '.job_id' | tee /tmp/jid_test
sleep 3
curl -sS http://100.114.205.53:8801/api/jobs/$(cat /tmp/jid_test) | jq '{phase, failure_reason}'
```
Expected: `{"phase":"failed","failure_reason":"all input collections unresolvable"}`.
- [ ] **Step 10: §11.10 Stale-expansion sweep on restart**
```bash
# Manually create a stale expansion row
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "
INSERT INTO sort_jobs (phase, phase_started_at, input_raw, collection_ids)
VALUES ('expanding', now() - interval '15 minutes', 'sweep test', ARRAY['99999999'])
RETURNING job_id;" | tail -3 | head -1 | xargs -I{} echo "stale_jid={}"
sudo systemctl restart sortof-api && sleep 3
sudo docker exec -i sortof_db psql -U sortof -d sortof -c "
SELECT phase, failure_reason FROM sort_jobs WHERE input_raw='sweep test';"
```
Expected: phase=`failed`, failure_reason=`expansion timed out`.
- [ ] **Step 11: §11.11 Counts contract - sum equals total minus non_mod**
After a cold collection drains, sum the live counts and compare to wsids count:
```bash
JID=<some-active-job>
curl -sS http://100.114.205.53:8801/api/jobs/$JID | jq '
.counts as $c | .wsids as $w | {
sum: ($c.cached + $c.queued + $c.draining),
total_wsids: ($w | length),
delta: (($w | length) - ($c.cached + $c.queued + $c.draining))
}'
```
Expected: `delta` is the count of wsids that ended up `non_mod` or `unknown` (not in any of the three count buckets) - typically 0 or a small integer.
- [ ] **Step 12: §11.12 WORKSHOP_ITEMS_LINE locked at job creation**
After cancellation or partial-failure, the final `result.WORKSHOP_ITEMS_LINE` from `/api/jobs/<id>` must equal `wsids[]` joined by `;` regardless of how many landed in `non_mod` / `unknown`. Spot-check by comparing.
- [ ] **Step 13: Public-hostname mirror**
```bash
curl -sS -X POST https://sortof.indifferentketchup.com/api/sort \
-H 'Content-Type: application/json' \
-d '{"input":"2169435993;2392709985;2487022075"}' | jq '{status, MODS_LINE}'
```
Expected: success, canonical MODS_LINE. Public side mirrors all backend behavior.
- [ ] **Step 14: Final regression** - re-run the canonical 3-mod sync sort once more and confirm the response shape did not gain a `job_id` field.
---
## Self-review (already applied)
- **Spec coverage:** §1 (overview) - Tasks 1-7 cover backend, 8-10 cover frontend. §2 (API contract) - Tasks 6-7. §3 (schema) - Task 1. §4 (phase machine) - `derive_phase` in Task 4. §5 (Steam expansion) - Tasks 3, 5. §6 (counts contract) - `compute_counts` in Task 4, applied in Task 7. §7 (frontend) - Tasks 8-10. §8 (cancellation) - Task 7 step 4 + Task 10. §9 (restart resilience) - `sweep_stale_expansions` in Task 4 + lifespan wiring in Task 6. §10 (open questions) - locked in spec; plan implements verbatim. §11/§12 - Task 11.
- **Placeholders:** all `<paste-real-…>` markers explicitly call out the implementer-action; no TBDs.
- **Type consistency:** `wsids` is `List[str] | None` everywhere; `counts` is `{cached: int, queued: int, draining: int}` everywhere; `phase` is the locked enum throughout. `pollJobLoop` callback receives the same shape `r.body` matches the GET endpoint return.
- **No git, by design:** every code-changing task starts with a `cp file file.bak-$(date)` step in lieu of a commit. The schema migration in Task 1 is idempotent (`CREATE TABLE IF NOT EXISTS`) so no rollback file is needed.
- **Restart-vs-no-restart:** backend tasks (1, 4, 5, 6, 7) end with `sudo systemctl restart sortof-api`. Frontend tasks (8, 9, 10) end with `curl-grep` only - `StaticFiles` serves from disk; hard-refresh in browser. Worker is unchanged across the entire feature.