broccolini-bot/docs/analytics/Part 1 Batch Analytics Report.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Mode C: Batch Analytics Report</title>
  <style>
    :root {
      --bg: #fafbfc;
      --paper: #ffffff;
      --text: #1a1d21;
      --text-muted: #57606a;
      --accent: #0969da;
      --accent-soft: #ddf4ff;
      --border: #d0d7de;
      --table-stripe: #f6f8fa;
      --code-bg: #f0f2f5;
      --font-sans: 'Segoe UI', system-ui, -apple-system, sans-serif;
      --font-mono: ui-monospace, 'Cascadia Code', 'Source Code Pro', monospace;
      --radius: 6px;
      --shadow: 0 1px 3px rgba(0,0,0,.06);
    }

    * { box-sizing: border-box; }
    body {
      margin: 0;
      padding: 2rem 1.5rem 3rem;
      font-family: var(--font-sans);
      font-size: 15px;
      line-height: 1.6;
      color: var(--text);
      background: var(--bg);
    }

    .report {
      max-width: 820px;
      margin: 0 auto;
      background: var(--paper);
      padding: 2.5rem 3rem;
      border-radius: var(--radius);
      box-shadow: var(--shadow);
    }

    .report-header {
      border-bottom: 2px solid var(--border);
      padding-bottom: 1.5rem;
      margin-bottom: 2rem;
    }
    .report-header h1 {
      margin: 0 0 0.5rem;
      font-size: 1.75rem;
      font-weight: 600;
      color: var(--text);
    }
    .report-meta {
      font-size: 0.9rem;
      color: var(--text-muted);
    }
    .report-meta p { margin: 0.25rem 0; }

    h2 {
      font-size: 1.2rem;
      font-weight: 600;
      margin: 2rem 0 1rem;
      padding-bottom: 0.35rem;
      color: var(--text);
      border-bottom: 1px solid var(--border);
    }
    h2:first-of-type { margin-top: 0; }

    p { margin: 0.75rem 0; }
    .narrative, .recommendation { font-style: normal; }
    strong { font-weight: 600; }

    table {
      width: 100%;
      border-collapse: collapse;
      font-size: 0.9rem;
      margin: 1rem 0;
      border-radius: var(--radius);
      overflow: hidden;
      box-shadow: 0 1px 2px rgba(0,0,0,.05);
    }
    th, td {
      padding: 0.6rem 1rem;
      text-align: left;
      border: 1px solid var(--border);
    }
    th {
      background: var(--table-stripe);
      font-weight: 600;
      color: var(--text);
    }
    tr:nth-child(even) { background: var(--table-stripe); }
    tr:hover { background: #eef2f7; }

    pre, code {
      font-family: var(--font-mono);
      font-size: 0.85em;
    }
    pre {
      background: var(--code-bg);
      padding: 1rem 1.25rem;
      border-radius: var(--radius);
      overflow-x: auto;
      margin: 1rem 0;
      border: 1px solid var(--border);
    }
    code { padding: 0.15em 0.4em; background: var(--code-bg); border-radius: 4px; }

    ul { margin: 0.75rem 0; padding-left: 1.5rem; }
    li { margin: 0.35rem 0; }

    hr {
      border: none;
      border-top: 1px solid var(--border);
      margin: 2rem 0;
    }

    @media print {
      body { background: #fff; padding: 0; }
      .report {
        max-width: none;
        box-shadow: none;
        padding: 0;
      }
      h2 { page-break-after: avoid; }
      table { page-break-inside: avoid; }
    }
  </style>
</head>
<body>
  <div class="report">
    <header class="report-header">
      <h1>Mode C: Batch analytics report</h1>
      <div class="report-meta">
        <p><strong>Source:</strong> <code>Discord Ticket Transcripts/Drive2/</code></p>
        <p><strong>Computed:</strong> From transcript HTML (metadata + decoded base64 message payloads).</p>
        <p><strong>Guide:</strong> Part 1 Analysis (Transcript analytics schemas, Broccolini support section).</p>
        <p><strong>Tool:</strong> <code>scripts/batch_transcript_analytics.py</code></p>
      </div>
      <p style="margin-top: 1rem;">Analytics below are <strong>per‑ticket and aggregate</strong> across 722 transcripts. Dimensions that require full Mode A extraction (issue categories, tags, wiki success/failure, intake gaps, frequency/impact, resolution status, email forgotten/misspelled) are noted; tables use parser-derived data where available.</p>
    </header>

    <h2>1. Volume and scope</h2>
    <table>
      <thead><tr><th>Metric</th><th>Value</th></tr></thead>
      <tbody>
        <tr><td>Total tickets</td><td>722</td></tr>
        <tr><td>Transcripts with parse errors</td><td>0</td></tr>
        <tr><td>Tickets with "Ticket closed" / "Transcript saving" in payload</td><td>722 (100%)</td></tr>
        <tr><td>Tickets with claimed channel name (staff claimed)</td><td>671 (93%)</td></tr>
        <tr><td>Tickets with escalation mentioned in text</td><td>1</td></tr>
      </tbody>
    </table>
    <p class="narrative"><strong>Narrative:</strong> The Drive2 batch is fully parseable. Virtually all tickets show a close/saving event; 93% have a claimed channel, indicating most tickets were claimed by staff before closure. Escalations are rare in this set.</p>

    <hr />

    <h2>2. Game detection and game_or_server (heuristic)</h2>
    <p>From decoded form + messages: text buffer scanned for canonical game names and aliases (Part 1 Analysis §5). <code>game_or_server</code> routing buckets would require Mode A (Valheim | Rust main | MC modded | MC vanilla | Other).</p>
    <table>
      <thead><tr><th>game_detected (heuristic)</th><th>Count</th><th>%</th></tr></thead>
      <tbody>
        <tr><td>Project Zomboid</td><td>257</td><td>35.6</td></tr>
        <tr><td>Minecraft</td><td>179</td><td>24.8</td></tr>
        <tr><td>Satisfactory</td><td>79</td><td>10.9</td></tr>
        <tr><td>Palworld</td><td>46</td><td>6.4</td></tr>
        <tr><td>Not Mentioned</td><td>45</td><td>6.2</td></tr>
        <tr><td>Enshrouded</td><td>18</td><td>2.5</td></tr>
        <tr><td>ARK: Survival Evolved</td><td>18</td><td>2.5</td></tr>
        <tr><td>7 Days to Die</td><td>17</td><td>2.4</td></tr>
        <tr><td>Valheim</td><td>14</td><td>1.9</td></tr>
        <tr><td>DayZ</td><td>10</td><td>1.4</td></tr>
        <tr><td>FiveM</td><td>8</td><td>1.1</td></tr>
        <tr><td>Core Keeper</td><td>6</td><td>0.8</td></tr>
        <tr><td>Vintage Story</td><td>5</td><td>0.7</td></tr>
        <tr><td>Rust</td><td>5</td><td>0.7</td></tr>
        <tr><td>Factorio</td><td>5</td><td>0.7</td></tr>
        <tr><td>V Rising</td><td>3</td><td>0.4</td></tr>
        <tr><td>ECO</td><td>3</td><td>0.4</td></tr>
        <tr><td>Necesse</td><td>4</td><td>0.6</td></tr>
      </tbody>
    </table>
    <p class="narrative"><strong>Narrative:</strong> Project Zomboid and Minecraft dominate; Satisfactory and Palworld are next. About 6% have no game detected from text. Full <code>game_or_server</code> (MC modded vs vanilla, Rust main, etc.) needs per‑ticket Mode A extraction.</p>

    <hr />

    <h2>3. Issue categories and tags</h2>
    <p><strong>Issue categories</strong> (Availability, Connectivity, Billing, Data/saves, Configuration/mods) and <strong>TICKET_TAGS</strong> (Server Down, Stuck Restarting, Can't Connect, Server Lag, Billing, Refund Request, Mod Help, Backup Restore, World/Save, Server Config) require Mode A extraction from each transcript. No aggregate table is computed from the batch parser.</p>
    <p class="recommendation"><strong>Recommendation:</strong> Run Mode A on all transcripts (or a sample), then aggregate <code>issue_types</code> and suggested tags into counts and top tags per game.</p>

    <hr />

    <h2>4. Message count and conversation shape</h2>
    <table>
      <thead><tr><th>Messages (header)</th><th>Number of tickets</th></tr></thead>
      <tbody>
        <tr><td>3–6</td><td>151</td></tr>
        <tr><td>7–10</td><td>128</td></tr>
        <tr><td>11–15</td><td>95</td></tr>
        <tr><td>16–22</td><td>88</td></tr>
        <tr><td>23–35</td><td>78</td></tr>
        <tr><td>36–60</td><td>45</td></tr>
        <tr><td>61+</td><td>137</td></tr>
      </tbody>
    </table>
    <p>Summary stats (from header): Min 3, max 356 messages per ticket; majority in the 3–22 range. Back‑and‑forth turns, duration, and "staff asked for more info repeatedly" require Mode A.</p>
    <p class="narrative"><strong>Narrative:</strong> Conversation length is skewed toward short (3–10 messages) and mid (11–35); a smaller set of tickets are long (60+ messages), likely complex or multi-step resolutions.</p>

    <hr />

    <h2>5. Attachments (saved / skipped)</h2>
    <table>
      <thead><tr><th>Attachments saved</th><th>Number of tickets</th></tr></thead>
      <tbody>
        <tr><td>0</td><td>325</td></tr>
        <tr><td>1</td><td>169</td></tr>
        <tr><td>2</td><td>81</td></tr>
        <tr><td>3</td><td>55</td></tr>
        <tr><td>4</td><td>30</td></tr>
        <tr><td>5+</td><td>62</td></tr>
      </tbody>
    </table>
    <table>
      <thead><tr><th>Attachments skipped</th><th>Number of tickets</th></tr></thead>
      <tbody>
        <tr><td>0</td><td>695</td></tr>
        <tr><td>1</td><td>16</td></tr>
        <tr><td>2</td><td>6</td></tr>
        <tr><td>3</td><td>3</td></tr>
        <tr><td>4</td><td>2</td></tr>
      </tbody>
    </table>
    <p class="narrative"><strong>Narrative:</strong> About 45% of tickets have at least one attachment saved; most have none skipped. Skipped reasons and mentions_screenshots/clips/logs require Mode A.</p>

    <hr />

    <h2>6. Staff involvement (from payload)</h2>
    <p>Staff identified by Broccolini support user IDs (Part 1 Analysis §10.1) appearing in message payloads.</p>
    <table>
      <thead><tr><th>Staff involved (count per ticket)</th><th>Number of tickets</th></tr></thead>
      <tbody>
        <tr><td>0</td><td>152</td></tr>
        <tr><td>1</td><td>545</td></tr>
        <tr><td>2</td><td>25</td></tr>
      </tbody>
    </table>
    <p class="narrative"><strong>Narrative:</strong> Most tickets have exactly one staff member in the payload; 152 have no staff ID in messages (e.g. Ticket Tool–only or unclaimed). "Tickets claimed per member" and first‑response time need claim/unclaim message parsing (channel name gives claim attribution; full workload per member needs Mode A or claim-event parsing).</p>

    <hr />

    <h2>7. User count (participants per ticket)</h2>
    <table>
      <thead><tr><th>User count (header)</th><th>Number of tickets</th></tr></thead>
      <tbody>
        <tr><td>2</td><td>70</td></tr>
        <tr><td>3</td><td>582</td></tr>
        <tr><td>4</td><td>44</td></tr>
        <tr><td>5</td><td>8</td></tr>
        <tr><td>6</td><td>2</td></tr>
      </tbody>
    </table>
    <p class="narrative"><strong>Narrative:</strong> Most tickets have 3 participants (requester + 1 staff + Ticket Tool); 4+ participants suggest multi-staff or extra users in thread.</p>

    <hr />

    <h2>8. Wiki usage and wiki‑linked outcomes</h2>
    <p><strong>wiki_articles_posted</strong>, <strong>wiki_solved_issue</strong> (true / false / unclear), and staff‑linked outcomes (user_wanted_broccolini_to_do_it, user_wanted_broccolini_but_walkthrough) require Mode A extraction. No aggregate table from the batch parser.</p>
    <p class="recommendation"><strong>Recommendation:</strong> After Mode A, aggregate: (1) tickets where wiki_solved_issue = true / false / unclear; (2) per support member: wiki posts that solved vs did not, "do it for me" vs walkthrough counts.</p>

    <hr />

    <h2>9. Email analytics</h2>
    <p>Parser did not detect "Account Email" + email in the same decoded block in this run. <strong>Email analytics</strong> (email_forgotten, email_misspelled, email_didnt_link, email_corrected) require Mode A extraction from form embeds and message text.</p>

    <hr />

    <h2>10. Frequency / impact distributions</h2>
    <p><strong>frequency</strong> (once | sometimes | every_time | unclear) and <strong>impact</strong> (minor | moderate | severe | blocked | unclear) require inference from transcript wording (Mode A). No aggregate table from the batch parser.</p>

    <hr />

    <h2>11. Resolution patterns</h2>
    <p>From parser: all 722 tickets contain "Ticket closed" or "Transcript saving" in the payload. <strong>status</strong> (resolved | unresolved | escalated | unclear) and <strong>relied_on</strong> (logs | mod_updates | staff_action | other) require Mode A. One ticket mentions escalation in text.</p>
    <p class="narrative"><strong>Narrative:</strong> All transcripts represent closed/saved tickets; resolution outcome and what resolution relied on need per‑ticket extraction.</p>

    <hr />

    <h2>12. Intake gaps</h2>
    <p>Per‑ticket intake_gaps (account_contact, issue_type, reproduction, environment, attachments, priority, rules) each as complete | partial | missing require Mode A. No aggregate table from the batch parser.</p>
    <p class="recommendation"><strong>Recommendation:</strong> After Mode A, report % complete / partial / missing per dimension to target form and template improvements.</p>

    <hr />

    <h2>13. Recurring analytics (Broccolini support section)</h2>
    <p>From the batch parser we have:</p>
    <ul>
      <li><strong>Tickets per game_detected (heuristic):</strong> see §2.</li>
      <li><strong>Claimed channel share:</strong> 671/722 (93%).</li>
      <li><strong>Staff involved count per ticket:</strong> see §6.</li>
    </ul>
    <p><strong>Require Mode A or claim parsing:</strong></p>
    <ul>
      <li>Tickets claimed per member (from claim/unclaim messages or channel name).</li>
      <li>First response time, re‑opens, escalations.</li>
      <li>Tag distribution, repeat customers, sentiment toward staff.</li>
      <li>Wiki‑linked outcomes per member (§9.2).</li>
    </ul>

    <hr />

    <h2>14. How to reproduce and extend</h2>
    <ol>
      <li><strong>Run batch parser (this report's source):</strong>
        <pre>python3 scripts/batch_transcript_analytics.py "Discord Ticket Transcripts/Drive2"</pre>
        For a subfolder or Drive:
        <pre>python3 scripts/batch_transcript_analytics.py "Discord Ticket Transcripts/Drive"</pre>
      </li>
      <li><strong>Full Mode C tables:</strong> Run Mode A extraction on each transcript (or a sample), collect JSON, then aggregate by issue categories, tags, game_or_server, wiki_solved_issue, intake_gaps, frequency/impact, resolution status, and email analytics. Use Part 1 Analysis and <code>docs/TICKET-ANALYTICS-SCHEMA-PROMPTING.md</code> as the schema source of truth.</li>
    </ol>
  </div>
</body>
</html>