Close: guard channel delete with pendingDelete so a restart can't orphan it
The button and slash close paths deleted the channel via a bare setTimeout that never set the pendingDelete flag, so a restart in the 5s grace window orphaned the channel (closed in DB, still present in Discord) with no recovery — only the auto-close path used the flag correctly. Extract scheduleTicketChannelDelete() in services/tickets.js: a grace-delayed, queue-routed (enqueueDelete) delete that clears pendingDelete on success. All three close paths now use it. Button/slash set pendingDelete:true and keep discordThreadId populated so resumePendingDeletes() recovers the delete on the next boot. The button path previously nulled discordThreadId before the delete, which made the channel unrecoverable.
This commit is contained in:
73
.scratch/close-hardening/design.md
Normal file
73
.scratch/close-hardening/design.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Close-hardening: stop mid-close restarts from orphaning channels
|
||||
|
||||
## Problem (observed)
|
||||
Ticket #18 was closed (transcript saved, `status: closed`) but its Discord channel
|
||||
was never deleted — it lingers as an orphan. Root cause: the final channel delete
|
||||
is a deferred `setTimeout`, and a container restart during the delay drops it.
|
||||
|
||||
Evidence: ticket #18 in Mongo is `status: "closed"` with
|
||||
`discordThreadId: "1512204690631430144"` still pointing at the live channel, while
|
||||
properly-closed tickets (#17, #12, #11, #10) are `status: closed` + `discordThreadId: null`.
|
||||
|
||||
## Three underlying defects
|
||||
1. **Deferred delete is cancellable / fragile.**
|
||||
- Button path (`handlers/buttons.js` `runFinalClose:471`):
|
||||
`trackTimeout(setTimeout(() => channel.delete(), 5000))`. On SIGTERM,
|
||||
`handleShutdown` (`broccolini-discord.js:276-279`) `clearTimeout`s every tracked
|
||||
timeout → the delete never fires. A redeploy in the 5 s window orphans the channel.
|
||||
- Slash path (`handlers/commands/close.js` `finalizeForceClose:89-93`): a *plain*
|
||||
`setTimeout` (not tracked) — survives SIGTERM but dies on hard exit/SIGKILL, and
|
||||
there is no reconciliation either way.
|
||||
2. **Inconsistent DB writes between the two paths.**
|
||||
- Button path sets `{ discordThreadId: null, status: 'closed' }` (buttons.js:447-450).
|
||||
- Slash path sets only `{ status: 'closed' }` (close.js:73-76), leaving `discordThreadId`.
|
||||
So an orphan may have `discordThreadId` null OR still-set — no single signal.
|
||||
3. **No reconciliation for "closed but channel still exists."**
|
||||
`reconcileDeletedTicketChannels` only handles the opposite direction (DB open,
|
||||
channel gone). Nothing heals a closed ticket whose channel survived.
|
||||
|
||||
## Goals
|
||||
- A restart at any moment during close must not permanently orphan a channel.
|
||||
- Both close paths leave identical, unambiguous DB state.
|
||||
- A self-healing sweep finishes any delete a restart interrupted.
|
||||
|
||||
## Approach (IMPLEMENTED — uses the existing pendingDelete mechanism)
|
||||
Discovery during implementation: the codebase **already has** the restart-survival
|
||||
machinery — the `pendingDelete` flag (`models.js`) + `resumePendingDeletes(client)`
|
||||
called once from the `ready` handler (`broccolini-discord.js:231`). The **auto-close**
|
||||
path uses it correctly; the **button** and **slash** paths simply did not participate
|
||||
(bare `setTimeout(channel.delete())`, never setting `pendingDelete`). That omission is
|
||||
the entire bug. So the fix is to make all three paths share one guarded delete — NOT to
|
||||
add a new reconcile job.
|
||||
|
||||
1. **Shared helper `scheduleTicketChannelDelete(channel, gmailThreadId)`** in
|
||||
`services/tickets.js`: after a 5 s grace delay, `enqueueDelete(channel)` (queue-routed,
|
||||
honoring Hard Rule #3 — the old bare `channel.delete()` bypassed the queue) then unset
|
||||
`pendingDelete`. Wrapped in `trackTimeout`.
|
||||
2. **Each close path sets `pendingDelete: true` and keeps `discordThreadId` populated**
|
||||
before scheduling, so a restart in the grace window is recovered by
|
||||
`resumePendingDeletes()` (it re-fetches the channel by `discordThreadId` and deletes it).
|
||||
- Button path previously set `discordThreadId: null` *before* the delete — that made the
|
||||
channel unrecoverable on restart. Changed to `{ pendingDelete: true }`, leaving
|
||||
`discordThreadId` set (matches the auto-close contract).
|
||||
3. The grace delay is kept (staff read the close message first); recovery now covers it.
|
||||
|
||||
## Files (DONE)
|
||||
- `services/tickets.js` — added `scheduleTicketChannelDelete()`; auto-close else-branch now
|
||||
calls it; exported.
|
||||
- `handlers/buttons.js` `runFinalClose` — `attemptCloseTransition(..., { pendingDelete: true })`
|
||||
(was `{ discordThreadId: null }`); delete via `scheduleTicketChannelDelete`.
|
||||
- `handlers/commands/close.js` `finalizeForceClose` — same `{ pendingDelete: true }`; delete via
|
||||
`scheduleTicketChannelDelete`.
|
||||
|
||||
## Notes / residual
|
||||
- Pre-existing orphans (e.g. #14) have `pendingDelete: false`, so `resumePendingDeletes`
|
||||
will NOT auto-heal them — they need the one-off manual cleanup (same as #18).
|
||||
- A non-restart `enqueueDelete` failure leaves `pendingDelete: true` until the next boot
|
||||
(resume retries). Same property the auto-close path already had — accepted.
|
||||
- Closed tickets now retain `discordThreadId` (like auto-close already did); nothing queries
|
||||
closed tickets by channel, and the deleted channel id never re-matches a live channel.
|
||||
|
||||
## Out of scope
|
||||
- The in-memory countdown itself (a restart during the *countdown*, before finalize,
|
||||
simply cancels the pending close — acceptable; staff can re-close).
|
||||
@@ -22,7 +22,7 @@ const {
|
||||
} = require('discord.js');
|
||||
const { mongoose } = require('../db-connection');
|
||||
const { CONFIG } = require('../config');
|
||||
const { makeTicketName, resolveCreatorNickname, getOrCreateTicketCategory, cleanupEmptyOverflowCategory, checkTicketCreationRateLimit, toDiscordSafeName, attemptCloseTransition } = require('../services/tickets');
|
||||
const { makeTicketName, resolveCreatorNickname, getOrCreateTicketCategory, cleanupEmptyOverflowCategory, checkTicketCreationRateLimit, toDiscordSafeName, attemptCloseTransition, scheduleTicketChannelDelete } = require('../services/tickets');
|
||||
const { sendTicketClosedEmail } = require('../services/gmail');
|
||||
const { moveThreadToFolder } = require('../services/gmailLabels');
|
||||
const { getTicketActionRow, ticketChannelOverwrites } = require('../utils/ticketComponents');
|
||||
@@ -453,7 +453,9 @@ async function runFinalClose(interaction, ticket, sendEmail = true) {
|
||||
await sendTicketClosedEmail(ticket, closerDisplayName, interaction.user.id);
|
||||
}
|
||||
|
||||
const { transitioned, ticket: closedTicket } = await attemptCloseTransition(ticket.gmailThreadId, { discordThreadId: null }, { welcomeMessageId: '' });
|
||||
// Keep discordThreadId set and mark pendingDelete so a restart during the
|
||||
// grace window before the channel delete is recovered by resumePendingDeletes().
|
||||
const { transitioned, ticket: closedTicket } = await attemptCloseTransition(ticket.gmailThreadId, { pendingDelete: true }, { welcomeMessageId: '' });
|
||||
if (transitioned) {
|
||||
const closerType = isStaff(interaction.member) ? 'staff' : 'user';
|
||||
recordAction(interaction.user.id, 'close', {
|
||||
@@ -482,9 +484,12 @@ async function runFinalClose(interaction, ticket, sendEmail = true) {
|
||||
const parentCatId = ticket.parentCategoryId;
|
||||
const guildRef = interaction.guild;
|
||||
|
||||
// Queue-routed, pendingDelete-guarded delete (shared with auto-close + slash
|
||||
// close) so a mid-close restart can't orphan the channel.
|
||||
scheduleTicketChannelDelete(interaction.channel, ticket.gmailThreadId);
|
||||
|
||||
// Lazy require — same cycle reason as in handleConfirmCloseRequest above.
|
||||
const { trackTimeout } = require('../broccolini-discord');
|
||||
trackTimeout(setTimeout(() => interaction.channel.delete().catch(() => {}), 5000));
|
||||
trackTimeout(setTimeout(() => {
|
||||
if (parentCatId && guildRef) {
|
||||
cleanupEmptyOverflowCategory(guildRef, parentCatId, CONFIG.TICKET_CATEGORY_NAME).catch(() => {});
|
||||
|
||||
@@ -15,7 +15,7 @@ const { logTicketEvent, logError } = require('../../services/debugLog');
|
||||
const { moveThreadToFolder } = require('../../services/gmailLabels');
|
||||
const { pendingCloses } = require('../pendingCloses');
|
||||
const { findTicketForChannel } = require('../sharedHelpers');
|
||||
const { attemptCloseTransition } = require('../../services/tickets');
|
||||
const { attemptCloseTransition, scheduleTicketChannelDelete } = require('../../services/tickets');
|
||||
const { buildTranscriptText, formatDateForTranscript, renderTranscriptHeader } = require('../../services/transcript');
|
||||
const { recordAction } = require('../../services/staffStats');
|
||||
|
||||
@@ -75,7 +75,9 @@ async function finalizeForceClose(channelRef, clientRef, _TicketModel, _recordAc
|
||||
if (!freshTicket || freshTicket.status === 'closed') return;
|
||||
|
||||
try {
|
||||
const { transitioned, ticket: closedTicket } = await attemptCloseTransition(freshTicket.gmailThreadId, {}, { welcomeMessageId: '' }, T);
|
||||
// pendingDelete (with discordThreadId left set) lets resumePendingDeletes()
|
||||
// recover the channel delete if a restart interrupts the grace window.
|
||||
const { transitioned, ticket: closedTicket } = await attemptCloseTransition(freshTicket.gmailThreadId, { pendingDelete: true }, { welcomeMessageId: '' }, T);
|
||||
if (transitioned) {
|
||||
record(closerId ?? 'system', 'close', {
|
||||
ticket: closedTicket,
|
||||
@@ -97,11 +99,9 @@ async function finalizeForceClose(channelRef, clientRef, _TicketModel, _recordAc
|
||||
console.error('Transcript error (force-close):', tErr)
|
||||
);
|
||||
|
||||
setTimeout(() => {
|
||||
channelRef.delete('Ticket force-closed').catch(e =>
|
||||
console.error('Failed to delete channel:', e)
|
||||
);
|
||||
}, 5000);
|
||||
// Queue-routed, pendingDelete-guarded delete (shared with auto-close + button
|
||||
// close) so a mid-close restart can't orphan the channel.
|
||||
scheduleTicketChannelDelete(channelRef, freshTicket.gmailThreadId);
|
||||
} catch (err) {
|
||||
console.error('Force close error:', err);
|
||||
}
|
||||
|
||||
@@ -293,6 +293,32 @@ async function attemptCloseTransition(gmailThreadId, extraSet = {}, extraUnset =
|
||||
return { transitioned, ticket };
|
||||
}
|
||||
|
||||
/**
|
||||
* Schedule the final ticket-channel delete after a short grace period (so staff
|
||||
* read the close message first), routed through the channel queue.
|
||||
*
|
||||
* The delete is guarded by the `pendingDelete` flag: the caller MUST have already
|
||||
* set `pendingDelete: true` on the ticket AND left `discordThreadId` populated, so
|
||||
* that a restart during the grace window is recovered on boot by
|
||||
* resumePendingDeletes() (which re-fetches the channel and deletes it). The flag
|
||||
* is cleared once enqueueDelete resolves; if the doc is gone the unset is a no-op.
|
||||
*
|
||||
* Shared by all three close paths (auto-close, button, slash) so they behave
|
||||
* identically and none can orphan a channel on a mid-close restart.
|
||||
*/
|
||||
function scheduleTicketChannelDelete(channel, gmailThreadId, delayMs = 5000) {
|
||||
// Lazy require — broccolini-discord re-exports trackTimeout; cycle-safe at call time.
|
||||
const { trackTimeout } = require('../broccolini-discord');
|
||||
trackTimeout(setTimeout(() => {
|
||||
enqueueDelete(channel).then(() => {
|
||||
withRetry(() => Ticket.updateOne(
|
||||
{ gmailThreadId },
|
||||
{ $unset: { pendingDelete: '' } }
|
||||
)).catch(() => {});
|
||||
}).catch(() => {});
|
||||
}, delayMs));
|
||||
}
|
||||
|
||||
// --- SCHEDULED CHECKS ---
|
||||
// These accept `client` and optionally `sendTicketClosedEmail` to avoid circular deps.
|
||||
|
||||
@@ -340,16 +366,7 @@ async function checkAutoClose(client, sendTicketClosedEmail, _TicketModel, _reco
|
||||
if (_deps && _deps.scheduleDelete) {
|
||||
_deps.scheduleDelete(channel, ticket);
|
||||
} else {
|
||||
// Lazy require — broccolini-discord re-exports trackTimeout; cycle-safe.
|
||||
const { trackTimeout } = require('../broccolini-discord');
|
||||
trackTimeout(setTimeout(() => {
|
||||
enqueueDelete(channel).then(() => {
|
||||
withRetry(() => Ticket.updateOne(
|
||||
{ gmailThreadId: ticket.gmailThreadId },
|
||||
{ $unset: { pendingDelete: '' } }
|
||||
)).catch(() => {});
|
||||
}).catch(() => {});
|
||||
}, 5000));
|
||||
scheduleTicketChannelDelete(channel, ticket.gmailThreadId);
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
@@ -467,6 +484,7 @@ module.exports = {
|
||||
checkTicketCreationRateLimit,
|
||||
checkTicketLimits,
|
||||
attemptCloseTransition,
|
||||
scheduleTicketChannelDelete,
|
||||
checkAutoClose,
|
||||
checkAutoUnclaim,
|
||||
reconcileDeletedTicketChannels,
|
||||
|
||||
Reference in New Issue
Block a user