v1.1 batch 1: markdown, message actions, tok/s+ctx, AI naming

Four features land together on this branch: 1. Markdown rendering — assistant messages go through react-markdown + remark-gfm. Fenced code blocks render via existing CodeBlock (with copy button); inline `code` is styled inline. User messages stay plain text. No raw HTML (no rehype-raw). 2. Per-message Copy + Regenerate. New endpoint POST /api/sessions/:id/messages/:message_id/regenerate validates the target (404/400/409), atomically deletes the target plus any later messages in the session, inserts a fresh streaming assistant row, and enqueues a normal inference run. The DELETE bound uses a SQL subquery (`created_at >= (SELECT created_at FROM messages WHERE id = $1)`) instead of a JS round-trip so postgres TIMESTAMPTZ µs precision is preserved — otherwise sub-ms clock_timestamp() differences between the user row and the assistant row collapsed to the same JS Date, pulling the triggering user message into the >= bound. New `messages_deleted` WS frame so already-connected clients prune the stale tail without needing a full snapshot resend. 3. tok/s + ctx counter. Five new nullable message columns: tokens_used, ctx_used, ctx_max, started_at, finished_at. started_at is set right before the OpenAI call in services/inference.ts (not in the route, not in the frame handler); finished_at + tokens_used + ctx_used + ctx_max are committed in the same UPDATE that flips status to 'complete'. The inference request now opts into stream_options.include_usage so the final chunk carries usage; defensive parsing also picks up timings.n_ctx when llama.cpp emits it (currently absent for our llama-swap models, so ctx_max stays NULL and the UI just shows `<used> ctx`). message_complete frame extended with tokens_used / ctx_used / ctx_max / started_at / finished_at / model. Frontend StatsLine in MessageBubble computes tok/s client-side from the timestamps and renders muted mono text below the body of completed assistant messages. 4. AI chat naming after the first turn. Backend services/auto_name.ts runs via setImmediate after the top-level inference resolves; it checks that there is exactly one completed assistant message and that the session has not been user-renamed (`name IS NULL OR name = '' OR name = 'New session'`), then fires a single non-streaming chat completion with the spec prompt. Qwen3 chat templates emit chain-of- thought into reasoning_content and burn the entire max_tokens budget without producing visible output, so the request includes `chat_template_kwargs: { enable_thinking: false }` and max_tokens=30. Title is trimmed, quote-stripped, "Title:" prefix dropped, and truncated to 60 chars before a guarded UPDATE on sessions.name. New `session_renamed` WS frame propagates to the open session view directly and to the project's session list via a tiny module-scope event bus (apps/web/src/hooks/sessionEvents.ts) — kept dumb: one event type, two methods, no library. Cleanups: dropped the now-unused splitCodeBlocks export from CodeBlock.tsx (react-markdown supersedes it), and added a long-form NOTE in auto_name.ts documenting the enable_thinking + max_tokens pattern for any future Qwen- family non-streaming utility calls (planned: fork-message, agent-routing, web-search summarization). Schema bootstrap remains idempotent (ADD COLUMN IF NOT EXISTS). Auth, broker, clock_timestamp() conventions, and zod validation all unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:52:40 +00:00
parent a7f218e182
commit 2464d23bb6
18 changed files with 1559 additions and 94 deletions
--- a/apps/server/src/routes/messages.ts
+++ b/apps/server/src/routes/messages.ts
@@ -8,12 +8,13 @@ const SendBody = z.object({
 });

 interface MessageHandlers {
-  onSend: (sessionId: string, userMessageId: string, assistantMessageId: string) => void;
+  enqueueInference: (sessionId: string, assistantMessageId: string) => void;
  publishUserMessage: (
    sessionId: string,
    userMessageId: string,
    content: string
  ) => void;
+  publishMessagesDeleted: (sessionId: string, messageIds: string[]) => void;
 }

 export function registerMessageRoutes(
@@ -30,7 +31,8 @@ export function registerMessageRoutes(
        return { error: 'session not found' };
      }
      const rows = await sql<Message[]>`
-        SELECT id, session_id, role, content, tool_calls, tool_results, status, last_seq, created_at
+        SELECT id, session_id, role, content, tool_calls, tool_results, status, last_seq,
+               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
        FROM messages
        WHERE session_id = ${req.params.id}
        ORDER BY created_at ASC, id ASC
@@ -74,10 +76,66 @@ export function registerMessageRoutes(
        result.user_message_id,
        parsed.data.content
      );
-      handlers.onSend(req.params.id, result.user_message_id, result.assistant_message_id);
+      handlers.enqueueInference(req.params.id, result.assistant_message_id);

      reply.code(202);
      return result;
    }
  );
+
+  app.post<{ Params: { id: string; message_id: string } }>(
+    '/api/sessions/:id/messages/:message_id/regenerate',
+    async (req, reply) => {
+      const { id: sessionId, message_id: targetId } = req.params;
+
+      const target = await sql<{ id: string; role: string; status: string }[]>`
+        SELECT id, role, status
+        FROM messages
+        WHERE session_id = ${sessionId} AND id = ${targetId}
+      `;
+      if (target.length === 0) {
+        reply.code(404);
+        return { error: 'message not found' };
+      }
+      const targetRow = target[0]!;
+      if (targetRow.role !== 'assistant') {
+        reply.code(400);
+        return { error: 'only assistant messages can be regenerated' };
+      }
+      if (targetRow.status === 'streaming') {
+        reply.code(409);
+        return { error: 'message is still streaming' };
+      }
+
+      const { newAssistantId, deletedIds } = await sql.begin(async (tx) => {
+        // Subquery keeps created_at in postgres at TIMESTAMPTZ µs precision.
+        // Round-tripping through JS Date loses sub-ms precision and can pull
+        // earlier rows (e.g. the triggering user message) into the >= bound.
+        const deletedRows = await tx<{ id: string }[]>`
+          DELETE FROM messages
+          WHERE session_id = ${sessionId}
+            AND created_at >= (
+              SELECT created_at FROM messages WHERE id = ${targetId}
+            )
+          RETURNING id
+        `;
+        const [row] = await tx<{ id: string }[]>`
+          INSERT INTO messages (session_id, role, content, status, created_at)
+          VALUES (${sessionId}, 'assistant', '', 'streaming', clock_timestamp())
+          RETURNING id
+        `;
+        await tx`UPDATE sessions SET updated_at = NOW() WHERE id = ${sessionId}`;
+        return {
+          newAssistantId: row!.id,
+          deletedIds: deletedRows.map((r) => r.id),
+        };
+      });
+
+      handlers.publishMessagesDeleted(sessionId, deletedIds);
+      handlers.enqueueInference(sessionId, newAssistantId);
+
+      reply.code(202);
+      return { assistant_message_id: newAssistantId };
+    }
+  );
 }