Four features land together on this branch:
1. Markdown rendering — assistant messages go through react-markdown +
remark-gfm. Fenced code blocks render via existing CodeBlock (with copy
button); inline `code` is styled inline. User messages stay plain text.
No raw HTML (no rehype-raw).
2. Per-message Copy + Regenerate. New endpoint
POST /api/sessions/:id/messages/:message_id/regenerate validates the
target (404/400/409), atomically deletes the target plus any later
messages in the session, inserts a fresh streaming assistant row, and
enqueues a normal inference run. The DELETE bound uses a SQL subquery
(`created_at >= (SELECT created_at FROM messages WHERE id = $1)`)
instead of a JS round-trip so postgres TIMESTAMPTZ µs precision is
preserved — otherwise sub-ms clock_timestamp() differences between the
user row and the assistant row collapsed to the same JS Date, pulling
the triggering user message into the >= bound. New `messages_deleted`
WS frame so already-connected clients prune the stale tail without
needing a full snapshot resend.
3. tok/s + ctx counter. Five new nullable message columns: tokens_used,
ctx_used, ctx_max, started_at, finished_at. started_at is set right
before the OpenAI call in services/inference.ts (not in the route, not
in the frame handler); finished_at + tokens_used + ctx_used + ctx_max
are committed in the same UPDATE that flips status to 'complete'. The
inference request now opts into stream_options.include_usage so the
final chunk carries usage; defensive parsing also picks up timings.n_ctx
when llama.cpp emits it (currently absent for our llama-swap models, so
ctx_max stays NULL and the UI just shows `<used> ctx`). message_complete
frame extended with tokens_used / ctx_used / ctx_max / started_at /
finished_at / model. Frontend StatsLine in MessageBubble computes tok/s
client-side from the timestamps and renders muted mono text below the
body of completed assistant messages.
4. AI chat naming after the first turn. Backend services/auto_name.ts
runs via setImmediate after the top-level inference resolves; it
checks that there is exactly one completed assistant message and that
the session has not been user-renamed (`name IS NULL OR name = '' OR
name = 'New session'`), then fires a single non-streaming chat
completion with the spec prompt. Qwen3 chat templates emit chain-of-
thought into reasoning_content and burn the entire max_tokens budget
without producing visible output, so the request includes
`chat_template_kwargs: { enable_thinking: false }` and max_tokens=30.
Title is trimmed, quote-stripped, "Title:" prefix dropped, and
truncated to 60 chars before a guarded UPDATE on sessions.name. New
`session_renamed` WS frame propagates to the open session view
directly and to the project's session list via a tiny module-scope
event bus (apps/web/src/hooks/sessionEvents.ts) — kept dumb: one event
type, two methods, no library.
Cleanups: dropped the now-unused splitCodeBlocks export from CodeBlock.tsx
(react-markdown supersedes it), and added a long-form NOTE in auto_name.ts
documenting the enable_thinking + max_tokens pattern for any future Qwen-
family non-streaming utility calls (planned: fork-message, agent-routing,
web-search summarization).
Schema bootstrap remains idempotent (ADD COLUMN IF NOT EXISTS). Auth,
broker, clock_timestamp() conventions, and zod validation all unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
47 lines
1.5 KiB
TypeScript
47 lines
1.5 KiB
TypeScript
import type { FastifyInstance } from 'fastify';
|
|
import type { Sql } from '../db.js';
|
|
import type { Broker } from '../services/broker.js';
|
|
import type { Message } from '../types/api.js';
|
|
|
|
export function registerWebSocket(
|
|
app: FastifyInstance,
|
|
sql: Sql,
|
|
broker: Broker
|
|
): void {
|
|
app.get<{ Params: { id: string } }>(
|
|
'/api/ws/sessions/:id',
|
|
{ websocket: true },
|
|
async (socket, req) => {
|
|
const sessionId = req.params.id;
|
|
|
|
const session = await sql<{ id: string }[]>`SELECT id FROM sessions WHERE id = ${sessionId}`;
|
|
if (session.length === 0) {
|
|
socket.send(JSON.stringify({ type: 'error', error: 'session not found' }));
|
|
socket.close(1008, 'session not found');
|
|
return;
|
|
}
|
|
|
|
const messages = await sql<Message[]>`
|
|
SELECT id, session_id, role, content, tool_calls, tool_results, status, last_seq,
|
|
tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at
|
|
FROM messages
|
|
WHERE session_id = ${sessionId}
|
|
ORDER BY created_at ASC, id ASC
|
|
`;
|
|
socket.send(JSON.stringify({ type: 'snapshot', messages }));
|
|
|
|
const unsubscribe = broker.subscribe(sessionId, (frame) => {
|
|
if (socket.readyState !== socket.OPEN) return;
|
|
try {
|
|
socket.send(JSON.stringify(frame));
|
|
} catch (err) {
|
|
app.log.warn({ err, sessionId }, 'ws send failed');
|
|
}
|
|
});
|
|
|
|
socket.on('close', () => unsubscribe());
|
|
socket.on('error', () => unsubscribe());
|
|
}
|
|
);
|
|
}
|