## ADDED Requirements ### Requirement: Token budget management The system SHALL manage LLM context window limits by tracking token usage and triggering summarization when thresholds are exceeded. #### Scenario: Token threshold exceeded - **WHEN** cumulative message tokens exceed `max_tokens` configuration - **THEN** the system SHALL identify messages to summarize starting from oldest - **THEN** the system SHALL replace summarized messages with a `RunningSummary` object - **THEN** the system SHALL ensure remaining messages + summary fit within `max_tokens` budget #### Scenario: Partial token budget allocation - **WHEN** `max_summary_tokens` is configured (default 256) - **THEN** the system SHALL reserve `max_summary_tokens` tokens for the summary itself - **THEN** remaining messages SHALL be trimmed to fit within `max_tokens - max_summary_tokens` ### Requirement: Incremental summarization The system SHALL support incremental summarization across multiple turns, tracking which messages have already been summarized to avoid redundant work. #### Scenario: First summarization - **WHEN** no existing `RunningSummary` exists and token threshold is exceeded - **THEN** the system SHALL call the LLM with an initial summary prompt - **THEN** the system SHALL return a `RunningSummary` with `summary`, `summarized_message_ids` set, and `last_summarized_message_id` #### Scenario: Subsequent summarization (append) - **WHEN** a `RunningSummary` exists and new messages exceed threshold - **THEN** the system SHALL call the LLM with the existing summary plus new messages - **THEN** the system SHALL extend `summarized_message_ids` with newly summarized message IDs - **THEN** the system SHALL update `last_summarized_message_id` ### Requirement: Context trimming with summarization hook The system SHALL provide a hook that fires before messages are discarded, allowing the daily tier to capture summarized content. #### Scenario: Pre-trim flush - **WHEN** messages are about to be discarded (summarized) - **THEN** the system SHALL fire a `memory_flush_hook` with the messages being summarized - **THEN** the hook SHALL queue the messages for async memory extraction - **THEN** the main thread SHALL NOT block on memory extraction ### Requirement: Token counting with fallback The system SHALL provide accurate token counting using `tiktoken` when available, with a char-based fallback. #### Scenario: tiktoken available - **WHEN** tiktoken package is installed - **THEN** the system SHALL use `tiktoken.get_encoding("cl100k_base")` for token counting - **THEN** token counts SHALL be accurate per OpenAI/Anthropic tokenization #### Scenario: tiktoken unavailable - **WHEN** tiktoken is not installed - **THEN** the system SHALL fall back to character-based estimation: `len(text) // 4` - **THEN** the system SHALL log a warning about missing tiktoken ### Requirement: Summarization node for LangGraph The system SHALL provide a `SummarizationNode` Runnable that integrates into LangGraph state graphs. #### Scenario: Graph integration - **WHEN** `SummarizationNode` is added to a LangGraph workflow - **THEN** it SHALL read messages from `input_messages_key` (default "messages") - **THEN** it SHALL write updated messages to `output_messages_key` (default "summarized_messages") - **THEN** it SHALL store `RunningSummary` in `context.running_summary` #### Scenario: Equality of input/output keys - **WHEN** `input_messages_key` equals `output_messages_key` - **THEN** the node SHALL emit a `RemoveMessage(REMOVE_ALL_MESSAGES)` to clear previous state - **THEN** the node SHALL write the new message list including the summary