## ADDED Requirements

### Requirement: Token budget management
The system SHALL manage LLM context window limits by tracking token usage and triggering summarization when thresholds are exceeded.

#### Scenario: Token threshold exceeded
- **WHEN** cumulative message tokens exceed `max_tokens` configuration
- **THEN** the system SHALL identify messages to summarize starting from oldest
- **THEN** the system SHALL replace summarized messages with a `RunningSummary` object
- **THEN** the system SHALL ensure remaining messages + summary fit within `max_tokens` budget

#### Scenario: Partial token budget allocation
- **WHEN** `max_summary_tokens` is configured (default 256)
- **THEN** the system SHALL reserve `max_summary_tokens` tokens for the summary itself
- **THEN** remaining messages SHALL be trimmed to fit within `max_tokens - max_summary_tokens`

### Requirement: Incremental summarization
The system SHALL support incremental summarization across multiple turns, tracking which messages have already been summarized to avoid redundant work.

#### Scenario: First summarization
- **WHEN** no existing `RunningSummary` exists and token threshold is exceeded
- **THEN** the system SHALL call the LLM with an initial summary prompt
- **THEN** the system SHALL return a `RunningSummary` with `summary`, `summarized_message_ids` set, and `last_summarized_message_id`

#### Scenario: Subsequent summarization (append)
- **WHEN** a `RunningSummary` exists and new messages exceed threshold
- **THEN** the system SHALL call the LLM with the existing summary plus new messages
- **THEN** the system SHALL extend `summarized_message_ids` with newly summarized message IDs
- **THEN** the system SHALL update `last_summarized_message_id`

### Requirement: Context trimming with summarization hook
The system SHALL provide a hook that fires before messages are discarded, allowing the daily tier to capture summarized content.

#### Scenario: Pre-trim flush
- **WHEN** messages are about to be discarded (summarized)
- **THEN** the system SHALL fire a `memory_flush_hook` with the messages being summarized
- **THEN** the hook SHALL queue the messages for async memory extraction
- **THEN** the main thread SHALL NOT block on memory extraction

### Requirement: Token counting with fallback
The system SHALL provide accurate token counting using `tiktoken` when available, with a char-based fallback.

#### Scenario: tiktoken available
- **WHEN** tiktoken package is installed
- **THEN** the system SHALL use `tiktoken.get_encoding("cl100k_base")` for token counting
- **THEN** token counts SHALL be accurate per OpenAI/Anthropic tokenization

#### Scenario: tiktoken unavailable
- **WHEN** tiktoken is not installed
- **THEN** the system SHALL fall back to character-based estimation: `len(text) // 4`
- **THEN** the system SHALL log a warning about missing tiktoken

### Requirement: Summarization node for LangGraph
The system SHALL provide a `SummarizationNode` Runnable that integrates into LangGraph state graphs.

#### Scenario: Graph integration
- **WHEN** `SummarizationNode` is added to a LangGraph workflow
- **THEN** it SHALL read messages from `input_messages_key` (default "messages")
- **THEN** it SHALL write updated messages to `output_messages_key` (default "summarized_messages")
- **THEN** it SHALL store `RunningSummary` in `context.running_summary`

#### Scenario: Equality of input/output keys
- **WHEN** `input_messages_key` equals `output_messages_key`
- **THEN** the node SHALL emit a `RemoveMessage(REMOVE_ALL_MESSAGES)` to clear previous state
- **THEN** the node SHALL write the new message list including the summary