Context Window

The maximum amount of text (measured in tokens) that a language model can process at once, including both the input prompt and generated output.

Also known as:Context LengthToken Limit

What is a Context Window?

A context window (or context length) is the maximum number of tokens a language model can process in a single interaction. It encompasses both the input prompt and the model's response, defining the "working memory" of the model.

Context Window Sizes

Historical

  • GPT-2: 1,024 tokens
  • GPT-3: 4,096 tokens

Current Generation

  • GPT-4: 8K-128K tokens
  • Claude: 100K-200K tokens
  • Gemini: Up to 1M tokens

Token Basics

What is a Token?

  • Roughly 4 characters in English
  • ~750 words ≈ 1,000 tokens
  • Varies by language

Token Counting

  • Input tokens (prompt)
  • Output tokens (response)
  • Total must fit in window

Implications

Capabilities

  • Longer documents can be processed
  • More context for better responses
  • Extended conversations

Limitations

  • Cost increases with tokens
  • Processing time increases
  • "Lost in the middle" phenomenon

Strategies for Long Content

Chunking Break content into pieces.

Summarization Compress information.

RAG Retrieve relevant context.

Sliding Window Process sequentially.

Best Practices

  • Use context efficiently
  • Prioritize relevant information
  • Consider cost implications
  • Test with various lengths