トークンはLLMがテキストを処理する単位(単語の断片)であり、コンテキストウィンドウはLLMが一度に考慮できるテキスト(トークン)の最大量です。これらを理解することは、LLMを効果的に使い、コストを管理し、その限界に対処する上で重要です。
トークンとは
TOKEN → the unit LLMs process text in (not words/characters, but PIECES):
→ text is split into tokens (roughly ~4 characters or ~0.75 words each in English)
→ e.g. 'unbelievable' might be 3 tokens; common words are often 1 token
→ the model processes and generates token by token
→ LLMs work in tokens (input and output are measured in tokens)
コンテキストウィンドウ
CONTEXT WINDOW → the maximum number of TOKENS an LLM can process at once (input + output):
→ everything the model 'sees' (your prompt + conversation + retrieved context) must FIT
→ ranges from thousands to millions of tokens (varies by model)
→ BEYOND the limit → the model can't consider it (truncated/doesn't fit)
→ a hard limit on how much context the model can work with at once
