Tokens ni vitengo vinavyotengenezwa na LLMs wakati wa kuchakata maandishi (vipande vya maneno), na context window ni kiwango cha juu zaidi cha maandishi (tokens) ambayo LLM inaweza kuzingatia kwa wakati mmoja. Kuelewa haya ni muhimu kwa ajili ya kumtumia LLM kwa ufanisi, kusimamia gharama, na kushughulikia mipango yake.
Tokens ni nini
TOKEN → the unit LLMs process text in (not words/characters, but PIECES):
→ text is split into tokens (roughly ~4 characters or ~0.75 words each in English)
→ e.g. 'unbelievable' might be 3 tokens; common words are often 1 token
→ the model processes and generates token by token
→ LLMs work in tokens (input and output are measured in tokens)
