A token is the fundamental unit of text that large language models process. Tokens are fragments of words, whole words, or punctuation marks that the model reads and generates. In English, one token is roughly 3/4 of a word, so 100 words is approximately 130-140 tokens.
LLMs do not process text as characters or words, they use tokens. A tokenizer splits input text into tokens based on patterns learned from training data. Common words like "the" or "hello" are single tokens, while uncommon words are split into multiple tokens ("tokenization" might become "token" + "ization").
Tokens matter for three practical reasons:
1. **Pricing**: LLM APIs charge per token (input + output). More tokens = higher cost. 2. **Context window**: Each model has a maximum token limit for the combined input and output. Exceeding it means truncating context. 3. **Latency**: More output tokens = longer response time, since LLMs generate one token at a time.
For a typical customer support interaction: the system prompt uses 200-500 tokens, RAG context uses 500-2,000 tokens, the customer question uses 20-100 tokens, and the AI response uses 100-500 tokens.
In practice, token should be evaluated by what it changes in the support workflow. Ask whether it improves answer accuracy, reduces repeated agent work, clarifies handoff decisions, or makes reporting easier. If the answer is only "it sounds modern," the concept is not yet operational.
A concrete example is token cost calculation for a support conversation: A typical AI support conversation uses: 400 tokens (system prompt) + 1,200 tokens (RAG context) + 50 tokens (customer question) + 200 tokens (AI response) = 1,850 tokens. At GPT-5 pricing, this costs approximately $0.005 per conversation, enabling thousands of AI conversations for dollars, not hundreds.
The simplest takeaway is: Tokens are the basic text units LLMs process, roughly 3/4 of a word in English
An average English sentence of 15-20 words uses approximately 20-27 tokens. Exact count varies by vocabulary, common words use fewer tokens while technical or uncommon words use more. Most LLM providers offer free tokenizer tools to check exact counts.
Tokens provide a balance between character-level processing (too granular, very slow) and word-level processing (too many unique words to handle efficiently). Tokenization reduces the vocabulary to 50,000-100,000 tokens that can represent any text efficiently, including code, numbers, and multiple languages.
LLM APIs charge per 1,000 tokens (input and output separately). Input tokens (your prompt + context) are cheaper than output tokens (the AI response). A typical support conversation costs $0.003-$0.01 in token fees. Platform pricing like Chatsy bundles token costs into conversation-based pricing for simpler budgeting.
A high-volume support team reduces their AI costs by 40% by: shortening the system prompt from 800 to 300 tokens, limiting RAG context to the top 3 passages instead of 10, and setting a maximum response length of 200 tokens for simple FAQ answers.
Yes. English is the most token-efficient language because LLMs are primarily trained on English text. Languages using non-Latin scripts (Chinese, Japanese, Korean, Arabic) can use 2-3x more tokens for the same semantic content, which increases costs for multilingual deployments.