Glossary

Token

A token is the fundamental unit of text that large language models process. Tokens are fragments of words, whole words, or punctuation marks that the model reads and generates. In English, one token is roughly 3/4 of a word, so 100 words is approximately 130-140 tokens.

How it works

LLMs do not process text as characters or words, they use tokens. A tokenizer splits input text into tokens based on patterns learned from training data. Common words like "the" or "hello" are single tokens, while uncommon words are split into multiple tokens ("tokenization" might become "token" + "ization").

Tokens matter for three practical reasons:

1. **Pricing**: LLM APIs charge per token (input + output). More tokens = higher cost. 2. **Context window**: Each model has a maximum token limit for the combined input and output. Exceeding it means truncating context. 3. **Latency**: More output tokens = longer response time, since LLMs generate one token at a time.

For a typical customer support interaction: the system prompt uses 200-500 tokens, RAG context uses 500-2,000 tokens, the customer question uses 20-100 tokens, and the AI response uses 100-500 tokens.

Operational Review

In practice, token should be evaluated by what it changes in the support workflow. Ask whether it improves answer accuracy, reduces repeated agent work, clarifies handoff decisions, or makes reporting easier. If the answer is only "it sounds modern," the concept is not yet operational.

A concrete example is token cost calculation for a support conversation: A typical AI support conversation uses: 400 tokens (system prompt) + 1,200 tokens (RAG context) + 50 tokens (customer question) + 200 tokens (AI response) = 1,850 tokens. At GPT-5 pricing, this costs approximately $0.005 per conversation, enabling thousands of AI conversations for dollars, not hundreds.

The simplest takeaway is: Tokens are the basic text units LLMs process, roughly 3/4 of a word in English

Why it matters

Understanding tokens is essential for managing AI chatbot costs and performance. Token usage directly determines your monthly LLM costs, response speed, and how much context the AI can consider when generating answers. Over-engineering prompts or including excessive RAG context wastes tokens and money without improving answer quality.

How Chatsy uses token

Chatsy manages token usage automatically by optimizing system prompts, intelligently selecting the most relevant RAG passages (rather than stuffing the context), and managing conversation history to stay within model limits. Usage-based pricing on Chatsy is calculated from token consumption, making efficient token management a direct cost savings.

Real-world examples

Token cost calculation for a support conversation

A typical AI support conversation uses: 400 tokens (system prompt) + 1,200 tokens (RAG context) + 50 tokens (customer question) + 200 tokens (AI response) = 1,850 tokens. At GPT-5 pricing, this costs approximately $0.005 per conversation, enabling thousands of AI conversations for dollars, not hundreds.

Context window management in long conversations

A customer has a 15-message troubleshooting conversation. The full history exceeds the context window, so the system uses a sliding window, keeping the system prompt, the latest RAG context, and the most recent 8 messages while summarizing earlier messages. This maintains conversational continuity within token limits.

Token optimization for cost reduction

A high-volume support team reduces their AI costs by 40% by: shortening the system prompt from 800 to 300 tokens, limiting RAG context to the top 3 passages instead of 10, and setting a maximum response length of 200 tokens for simple FAQ answers.

Key takeaways

Tokens are the basic text units LLMs process, roughly 3/4 of a word in English
LLM pricing is per-token, making token efficiency directly tied to cost management
Context windows limit total tokens (input + output), requiring careful management for long conversations
A typical support interaction uses 1,500-3,000 tokens total, costing $0.003-$0.01 depending on the model
Token optimization (efficient prompts, selective RAG context) can reduce costs by 30-50% without quality loss

Frequently asked questions

How many tokens are in a typical sentence?

An average English sentence of 15-20 words uses approximately 20-27 tokens. Exact count varies by vocabulary, common words use fewer tokens while technical or uncommon words use more. Most LLM providers offer free tokenizer tools to check exact counts.

Why do LLMs use tokens instead of words?

Tokens provide a balance between character-level processing (too granular, very slow) and word-level processing (too many unique words to handle efficiently). Tokenization reduces the vocabulary to 50,000-100,000 tokens that can represent any text efficiently, including code, numbers, and multiple languages.

How do tokens affect chatbot pricing?

LLM APIs charge per 1,000 tokens (input and output separately). Input tokens (your prompt + context) are cheaper than output tokens (the AI response). A typical support conversation costs $0.003-$0.01 in token fees. Platform pricing like Chatsy bundles token costs into conversation-based pricing for simpler budgeting.

Do different languages use different numbers of tokens?

Yes. English is the most token-efficient language because LLMs are primarily trained on English text. Languages using non-Latin scripts (Chinese, Japanese, Korean, Arabic) can use 2-3x more tokens for the same semantic content, which increases costs for multilingual deployments.

Related terms

Large Language Model (LLM)

A Large Language Model (LLM) is a type of AI model trained on enormous amounts of text data to understand and generate h...

Context Window

A context window is the maximum number of tokens that a large language model can process in a single request, encompassi...

Prompt Engineering

Prompt engineering is the practice of designing, structuring, and refining the instructions (prompts) given to large lan...

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retriev...

Token

How it works

Operational Review

Why it matters

How Chatsy uses token

Real-world examples

Token cost calculation for a support conversation

Context window management in long conversations

Token optimization for cost reduction

Key takeaways

Frequently asked questions

Related terms

Large Language Model (LLM)

Context Window

Prompt Engineering

Retrieval-Augmented Generation (RAG)

Further reading

Related Resources

See token in action

Browse the glossary

Token

How it works

Operational Review

Why it matters

How Chatsy uses token

Real-world examples

Token cost calculation for a support conversation

Context window management in long conversations

Token optimization for cost reduction

Key takeaways

Frequently asked questions

Related terms

Large Language Model (LLM)

Context Window

Prompt Engineering

Retrieval-Augmented Generation (RAG)

Further reading

Related Resources

See token in action

Browse the glossary