What are Tokens? A deep-dive into one of the key concepts of AI

Tokens are the base units of text that commonly known AI models use to understand and learn language. They are used for natural language processing. Let’s imagine you have asked an AI model to find nice hiking routes in South of Europe for a weekend getaway with friends. In a very short time frame, you’ll receive a perfect overview of beautiful hiking routes varying in location, difficulty and length. The question is, how does the model know what you ask and what it should look for? Here is where the concept of tokens comes in.

Understand how Tokens work

A token is the smallest unit that a large language model such as Claude Sonnet 4, Mistral 3 or ChatGPT uses to understand and process language. In essence, these are the building blocks for AI systems to break down larger pieces of text into smaller units or blocks. Therefore, it can more effectively analyse language and generate responses.

During the process of tokenization, the AI model converts longer texts into smaller, manageable pieces most commonly comprised of words or phrases. Before an AI model processes any input, it divides the text based on spaces, punctuation and other delimiters. You can compare it to slicing a watermelon to eat; an LLM needs to break up the content into smaller pieces so it can digest it.

For example, the quote from ice hockey player Wayne Gretzky, ‘You miss 100% of the shots you do not take,’ will be broken down into eleven tokens, including words and spaces.

To understand token length, the rule of thumb says:

  • 1 token ≈ four characters in English
  • 1 token ≈ ¾ words
  • 100 tokens ≈ 75 words

This process allows the AI to analyse and digest human language into a form it can understand, enabling it to process human inputs and provide answers. We distinguish between input tokens, which are the human prompt (e.g. ‘Find me hiking routes in South of Europe’), and output tokens, which are the answers generated by the AI model (e.g. ‘Provide a full list of nine destinations across three countries’).

Why do Tokens matter?

Tokens are important to understand in the context of AI for 2 reasons:

  1. Token limits: AI models have a limited number of tokens they can process in one go which is known as the “context window” or think of it as the attention span. The context includes prompts and past exchanges. Higher token limits mean the model can manage longer inputs and keep context over extended conversations. The limit ranges from a few thousand for smaller models to tens of thousands for larger models. Knowing these token limits is important because they impact performance, cost and efficiency. By understanding the nature and number of tokens, users can interact more effectively with the AI model.
  2. Cost control: Large foundational model builders such as Open AI, Anthropic or Mistral charge based on token-usage when consumers access their AI services. Therefore they can track how much their products are used and often charge for input (prompt) and output tokens (completion). The more tokens you feed into the system, the higher the costs. Token limits help to control costs.

As tokens are key for AI models and their usage, there are a few strategies for managing tokens effectively:

  • Keep prompts concise and don’t mix too many topics in one request
  • Break longer conversations into shorter exchanges to avoid reaching limits soon
  • Use a tokenizer tool to count tokens upfront and estimate costs

What are the challenges of tokens?

Tokens are key to AI and its applications, however it comes with various challenges.

Ambiguity – Human language is ambiguous and leaves room for interpretation, depending on the context. Tokenization alone is not able to always resolve this and there could be misinterpretation. For example, in the sentence “The kids want to play”, the word play could be a noun or a verb. During the tokenization process the model could confuse it without the right context. This leads to inaccurate results, especially in sentiment analysis or translation.

  • Double-meaning of words such as “cool” which could mean that is chilly or it was nice . If not provided the right context, during the tokenization process the wrong word could be picked up
  • If words are put together such as ice cream, then splitting it can turn it into completely different meanings
  • Company name ambiguity with for example Apple, that could be the fruit or the company

Language boundaries – In languages such as Chinese or Japanese, don’t use spaces between words and this makes tokenization quite difficult with traditional methods to find separation boundaries. For example, the word hot dog means  热狗 in Chinese, but it is not clear if hot or dog or both. A simple tokenizing process might confuse it and it could result in errors. This means that tokenization needs are different depending on the language and therefore the strategy needs to be adjusted for e.g., Chinese or Arabic languages.

Special cases – Involve special characters, numbers and abbreviations that do not fit into standard tokenization rules. Therefore the AI model needs to handle unique situations correctly.

  • Website and email addresses are one unit but the AI model might break it and then it results in wrong processing
  • With numbers and symbols the AI model needs to decide if for example the phone number is one token or split depending on the context
  • If the context is not right, words like U.S.A. or decision-making could be handled as one token or split up, again depending on the context

Conclusion

Now you understood the magic behind your simple request about the hiking routes in South of France and how the AI model generates the responses. In the end, tokens are the secret little building blocks that make our conversations with AI possible. They determine how much the model can remember, how much it costs, and how well it can understand us — even if we’re asking about hiking routes or quoting Wayne Gretzky. So next time you chat with an AI, remember: every word, every pause, every “cool” counts… token by token. And just like us after a long hike, models can only take in so much before they need a break.