How Many R’s in the Word “Strawberry?”

September 2, 2024 Samuel Mormando

In our AI-driven world today, we increasingly rely on large language models (LLMs) for quick answers and insights. However, these models, while powerful, have their limitations. A common mistake they make highlights why it’s crucial to always verify information: asking how many ‘r’s are in the word “strawberry.” This might seem simple for humans, but AI’s unique way of understanding text can lead to surprising errors.

Human vs. AI: How We Understand Text Differently

Human Understanding: When humans read, we process text linearly, recognizing each letter in sequence. For us, “strawberry” is easily understood as a string of characters, making it straightforward to count the letters or recognize patterns.
AI Understanding: AI models, on the other hand, don’t see words in the same way. They rely on a process called tokenization, which breaks down text into smaller units called tokens. These can be entire words, parts of words, or even single characters, depending on the model’s training.

The Tokenization Process: A Peek Inside AI’s Mind

Tokenization Explained: Tokenization is how AI deconstructs text into manageable parts. For example, “strawberry” might be tokenized as “straw” and “berry” or even as a single token if the word is frequent in the training data.
Why This Matters: When you ask an AI to count letters, such as ‘r’s in “strawberry,” it doesn’t naturally parse the word at the character level. Instead, it sees larger chunks, like “straw” and “berry,” missing the detailed breakdown humans easily see. It’s like trying to count individual threads in a rope without unravelling it first.

Compound Words: More Than Just ‘Strawberry’

Challenges with Compound Words: Similar issues arise with compound words like “timekeeper.” While humans see a single word, AI might split it into “time” and “keeper.” This tokenization helps AI understand context but falters when exact details, like letter counts, are needed.
An Example with ‘Timekeeper’: Asking an AI how many ‘e’s are in “timekeeper” can yield inconsistent results. The AI might count the ‘e’s separately in “time” and “keeper” or even miscount due to differing starting points (like starting from 0 or 1), leading to errors.

Why Verifying AI Output Is Essential

AI’s Strengths and Limitations: While AI excels at processing vast amounts of information and finding patterns, it doesn’t always get the finer details right. Understanding how AI works — and its limitations — empowers us to use these tools more effectively.
Practical Takeaway: Always verify information from AI, especially for tasks requiring precise details like counting characters. Trust, but verify, as even the most advanced models can make simple yet significant mistakes.

Conclusion: AI is an incredible tool, but it’s not infallible. By understanding its strengths and weaknesses, we can better harness its power while avoiding pitfalls. So next time you use AI, whether it’s for counting letters or more complex tasks, remember the humble strawberry and always double-check the details.