Published
Dominik Chrástecký - Blog New in PHP 8.5: Levenshtein Comparison for UTF-8 Strings New in PHP 8.5: Levenshtein Comparison for UTF-8 Strings- 3 min read
New in PHP 8.5: Levenshtein Comparison for UTF-8 Strings

PHP 8.5 adds a new function for calculating the Levenshtein distance between strings — now with proper UTF-8 support.
PHP has long had a levenshtein() function, but it comes with a significant limitation: it doesn’t support UTF-8.
If you’re not familiar with the Levenshtein distance, it’s a way to measure how different two strings are — by counting the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.
For example, the following code returns 2 instead of the correct result, 1:
var_dump(levenshtein('göthe', 'gothe'));
There are workarounds — such as using a pure PHP implementation or converting strings to a custom single-byte encoding — but they come with downsides, like slower performance or non-standard behavior.
With the new grapheme_levenshtein() function in PHP 8.5, the code above now correctly returns 1.
Grapheme-Based Comparison
What makes this new function especially powerful is that it operates on graphemes, not bytes or code points. For instance, the character é (accented 'e') can be represented in two ways: as a single code point (U+00E9) or as a combination of the letter e (U+0065) and a combining accent (U+0301). In PHP, you can write these as:
$string1 = "\u{00e9}";
$string2 = "\u{0065}\u{0301}";
Even though these strings are technically different at the byte level, they represent the same grapheme. The new grapheme_levenshtein() function correctly recognizes this and returns 0 — meaning no difference.
This is particularly useful when working with complex scripts such as Japanese, Chinese, or Korean, where grapheme clusters play a bigger role than in Latin or Cyrillic alphabets.
Just for fun: what do you think the original levenshtein() function will return for the example above?
var_dump(levenshtein("\u{0065}\u{0301}", "\u{00e9}"));- New in PHP 8.5: Marking Return Values as Important
- New in PHP 8.5: Attributes on Constants
- New in PHP 8.5: Final Promoted Properties
- New in PHP 8.5: The Pipe Operator
- New in PHP 8.5: Asymmetric Visibility for Static Properties
- New in PHP 8.5: Closures as Constant Expressions
- New in PHP 8.5: Small Features, Big Impact