-
Spec compliant: Up-to-date Unicode data, verified by the official Unicode test suites and fuzzed against Rust's own unicode segmentation.
-
Excellent compatibility: It works well in all Luau environments, on or off Roblox.
-
Zero-dependencies: It doesn't bloat your packages and is very easy to review.
-
Small bundle size: It compresses the Unicode data and maintains a very small memory footprint.
-
Extremely efficient: It's carefully optimized for runtime performance.
-
Modern Luau: It's fully type-checked, runs on the new solver, and takes advantage of modern Luau features.
local segmenter = require("@pkg/segmenter")
-- Grapheme clusters
segmenter.splitGraphemes("a̐éö̲\r\n")
-- { "a̐", "é", "ö̲", "\r\n" }
-- Word segments (with isWordLike flag)
segmenter.words("Hello, world!")
-- {
-- { segment = "Hello", index = 1, isWordLike = true },
-- { segment = ",", index = 6, isWordLike = false },
-- { segment = " ", index = 7, isWordLike = false },
-- { segment = "world", index = 8, isWordLike = true },
-- { segment = "!", index = 13, isWordLike = false },
-- }
-- Sentence boundaries
segmenter.splitSentences("Hello! Next")
-- { "Hello! ", "Next" }
-- Category lookups
segmenter.wordCategory(string.byte("A"))
-- => segmenter.WordCategory.ALetterAdd the dependency to your wally.toml:
[dependencies]
UnicodeSegmentation = "grilme99/unicode-segmentation@1.0.0"All segment indices are 1-based byte offsets in the original string.
Segment:{ segment: string, index: number }WordSegment:{ segment: string, index: number, isWordLike: boolean }
segmenter.graphemes(input: string): { Segment }segmenter.splitGraphemes(input: string): { string }segmenter.countGraphemes(input: string): number
segmenter.words(input: string): { WordSegment }segmenter.splitWords(input: string): { string }segmenter.countWords(input: string): number
segmenter.sentences(input: string): { Segment }segmenter.splitSentences(input: string): { string }segmenter.countSentences(input: string): number
Each category lookup returns a numeric enum value; use the matching enum table to interpret it:
segmenter.graphemeCategory(codepoint: number): numbersegmenter.wordCategory(codepoint: number): numbersegmenter.sentenceCategory(codepoint: number): numbersegmenter.GraphemeCategorysegmenter.WordCategorysegmenter.SentenceCategory
Run the full test suite:
lute test testsUnicode® 17.0.0
Unicode® Standard Annex #29 - Revision 47 (2025-08-17)
This library runs in any Luau runtime that supports buffer, utf8, bit32,
and the standard string/table APIs. It is designed to run both in Lute and
on Roblox.
- The Rust Unicode team (@unicode-rs):
The implementation is based heavily on Rust's unicode-segmentation library.