feat(v4): add z.script() for unicode script validation#5894
Draft
FucciUnavailable wants to merge 2 commits intocolinhacks:mainfrom
Draft
feat(v4): add z.script() for unicode script validation#5894FucciUnavailable wants to merge 2 commits intocolinhacks:mainfrom
FucciUnavailable wants to merge 2 commits intocolinhacks:mainfrom
Conversation
…t can be extended further to other input
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Will eventually close #5804
This PR adds z.script(name) which validates that every character in the string belong to the given unicode script, using the \p{Script=lang} property escape built into the JS engine
Expected behavior:
z.script("Arabic").parse("حفلة"); // passes
z.script("Latin").parse("hello"); // passes
z.script("Cyrillic").parse("привет"); // passes
z.string().script("Cyrillic") // the method form also available and passed the tests
z.script("NotAScript") // throws syntax error at Schema creation; The JS Engine rejects unknowin script identifiers.
Notes:
Invalid script names fail fast at schema creation time (not parse time) => the JS engine rejects unknown \p{Script=...} identifiers immediately
Strict by design: spaces, punctuation, and diacritics outside the script's own code points will fail (suitable for validating individual words/tokens, not full sentences) => Support can be extended with Script_Extensions instead of Script
^\p{Script_Extensions=Arabic}+$Script names are the official Unicode identifiers (Old_Persian, Linear_B..etc) => casing matters
New scripts are supported automatically as Node.js/V8 updates their ICU Unicode tables => nothing to maintain in Zod
Motivation:
Making zod even more accessible worldwide! :D
Important:
Named aliases (z.arabic(), z.latin(), etc.) are intentionally left out each would be a one-liner in classic/schemas.ts, but picking which of the 150+ Unicode scripts deserve a named helper is an arbitrary call.
z.script("Arabic") is already readable, and the escape hatch is there if we want to ship named aliases later.
Reference Issue: #5804