Skip to content

feat(v4): add z.script() for unicode script validation#5894

Draft
FucciUnavailable wants to merge 2 commits intocolinhacks:mainfrom
FucciUnavailable:feat/unicode-script-validator-v1
Draft

feat(v4): add z.script() for unicode script validation#5894
FucciUnavailable wants to merge 2 commits intocolinhacks:mainfrom
FucciUnavailable:feat/unicode-script-validator-v1

Conversation

@FucciUnavailable
Copy link
Copy Markdown
Contributor

Will eventually close #5804

This PR adds z.script(name) which validates that every character in the string belong to the given unicode script, using the \p{Script=lang} property escape built into the JS engine

Expected behavior:

z.script("Arabic").parse("حفلة"); // passes
z.script("Latin").parse("hello"); // passes
z.script("Cyrillic").parse("привет"); // passes

z.string().script("Cyrillic") // the method form also available and passed the tests

z.script("NotAScript") // throws syntax error at Schema creation; The JS Engine rejects unknowin script identifiers.

Notes:

  • Invalid script names fail fast at schema creation time (not parse time) => the JS engine rejects unknown \p{Script=...} identifiers immediately

  • Strict by design: spaces, punctuation, and diacritics outside the script's own code points will fail (suitable for validating individual words/tokens, not full sentences) => Support can be extended with Script_Extensions instead of Script ^\p{Script_Extensions=Arabic}+$

  • Script names are the official Unicode identifiers (Old_Persian, Linear_B..etc) => casing matters

  • New scripts are supported automatically as Node.js/V8 updates their ICU Unicode tables => nothing to maintain in Zod

Motivation:

Making zod even more accessible worldwide! :D

Important:

Named aliases (z.arabic(), z.latin(), etc.) are intentionally left out each would be a one-liner in classic/schemas.ts, but picking which of the 150+ Unicode scripts deserve a named helper is an arbitrary call.
z.script("Arabic") is already readable, and the escape hatch is there if we want to ship named aliases later.

Reference Issue: #5804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Built-in Unicode / Script Validators

1 participant