Conversation
|
I read through all of this and it all seemed pretty reasonable! For your two noted issues:
We already rely on |
|
Cool work! Neat to see your patched version of zod-to-mongo-schema. I tried running this (I know it's just a draft, I'm just excited), and hit a couple errors in short order: Servers pidThe insertion into the Servers collection on startup trips over the type of the The validator believes that this field is a long, but when we perform the insertion it appears that the field is getting serialized as an int. I worked around this by swapping the field definition from MonitorConnectAcks portNumberAttaching the schema for the mediasoup MonitorConnectAcks collection failed: This is erroring out on the Puzzles expectedAnswerCountThat got the server to start up. I tried adding a new puzzle, and inserting the new puzzle object failed validation: Again, numbers getting specified in the mongo schema as I think the numeric types in particular are giving us some repeated grief? I remain excited for this overall, and to that end I think it'd be wise for our test plan for this change to include diffing the new validation schemas against our existing validation schemas and either trying to get them to match or at least exercising insertions for any collection where this patchset generates a different validation schema. |
|
Yeah I observed those as well. I was going to leave a comment and update the description but was too tired to actually write it up. I think there are at least three things happening here:
So far, I've mostly just been trying to get things to a point that the app would run, so haven't thought about the actual testing regimen yet, but I agree that diffing the schemas and at least being confident in the changes seems like the right thing to do. I'm working on a patch for the Zod issue to submit upstream. We could theoretically work around it, but Zod is reasonably active so I'm hoping we don't have to. |
|
Created colinhacks/zod#5700 for the first of those issues. |
|
I asked Claude to dump out the schemas and analyze the differences. Here's what it came up with: Schema Diff Report: Zod 3 (main) vs Zod 4Systematic Changes (all/most collections, superficial)These appear across virtually every collection and are safe:
Concerning Changes
|
| main | zod-4 | |
|---|---|---|
| Schema | {minimum: 0, exclusiveMinimum: true, maximum: 65535, bsonType: "int"} |
{exclusiveMinimum: true, maximum: 65535, bsonType: "int"} |
In JSON Schema draft-04, exclusiveMinimum is a boolean modifier of minimum. Without minimum, exclusiveMinimum: true is meaningless — the port > 0 constraint is lost. This is the Zod z.positive() bug already reported upstream.
answer/guess fields gained constraints (medium risk)
Affected: jr_puzzles.answers[], jr_guesses.guess, jr_bookmark_notifications.answer
| main | zod-4 | |
|---|---|---|
| Schema | {bsonType: "string"} |
{type: "string", minLength: 1, pattern: "^[^a-z]+$"} |
The answer custom type likely uses a .pipe() that Zod 3's JSON Schema converter couldn't serialize but Zod 4's can. This is more correct (answers should be uppercase), but it's a real validation tightening. Existing documents with lowercase answers would fail validation on update.
Email regex changed (medium risk)
Affected: users.emails[].address, jr_folder_perms.googleAccount
| main | zod-4 | |
|---|---|---|
| Pattern | ^[a-zA-Z0-9.!#$%&'*+/=?^_`{⏐}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ |
^(?!\.)(?!.*\.\.)([A-Za-z0-9_'+\-\.]*)[A-Za-z0-9_+-]@([A-Za-z0-9][A-Za-z0-9\-]*\.)+[A-Za-z]{2,}$ |
Zod 4 ships a different z.string().email() regex. The new one disallows leading/consecutive dots, has a narrower local-part character set, and requires a TLD of 2+ chars. Could reject previously-valid email addresses.
URL patterns lost (low risk)
Affected: jr_puzzles.url, jr_hunts.homepageUrl, jr_chatmessages/jr_chatnotifications image child url
Zod 3 emitted a URL-matching regex from z.string().url(). Zod 4 emits no pattern — just {type: "string"}. This is a validation loosening (won't reject valid documents, but won't catch invalid URLs at the MongoDB level). Likely z.toJSONSchema() doesn't serialize URL checks.
UUID regex updated (low risk)
Affected: All UUID fields across mediasoup collections (consumerId, producerId, routerId, transportId)
| main | zod-4 | |
|---|---|---|
| Version digit | [1-5] |
[1-8] |
| Variant nibble | unconstrained | [89abAB] |
| Max UUID | not allowed | allowed |
More permissive on version range (6-8), slightly more restrictive on variant nibble. All standard RFC 4122 UUIDs have variant bits in [89abAB], so existing data should be fine.
Summary
| Change | Risk | Action needed? |
|---|---|---|
bsonType → type |
None | No |
deleted/updatedAt/tags etc. now required |
None (have defaults) | No |
allOf[not null, bool] → boolean |
None | No |
Enums gain type: "string" |
None | No |
anyOf → oneOf |
None | No |
Flatter allOf nesting |
None | No |
exclusiveMinimum/Maximum: false removed |
None | No |
| UUID regex updated | Low | Monitor, likely fine |
| URL patterns lost | Low | Upstream: z.toJSONSchema() / zod-to-mongo-schema |
| Email regex changed | Medium | Verify existing emails pass new regex |
answer/guess gained constraints |
Medium | Correct behavior, but verify no lowercase data exists |
port lost minimum: 0 |
High | Needs fix (Zod bug already reported) |
|
Couple of notes on that:
|
|
Opened colinhacks/zod#5702 which I believe will fix the discriminated unions. |
86dfe93 to
5040e89
Compare
In truth, this is largely a ground-up reimplementation of our typed model layer that takes advantage of a few years of experience working with the original implementation. Upgrading to Zod 4 is *somewhat* orthogonal. My primary goal with this upgrade was to take advantage of Zod 4's native support for JSON Schemas. There already exists a library (zod-to-mongo-schema) which can utilize Zod's native JSON Schema support to generate MongoDB-compatible JSON Schemas. Given that our needs here are fairly undifferentiated, it doesn't feel like there's significant benefit to maintaining this code ourselves. This allows us to clean up the old code to generate JSON Schemas. As part of this swap, we need to deal with some differences in how zod-to-mongo-schemas represents schemas: * Zod's `z.int()` can technically capture any (53-bit) integer value representable in floating point, so zod-to-mongo-schema represents it as a "long"; we previously used "int". The MongoDB Javascript driver by default will serialize integer values to the BSON "int" type, so this causes conflicts. Switch to `z.int32()` instead to get the desired "int" type in the generated JSON Schema. As always, the most complex element of our typed model layer is handling fields which should be automatically populated. Our previous approach (documented in #1394) was powerful and flexible, but abusing Zod's input vs. output schemas and lying to the type system with transforms broke down with Zod's native JSON Schemas, which refuses to serialize transforms. Instead of trying to modernize that approach, this commit uses two systems to track automated fields. Within the type system, we use Zod's branded types to mark fields which should be auto-populated (effectively as a boolean marker), which we can then use to filter auto-populated fields, making them optional at insertion-time. At runtime, we use schema metadata, which captures when a field should be populated (insert, update, or both) and what value should be used (which has been limited to the specific types of values we actually use). (In bringing in branded types, we do return to our old friend of input/output schemas — auto-populated types are only branded on the input side, which is only used for computing the insert type, since we don't actually want nominal typing for our auto-populated fields.) This has the effect of dropping support for arbitrary transformation functions. In practice, we were only using that function to apply the `answerify` transform to answer fields, but we had previously concluded that answers were already uppercased at input time. In exchange for dropping transforms, we no longer need to analyze the entire update operation to make sure transformations are applied properly, and can instead just rely on MongoDB to enforce that the result matches our schema. This in turn lets us remove a significant amount of code around updates (including the mechanisms around `relaxSchema` and `parseMongoModifierAsync`). In addition to dropping support for transform functions, I also dropped support for the `bypassSchema` option for inserts, updates, and upserts. This wasn't a great abstraction, since it attempted to otherwise preserve the behavior of Meteor Collections' native methods, but had to do some extra work to do that. Instead, users can just reach for rawCollection. This only comes up in migration code anyway, which generally requires a fair amount of abstraction bypass regardless. And finally, I pulled the code into imports/lib/typedModel, rather than leaving it strewn about with actual model declarations.
This is still very much a work in progress, but I wanted to go ahead and push it up so it wasn't living exclusively in my head and on my computer.
Here's what I currently know to be broken on this branch, although at this point the brokenness is likely covering up additional issues:
Zod 4 + zod-to-mongo-schema don't currently support outputting schemas for
z.date()orz .instanceof(Uint8Array<ArrayBuffer>)(which we use for BSON binary data). Like I say in the commit message below, we don't have any differentiated requirements around generating a MongoDB-compatible JSON schema from a Zod schema, so this should be a task that we can outsource to existing open-source software. I'm going to try and work with Zod and zod-to-mongo-schema upstreams to figure out how best to support this. It's possible that I will have to give up on my dream and re-implement schema generation in-house, but I don't want to do it just yetFor the time being, I've worked around this by switching to a fork of zod-to-mongo-schema
The current approach for
Model.autoPopulatedFields()runs into some issues invokingMeteor.userId()in contexts where a user ID isn't available, and callingMeteor.userId()outside of a method or publication throws an exception. (As an easy example, intests/unit/imports/server/Flags.tswe provide a fakecreatedByto try and side-step this, butautoPopulatedFieldsqueries for a value unconditionally)I've fixed this by checking
DDP._Current{Method,Publication}Invocationto make sure we're in a context whereMeteor.userIdshould work.We're hitting an assortment of issues with numeric types. Some of this is zod-to-mongo-schema attempting to be more specific about BSON "int" vs. "long" (instead of just using "number", which is mostly what we've done historically). Some of this is a Zod bug (it doesn't correctly conflict between minimum and exclusiveMinimum or maximum and exclusiveMaximum when encoding a draft-04 JSON schema)
Zod 4 does not emit a regex pattern in JSON Schema when encoding
z.string().url(), which is a regression from our previous behavior. These patterns on Puzzle and Hunt are arguably load-bearing right now, since we don't do any other validation of user-provided URLs (other than ensuring that they're a string). This came up in Reduce noise from logged errors for bad user inputs #2284 and is maybe not ideal behavior, but we should probably tackle those orthogonally.I've fixed this by re-introducing our regex and attaching it to any
z.url()fields.The new approach to discriminated unions is buggy. The Zod 4 schema ends up with something like:
which is unsatisfiable, because each entry rejects every other.
Here's the commit message, which describes the approach:
And in terms of review sequence, I'd recommend something like this:
autoPopulate.tscustomTypes.tsvalidateSchema.tsModel.tsSoftDeletedModel.ts