Skip to content

Cache structs.fields() result for non-generic structs#1005

Open
Siyet wants to merge 3 commits intomainfrom
963-cache-fields
Open

Cache structs.fields() result for non-generic structs#1005
Siyet wants to merge 3 commits intomainfrom
963-cache-fields

Conversation

@Siyet
Copy link
Copy Markdown
Collaborator

@Siyet Siyet commented Apr 9, 2026

Reopening #997 (auto-closed when fork was deleted).

Summary

  • Cache the fields() result tuple on the struct class as __struct_field_info__
  • Avoids expensive get_class_annotations() call and FieldInfo object recreation on every invocation
  • Generic aliases (e.g. MyStruct[int]) are not cached since type parameters affect annotations

Closes #963

Benchmark (1M calls, 3 fields)

Before After
dataclasses.fields() 0.85s 0.85s
msgspec.structs.fields() ~7.5s (20x slower) 0.44s (2x faster)

Store the computed tuple of FieldInfo on the class as
__struct_field_info__ to avoid expensive get_class_annotations()
and FieldInfo recreation on every call. Generic aliases are not
cached since type parameters affect annotations.

Closes #963
Comment thread src/msgspec/structs.py

# Cache for non-generic structs
if not is_generic:
cls.__struct_field_info__ = result
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating the type as a result of inspection seems wrong to me. Is there no either way we can cache or otherwise close the gap in performance here? Also, does this work if the Struct is configured as frozen?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Let me address your points in order.

Frozen structs. Yes, this works. frozen=True sets tp_setattro = Struct_setattro_frozen on the struct class itself, which only blocks attribute assignment on instances. The cache write cls.__struct_field_info__ = result goes through type(cls).tp_setattro (i.e. StructMetaType's slot, inherited from PyType_Type), which isn't overridden and isn't affected by frozen. That said, you're right that it's not obvious from the diff, I just pushed an explicit test for frozen structs to make the guarantee visible.

Why mutation, and what alternatives I considered. The dominant cost in fields() is _get_class_annotations() (typing.get_type_hints()), which is what makes the difference between 0.44s and 7.5s on the benchmark. I looked at a few alternatives before landing on __struct_field_info__:

  • Module-level WeakKeyDictionary: keeps fields() non-mutating, but adds an extra lookup on every call and another global to reason about. Felt strictly worse.
  • Eager compute in StructMeta_new: fastest in theory, but breaks on forward references, which is exactly why the current implementation is lazy.
  • Caching only hints instead of the full tuple: still requires mutating the class, and we'd still rebuild FieldInfo objects on every call (a meaningful chunk of the remaining cost).

In the end I picked the option with the smallest surface and the biggest win, treating __struct_field_info__ as a conventional cache slot (similar to how CPython uses dunder names for lazily-populated metadata).

On "mutation feels wrong". I hear you and I agree it's not ideal aesthetically. I see two ways forward:

  1. Keep the current approach, document __struct_field_info__ as an internal cache slot, and add the frozen test. Pragmatic, minimal diff.
  2. Port fields() to C as a follow-up PR. StructMetaObject already holds everything we need (struct_encode_fields, struct_offsets, struct_types, struct_defaults), so for non-generic structs we can skip typing.get_type_hints entirely and build the FieldInfo tuple directly. This would likely beat even the cached Python version and fully sidesteps the mutation question.

Personally I lean toward option 2 (port to C): it gives the best perf, removes the class mutation, and fits naturally into the existing StructMetaObject C infrastructure. If you're on board, I'm happy to close this PR and open a new one with the C port. @jcrist, would love to hear your take as well, since this touches StructMeta design.

@Siyet Siyet requested a review from jcrist April 10, 2026 07:32
@Siyet
Copy link
Copy Markdown
Collaborator Author

Siyet commented Apr 10, 2026

The failing build job here is unrelated to this PR: it's the link checker tripping on the Pydantic docs redirect (docs.pydantic.dev/latest/pydantic.dev/docs/validation/...), which is fixed in #1008. Once #1008 lands and this branch is rebased, CI should go green.

@Siyet Siyet requested a review from ofek April 11, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msgspec.structs.fields() is 20x slower than dataclasses.fields()

2 participants