Skip to content

[mypyc] Add librt.strings.isidentifier codepoint primitive#21522

Open
VaggelisD wants to merge 2 commits into
python:masterfrom
VaggelisD:librt-strings-isidentifier
Open

[mypyc] Add librt.strings.isidentifier codepoint primitive#21522
VaggelisD wants to merge 2 commits into
python:masterfrom
VaggelisD:librt-strings-isidentifier

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

5th PR of #21418

True if a codepoint can start a valid identifier (XID_Start, per
PEP 3131). The ASCII fast path covers `[A-Za-z_]` inline; non-ASCII
codepoints round-trip through PyUnicode_FromOrdinal +
PyUnicode_IsIdentifier so the answer matches str.isidentifier on a
1-character string.

The non-ASCII path is the first allocating helper in this series, so
its body lives out-of-line in codepoint_extra_ops.c (it would otherwise
be emitted as a separate copy in every translation unit that includes
the header). On OOM it swallows the exception via PyErr_Clear() and
returns False, which keeps the function ERR_NEVER. Documented at the
call site so callers don't get a surprising silent failure.

Stack: depends on the librt.strings.isspace primitive.
@VaggelisD VaggelisD force-pushed the librt-strings-isidentifier branch from add1d44 to 283b2f8 Compare May 21, 2026 07:42
@github-actions

This comment has been minimized.

Comment thread mypyc/lib-rt/codepoint_extra_ops.c Outdated
Comment on lines +11 to +16
if (s == NULL) {
// OOM. Swallow and return false to keep the function ERR_NEVER;
// callers expect a defined answer, not a propagated exception.
PyErr_Clear();
return false;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually call CPyError_OutOfMemory() in this case to abort, i think it would make sense here as well.

Comment thread mypyc/build.py Outdated
Comment on lines +57 to +62
ModDesc(
"librt.strings",
["strings/librt_strings.c", "codepoint_extra_ops.c"],
["codepoint_extra_ops.h"],
["strings"],
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for not catching this earlier but it looks like the files are dependent on each other the wrong way. the _extra_ops files should be internal to mypyc and shouldn't be needed when building librt modules.

so instead of including codepoint_extra_ops in librt_strings we need it the other way around, with the common functionality in librt_strings. like it's done with the other _extra_ops files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, that's on me first!

Copy link
Copy Markdown
Collaborator

@p-sawicki p-sawicki May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i see there's still #include "codepoint_extra_ops.h" in librt_strings.c and it would be good to clean that up by moving the functions from there to librt_strings.h.

that would make the codepoint_extra_ops files empty so maybe they aren't actually needed and we could just compile the functions statically straight from librt_strings.h? unless you're planning on adding functions later that would be better placed outside of librt_strings.h.

- codepoint_extra_ops.h: include CPy.h and move the isidentifier slow
  path inline into LibRTStrings_IsIdentifier. Aborts via
  CPyError_OutOfMemory on allocation failure (the helper is ERR_NEVER,
  so returning a silently-wrong bool under memory pressure was the wrong
  contract). Matches the pattern in the sibling _extra_ops.h files (all
  bodies static inline, CPy.h included for runtime helpers).
- codepoint_extra_ops.c: reduce to a single-line shim that #includes the
  header. Exists only so SourceDep("codepoint_extra_ops.c") pulls the
  header into user mypyc extensions in include_runtime_files mode.
- build.py / lib-rt/setup.py: drop codepoint_extra_ops.c from the
  librt.strings module sources. The _extra_ops.c files are mypyc-internal
  (linked into user extensions via SourceDep in mypyc/ir/deps.py); the
  librt.strings Python module shouldn't depend on them, matching how
  bytes_extra_ops, str_extra_ops, etc. are organized. librt.strings now
  picks up LibRTStrings_IsIdentifier via #include of the header.
@github-actions
Copy link
Copy Markdown
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants