Skip to content

feat(pii): Base SpanData PII on relay-conventions#5997

Open
loewenheim wants to merge 8 commits into
masterfrom
sebastian/span-data-conventions
Open

feat(pii): Base SpanData PII on relay-conventions#5997
loewenheim wants to merge 8 commits into
masterfrom
sebastian/span-data-conventions

Conversation

@loewenheim
Copy link
Copy Markdown
Contributor

This replaces the default PII value on SpanData with a function that
looks up a field's name (as defined by the metastructure annotation)
in sentry-conventions. If the name isn't found, the function returns
Pii::True to be safe.

It also updates sentry-conventions for getsentry/sentry-conventions@523f1f0.

Additionally, it fixes a very annoying edge case bug in PII string processing: the logic assumes that a null byte can be freely inserted into strings to mark already removed chunks, but some PII regexes do match null bytes. I've corrected one such regex that otherwise would cause a test failure.

Closes INGEST-919.

This replaces the default PII value on `SpanData` with a function that
looks up a field's name (as defined by the `metastructure` annotation)
in `sentry-conventions`. If the name isn't found, the function returns
`Pii::True` to be safe.
@loewenheim loewenheim requested a review from a team as a code owner May 13, 2026 13:19
@loewenheim loewenheim self-assigned this May 13, 2026
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 13, 2026

INGEST-919

Comment thread relay-pii/src/regexes.rs
)
(
[^/\\\r\n]+
[^/\\\r\n\x00]+
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this regex can match a null byte it causes problems with redactions.

"url.full": {
"type": "string",
"value": "https://www.service.io/users/01234-qwerty/settings/98765-adfghj",
"value": "https://www.service.io/users/[user]/settings/98765-adfghj",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute has become pii: "true" in the current conventions version.

"sentry.description": {
"type": "string",
"value": "GET https://www.service.io/users/01234-qwerty/settings/98765-adfghj",
"value": "GET https://www.service.io/users/[user]/settings/98765-adfghj",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute is filled from url.full, hence inherits the redaction.

Copy link
Copy Markdown
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change with potentially big impact, are you confident we have sufficient test coverage already?

Comment thread tests/integration/test_spansv2.py
Comment thread relay-event-schema/src/protocol/span.rs
Comment thread CHANGELOG.md Outdated
Comment thread relay-event-schema/src/protocol/span.rs
Comment on lines -597 to +601
match self.pii() {
Pii::True => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)),
Pii::False => None,
Pii::Maybe => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)),
match self.attrs().pii {
PiiMode::Static(Pii::True) => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)),
PiiMode::Static(Pii::False) => None,
PiiMode::Static(Pii::Maybe) => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)),
PiiMode::Dynamic(f) => Some(Cow::Owned(DEFAULT_FIELD_ATTRS.pii_dynamic(f))),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when deriving the attributes for recursion, we would eagerly resolve a dynamic PII value function into a static value. Now we instead pass the dynamic function through.

Comment thread relay-event-schema/src/protocol/span.rs
/// Other fields in `span.data`.
#[metastructure(
additional_properties,
pii = "true",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional properties lose unconditional PII scrubbing safety net

Medium Severity

The other field (additional properties bag) had its explicit pii = "true" removed and now inherits the struct-level dynamic span_data_pii_from_conventions function. When inner_attrs propagates this dynamic function to child entries, any attribute that exists in sentry-conventions with pii = false will no longer be scrubbed — even for unknown, user-supplied data landing in additional_properties. This is a relaxation of the previous unconditional scrubbing policy for the catch-all bag. Combined with the new PiiMode::Dynamic arm in inner_attrs always returning Some(...), deeply nested values within these entries also inherit the dynamic lookup rather than the previous flat Pii::True.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e48ff37. Configure here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have test coverage for these cases, I don't see that as a problem

@loewenheim
Copy link
Copy Markdown
Contributor Author

Oh, and to actually answer this:

This is a change with potentially big impact, are you confident we have sufficient test coverage already?

No, not necessarily 😞. I would probably extend the test_spandata_conventions test with more cases.

Comment thread relay-event-schema/src/protocol/span.rs
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7fa8c6d. Configure here.

PiiMode::Static(Pii::True) => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)),
PiiMode::Static(Pii::False) => None,
PiiMode::Static(Pii::Maybe) => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)),
PiiMode::Dynamic(f) => Some(Cow::Owned(DEFAULT_FIELD_ATTRS.pii_dynamic(f))),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic PII in inner_attrs over-scrubs nested field values

Medium Severity

The PiiMode::Dynamic(f) arm in inner_attrs passes span_data_pii_from_conventions through to children of complex-typed fields (e.g., gen_ai.input.messages of type Annotated<Value>). When such a field contains a nested object, each sub-key (like "role" or "content") is looked up in conventions via state.keys().next(). Sub-keys not found in conventions get Pii::True, whereas previously (with pii = "maybe" on SpanData) they inherited Pii::Maybe. This upgrades scrubbing from "only specific path selectors" to "generic selectors like $string match," causing unintended data removal in nested structures.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7fa8c6d. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants