Skip to content

docs(uspto): improve documentation of USPTO XML parser security config#3338

Merged
PeterStaar-IBM merged 1 commit intomainfrom
fix/uspto-entity-expansion
Apr 21, 2026
Merged

docs(uspto): improve documentation of USPTO XML parser security config#3338
PeterStaar-IBM merged 1 commit intomainfrom
fix/uspto-entity-expansion

Conversation

@ceberam
Copy link
Copy Markdown
Member

@ceberam ceberam commented Apr 21, 2026

Summary

This PR clarifies the security configuration of the USPTO XML parser. The analysis shows that the original configuration was actually secure, but the security rationale was not well documented.

Key Security Insight

defusedxml distinguishes between:

  • Entity DECLARATION (defining entities in DTD) - controlled by forbid_* flags
  • Entity RESOLUTION/FETCHING (actually expanding/fetching entities) - controlled by feature_external_* flags

The critical protection comes from feature_external_ges=False, which prevents external entities from being resolved/fetched, regardless of whether they're declared.

Why This Configuration is Secure

  1. XXE attacks are blocked because external entities are never fetched, even though they can be declared
  2. Billion Laughs is mitigated by defusedxml's expansion limits
  3. NDATA entities are safe because they're unparsed entities that don't expand inline
  4. USPTO compatibility is maintained because the format requires DTDs and NDATA entities

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
@ceberam ceberam self-assigned this Apr 21, 2026
@ceberam ceberam added documentation Improvements or additions to documentation xml issue related to supported schema-specific XML formats labels Apr 21, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 21, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @ceberam, all your commits are properly signed off. 🎉

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@PeterStaar-IBM PeterStaar-IBM merged commit 09de7f9 into main Apr 21, 2026
27 checks passed
@PeterStaar-IBM PeterStaar-IBM deleted the fix/uspto-entity-expansion branch April 21, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation xml issue related to supported schema-specific XML formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants