|
5 | 5 | The original files can be found in https://bulkdata.uspto.gov. |
6 | 6 |
|
7 | 7 | Security Note: |
8 | | - This module uses defusedxml.sax.make_parser() with customized security settings |
9 | | - to protect against XML External Entity (XXE) attacks while allowing USPTO XML files |
10 | | - to be parsed. In addition, it includes safeguards against entity expansion attacks |
11 | | - and entity nesting depth. USPTO files contain DTD declarations that defusedxml |
12 | | - blocks by default, so we configure the parser with: |
13 | | -
|
14 | | - - feature_external_ges: False (blocks external general entities) |
15 | | - - feature_external_pes: False (blocks external parameter entities) |
16 | | - - forbid_dtd: False (allows DTD declarations in the XML) |
17 | | - - forbid_entities: False (allows entity declarations) |
18 | | - - forbid_external: False (allows external references in declarations) |
19 | | -
|
20 | | - This configuration permits DTD declarations (required for USPTO files) while the |
21 | | - disabled external entity features prevent actual fetching of external resources, |
22 | | - effectively blocking XXE attacks. The parser processes the XML structure without |
23 | | - accessing any external files or URLs. |
| 8 | + This module uses defusedxml.sax.make_parser() with security settings to protect |
| 9 | + against XML External Entity (XXE) attacks and entity expansion attacks (Billion |
| 10 | + Laughs/CWE-776). The parser is configured with: |
| 11 | +
|
| 12 | + - feature_external_ges: False (blocks external general entity resolution) |
| 13 | + - feature_external_pes: False (blocks external parameter entity resolution) |
| 14 | + - forbid_dtd: False (allows DTD declarations required by USPTO XML format) |
| 15 | + - forbid_entities: False (allows entity declarations including NDATA) |
| 16 | + - forbid_external: False (allows SYSTEM declarations in DTD) |
| 17 | +
|
| 18 | + Security Analysis: |
| 19 | + 1. XXE Prevention: While external entities can be declared (forbid_external=False), |
| 20 | + they are never resolved or fetched due to feature_external_ges=False and |
| 21 | + feature_external_pes=False. This prevents XXE attacks. |
| 22 | +
|
| 23 | + 2. Billion Laughs Mitigation: defusedxml's built-in entity expansion limits |
| 24 | + (MAX_ENTITY_EXPANSION=10,000) prevent exponential entity expansion from |
| 25 | + causing memory exhaustion. While not completely blocking entity expansion, |
| 26 | + this limit prevents the worst-case denial-of-service scenarios. |
| 27 | +
|
| 28 | + 3. NDATA Entities: USPTO files use NDATA entities for image references |
| 29 | + (e.g., <!ENTITY img SYSTEM "file.tif" NDATA TIF>). These are unparsed |
| 30 | + entities that don't expand inline and aren't fetched due to the external |
| 31 | + entity resolution being disabled. |
| 32 | +
|
| 33 | + This configuration balances security with USPTO format compatibility. The key |
| 34 | + insight is that defusedxml distinguishes between entity declaration (allowed) |
| 35 | + and entity resolution/fetched (blocked), providing protection while allowing |
| 36 | + the required DTD structure. |
24 | 37 | """ |
25 | 38 |
|
26 | 39 | import html |
|
0 commit comments