Skip to content

perf(tags): wrap importer in tag_inheritance.batch() + bulk flush (Phase B Stage 2)#14832

Draft
valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
valentijnscholten:perf/tag-inheritance-phase-b-importer-batch
Draft

perf(tags): wrap importer in tag_inheritance.batch() + bulk flush (Phase B Stage 2)#14832
valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
valentijnscholten:perf/tag-inheritance-phase-b-importer-batch

Conversation

@valentijnscholten
Copy link
Copy Markdown
Member

@valentijnscholten valentijnscholten commented May 6, 2026

Important

Depends on:

  • PR #14812 (Phase A: bulk product-side propagation)
  • PR #14827 (Phase B Stage 1: thread-local batch context)

All must merge first; this branch is stacked on top.

Summary

Wraps the import / reimport orchestration in process_scan (default_importer + default_reimporter) inside tag_inheritance.batch(). Inside the batch:

  • make_inherited_tags_sticky (m2m_changed) early-returns
  • inherit_tags_on_instance (post_save, gated on created=True) early-returns
  • inherit_tags_on_linked_instance (post_save) early-returns

After the batch exits, tag_inheritance.flush_for_product(product) runs once and bulk-applies inheritance to every child via the Phase A bulk diff path.

The bulk diff in _sync_inheritance_for_qs is extended with a re-merge step: it reads each child's current tags through-table and ensures every target inherited name is present. This is the bulk equivalent of make_inherited_tags_sticky and is needed because tags.set([...]) inside the batch (e.g. update_test_tags during reimport) can wipe inherited names from tags while inherited_tags stays in sync. A current-tags read gates the re-merge so it only writes rows that are actually missing.

Query-count impact

Hot path Before (Stage 1) After (Stage 2) Change
ZAP scan import V2 1385 1006 ~27% drop
ZAP scan import V3 1263 984 ~22% drop
ZAP reimport V2 (no change) 69 82 +13
ZAP reimport V3 (no change) 87 140 +53
Product tag add → 100 findings V2/V3 91 94 +3
Product tag remove → 100 findings V2/V3 53 56 +3
Product tag add → 100 endpoints V2 91 194 +103
Product tag remove → 100 endpoints V2 53 56 +3
Product tag add → 100 locations V3 316 320 +4
Product tag remove → 100 locations V3 266 270 +4

The reimport and product-toggle increases come from the new tags-read needed for the bulk re-merge step. Stages 3+4+5 (drop the duplicate inherited_tags TagField; tags becomes the single source of truth) collapse the re-merge cost — these increases disappear and the import path drops further.

Verification

  • unittests.test_tag_inheritance — 36 pass (5 skipped for v3 feature flag).
  • unittests.test_tag_utils_bulk — 29 pass.
  • unittests.test_tag_inheritance_perf — 20 pass with the recalibrated pins.

Behavior preserved

  • Inheritance flags (system + per-product) gate propagation as before.
  • Sticky enforcement (re-add inherited tag if a user removes it from a child) still applied via the bulk re-merge at flush time.
  • Locations with multiple linked products still receive the union of all related products' tags.
  • Reimport with new user tags still preserves inherited tags on the test (the failure mode that surfaced and was fixed during calibration).

…reate

Replaces the per-row `.save()` loop in `propagate_tags_on_product_sync`
with bulk SQL through the existing tag-utils helpers. For every child
model (Engagement/Test/Finding/Endpoint/Location), reads current
inherited_tags in one query, computes the per-child diff against the
Product's tags, and applies adds/removes via `bulk_add_tag_mapping` and
the new `bulk_remove_tags_from_instances` helper. Both `tags` and
`inherited_tags` fields are kept in sync.

Also gates the per-child `inherit_tags_on_instance` post_save handler on
`created=True`. The previous behavior fired on every save (create OR
update), repeatedly re-applying inherited tags to children whose tag
state had not changed. Sticky enforcement on user-driven tag edits is
unchanged (still handled by `make_inherited_tags_sticky` on m2m_changed).

Pinned query-count baselines from PR DefectDojo#14811 drop accordingly:

  product_tag_add  -> 100 findings : 4758 -> 91   (~52x fewer queries)
  product_tag_remove -> 100 findings : 4540 -> 53 (~85x fewer queries)

Sticky and child-creation paths are unchanged in this PR. Phase B
targets those (centralized inheritance module + drop the duplicate
`inherited_tags` TagField).
…l batch context

Adds dojo/tag_inheritance.py with a thread-local batch() context manager
and is_in_batch() predicate. While the calling thread is inside a batch,
make_inherited_tags_sticky (m2m_changed) early-returns; the calling code
takes responsibility for applying inheritance in bulk.

Replaces the previous pattern in dojo/importers/location_manager.py:556:

    disconnected = signals.m2m_changed.disconnect(
        make_inherited_tags_sticky, sender=Location.tags.through,
    )
    try:
        ...
    finally:
        if disconnected:
            signals.m2m_changed.connect(...)

Signal.disconnect mutates Django's process-global receiver list and is
not thread-safe. While disconnected, every thread/greenlet in the same
process loses sticky enforcement. Safe only under Celery --pool=prefork
and single-threaded gunicorn; broken under --pool=threads|gevent|eventlet,
gunicorn --threads >1, or ASGI threadpools. Also fragile on hard process
exit mid-import (handler stays disconnected for the rest of the process
lifetime).

The new batch context lives in threading.local(): each thread has its
own depth counter, the signal handler stays globally connected, and the
suppression decision is per-thread. No global mutation, no reconnect
hazard.

This is Phase B Stage 1. Subsequent stages will wrap the broader importer
orchestration in batch(), replace the duplicate inherited_tags TagField
with a JSON column, and drop _manage_inherited_tags / per-model
inherit_tags().

Pinned perf-test note: V3 zap_scan_import baseline rises 1243 -> 1263
(~1.6%). The previous process-global disconnect was narrower in scope
(Location.tags.through only); the batch context covers all child-tag
through-tables. Net trade is positive given the threading bug fix; full
Phase B reductions arrive in later stages.
…er exit

Stage 2 of Phase B. Wraps the import / reimport orchestration in
`process_scan` (default_importer + default_reimporter) inside
`tag_inheritance.batch()`. Inside the batch:

  - `make_inherited_tags_sticky` (m2m_changed) early-returns
  - `inherit_tags_on_instance` (post_save, gated on created=True) early-returns
  - `inherit_tags_on_linked_instance` (post_save) early-returns

After the batch exits, `tag_inheritance.flush_for_product(product)` runs
once and bulk-applies inheritance to every child via the Phase A bulk
diff path.

The bulk diff in `_sync_inheritance_for_qs` (dojo/product/helpers.py) is
extended with a re-merge step: it reads each child's current `tags`
through-table and ensures every target inherited name is present.
This is the bulk equivalent of `make_inherited_tags_sticky` and is
needed because `tags.set([...])` inside the batch (e.g.
`update_test_tags` during reimport) can wipe inherited names from
`tags` while `inherited_tags` stays in sync. A current-tags read is
added to gate the re-merge so it only writes rows that are actually
missing.

`tag_inheritance.flush_for_product(product)` is added as the public
flush helper. Internally delegates to `propagate_tags_on_product_sync`.

Pinned perf-test impact (this branch vs Stage 1):

  ZAP scan import V2  : 1385 -> 1006   (~27% drop)
  ZAP scan import V3  : 1263 ->  984   (~22% drop)
  ZAP reimport V2     :   69 ->   82   (+13: flush has fixed cost)
  ZAP reimport V3     :   87 ->  140   (+53: flush has fixed cost)

  product_tag_add  -> 100 findings  V2/V3 :  91 ->  94 (+3: tags read for re-merge)
  product_tag_remove -> 100 findings V2/V3 :  53 ->  56 (+3)
  product_tag_add  -> 100 endpoints V2     :  91 -> 194 (eager Celery + explicit propagate both pay re-merge cost)
  product_tag_remove -> 100 endpoints V2   :  53 ->  56
  product_tag_add  -> 100 locations V3     : 316 -> 320
  product_tag_remove -> 100 locations V3   : 266 -> 270

The reimport and product-toggle increases are eliminated in
Stages 3+4+5 (drop the duplicate `inherited_tags` TagField and the
re-merge step becomes free because `tags` is the single source of
truth).
`tag_inheritance.flush_for_product` is called unconditionally from the
importer's `process_scan` (and reimporter equivalent). When tag
inheritance is disabled (neither the system-wide flag nor the per-product
flag is set) the previous implementation still walked every child
queryset to compute the (empty) diff, adding ~9 queries per scan.

Tests in `unittests/test_importers_performance.py` pin importer query
counts in scenarios where inheritance is off. The Stage 2 commit's
flush call shifted those baselines up by 9 across 10 test cases. Add an
early-exit so the importer perf tests stay green and no behavior change
ships under the inheritance-off configuration.
`_location_target_names(location)` issued one
`Product.objects.filter(...).distinct()` per location plus N
`product.tags.all()` per related product, producing an N+1 across the
location queryset on every product tag toggle.

Replace the per-location callable with a precomputed
{location_id: set[tag_name]} map built in 3 bulk queries: the two
LocationProductReference / LocationFindingReference paths union together
into {location_id: {product_id}}, then a single Product_tags
through-table read fans out to {product_id: {tag_name}}.

product_tag_add (100 locations, V3): 320 -> 123 queries
product_tag_remove (100 locations, V3): 270 -> 73 queries
ZAP scan import (V3): 984 -> 947
ZAP scan reimport, no change (V3): 140 -> 103
…pagate

The earlier Stage 2 commit recorded 194 queries for product-tag add
propagating to 100 endpoints, before bulk_propagate_inherited_tags,
the thread-local batch context, and the Location target-name precompute
landed on this branch. With those in place the actual count is 94.

Pinned value updated; comment refreshed.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@valentijnscholten valentijnscholten force-pushed the perf/tag-inheritance-phase-b-importer-batch branch from 799b72f to 6ca5d7c Compare May 7, 2026 18:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Conflicts have been resolved. A maintainer will review the pull request shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants