perf(tags): wrap importer in tag_inheritance.batch() + bulk flush (Phase B Stage 2)#14832
Draft
valentijnscholten wants to merge 6 commits intoDefectDojo:devfrom
Draft
Conversation
a3c9e7b to
cfef728
Compare
Maffooch
approved these changes
May 7, 2026
blakeaowens
approved these changes
May 7, 2026
…reate Replaces the per-row `.save()` loop in `propagate_tags_on_product_sync` with bulk SQL through the existing tag-utils helpers. For every child model (Engagement/Test/Finding/Endpoint/Location), reads current inherited_tags in one query, computes the per-child diff against the Product's tags, and applies adds/removes via `bulk_add_tag_mapping` and the new `bulk_remove_tags_from_instances` helper. Both `tags` and `inherited_tags` fields are kept in sync. Also gates the per-child `inherit_tags_on_instance` post_save handler on `created=True`. The previous behavior fired on every save (create OR update), repeatedly re-applying inherited tags to children whose tag state had not changed. Sticky enforcement on user-driven tag edits is unchanged (still handled by `make_inherited_tags_sticky` on m2m_changed). Pinned query-count baselines from PR DefectDojo#14811 drop accordingly: product_tag_add -> 100 findings : 4758 -> 91 (~52x fewer queries) product_tag_remove -> 100 findings : 4540 -> 53 (~85x fewer queries) Sticky and child-creation paths are unchanged in this PR. Phase B targets those (centralized inheritance module + drop the duplicate `inherited_tags` TagField).
…l batch context
Adds dojo/tag_inheritance.py with a thread-local batch() context manager
and is_in_batch() predicate. While the calling thread is inside a batch,
make_inherited_tags_sticky (m2m_changed) early-returns; the calling code
takes responsibility for applying inheritance in bulk.
Replaces the previous pattern in dojo/importers/location_manager.py:556:
disconnected = signals.m2m_changed.disconnect(
make_inherited_tags_sticky, sender=Location.tags.through,
)
try:
...
finally:
if disconnected:
signals.m2m_changed.connect(...)
Signal.disconnect mutates Django's process-global receiver list and is
not thread-safe. While disconnected, every thread/greenlet in the same
process loses sticky enforcement. Safe only under Celery --pool=prefork
and single-threaded gunicorn; broken under --pool=threads|gevent|eventlet,
gunicorn --threads >1, or ASGI threadpools. Also fragile on hard process
exit mid-import (handler stays disconnected for the rest of the process
lifetime).
The new batch context lives in threading.local(): each thread has its
own depth counter, the signal handler stays globally connected, and the
suppression decision is per-thread. No global mutation, no reconnect
hazard.
This is Phase B Stage 1. Subsequent stages will wrap the broader importer
orchestration in batch(), replace the duplicate inherited_tags TagField
with a JSON column, and drop _manage_inherited_tags / per-model
inherit_tags().
Pinned perf-test note: V3 zap_scan_import baseline rises 1243 -> 1263
(~1.6%). The previous process-global disconnect was narrower in scope
(Location.tags.through only); the batch context covers all child-tag
through-tables. Net trade is positive given the threading bug fix; full
Phase B reductions arrive in later stages.
…er exit Stage 2 of Phase B. Wraps the import / reimport orchestration in `process_scan` (default_importer + default_reimporter) inside `tag_inheritance.batch()`. Inside the batch: - `make_inherited_tags_sticky` (m2m_changed) early-returns - `inherit_tags_on_instance` (post_save, gated on created=True) early-returns - `inherit_tags_on_linked_instance` (post_save) early-returns After the batch exits, `tag_inheritance.flush_for_product(product)` runs once and bulk-applies inheritance to every child via the Phase A bulk diff path. The bulk diff in `_sync_inheritance_for_qs` (dojo/product/helpers.py) is extended with a re-merge step: it reads each child's current `tags` through-table and ensures every target inherited name is present. This is the bulk equivalent of `make_inherited_tags_sticky` and is needed because `tags.set([...])` inside the batch (e.g. `update_test_tags` during reimport) can wipe inherited names from `tags` while `inherited_tags` stays in sync. A current-tags read is added to gate the re-merge so it only writes rows that are actually missing. `tag_inheritance.flush_for_product(product)` is added as the public flush helper. Internally delegates to `propagate_tags_on_product_sync`. Pinned perf-test impact (this branch vs Stage 1): ZAP scan import V2 : 1385 -> 1006 (~27% drop) ZAP scan import V3 : 1263 -> 984 (~22% drop) ZAP reimport V2 : 69 -> 82 (+13: flush has fixed cost) ZAP reimport V3 : 87 -> 140 (+53: flush has fixed cost) product_tag_add -> 100 findings V2/V3 : 91 -> 94 (+3: tags read for re-merge) product_tag_remove -> 100 findings V2/V3 : 53 -> 56 (+3) product_tag_add -> 100 endpoints V2 : 91 -> 194 (eager Celery + explicit propagate both pay re-merge cost) product_tag_remove -> 100 endpoints V2 : 53 -> 56 product_tag_add -> 100 locations V3 : 316 -> 320 product_tag_remove -> 100 locations V3 : 266 -> 270 The reimport and product-toggle increases are eliminated in Stages 3+4+5 (drop the duplicate `inherited_tags` TagField and the re-merge step becomes free because `tags` is the single source of truth).
`tag_inheritance.flush_for_product` is called unconditionally from the importer's `process_scan` (and reimporter equivalent). When tag inheritance is disabled (neither the system-wide flag nor the per-product flag is set) the previous implementation still walked every child queryset to compute the (empty) diff, adding ~9 queries per scan. Tests in `unittests/test_importers_performance.py` pin importer query counts in scenarios where inheritance is off. The Stage 2 commit's flush call shifted those baselines up by 9 across 10 test cases. Add an early-exit so the importer perf tests stay green and no behavior change ships under the inheritance-off configuration.
`_location_target_names(location)` issued one
`Product.objects.filter(...).distinct()` per location plus N
`product.tags.all()` per related product, producing an N+1 across the
location queryset on every product tag toggle.
Replace the per-location callable with a precomputed
{location_id: set[tag_name]} map built in 3 bulk queries: the two
LocationProductReference / LocationFindingReference paths union together
into {location_id: {product_id}}, then a single Product_tags
through-table read fans out to {product_id: {tag_name}}.
product_tag_add (100 locations, V3): 320 -> 123 queries
product_tag_remove (100 locations, V3): 270 -> 73 queries
ZAP scan import (V3): 984 -> 947
ZAP scan reimport, no change (V3): 140 -> 103
…pagate The earlier Stage 2 commit recorded 194 queries for product-tag add propagating to 100 endpoints, before bulk_propagate_inherited_tags, the thread-local batch context, and the Location target-name precompute landed on this branch. With those in place the actual count is 94. Pinned value updated; comment refreshed.
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
799b72f to
6ca5d7c
Compare
Contributor
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
Depends on:
All must merge first; this branch is stacked on top.
Summary
Wraps the import / reimport orchestration in
process_scan(default_importer + default_reimporter) insidetag_inheritance.batch(). Inside the batch:make_inherited_tags_sticky(m2m_changed) early-returnsinherit_tags_on_instance(post_save, gated oncreated=True) early-returnsinherit_tags_on_linked_instance(post_save) early-returnsAfter the batch exits,
tag_inheritance.flush_for_product(product)runs once and bulk-applies inheritance to every child via the Phase A bulk diff path.The bulk diff in
_sync_inheritance_for_qsis extended with a re-merge step: it reads each child's currenttagsthrough-table and ensures every target inherited name is present. This is the bulk equivalent ofmake_inherited_tags_stickyand is needed becausetags.set([...])inside the batch (e.g.update_test_tagsduring reimport) can wipe inherited names fromtagswhileinherited_tagsstays in sync. A current-tags read gates the re-merge so it only writes rows that are actually missing.Query-count impact
The reimport and product-toggle increases come from the new tags-read needed for the bulk re-merge step. Stages 3+4+5 (drop the duplicate
inherited_tagsTagField;tagsbecomes the single source of truth) collapse the re-merge cost — these increases disappear and the import path drops further.Verification
unittests.test_tag_inheritance— 36 pass (5 skipped for v3 feature flag).unittests.test_tag_utils_bulk— 29 pass.unittests.test_tag_inheritance_perf— 20 pass with the recalibrated pins.Behavior preserved