Skip to content

Race Condition in AppRefTracker: Concurrent Map Iteration and Write #1812

@scarlett-qu

Description

@scarlett-qu

What steps did you take:

  1. Deployed kapp-controller v0.59.1
  2. Cluster scaled to production workloads containing:
    • 1,680+ namespaces
    • Multiple Secrets and ConfigMaps being continuously updated/reconciled across namespaces
  3. Under normal cluster operations with high resource churn, kapp-controller pod crashed

What happened:
kapp-controller crashes with fatal error: concurrent map iteration and map write under high concurrency when managing large clusters with thousands of namespaces and resources.

Pod logs:

fatal error: concurrent map iteration and map write

goroutine 616 [running]:
internal/runtime/maps.fatal({0x2abdc3c?, 0xc002981ab0?})
        runtime/panic.go:1058 +0x18
internal/runtime/maps.(*Iter).Next(0xc00563f540?)
        internal/runtime/maps/table.go:683 +0x86
carvel.dev/kapp-controller/pkg/reconciler.(*SecretHandler).enqueueAppsForUpdate(0xc00163cd80, {0xc00494dc60, 0x1f}, {0xc00364dd40, 0xd}, {0x2ea2560, 0xc00031e8c0})
        carvel.dev/kapp-controller/pkg/reconciler/secret_handler.go:58 +0x3b1
...

This race happens where a map is being iterated while potentially being modified elsewhere. Maps are not thread-safe in Go. When iterating over a map with "for range", if that map is modified during iteration (even in a different goroutine), Go's runtime panics with exactly this error to catch the race.

What did you expect:
kapp-controller should safely handle concurrent map access without crashing, especially in large-scale deployments with thousands of resources undergoing continuous reconciliation.

Anything else you would like to add:
[Additional information that will assist in solving the issue.]

Environment:

  • kapp Controller version (execute kubectl get deployment -n kapp-controller kapp-controller -o yaml and the annotation is kbld.k14s.io/images): v0.59.1
  • Kubernetes version (use kubectl version): v1.32.9

Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue describes a defect or unexpected behaviorcarvel-triageThis issue has not yet been reviewed for validity

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions