Skip to content

Establish Lifecycle Rules for ClusterLoader2 Variables #3973

@serathius

Description

@serathius

What would you like to be added:

I propose to establish lifecycle rules for ClusterLoader2 (CL2) variables similarly to K8s feature gates, particularly for the main load scenario.

Specifically, the proposal is:

  • For the load scenario, start tracking all CL2_ flags that can potentially be set and work on their removal.
    (this can be a simple markdown table in the README).
  • Put Kubernetes feature gate lifecycle rules for the flags, providing some rules for removal and requiring the flags at some point to become default:
    • Alpha <- newly added optional, should be enabled in experimental scalability scenarios like (resource-size), should be graduated at some point or will be removed.
    • Beta <- can be enabled in release blocking scenario by passing it manually. Should be graduated eventually.
    • GA <- Enabled by default, flipped to default in the CL2. Can be totally removed at some point.
  • For flags that cannot be graduated by default or that other SIGs would like to maintain regardless, we should plan to move them to a separate scenario owned by that SIG.

Why is this needed:

ClusterLoader2 feature flags are currently used to test a variety of things by different SIGs for different use cases. However, adding support for these variables exponentially increases the testing matrix supported by a single SIG scalability.

To simplify informing release, there is one main scalability test (the 5k node test), which is the load scenario. Over the years, all the feature flags were added to this single scenario, but some of them were never used and are extensively complicating the test templates.

The deployment manifests have gathered a massive amount of dependent if statements that make it hard to land important improvements, such as the effort to establish "Pod Shape" as a formal scalability envelope (issue #138415). Without lifecycle rules and graduation criteria for these CL2 features, the maintenance burden on the small number of volunteers maintaining this growing testing matrix becomes unsustainable.

/cc @wojtek-t @Qqkyu @mborsz @aojea
/sig scalability

CL2 Variable Proposed State Used / Graduation Status Notes / Next Steps
CL2_USE_HOST_NETWORK_PODS Remove Unused Addressed/Removed in PR #3945
CL2_RUN_ON_ARM_NODES Remove Unused Addressed/Removed in PR #3946
CL2_ENABLE_NETWORK_POLICY_ENFORCEMENT_LATENCY_TEST Move Used by sig-network This test is the only one using it. Move out of main load scenario to the sig-network testgrid.
CL2_DNS_QPS_PER_CLIENT TODO TODO TODO
CL2_DEPLOYMENT_POD_PAYLOAD_SIZE TODO TODO TODO
CL2_USE_ADVANCED_DNSTEST TODO TODO TODO
CL2_TOLERATION TODO TODO TODO
CL2_RUNTIME_CLASS_NAME TODO TODO TODO
CL2_ENABLE_PVS TODO TODO TODO
CL2_STATEFULSET_POD_PAYLOAD_SIZE TODO TODO TODO
CL2_CHECK_IF_PODS_ARE_UPDATED TODO TODO TODO
CL2_DISABLE_DAEMONSETS TODO TODO TODO
CL2_ENABLE_DNSTESTS TODO TODO TODO
CL2_ENABLE_NETWORKPOLICIES TODO TODO TODO
CL2_NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_KEY TODO TODO TODO
CL2_NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_VALUE TODO TODO TODO
CL2_NET_POLICY_SERVER_EVERY_NTH_POD TODO TODO TODO
CL2_JOB_POD_PAYLOAD_SIZE TODO TODO TODO
CL2_DS_SURGE TODO TODO TODO
CL2_NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_VALUE TODO TODO TODO
CL2_DAEMONSET_POD_PAYLOAD_SIZE TODO TODO

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions