Skip to content

feat: Add observability IAM permissions for RIG cluster execution role#1030

Open
Madhubalasri-B wants to merge 1 commit intoawslabs:mainfrom
Madhubalasri-B:feat/rig-observability-iam
Open

feat: Add observability IAM permissions for RIG cluster execution role#1030
Madhubalasri-B wants to merge 1 commit intoawslabs:mainfrom
Madhubalasri-B:feat/rig-observability-iam

Conversation

@Madhubalasri-B
Copy link
Copy Markdown
Collaborator

Summary

When observability is enabled on RIG (Restricted Instance Group) clusters, attach APS RemoteWrite and CloudWatch Logs permissions to the cluster execution role.

Changes

  • main.tf: Allow observability module for RIG clusters; pass rig_mode and execution_role_name to the observability module
  • modules/observability/variables.tf: Add rig_mode (bool) and execution_role_name (string) variables
  • modules/observability/iam_roles.tf: Add execution_role_observability inline policy with aps:RemoteWrite and CloudWatch Logs permissions, conditional on rig_mode
  • rig_custom.tfvars: Enable observability module (create_observability_module = true)

Testing

  • Deployed full RIG HyperPod EKS cluster via terraform apply with rig_custom.tfvars
  • Verified the inline policy was created on the execution role with correct APS and CloudWatch Logs permissions
  • terraform validate passes

Port CDK commit c4f4067adf70 from SagemakerHyperpodClusterSetupCDK.
When observability is enabled on RIG clusters, attach APS RemoteWrite
and CloudWatch Logs permissions to the cluster execution role.

Changes:
- main.tf: Allow observability module for RIG clusters, pass rig_mode
  and execution_role_name
- modules/observability/variables.tf: Add rig_mode and execution_role_name
  variables
- modules/observability/iam_roles.tf: Add execution_role_observability
  inline policy (conditional on rig_mode)
- rig_custom.tfvars: Enable observability module
Copy link
Copy Markdown
Contributor

@apsknight apsknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bluecrayon52 bluecrayon52 self-requested a review March 27, 2026 19:34
Copy link
Copy Markdown
Contributor

@bluecrayon52 bluecrayon52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In main.tf you can remove the local.create_observability_module and replace with a direct reference to var.create_observability_module since we are no longer automatically disabling this feature based on the state of local.rig_mode.

Please also update the README.md to reflect changes, specifically the section that calls out the HyperPod Observability module as not supported by RIGs, and specify which Advanced Observability Metrics Configurations are supported with RIGs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants