ec2-terminate-by-tag: WaitForEC2Down reports "instance is not in stopped state" even when EC2 is confirmed stopped in AWS Console

## Bug Report

### Summary

The `ec2-terminate-by-tag` experiment consistently fails with "instance is not in stopped state" error during the `WaitForEC2Down` phase, even though the target EC2 instance is confirmed to be in `stopped` state via the AWS Console.

### Environment

- LitmusChaos version: 3.28 (ChaosCenter on EKS)
- go-runner image: `litmuschaos/go-runner:3.28.0`
- AWS Region: ap-northeast-2
- Target: EC2 instances managed by Auto Scaling Group (ASG)
- EC2 instances are tagged and running a Go web server connected to RDS PostgreSQL
- Chaos was injected directly into an EC2 instance, not into an EC2 instance that was provisioned as an EKS node

### Steps to Reproduce

1. Set up 2 EC2 instances in an ASG with tag `ChaosTarget:True`
2. Configure `ec2-terminate-by-tag` experiment with:
   - Total Chaos Duration: 300
   - Chaos Interval: 300
   - Instance Affected Perc: 100 (also tested with 50)
   - Sequence: parallel (also tested with serial)
   - Managed Nodegroup: enable (also tested with disable)
   - Default Health Check: false
   - TIMEOUT: 300
   - DELAY: 10
3. Run the experiment

### Expected Behavior

The experiment should:
1. Stop the target EC2 instance(s)
2. Detect the `stopped` state via `WaitForEC2Down`
3. Wait for chaos duration
4. Complete successfully (or start the instance back if Managed Nodegroup is disabled)

### Actual Behavior

- The EC2 instance is successfully stopped (confirmed via AWS Console showing `stopped` state)
- The experiment pod logs show it is polling for the stopped state
- However, `WaitForEC2Down` reports failure with "instance is not in stopped state"
- The experiment ends with `CHAOS_INJECT_ERROR`

### Error Log

time="2026-04-28T02:29:58Z" level=info msg="The instance state is stopped"
time="2026-04-28T02:30:00Z" level=info msg="The instance state is stopped"
time="2026-04-28T02:30:02Z" level=info msg="The instance state is stopped"
time="2026-04-28T02:30:04Z" level=info msg="[Probe]: {Actual value: 2}, {Expected value: 0}, {Operator: >=}"
time="2026-04-28T02:30:04Z" level=info msg="The instance state is stopped"
time="2026-04-28T02:30:06Z" level=error msg="Chaos injection failed: could not run chaos in parallel mode\n --- at /litmus-go/chaoslib/litmus/ec2-terminate-by-tag/lib/ec2-terminate-by-tag.go:63 (PrepareEC2TerminateByTag) ---\nCaused by: ec2 instance failed to stop\n --- at /litmus-go/chaoslib/litmus/ec2-terminate-by-tag/lib/ec2-terminate-by-tag.go:188 (injectChaosInParallelMode) ---\nCaused by: {\"errorCode\":\"CHAOS_INJECT_ERROR\",\"reason\":\"instance is not in stopped state\",\"target\":\"{EC2 Instance ID: i-xxxxx, Region: ap-northeast-2}\"}"


### Configurations Tested (all produced the same error)

| Setting | Values Tested |
|---------|--------------|
| Sequence | parallel, serial |
| Managed Nodegroup | enable, disable |
| Instance Affected Perc | 50, 100 |
| Total Chaos Duration | 120, 300, 900 |
| Chaos Interval | 30, 60, 300 |
| TIMEOUT (env var) | default, 300 |
| DELAY (env var) | default, 10 |
| Default Health Check | false |

### Additional Context

- When `Managed Nodegroup: disable`, LitmusChaos starts the stopped instance after chaos duration, causing conflict with ASG which has already launched a replacement instance. ASG then terminates the excess instance.
- When `Managed Nodegroup: enable`, the same `WaitForEC2Down` error occurs.
- The actual chaos injection (EC2 stop) works correctly. The issue is only in the status verification logic within `WaitForEC2Down`.
- A similar but separate issue exists for `rds-instance-stop` where the RDS Instance Identifier and Region parameters appear to be swapped internally during serial mode execution.
- With the ASG's desired, minimum, and maximum instances all set to 2, I injected chaos into one of them. Even though I can manually confirm the instance transitioned to the stopped state, Litmus Chaos fails to verify the successful termination and throws the error message: 'instance is not in stopped state'.

### Suspected Root Cause

The `WaitForEC2Down` function may have an internal polling limit or state check logic that fails when ASG is involved, possibly because ASG changes the instance state (e.g., terminates the stopped instance) before `WaitForEC2Down` can confirm the `stopped` state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ec2-terminate-by-tag: WaitForEC2Down reports "instance is not in stopped state" even when EC2 is confirmed stopped in AWS Console #5491

Bug Report

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Log

Configurations Tested (all produced the same error)

Additional Context

Suspected Root Cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setting	Values Tested
Sequence	parallel, serial
Managed Nodegroup	enable, disable
Instance Affected Perc	50, 100
Total Chaos Duration	120, 300, 900
Chaos Interval	30, 60, 300
TIMEOUT (env var)	default, 300
DELAY (env var)	default, 10
Default Health Check	false

ec2-terminate-by-tag: WaitForEC2Down reports "instance is not in stopped state" even when EC2 is confirmed stopped in AWS Console #5491

Description

Bug Report

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Log

Configurations Tested (all produced the same error)

Additional Context

Suspected Root Cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions