[Bug][EAGLE] Crash when using mix hidden states

**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/Model-Optimizer/issues?q=is%3Aissue).**

## Describe the bug


- When enabling mix hidden states mode in EAGLE3 training, I get a crash about some autograd error:
```
[rank1]: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDABFloat16Type [2, 4096, 2880]], which is output 0 of IndexPutBackward0, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```


### Steps/Code to reproduce bug



Developed on a WIP branch for GPT-OSS EAGLE3 training. A few local changes which may be related. Creating the issue for tracking in case I cannot solve it.

### Expected behavior

### Who can help?



- ?

## System information



- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? 
- CPU architecture (x86_64, aarch64): ?
- GPU name (e.g. H100, A100, L40S): ?
- GPU memory size: ?
- Number of GPUs: ?
- Library versions (if applicable):
  - Python: ?
  - ModelOpt version or commit hash: ?
  - CUDA: ?
  - PyTorch: ?
  - Transformers: ?
  - TensorRT-LLM: ?
  - ONNXRuntime: ?
  - TensorRT: ?
- Any other details that may help: ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][EAGLE] Crash when using mix hidden states #1088

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug][EAGLE] Crash when using mix hidden states #1088

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions