Possible shaping error on _add_loss_op() in model.py

Hello!

I'm running into a reshaping error when using RL and intermediate rewards.

The output of `intermediate_rewards()` is a `# list of max_dec_step * (batch_size, k)`(line 241)

and then this is stacked and has shape `(batch_size, k)` - stored in `self.sampling_discounted_rewards`.

But then in `_add_loss_op()`, you iterate k times and append:
```
for _ in range(self._hps.k):
    self._sampled_rewards.append(self.sampling_discounted_rewards[:, :, _]) # shape (max_enc_steps, batch_size)
```

But the index [:, :, _] would run into a dimension error because the shape of `self.sampling_discounted_rewards` is `(batch_size, k)`. 

Am I missing something here? What should be the correct shape/reshaping? Thank you for uploading this code! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible shaping error on _add_loss_op() in model.py #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible shaping error on _add_loss_op() in model.py #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions