Hello!
I'm running into a reshaping error when using RL and intermediate rewards.
The output of intermediate_rewards() is a # list of max_dec_step * (batch_size, k)(line 241)
and then this is stacked and has shape (batch_size, k) - stored in self.sampling_discounted_rewards.
But then in _add_loss_op(), you iterate k times and append:
for _ in range(self._hps.k):
self._sampled_rewards.append(self.sampling_discounted_rewards[:, :, _]) # shape (max_enc_steps, batch_size)
But the index [:, :, _] would run into a dimension error because the shape of self.sampling_discounted_rewards is (batch_size, k).
Am I missing something here? What should be the correct shape/reshaping? Thank you for uploading this code!
Hello!
I'm running into a reshaping error when using RL and intermediate rewards.
The output of
intermediate_rewards()is a# list of max_dec_step * (batch_size, k)(line 241)and then this is stacked and has shape
(batch_size, k)- stored inself.sampling_discounted_rewards.But then in
_add_loss_op(), you iterate k times and append:But the index [:, :, _] would run into a dimension error because the shape of
self.sampling_discounted_rewardsis(batch_size, k).Am I missing something here? What should be the correct shape/reshaping? Thank you for uploading this code!