My batch_size is 64, I pretrain my model for about 50000 iterations, and get a better result than pgen`s. Then I turn on the coverage mechanism, and train the model with another 2000 iterations. The coverage loss cannot decrease to 0.2 which has been mentioned in pgen model. The final result on rouge-1 metric is about 38.90. Is there any tricks to add coverage mechanism? How can I get the similar result with pgen model ?
My batch_size is 64, I pretrain my model for about 50000 iterations, and get a better result than pgen`s. Then I turn on the coverage mechanism, and train the model with another 2000 iterations. The coverage loss cannot decrease to 0.2 which has been mentioned in pgen model. The final result on rouge-1 metric is about 38.90. Is there any tricks to add coverage mechanism? How can I get the similar result with pgen model ?