When resuming training with the train_continue flag, only the model weights are uploaded/restored. The optimizer state is not preserved. As a result, training does not truly resume from the previous checkpoint and may lead to unstable updates and noticeable performance degradation after continuing training.
When resuming training with the train_continue flag, only the model weights are uploaded/restored. The optimizer state is not preserved. As a result, training does not truly resume from the previous checkpoint and may lead to unstable updates and noticeable performance degradation after continuing training.