I got an AssertionError: Mask is silently ignored due to the use of a custom kernel when training GPT-2 with examples/pretrain_gpt.sh.
This line leads to the assertion error:
|
assert mask is None, "Mask is silently ignored due to the use of a custom kernel" |
Is this assertion necessary? And is it even correct?
I got an
AssertionError: Mask is silently ignored due to the use of a custom kernelwhen training GPT-2 withexamples/pretrain_gpt.sh.This line leads to the assertion error:
Megatron-DeepSpeed/megatron/model/fused_softmax.py
Line 191 in 8387ae1
Is this assertion necessary? And is it even correct?