In your code, you add dropout after relu2, your process is like dropout(relu2(bn2(conv1(relu1(bn1(x)))))).
But it this code, he adds dropout after conv1, his process is like relu2(bn2(dropout(conv1(relu1(bn1(x)))))).
Does it matter? What is the difference on performance between the two methods? I'm troubled, I can't reproduce the performance on CIFAR10(I only get accuracy at 93.2%) using the second method.
In your code, you add dropout after relu2, your process is like
dropout(relu2(bn2(conv1(relu1(bn1(x)))))).But it this code, he adds dropout after conv1, his process is like
relu2(bn2(dropout(conv1(relu1(bn1(x)))))).Does it matter? What is the difference on performance between the two methods? I'm troubled, I can't reproduce the performance on CIFAR10(I only get accuracy at 93.2%) using the second method.