Hi, could you please provide the range of the learning rate, or other hyper-parameter settings for the zero-shot experiments on the COCO-20i dataset? It is difficult to reproduce the results shown in the paper.
I use ViT-L/16 as backbone, and the results are 10 points lower than yours.