GQA: Visual Reasoning in the Real World GQA dataset GQA paper GQA CVPR Workshop Data structure ├── Question Number ├── Annotations | ├── answer | ├── full Answer | └── question │ ├── answer ├── entailed ├── equivalent ├── fullAnswer ├── groups ├── imageId ├── isBalanced ├── question ├── semantic ├── semanticStr └── types ├── detailed ├── semantic └── structural answer imageId question Network Architecture Image-Question Aggregator Image Pretrained Tensornets github Question Pretrained ELMo using tensorflow-hub Attention model, We just use attention module Self-Attention Generative Adversarial Networks paper Attention github Requirements tensorflow-gpu==1.13.1 numpy==1.16.2 tensorflow-hub==0.4.0 python==3.7.3 cv2==4.0.0 tqdm==4.31.1