Poster presented at VizWiz Challenge (CVPR 2022)

28 August 2022

The "VizWiz Grand Challenge" is a workshop at CVPR and aims to support visually impaired people in their everyday life. There are a total of 3 tasks that can be completed in this challenge: Visual Question Answering (incl. Answerability prediction) and Visual Grounding and few-shot object recognition challenge. The VIS team have been working on VQA.

Fabian Deuser, Konrad Habel, Philipp J. Rösch and Norbert Oswald have used a simple as well as elegant approach. Based on CLIP features (Radford et al., 2021) they built a classifier using MLP. The classification is supported by an Answer Type Gate. The elegant thing about the model is that it can be trained easily, since only a few parameters have to be updated. In addition, the single models (without ensemble) already achieve very good results, which is an advantage over a large number of competing models.

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model
Fabian Deuser, Konrad Habel, Philipp J. Rösch, Norbert Oswald

[arXiv], [VizWiz]