Poster presented at VizWiz Challenge (CVPR 2022)

28 August 2022

The "VizWiz Grand Challenge" is a workshop at CVPR and aims to support visually impaired people in their everyday life. There are a total of 3 tasks that can be completed in this challenge: Visual Question Answering (incl. Answerability prediction) and Visual Grounding and few-shot object recognition challenge. The VIS team have been working on VQA.

Fabian Deuser, Konrad Habel, Philipp J. Rösch and Norbert Oswald have used a simple as well as elegant approach. Based on CLIP features (Radford et al., 2021) they built a classifier using MLP. The classification is supported by an Answer Type Gate. The elegant thing about the model is that it can be trained easily, since only a few parameters have to be updated. In addition, the single models (without ensemble) already achieve very good results, which is an advantage over a large number of competing models.

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model
Fabian Deuser, Konrad Habel, Philipp J. Rösch, Norbert Oswald

[arXiv], [VizWiz]

Poster presented at VizWiz Challenge (CVPR 2022)

Aktuelles

First Place in the CVPR 2025 Affective Behavior Analysis In-The-Wild Competition

Best Paper Award Winner - GeoCV Workshop at WACV 2025

Paper published in the CVIU Journal

First place in the ACM Multimedia 2024 UAVs in Multimedia Challenge

Publication presented at WACV2024

Paper presented at 17th NATO OR&A Conference in Laurel, USA

Institute of Distributed Intelligent Systems