Paper presented at NAACL 2022 in Seattle

1 July 2022

In the field of visiolinguistic research ("Vision and Language"), a large number of transformer-based models have been published in recent years. Architectures have been improved and models have been pre-trained on increasingly large datasets. However, there is little research that tries to understand which concepts are learned in these models.

Philipp J. Rösch from VIS and Dr. Jindřich Libovický from Charles University in Prague, investigated the influence of positional information of objects, as well as introduced new pre-training strategies. The models were evaluated on the GQA dataset, among others, where the correct answer must be given to a textual question about an image. Their work was accepted at the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) in Seattle, USA, and will be published in Findings of NAACL 2022.

Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch, Jindřich Libovický
[ACL], [PDF], [Code]

Paper presented at NAACL 2022 in Seattle

Aktuelles

Publication presented at WACV2024

Paper presented at 17th NATO OR&A Conference in Laurel, USA

Institute of Distributed Intelligent Systems

Second Place in the CVPR 2024 Affective Behavior Analysis In-The-Wild Competition

From Sports Analytics to Drone Navigation: How Competitions Lead to Excellence in Research

Paper presented at ICCV 2023 in Paris

First place in the ACM Multimedia 2023 UAVs in Multimedia Challenge