Paper presented at NAACL 2022 in Seattle

7 Oktober 2022

In the field of visiolinguistic research ("Vision and Language"), a large number of transformer-based models have been published in recent years. Architectures have been improved and models have been pre-trained on increasingly large datasets. However, there is little research that tries to understand which concepts are learned in these models.

Philipp J. Rösch from VIS and Dr. Jindřich Libovický from Charles University in Prague, investigated the influence of positional information of objects, as well as introduced new pre-training strategies. The models were evaluated on the GQA dataset, among others, where the correct answer must be given to a textual question about an image. Their work was accepted at the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) in Seattle, USA, and will be published in Findings of NAACL 2022.

Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch, Jindřich Libovický
[ACL], [PDF], [Code]

Paper presented at NAACL 2022 in Seattle

Aktuelles

Zweiter Platz beim CVPR 2024-Wettbewerb Affective Behavior Analysis in-the-wild

Von der Sportanalytik zur Drohnen-Navigation: wie Wettbewerbe zu Spitzenleistungen in der Forschung führen

Publication presented at WACV2024

Paper presented at 17th NATO OR&A Conference in Laurel, USA

Paper presented at ICCV 2023 in Paris

First place in the ACM Multimedia 2023 UAVs in Multimedia Challenge

Hosting "1st Workshop on Vision-Based Structural Inspections in Civil Engineering" at WACV2024.