Paper presented at NATO OR&A 2022 in Copenhagen

10 Oktober 2022

Today, data surrounds us at every moment and is essential for our decision-making. Informal superiority in particular can save lives and support this decision-making. However, the specific search of large amounts of data usually requires fine-tuning of the models and thus additional effort. Therefore, we investigate modern multimodal models to search soft concepts in image and video data instead of hard labels to find vehicles independent of type. The zero-shot models used in our approach do not require any further training but a new kind of searching, the so-called prompt engineering, in which a model only needs to be asked the correct query.

For this work by Philipp J. Rösch, Fabian Deuser, Konrad Habel and Prof. Oswald from VIS, two datasets were created. An image dataset with civilian and military vehicles to find suitable prompts. This resulting prompts were then tested on an second in-the-wild video dataset under real-world conditions. The work was accepted at the NATO Operations Research and Analysis (NATO OR&A 2022) and was presented in Copenhagen.