Paper accepted in AI & Society journal
14 April 2026
Excited to share our new paper: “Not a statistical accident: framing bias as a semiotic property of image datasets,” now published in AI & Society.
This work is led by PhD researcher Simone Fabrizi, in collaboration with Symeon Papadopoulos and Yiannis Kompatsiaris from CERTH, Greece.
In this paper, we argue that image datasets are not neutral samples of reality but semiotic systems that frame concepts through recurring visual associations, making framing bias an inherent property of datasets.
Key takeaways:
- Bias is not just statistical. Bias is often attributed to sampling errors, imbalance, or annotation issues. We argue that bias is structural and interpretive: datasets function as meaning-making systems that inevitably encode perspectives.
- Meaning emerges through co-occurrence. Concepts are framed through recurring visual associations (e.g., women + kitchens, Africa + poverty) which create dataset-level narratives that models can learn and reproduce.
- Bias also appears through omission. In our analysis of Visual Genome, leisure activities are heavily represented while everyday labour is largely absent. Despite aiming to represent the “visual world”, the dataset portrays only a narrow slice of reality.
- Framing works in both directions. Individual images contribute to dataset-level narratives, while the dataset context shapes how individual images are interpreted.
Implications: Instead of asking only: "Is my dataset balanced?" we should also ask:
- What worldview does my dataset encode?
- What stories does my dataset tell?
- Which contexts dominate, and which ones are missing?
No dataset can be completely unbiased. What we need instead is better bias understanding, dataset documentation, and context-aware deployment.