Università per Stranieri di Siena. Ricerche del CESIM e dell’Osservatorio dell’italiano diffuso fra stranieri e delle lingue immigrate in Italia

Vol. LIV, 3.2025

The IncluInstIt corpus: initial considerations for tagging genderinclusive language in Italian

Autori

Parole chiave: gender inclusive language, neo-morphemes, corpus linguistics, computer mediated communication
Data di pubblicazione: 09-03-2026

Abstract

In this paper we introduce IncluInstIT, a novel corpus of Italian genderinclusive language curated from Instagram, alongside an innovative tagging system tailored to identify inclusive morphological strategies. Unlike existing corpora, IncluInstIT captures emergent gender inclusive language forms – such as, universal feminines, ə, u, x, and split forms – used in informal digital communication. Comprising over 4,800 pre-processed posts, this corpus reflects a dynamic spectrum of inclusive expressions across hashtags, offering a diachronic view of evolving gender representations. We here present an initial annotation scheme, enriched with newly defined gender tags, with the goal of discussing ways in which NLP tools can investigate fairness and inclusivity

Autori

Irene Caiazzo - Università per Stranieri di Siena

Federica Formato - University of Brighton

Giovanna Maria Dimitri - Università degli Studi di Milano

Liana Tronci - Università per Stranieri di Siena

  • Abstract viewed - 0 times