Maarten Sap

Risk of Racial Bias in Hate Speech Detection

There are three files:

Founta dataset with dialect extracted


Davidson dataset with dialect extracted


Mturk re-annotations with no/dialect/race priming:

Collected with this MTurk template, sap2019risk_mTurkExperiment.csv contains the annotations from our pilot study, with the following columns:


Annotators with Attitudes

Download annotated data: annWithAttitudes.tgz

Qual file: annWithAttitudes-Qual.html

Large-scale question: annWithAttitudes-LargeScale.html


Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi & Noah A. Smith (2022) Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. NAACL.