Controllable Debiasing

What is controllable debiasing?

Controllable debiasing is a new formulation of stylistic rewriting that aims to rewrite a given text and correct the implicit and potentially undesirable biases in character portrayals.

What types of portrayal biases?

In our work, we analyze gender bias in portrayal through the lens of connotation frames of power and agency, which capture knowledge about the implied power dynamics with respect to verbs. In our previous work, we showed that authors attribute significantly less power and less agency to female characters compared to male characters. Therefore, here, we create PowerTransformer to rewrite text with a new level of agency.

How does PowerTransformer work?

Our model is an encoder-decoder based on a pretrained language model. We train it to reconstruct story sentences from which we've remove agency markers (i.e., verbs). Additionally, we also jointly train it on an out-of-domain paraphrasing task, which teaches the model to rewrite more than just one word. Then, at test time, we also incorporate agency information by boosting the probability of words with the desired level of agency. In our paper, we explore how important different components of this model are, and show that both the joint reconstruction-paraphrasing and the vocab boosting yield significant benefits in performance.

Does PowerTransformer actually mitigate gender biases?

As a case study for our model, we re-visit the movie scripts from our original analyses and attempt to rewrite the sentences that describe female characters to give them higher agency. We show that, with PowerTransformer, we can reverse the gender bias and give female characters substantially more agency than male characters, reversing the effect in the original movie scripts. Of course, this is only a pilot study, and automatically rewriting an entire movie should not be done without human supervision.

What are the implications of Controllable Debiasing?

We believe that this task has the potential to help authors when writing stories or movies, by providing alternative portrayals of characters with different connotations or framings. Specifically, a machine-in-the-loop writing system could help authors measure and address biases in their writing using PowerTransformer. This, in turn, could mitigate the negative effects of stereotypical portrayals and could help debunk gender roles.

MTurk templates: [Agency qualification task] [Head-to-head evaluation]

Examples of using connotation frames (Sap et al., 2017) for controllable revisions to portray characters with more agency and power. In the first example, automatically rewriting "Mey daydreamed about being a doctor" as "Mey pursued her dream to be a doctor" portrays Mey with more authority and decisiveness. In the second example, "Ana strutted" implies that she is more active and decisive, compared to "Ana wandered" which portrays her as aimless and passive.

PowerTransformer

Unsupervised Controllable Revision for Biased Language Correction

What is controllable debiasing?

What types of portrayal biases?

How does PowerTransformer work?

Does PowerTransformer actually mitigate gender biases?

What are the implications of Controllable Debiasing?