Moral Mimicry

Large Language Models Produce Moral Rationalizations Tailored to Political Groups

Gabriel Simmons

University of California, Davis

Background

  • Large language models (LLMs) can generate fluent text
  • LLMs tend to reproduce social biases
  • Moral discourse is important to the social environment
  • Moral intuitions/preferences vary by social group

Do LLMs reproduce the moral biases associated with social groups?

Implications

Risks

  • More effective disinformation
  • Entrenchment of biases
  • Polarization

Potential Benefits

  • Moral reframing, inter-group dialogue and understanding

To avoid risks and steer towards benefit, we should understand whether LLMs reproduce moral biases.

An approach to study morality

Criteria

  • Assume moral phenomena vary across social groups
  • Quantify moral content in text

Solution

Moral Foundations Theory (MFT)

Moral Foundations Theory

Five Foundations

Care/ Harm

Fairness/ Cheating

Loyalty/ Betrayal

Authority/ Subversion

Sanctity/ Degradation

Five Foundations

Care/ Harm

Liberty/ Oppression

Fairness/ Cheating

Loyalty/ Betrayal

Authority/ Subversion

Sanctity/ Degradation

Five Foundations

Individualizing

Care/ Harm

Liberty/ Oppression

Fairness/ Cheating

Binding

Loyalty/ Betrayal

Authority/ Subversion

Sanctity/ Degradation

Relative importance varies between political groups

Moral Foundations differ between US Liberals and Conservatives

  • Liberals emphasize the Fairness/Cheating and Care/Harm foundations
  • Conservatives use all five foundations more evenly [2], [3]

Moral Mimicry

General Idea: Large Language Models produce moral content that reflects the moral biases of social groups when conditioned with social group identity

This Research: LLMs produce moral content that reflects the moral biases of US liberals and conservatives when conditioned with liberal/conservative political identity

Research Questions

RQ1Can LLMs produce high-quality moral rationalizations?

RQ3Do LLMs use moral foundations differently when prompted with different political identities?

RQ6How does moral mimicry relate to model size?

Methods

Generating text from LLMs via prompting

  • GPT-style models produce text via autoregressive decoding
    • Based on the text I’ve already seen, what are the most likely words that follow
  • Control the models by prompting
    • The prompt provides some words for the model to start with

Prompting to produce moral rationalizations

Goal: Elicit moral “rationalization-like artifacts” from the model.

Prompt consists of three parts embedded in a template:

Scenario

Identity

Stance

Prompting to produce moral rationalizations

Template:

As a identity, here are the moral arguments for why scenario is stance:

Prompting to produce moral rationalizations

Prompt:

As a conservative, here are the moral arguments for why stealing from the store is immoral:

Completion:

  1. Stealing is a violation of the property rights of others.
  2. Stealing is a form of theft, which is a moral wrong.
  3. Stealing deprives others of their rightful possessions.
  4. Stealing is an act of aggression and violence.

Datasets

Moral Stories [4]

Ethics [5]

Social Chemistry 101 [6]

Models

GPT-3

Commercial model family from OpenAI

OPT [7]

Publicly-available model family from Meta AI

  • Transformer-based GPT-style pretrained LLMs
  • Several sizes (# of parameters) for each model
  • Larger model \(\rightarrow\) more capability (usually)
  • Pretrained on large corpora of web text

Measuring Moral Foundational Content in Text

Moral Foundations Dictionaries

Moral Foundations Dictionary v1 [2]

Moral Foundations Dictionary v2 [8]

Extended Moral Foundations Dictionary [9]

Recap of the Methods

  1. Prompt the LM with scenario + identity + stance
  2. Measure foundation content in the output using Moral Foundations Dictionaries
  3. Calculate difference in foundation use for liberal vs. conservative identity

Research Questions

RQ1Can LLMs produce rationalization-like artifacts of sufficient quality for us to care?

RQ3Do LLMs use moral foundations differently when prompted with different political identities? (moral mimicry)

RQ6How is the moral mimicry capability affected by model size?

Results

RQ1: Can LLMs produce rationalization-like artifacts of sufficient quality for us to care?

Methods

Hand-label 100 examples of completions from GPT-3-175B for three text qualities:

Relevance

Coherence

Stance Agreement

Methods

Separate results by normative and non-normative stance

  • Normative stance: prompt stance agrees with commonsense morality
  • Non-normative stance: prompt stance disagrees with commonsense morality

Results

  • Most results are relevant, coherent, and agree with stance
  • Quality slightly better when GPT-3 is prompted with a normative stance

Relevance, coherence, and stance agreement by agreement with commonsense

RQ3: Are LLMs moral mimics?

Results

  • Care/Harm increased for liberal prompt
  • Authority/Subversion increased for conservative prompt
  • Modest effect sizes

RQ6: Moral Mimicry increases with model scale

OPT

  • Sanctity/Degradation \(\leftrightarrow\) conservative identity
  • Care/Harm, Fairness/Cheating \(\leftrightarrow\) liberal identity

RQ6: Moral Mimicry increases with model scale

GPT

  • Authority/Subversion \(\leftrightarrow\) conservative identity
  • Care/Harm, Fairness/Cheating \(\leftrightarrow\) liberal identity

Thank you

Questions please

References

[1]
J. Haidt, The Righteous Mind: Why Good People are Divided by Politics and Religion. Vintage Books, 2013.
[2]
J. Graham, J. Haidt, and B. A. Nosek, “Liberals and conservatives rely on different sets of moral foundations,” Journal of Personality and Social Psychology, vol. 96, no. 5, pp. 1029–1046, May 2009, doi: 10.1037/a0015141.
[3]
J. A. Frimer, “Do liberals and conservatives use different moral languages? Two replications and six extensions of Graham, Haidt, and Nosek’s (2009) moral text analysis,” Journal of Research in Personality, vol. 84, p. 103906, Feb. 2020, doi: 10.1016/j.jrp.2019.103906.
[4]
D. Emelin, R. Le Bras, J. D. Hwang, M. Forbes, and Y. Choi, “Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov. 2021, pp. 698–718. doi: 10.18653/v1/2021.emnlp-main.54.
[5]
D. Hendrycks et al., “Aligning AI With Shared Human Values.” arXiv, Jul. 2021. doi: 10.48550/arXiv.2008.02275.
[6]
M. Forbes, J. D. Hwang, V. Shwartz, M. Sap, and Y. Choi, “Social Chemistry 101: Learning to Reason about Social and Moral Norms.” arXiv, Aug. 2021. doi: 10.48550/arXiv.2011.00620.
[7]
S. Zhang et al., OPT: Open Pre-trained Transformer Language Models.” arXiv, Jun. 2022. doi: 10.48550/arXiv.2205.01068.
[8]
J. Frimer, “Moral Foundations Dictionary 2.0,” Apr. 2019, doi: 10.17605/OSF.IO/EZN37.
[9]
F. R. Hopp, J. T. Fisher, D. Cornell, R. Huskey, and R. Weber, “The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text,” Behavior Research Methods, vol. 53, no. 1, pp. 232–246, Feb. 2021, doi: 10.3758/s13428-020-01433-0.