Cite

Tigges, Curt, et al. Linear Representations of Sentiment in Large Language Models. arXiv:2310.15154, arXiv, 23 Oct. 2023. arXiv.org, http://arxiv.org/abs/2310.15154.

Metadata

Title: Linear Representations of Sentiment in Large Language Models Authors: Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda Cite key: tigges2023

Links

Abstract

Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through causal interventions, we isolate this direction and show it is causally relevant in both toy tasks and real world datasets such as Stanford Sentiment Treebank. Through this case study we model a thorough investigation of what a single direction means on a broad data distribution. We further uncover the mechanisms that involve this direction, highlighting the roles of a small subset of attention heads and neurons. Finally, we discover a phenomenon which we term the summarization motif: sentiment is not solely represented on emotionally charged words, but is additionally summarized at intermediate positions without inherent sentiment, such as punctuation and names. We show that in Stanford Sentiment Treebank zero-shot classification, 76% of above-chance classification accuracy is lost when ablating the sentiment direction, nearly half of which (36%) is due to ablating the summarized sentiment direction exclusively at comma positions.

Notes

From Obsidian

(As notes and annotations from Zotero are one-way synced, this section include a link to another note within Obsidian to host further notes)

Linear-Representations-of-Sentiment-in-Large-Language-Models

From Zotero

(one-way sync from Zotero)

Annotations

Highlighting colour codes

  • Note: highlights for quicker reading or comments stemmed from reading the paper but might not be too related to the paper
  • External Insight: Insights from other works but was mentioned in the paper
  • Question/Critic: questions or comments on the content of paper
  • Claim: what the paper claims to have found/achieved
  • Finding: new knowledge presented by the paper
  • Important: anything interesting enough (findings, insights, ideas, etc.) that’s worth remembering
Link to original

From Zotero

(one-way sync from Zotero)