Cite

Venhoff, Constantin, et al. Understanding Reasoning in Thinking Language Models via Steering Vectors. arXiv:2506.18167, arXiv, 24 June 2025. arXiv.org, https://doi.org/10.48550/arXiv.2506.18167.

Metadata

Title: Understanding Reasoning in Thinking Language Models via Steering Vectors Authors: Constantin Venhoff, Iván Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda Cite key: venhoff2025Understanding

Links

Online Link

Zotero PDF Link

Abstract

Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, controlling their reasoning processes remains challenging. This work presents a steering approach for thinking LLMs by analyzing and manipulating specific reasoning behaviors in DeepSeek-R1-Distill models. Through a systematic experiment on 500 tasks across 10 diverse categories, we identify several reasoning behaviors exhibited by thinking models, including expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model’s activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model’s reasoning process, such as its tendency to backtrack or express uncertainty. Our approach offers practical tools for steering reasoning processes in thinking models in a controlled and interpretable manner. We validate our steering method using three DeepSeek-R1-Distill models, demonstrating consistent control across different model architectures.

Notes

From Obsidian

(As notes and annotations from Zotero are one-way synced, this section include a link to another note within Obsidian to host further notes)

Understanding-Reasoning-in-Thinking-Language-Models-via-Steering-Vectors

From Zotero

(one-way sync from Zotero)

Annotations

Highlighting colour codes

Note: highlights for quicker reading or comments stemmed from reading the paper but might not be too related to the paper

External Insight: Insights from other works but was mentioned in the paper

Question/Critic: questions or comments on the content of paper

Claim: what the paper claims to have found/achieved

Finding: new knowledge presented by the paper

Important: anything interesting enough (findings, insights, ideas, etc.) that’s worth remembering

Link to original

From Zotero

(one-way sync from Zotero)

Lone's notes

Recently Updated

Diffusion language modeling with maximum semantic likelihood

AI resources

Controlling reasoning duration with activation steering

Problem solving by superpositional steering

Steering multiple behaviours jointly

All notes

Understanding Reasoning in Thinking Language Models via Steering Vectors

Notes

From Obsidian

From Zotero

Annotations

Highlighting colour codes

From Zotero

Graph View

Table of Contents

Backlinks