research

Inspired by

So if we can find the direction for multiple problem solving strategies, and suppose that we know a problem can be attack using certain approaches, then we can preload those “problem solving strategy” directions into the residual stream.

Take competitive programming for example, if we have the feature direction for dfs, bfs, binary search, dynamic programming, etc., we can load some of them to the residual stream to make the model explores those directions explicitly.