Hierarchical sliding window transformers

Lone's notes

Recently Updated

Adaptive KV cache pruning with selective features
Feb 25, 2025
eigen values and Page Rank
Mar 02, 2025
The effects of neural net layers on activation space
Oct 29, 2024
which features represented by Attention layers vs by MLP layers
Dec 03, 2024
Mechanistic Interpretability
Oct 24, 2024