The ability to make analogies indicates intelligence. If it helps you understanding things, don't be afraid of making too many analogies.

- - IEEE 754 floating-point
- - relay ssh-agent and gpg-agent in windows
    set automounts option for WSL
    symlink windows exe to wsl
- - Organize your Omnivore library with labels
(some) functions are vectors
`model.eval()` vs `torch.no_grad()`
a hypothesis for the mechanism of reasoning
Adaptive KV cache pruning with selective features
Adaptive LoRA
alias in CMD
anyway vs any way vs anyways
Aphrodite
Applying to Ph.D. Programs in Computer Science - Mor Harchol-Balter, CMU
Approximate Nearest Neighbors
AQLM
associative memory
auto start ssh-agent
AutoAWQ
auxiliary loss for language models
Bertrand's theorem
Best and Worst of both worlds - combining LSTM and Transformers
binary prefix trie
Bring Windows features to MacOS
bubble sort
Byte-level BPE
Byte-Pair Encoding (BPE)
Calculate GPU memory requirement and tokens for any LLM
career advices from Terrance Tao
Central Limit Theorem
Changing MLP layers in Transformers to Probabilistic encoders
chat agent with action planning
combination
Combining Modular Skills in Multitask Learning
common bit manipulation
complex number vs 2d vectors
Conformal prediction
Connections of SVD, PCA, eigenvectors and eigenvalues
count the GCD values of all pairs
coupling object and relation representation
Cross-lingual transfer by forced alignment of embeddings in different languages
Cross-moments of a random vector
CTransformers
curse of dimenstionality
detect if PATH has a specific directory entry in it
DFS, BFS, Dynamic Programming and LLM decoding
discriminative and generative models
Discriminative-Generative Learning
Dynamic-sized latent representations
Efficient Transformers with Dynamic Token Pooling
eigen values and Page Rank
Euler's theorem
EXL2
Exploiting the Bias-Variance trade-off for data fitting
Failure is a favour to the future
Fenwick Tree
GCD Convolution
generate primes
generate random unit vectors
GGML
GGUF
GPTQ-for-LLaMa
Graduate Application Aid
Graduate Applications Advice - Nathan Lambert
Graph of Life
Hierarchical sliding window transformers
Highlight Colour Codings
how does smaller versions of the same llm (3b, 7b, 13b, etc.) are trained
how to "average" a set of vectors
How To Train Your LLM Efficiently
hybrid of full weight update and PEFT
Hyrum's Law
I do things to satisfactory.
I know nothing
IELTS materials
Information Retrieval service
Is an LLM a one giant Hidden Markov model ?
it's better to learn unique skills instead of common ones
Just be. Smart. - letam.io
karabiner complex modification to make macOS feel more like Windows
learning by interacting with oneself
Legendre's conjecture
limit equals to e
LLM and HNSW
llm fine-tuning resources
LLMs development
local AI server
Local LLM
LoRA-abliterated
Machine Learning vocabs
me
merge sort
Mixture of Experts
Mixture-of-Distribution learning
ML might be bad for science and the importance of understanding how ML works
modular multi-modal hard routing
Moravec's paradox
my hybrid sort
my naive theory on ML
my systematic errors in English
No-Free-Lunch theorem
overfitting-underfitting and variance-bias
priority queue
profile vs rc scripts
programming interview questions to ask
Pursuit ignorance, pursuit mastery.
python max heap
quick sort
random unit vectors in high dimension are nearly orthogonal
Refusal behaviour can be controlled by turning a knob
related concepts might be packed in a low-dimensional subspace
relay ssh-agent and gpg-agent in windows
research interests
Riemannian Manifolds and Fisher Information
Rotate any vector to a target angle in high dimensional space
Rotate from one vector to another vector in high dimensional space
safetensor
Scaling Sparse Fine-Tuning to Large Language Models
Segment Tree
SentencePiece
Set a deadline to end, do not set a deadline to start.
set automounts option for WSL
setup pyenv
SFT - Supervised Fine-tuning
Sharded checkpointing
show all files cmd
speculative decoding
Stanford cheatsheet
Structure LLMs output
stupid things
subfractorial
The Bitter Lesson
The effect of BatchNorm vs LayerNorm vs RMSNorm
The effects of neural net layers on activation space
The only work that really matters is the work that no one sees
The Reparameterization trick
the type of system design philosophy that I like
The wrong way of teaching
Theory of activation space
tokenizers for LLMs
Toy Models of Superposition
Transforming column vectors of model weights to standard bases
trie
Unigram Tokenization
Universal Approximation Theorems
vector database
vector space
vimium c configs
vLLM
VRDSynth
what ML models learns and what's the real solution
When we think of what could go wrong, we achieve so little.
which features represented by Attention layers vs by MLP layers
why I like ML
WordPiece
Writing to learn
yet another hypothesis on the mechanism of LLMs

Lone's notes

Recently Updated

Adaptive KV cache pruning with selective features

eigen values and Page Rank

The effects of neural net layers on activation space

which features represented by Attention layers vs by MLP layers

Mechanistic Interpretability

All notes

The ability to make analogies indicates intelligence. If it helps you understanding things, don't be afraid of making too many analogies.

Graph View