Lone's notes

Recently Updated

Diffusion language modeling with maximum semantic likelihood
Jul 09, 2025
distill from AR LM to diffusion LM
Jul 09, 2025
LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space
Jul 09, 2025
AI resources
Jul 09, 2025
Controlling reasoning duration with activation steering
Jul 09, 2025

why

For 2 n-dimensional unit vectors $x = [x_{1}, x_{2}, \dots, x_{n}]$ and $y = [y_{1}, y_{2}, \dots, y_{n}]$

The $cos in e$ of the angle between them is $\frac{x \cdot y}{∥ x ∥ ∥ y ∥} = \sum_{i = 1}^{n} x_{i} y_{i}$
By the Central Limit Theorem, $\sum_{i = 1}^{n} x_{i} y_{i}$ approximates a normal distribution with mean zero as $n \to \infty$
So as $n \to \infty$ , the $cos in e$ is more likely to be around the mean which is 0, which means the 2 vectors are more likely to be orthogonal

notes

The vector must be sample from normal distribution, not uniform distribution. So one should use np.rand.normal and torch.normal instead of np.rand.random and torch.random
generate random unit vectors
geometry statistics

How
- Draw from a standard distribution for each dimension.
- Do not draw from a uniform distribution.
  - Or will need to apply Box-Muller transform
Implementation
```
import numpy as np
import torch
 
# numpy
v = np.random.normal(0, 1, size)
 
# torch
v = torch.normal(torch.tensor([0.0] * size), torch.tensor([1.0] * size))
```
Why that works
- Considering a 2D space, the probability of a point being at $(x, y)$ is $P (x) P (y)$
- The PDF for the standard distribution is $P (x) = \frac{1}{2 π} e^{- \frac{1}{2} x^{2}}$ , which is roughly in the form of $e^{x^{2}}$
- Thus $P (x) P (y)$ is in the form of $e^{x^{2} + y^{2}}$ , which is a function of the distance to the origin
- So the resulting distribution is radially symmetric around the origin
An intuitive explanation of why uniform sampling don’t work
- Considering a 2D space, uniformly sampling from both dimension results in a square population
- Whereas a random distribution in 2D space should be a circle
- This is called the corner effect
Link to original