PCA and eigenvectors and eigenvalues

From reading the excellent answer at pca - Making sense of principal component analysis, eigenvectors & eigenvalues - Cross Validated

Principal components are the projections of the data onto the principal axes.
Principal axes are a series of orthogonal unit vectors that
- best fit the data (minimize average squared distance)
- and maximize variance of the projection onto that vector
- these 2 metrics are actually the same (due to the Pythagorean theorem)
The eigenvectors of the covariance matrix are the principal components. Why ?
- The covariance matrix is symmetrical, thus it can be diagonalized by choosing a new orthogonal system formed by its eigenvectors ¹
- The covariances of this new system are all 0 (the non-diagonal entries), and the eigenvalues (the entries on the diagonal) are the variances along each basis
- Regardless of the projection, the variance of the projected data will be a weighted average of these eigenvalues
- Hence the highest variance possible is the biggest eigenvalues, making it the first principal component, and the rest are other principal components

SVD and PCA

From reading another excellent answer from the same author at linear algebra - What is the intuitive relationship between SVD and PCA? - Mathematics Stack Exchange

With $X$ is the data matrix, the covariance matrix is $\frac{1}{n - 1} X X^{T}$

In PCA, the covariance matrix is diagonalized using eigenvectors and eigenvalues

\frac{1}{n - 1} X X^{T} = \frac{1}{n - 1} W D W^{T}

With columns of $W$ are the eigenvalues and the diagonal of $D$ are the eigenvalues

In SVD, the data matrix is decomposed into $X = U Σ V^{T}$ where $U$ and $V$ are orthogonal matrices ( $U U^{T} = I$ ) and $Σ$ is a diagonal matrix.

Then the covariance matrix is

\frac{1}{n - 1} X X^{T} = \frac{1}{n - 1} (U Σ V^{T}) (U Σ V^{T})^{T} = \frac{1}{n - 1} (U Σ V^{T}) (V Σ U^{T}) = \frac{1}{n - 1} (U Σ^{2} U^{T})

Thus the square roots of the eigenvalues of $X X^{T}$ is the single values of $X$

D = Σ^{2}

“In fact, using the SVD to perform PCA makes much better sense numerically than forming the covariance matrix to begin with, since the formation of $X X^{T}$ can cause loss of precision.”²

Lone's notes

Recently Updated

Diffusion language modeling with maximum semantic likelihood

distill from AR LM to diffusion LM

LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space

AI resources

Controlling reasoning duration with activation steering

All notes

Connections of SVD, PCA, eigenvectors and eigenvalues

PCA and eigenvectors and eigenvalues

SVD and PCA

references

Graph View

Table of Contents

Lone's notes

Recently Updated

Diffusion language modeling with maximum semantic likelihood

distill from AR LM to diffusion LM

LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space

AI resources

Controlling reasoning duration with activation steering

All notes

Connections of SVD, PCA, eigenvectors and eigenvalues

PCA and eigenvectors and eigenvalues

SVD and PCA

references

Footnotes

Graph View

Table of Contents