Intuition on understanding the meaning of variance and bias for Machine Learning

theory

Variance and Bias in Machine Learning are for the learning algorithm (in this case, the model architecture + the training algorithm + optimizing algorithm + etc.), not for prediction of a model.
- → The output of a training algorithm is the estimated parameters, while the output of a model is its prediction
- → Variance and Bias are measures for the estimated parameters
  - Bias is the difference between the true value of a parameter and the expected value of an estimate of the parameter. (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated) $bia s = ∣ T r u e P a r am e t er - E (A llP a r am e t er E s t ima t i o n s) ∣$
  - Variance is the average of the square of the difference between an parameter estimation and the expected value of parameter estimation. (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated) $v a r ian e = E ((E a c h E s t ima t i o n - E (A llP a r am e t er E s t ima t i o n s))^{2})$
Training a model is estimating the true parameters of the underlying population using a sample of the population (i.e. the training set) (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated) → ==Parameter estimates = random variables== → take a subset of the population and train a model = making an estimation → it makes sense to discuss the expectation and variance of your estimations from multiple training runs
The conceptual problem is that we usually don't see these random variables. All we see is a single sample (or subset, i.e. the training data) from our population, and a single model, and a single realization of our parameter estimates. (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated)
Overfitting and Underfitting are indications of high/low variance/bias
- If we have model specification just right, i.e. the model is the true data generation process (GDP), then the parameter estimation will be unbiased and have minimum variance.
- If our model uses insufficient predictors (estimate $y = x + x^{2}$ using $y = x$ or $y = x^{3}$ ), then the parameter estimation will be biased, because the estimated coefficients will always be different from the true coefficients.
  - too small/simple models is a case of this
- If our model is mis-specified (uses redundant or wrong predictors), then the variance will be high.
  - If we over represent the GDP (i.e. model is more complex/bigger than necessary, i.e. estimate $y = x^{2}$ using $y^{'} = x + x^{2}$ or $y^{'} = x + x^{2} + x^{3}$ , i.e. have superfluous/redundant predictors), then the parameter estimation will be unbiased and has high variance.
    - The more superfluous predictors there are, the higher the variance.

Lone's notes

Recently Updated

Adaptive KV cache pruning with selective features

eigen values and Page Rank

The effects of neural net layers on activation space

which features represented by Attention layers vs by MLP layers

Mechanistic Interpretability

All notes

Intuition on understanding the meaning of variance and bias for Machine Learning

Graph View

Backlinks