Exploiting the Bias-Variance trade-off for data fitting

Given that:

The Bias-Variance tradeoff suggests that high bias comes with low variance and high variance comes with low bias.
Overfitting is a sign of high variance low bias

Then if I have a large amount of computation, I can make many overfitting models then take the average of their predictions to get a low variance low bias ensemble model.

Perhaps this is why these methods work:

Stacking and ensemble techniques
Training large models with random sampled batches
- Different pathways are different overfitted learner on different subset of the data
- This might be considered as a learning paradigm which is different from another paradigm which is Mixture-of-Distribution learning

Lone's notes

Recently Updated

Adaptive KV cache pruning with selective features

eigen values and Page Rank

The effects of neural net layers on activation space

which features represented by Attention layers vs by MLP layers

Mechanistic Interpretability

All notes

Exploiting the Bias-Variance trade-off for data fitting

Graph View

Backlinks