theory

  • Variance and Bias in Machine Learning are for the learning algorithm (in this case, the model architecture + the training algorithm + optimizing algorithm + etc.), not for prediction of a model.

  • Training a model is estimating the true parameters of the underlying population using a sample of the population (i.e. the training set) (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated) ==Parameter estimates = random variables== take a subset of the population and train a model = making an estimation it makes sense to discuss the expectation and variance of your estimations from multiple training runs

  • The conceptual problem is that we usually don't see these random variables. All we see is a single sample (or subset, i.e. the training data) from our population, and a single model, and a single realization of our parameter estimates. (ref: machine learning - What is meant by Low Bias and High Variance of the Model? - Cross Validated)

  • Overfitting and Underfitting are indications of high/low variance/bias

    • If we have model specification just right, i.e. the model is the true data generation process (GDP), then the parameter estimation will be unbiased and have minimum variance.
    • If our model uses insufficient predictors (estimate using or ), then the parameter estimation will be biased, because the estimated coefficients will always be different from the true coefficients.
      • too small/simple models is a case of this
    • If our model is mis-specified (uses redundant or wrong predictors), then the variance will be high.
      • If we over represent the GDP (i.e. model is more complex/bigger than necessary, i.e. estimate using or , i.e. have superfluous/redundant predictors), then the parameter estimation will be unbiased and has high variance.
        • The more superfluous predictors there are, the higher the variance.