- Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.
- Variance is about the stability of a model in response to new training examples. An algorithm like K-nearest neighbours has low bias (because it doesnâ€™t really assume anything special about the distribution of the data points) but high variance, because it can easily change its prediction in response to the composition of the training set.
- Bias relates to the ability of your model function to approximate the data, and so high bias is related to under-fitting.
- if a model produces a constant output, not depending on training data, the variance is zero, but the bias is huge (it is underfitting).
- if a model fits any training point, the bias is zero, but the variance is potentially huge (it is overfitting).
- if we overfit, we will have large variance.
- if we underfit, we will have large bias.