Without cross-validation, adding predictors always reduces the residual sum of squares (or possibly leaves it unchanged).
For wide classes of linear models, the total sum of squares equals the explained sum of squares plus the residual sum of squares.
In statistics, the residual sum of squares (RSS) is the sum of squares of residuals.
In the setting where one has a set of observations and an operator matrix , then the residual sum of squares can be written as a quadratic form in :
If partitions are not known, the residual sum of squares can be used to choose optimal separation points.
Hence the residual sum of squares has been completely decomposed into two components.
Note that the residual sum of squares can be further partitioned as the lack-of-fit sum of squares plus the sum of squares due to pure error.
Mallows's C addresses the issue of overfitting, in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model.
Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected.
Another convenient form arises if the σ are assumed to be identical and the residual sum of squares (RSS) is available.