Set include_bias=False for PolynomialFeatures class . Why is it so?

ashish_k · 13 December 2022 13:03

The explanation is given as follows:
In the previous cell we had to set include_bias=False as otherwise we would create a column perfectly correlated to the intercept_ introduced by the LinearRegression .
I am unable to understand this.

lesteve · 14 December 2022 09:53

This is a bit tricky indeed and we hide the details under the rug.

Imagine you have a single features with 3 samples

data_train = np.array([
    [1.],
    [2.],
    [3.]])

After polynomial feature expansion with degree=2 and include_bias=True:

data_transformed_array = np.array(
    [[1., 1., 1.],
    [1., 2., 4.],
    [1., 3., 9.]])

Your linear model model has the following form:

intercept + coeff0 * x0 + coeff1 *  x1 + coeff2 * x2

Where x0 is the value of the column of ones, x1 is the value of the original feature and x2 is the value of the original feature squared.

Since x0 is always equal to 1 (by construction), your model is of the form:

intercept + coeff0 + coeff1 * x1 + coeff2 * x2

So that intercept and coeff0 do the same thing, we can add 1 to intercept and remove 1 to coeff0 and the models stays the same, this is what we means by saying perfectly correlated.