Solution is longer thant exercise

Fox-PF · 6 March 2022 14:21

section “Dealing with correlation between one-hot encoded features” is missing from the exercise.

glemaitre58 · 6 March 2022 15:28

This is on purpose. This is an additional section to go further but we don’t want to ask questions because some concepts were not discussed in the lecture.

Manianis · 20 March 2022 05:55

As I understood, training a linear regression model, withou regularization, on columns generated by the OneHotEncoder will leads to computational problems.

Such problems could be avoided either by:

using regularization
using the drop=“first” argument of the OneHotEncoder to drop the redundant data.

Is that what you are trying to show in this additional part of the solution?

glemaitre58 · 20 March 2022 18:08

Indeed, this is what we want to show there.

cle_ber · 30 March 2022 10:05

I found this second section in the solution informative, but I almost missed it. I understand it’s meant to be optional, but you may want to consider adding a ‘note’ in the notebook to indicate the presence of this additional content.