Solution is longer thant exercise

section “Dealing with correlation between one-hot encoded features” is missing from the exercise.

This is on purpose. This is an additional section to go further but we don’t want to ask questions because some concepts were not discussed in the lecture.

3 Likes

As I understood, training a linear regression model, withou regularization, on columns generated by the OneHotEncoder will leads to computational problems.

Such problems could be avoided either by:

  • using regularization
  • using the drop=“first” argument of the OneHotEncoder to drop the redundant data.

Is that what you are trying to show in this additional part of the solution?

Indeed, this is what we want to show there.

I found this second section in the solution informative, but I almost missed it. I understand it’s meant to be optional, but you may want to consider adding a ‘note’ in the notebook to indicate the presence of this additional content.