Choice of column

Thank you for making this excellent course available.

In this module, you point out that the features “education-num” and “education” reflect the same underlying information. It would be useless and could even introduce a biais to keep both. So you drop “education-num” and keep “education”.

Can you explain this choice?

It seems to me that “education-num” would have been easier to keep, as it could work as a numerical column and in addition the sequence between the categories in the “education” column are not totally obvious?

Thank you,

The choice is arbitrary. Indeed, we did this choice because we will use this column to present the categorical encoding strategies. This feature is quite intuitive for this presentation.

1 Like

Thank you very much for your answer!