Why keep education level as text rather than as number?

I was wondering why you chose to remove the quantitative expression of level of studies (number of years of study) and not the qualitative one. Graphical representations show that numbers and text can be paired (BTW, I’m glad to have an opportunity to discover seaborn) but isn’t the numerical representation a bit richer ? Number of school years allows to quantify a difference in education level (as a number of years), whether the text are just different categories with no a priori hierarchy. Don’t the models use that ?

5 Likes

I believe they are trying to avoid homoscedasticity in the model by choosing one of the two. I first thought that it might be more information rich to know the level of schooling rather than the number because of the different categories that imply the same number of years. But after looking at the data, each education category has an exact and unique number of years of education (no overlap anywhere). So yeah, I wonder why chose one instead of the other.

I have the same question as well.

They are exactly representing the same data. We could have chosen either feature and this is completely arbitrary.

One reason might have been because we found this feature quite relevant to explain the encoding of categories that will be presented later on.

4 Likes