Question 12 - Definition of Missing Value

When question 12 asks about missing values, it seems not to take into account the unknown values in the categorical fields(two examples from the head shown below), and instead is asking if there are any null values in the data set. Is this intentional or is this part of the definition of there data set where we consider unknown values to give some information?

1 Like

I think that we should revise this exercise to remove this ambiguity in the next version of the MOOC. Indeed, by providing the read_csv command, we handle ? as a category and make the processing easier. It is indeed equivalent to use a SimpleImputer with a constant strategy if we would use na_values=["?"] when reading the dataset.

What we can improve for this version, is to move the “Hint” that is displayed when clicking on the button into the instruction such that one does not try to display the dataframe and get confused with the ? marker.

1 Like

We could potentially even remove the 2 question regarding missing data for the next MOOC :slight_smile:

1 Like

I agree

Yes, I ran data.isnull().values.any(), which returns False but then saw the ?s in the dataframe, I interpreted as an encoding for missing information and got the question wrong. Oh well!

Missing values questions have been removed for v2