A couple of comments on question 11

Hi everyone,
It supposes we should compare the cross-validation folds scores applying a Logistic regression model to two different datasets: (1) numeric dataset and (2) categorical and numerical dataset. The part of the question that say: “Look at the cross-validation test scores for both models …”, made me wander if I also should use DummyClassifier as in the previous question.

For a reason I don’t know the result of apply fit_transform to OneHotEncoder gave me a “scipy.sparce._csr.csr_matrix” and had problems de assemble the pd.DataFrame.
I’m enjoying this course.
Regards
Jhonny

I guess the wording can be improved, but the rest of the paragraph should make it clear that by “both models” we indeed mean comparing

  1. the model using both numerical and categorical features and
  2. the model using numerical features only.

We can probably rephrase it for the next session of the MOOC.

I am not sure if I correctly understand what you mean. Are you getting an error? Are you running the notebooks locally? Please provide a snippet of code containing the elements to reproduce the error message.

I’m using the Sandbox to solve quizzes. Below is the code I use after uploading the file:
cat_colselector = make_column_selector(dtype_include = object)
cat_cols = cat_colselector(data)
onehot_preprocessor = OneHotEncoder(handle_unknown = “ignore”)
oh_preproc = ColumnTransformer([(‘oh_pre’, onehot_preprocessor, cat_cols)], remainder = “passthrough”)
data_encoded = oh_preproc.fit_transform(data[cat_cols])

The line below give me error

data_cat = pd.DataFrame(data_encoded, columns = oh_preproc.get_feature_names_out().tolist())
print(type(data_encoded))

I fixed by using:

import scipy.sparse
data_cat = pd.DataFrame.sparse.from_spmatrix(data_encoded,
columns = oh_preproc.get_feature_names_out().tolist())
Regards
Jhonny

1 Like