A couple of comments on question 11

bryson_je · 20 April 2022 12:24

Hi everyone,
It supposes we should compare the cross-validation folds scores applying a Logistic regression model to two different datasets: (1) numeric dataset and (2) categorical and numerical dataset. The part of the question that say: “Look at the cross-validation test scores for both models …”, made me wander if I also should use DummyClassifier as in the previous question.

For a reason I don’t know the result of apply fit_transform to OneHotEncoder gave me a “scipy.sparce._csr.csr_matrix” and had problems de assemble the pd.DataFrame.
I’m enjoying this course.
Regards
Jhonny

ArturoAmorQ · 20 April 2022 14:54

I guess the wording can be improved, but the rest of the paragraph should make it clear that by “both models” we indeed mean comparing

the model using both numerical and categorical features and
the model using numerical features only.

We can probably rephrase it for the next session of the MOOC.

I am not sure if I correctly understand what you mean. Are you getting an error? Are you running the notebooks locally? Please provide a snippet of code containing the elements to reproduce the error message.

bryson_je · 20 April 2022 15:35

I’m using the Sandbox to solve quizzes. Below is the code I use after uploading the file:
cat_colselector = make_column_selector(dtype_include = object)
cat_cols = cat_colselector(data)
onehot_preprocessor = OneHotEncoder(handle_unknown = “ignore”)
oh_preproc = ColumnTransformer([(‘oh_pre’, onehot_preprocessor, cat_cols)], remainder = “passthrough”)
data_encoded = oh_preproc.fit_transform(data[cat_cols])

The line below give me error

data_cat = pd.DataFrame(data_encoded, columns = oh_preproc.get_feature_names_out().tolist())
print(type(data_encoded))

I fixed by using:

import scipy.sparse
data_cat = pd.DataFrame.sparse.from_spmatrix(data_encoded,
columns = oh_preproc.get_feature_names_out().tolist())
Regards
Jhonny