Using numerical and categorical variables together

Using numerical and categorical variables together

from sklearn.compose import make_column_selector as selector

numerical_columns_selector = selector(dtype_exclude=object)
categorical_columns_selector = selector(dtype_include=object)

numerical_columns = numerical_columns_selector(data)
categorical_columns = categorical_columns_selector(data)

What do you think of including a disclaimer here that users to check their data before assuming that “objects” are really “categorical”? In my experience working with data, when I load data from csv to pandas, numerical fields are typically read in as “objects” and I need to convert them.

Yes, this is a good idea. We stated that all features with numbers do not necessarily mean that they are numerical features. It would be nice to warn about categorical features and object type as well.

1 Like

I added a small caution notes mentioning the caveats of object dtype: ENH add a small notes regarding object dtype · INRIA/scikit-learn-mooc@5626df6 · GitHub

One needs to synchronize the notebooks in FUN to see the changes.

1 Like

How does once sync the notebooks in FUN?
I tried logging out and logging back in.

Yes my bad. You need to click on File → Reset to original. Under the hood, it will download the new version on the server and you will see the new changes. The side effect is that you will lose your current notebook.

There are some more detail here: How to reset a notebook to its original version? - #2

In case you don’t want to loose your current notebook, you can see the change on our jupyter-book: Using numerical and categorical variables together — Scikit-learn course

1 Like