Q6- Selecting numerical features

marcelamora · 21 March 2022 05:31

Hi,

In Question 6, I think we should only consider the numerical features in the list named “numerical_features” given before Question 5.

What could be a way to use the make_column_selector function and only obtain the subset listed in an specific list?

What I ended doing was to drop the other numeric columns from the dataset so when I used the function make_column_selector like this:

numerical_columns_selector = selector(dtype_exclude=object)
numerical_columns = numerical_columns_selector(data)

I only obtained the ones I didn’t drop, but I guess there should be a better way to do it.

From a stackoverflow post (python - How to select only few columns in scikit learn column selector pipeline? - Stack Overflow) I tried this but it didn’t work

preprocessor = ColumnTransformer([(‘one-hot-encoder’, categorical_preprocessor, categorical_columns), (‘standard_scaler’, “passthrough”, numerical_features)])

Thanks in advance!

glemaitre58 · 21 March 2022 08:01

marcelamora:

What could be a way to use the make_column_selector function and only obtain the subset listed in an specific list?

What I ended doing was to drop the other numeric columns from the dataset so when I used the function make_column_selector like this:
numerical_columns_selector = selector(dtype_exclude=object)
numerical_columns = numerical_columns_selector(data)

If the numerical features are indeed referring to numbers then this usage of make_columns_selector will work.

Here, we provide the list of features explicitly such that there is no ambiguity on which features are numerical for the exercise.