Difference between Column Trnasformer

What is the difference between using categorical_columns_selector vs categorical_columns in ColumnTransformer. Prior to this notebook, it selector inside ColumnTransformer. Please explain

categorical_columns was a list of names of the categorical columns to consider.
categorical_columns_selector is one level up to where you only define, based on the data type, which column to consider.

categorical_columns_selector = make_columns_selector(
    dtype_include="object"
)

The line above will define a rule that will select the columns of "object" dtype. To actually select the name of the columns and get a list of the selected columns, you need to be calling the selector with a dataframe:

categorical_columns = categorical_columns_selector(X)

ColumnsTransformer accepts both a list of values or a callable that returns a list of values. categorical_columns corresponds directly to the list of values while categorical_columns_selector corresponds to a callable that will return the list of values when passing the dataframe.

Got it so ColumnTransformer can accept both categorical_columns_seelctor as categorical_columns
Right?

Yes this it it.

Thank you so much. @glemaitre58 Have a great day