Pipeline or make_pipeline?

echidne · 11 June 2021 15:42

Hi dear teachers,

In module 1 and 2 you used make_pipeline to build pipeline when in module 3 you are using Pipeline in the lesson " Hyperparameter tuning by grid-search".

Did you have a special reason to use Pipeline here in place of make_pipeline?
In a practical point of view, in what cases we could have to use one over the other method to build a pipeline?

Thanks for your answers

glemaitre58 · 11 June 2021 16:55

Both are legit. make_pipeline does not require defining a name for each step in the pipeline because scikit-learn will create it from the name of the class. Let’s be explicit:

model = make_pipeline(StandardScaler(), RidgeCV())

Using model.get_params() will give you this information but the first step in the pipeline will be called "standardscaler" and the second step "ridgecv".

Using Pipeline allows defining our own name (to make them shorter or more explicit).

model = Pipeline([
    ("scaler", StandardScaler()),
    ("regressor", RidgeCV()),
])

You use model.get_params() to see the difference now but indeed we define the name of the steps ourself.

You have the same pattern with ColumnTransformer and make_column_transformer.

To summarize, make_pipeline allows writing less code but you might not like the naming thus Pipeline allows customizing the name of the steps.

glemaitre58 · 11 June 2021 16:56

Here the automatic naming would be most probably pipeline-1 and pipeline-2 and thus we chose more explicit name.

echidne · 11 June 2021 18:25

Thanks, it’s clearer now.