Both are legit. make_pipeline
does not require defining a name for each step in the pipeline because scikit-learn will create it from the name of the class. Let’s be explicit:
model = make_pipeline(StandardScaler(), RidgeCV())
Using model.get_params()
will give you this information but the first step in the pipeline will be called "standardscaler"
and the second step "ridgecv"
.
Using Pipeline
allows defining our own name (to make them shorter or more explicit).
model = Pipeline([
("scaler", StandardScaler()),
("regressor", RidgeCV()),
])
You use model.get_params()
to see the difference now but indeed we define the name of the steps ourself.
You have the same pattern with ColumnTransformer
and make_column_transformer
.
To summarize, make_pipeline
allows writing less code but you might not like the naming thus Pipeline
allows customizing the name of the steps.