fetch_california_housing is a function implemented in scikit-learn to fetch the California housing from internet. But it is specialized only for this dataset. The parameter as_frame make sure that we return pandas dataframe while return_X_y, means that return 2 variable X and y that are the data and the target respectively. Otherwise, a dictionary will be returned that contains the data and target but also some additional meta data.
pd.read_csv will read any CSV file by providing the path of the file. Then, we need to split this dataframe to only have the X and y variable.
When using Pipeline, you define the steps with a specific name:
model = Pipeline([
("scaler", StandardScaler()),
("classifier", LogisticRegression()),
])
So here, we define the name of the steps ("scaler" and "classifier"). In `make_pipeline`, we don't define this name and it will be directly define created from the name of the class:
```python
model = make_pipeline(StandardScaler(), LogisticRegression())
In this case, the step of the StandardScaler will be called “standardscaler” and LogisticRegression will be called logisticregression.
It means that an internal cross-validation will take place when calling fit. Providing these estimators in cross_validate will induce 2 cross-validation: an inner cross-validation by the model and an outer cross-validation done by cross_validate.