Question 3 Evaluate the pipeline

Hi

l try to run exam code, but get error where show this is answer.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
(“preprocessor”, StandardScaler()),
(“classifier”, KNeighborsClassifier(n_neighbors=5)),
])

from sklearn.model_selection import cross_validate

model.set_params(preprocessor=StandardScaler(), classifier__n_neighbors=5)
cv_results_ss_5 = cross_validate(
model, data, target, cv=10, scoring=“balanced_accuracy”
)
cv_results_ss_5[“test_score”].mean(), cv_results_ss_5[“test_score”].std()

l see problem with ValueError: could not convert string to float: ’ Private’

Did you face any error?

I am afraid I can’t reproduce the error, are you sure you are first using the provided snippet to load the penguins dataset?

import pandas as pd

penguins = pd.read_csv("../datasets/penguins.csv")

columns = ["Body Mass (g)", "Flipper Length (mm)", "Culmen Length (mm)"]
target_name = "Species"

# Remove lines with missing values for the columns of interest
penguins_non_missing = penguins[columns + [target_name]].dropna()

data = penguins_non_missing[columns]
target = penguins_non_missing[target_name]

To me it sounds like the data you are using contains a categorical variable (a string) that can’t be dealt with the StandardScaler.