Questions 7 - 8 - 9 M2 final Quiz

fhi62 · 30 May 2021 20:37

Nothing happens with my code: is it full online or does it rely on some local process that may not work on my computer ?

from sklearn.model_selection import validation_curve

max_depth = [1, 2, 5, 10, 20, 50, 100, 200, 5005]

train_scores, test_scores = validation_curve(
    model, data, target, param_name="n_neighbors", param_range=max_depth,
    cv=5, scoring="balanced_accuracy", n_jobs=2)
train_errors, test_errors = -train_scores, -test_scores

import matplotlib.pyplot as plt
plt.plot(max_depth, train_errors.mean(axis=1), label="Training error")
plt.plot(max_depth, test_errors.mean(axis=1), label="Testing error")
plt.legend()

plt.xlabel("Number of N_neighbors")
plt.ylabel("Balanced accuracy")
_ = plt.title("Effect on Train & Test score")

from sklearn.model_selection import cross_validate
cv_results = cross_validate(model, data, target, scoring="balanced_accuracy", cv=5)
cv_results = pd.DataFrame(cv_results)
cv_results
scores = cv_result["test_score"]
print("The balanced cross-validation accuracy is: "
      f"{scores.mean():.3f} +/- {scores.std():.3f}")

glemaitre58 · 31 May 2021 08:25

Can you be explicit? Taking your code, adding some code to load the dataset and create the model, I am able to see the validation curve.

cv_result is not defined → cv_results.

Using this parameter indicate me that you did not use a pipeline with a StandardScaler while the exercise require one:

Create a scikit-learn pipeline (using sklearn.pipeline.make_pipeline) where a StandardScaler will be used to scale the data followed by a KNeighborsClassifier. Use the default hyperparameters.

glemaitre58 · 31 May 2021 08:41

After investigation, it could be due to n_jobs=2. Sometimes you can get a nice error MemoryError but sometimes Jupyter will just kill the kernel for this reason and no error will be shown. It might be what your experienced here. You can fix n_jobs=1 to avoid this issue.

fhi62 · 31 May 2021 09:08

solved. Thanks