Q7 - Taking too long to run validation_curve on data

Hello,

Is there a way to speed up the cross-validation that uses parameter range? I’m using n_jobs=-1 to speed up but the cell is still running and it’s been more than 30 minutes.

Hmmm weird maybe it is related to Very slow cross validation and accuracy reproducibility - #7 by lesteve ?

I’ll try the proposed solution. It might be related.

For the record the full correction of this wrap-up quiz takes less than two seconds for me inside the FUN Jupyter.

Hmm, I stopped and restarted the Jupyter Server. It looks like it did not help.

It worked after restarting again. :slight_smile:

Great the good old “turn off and on again” but multiple times :wink:

1 Like

Hi! I hope you manage to solve it. In my case, I solved the same issue by naming the parameter to tune correctly (get_params())
Oddly enough, in such case, the notebook keeps running rather than returning an error

3 Likes

same with me, this was the solution param_name=“kneighborsclassifier__n_neighbors”

For some reason when it is wrong instead of raising an error it keeps running forever

1 Like

Thanks for the report, I can reproduce indeed. Looking a bit in more details, this is due to an IPython bug https://github.com/ipython/ipython/issues/12467.

We should probably use IPython < 8 until for the MOOC until this IPython bug is fixed.

This has now been done on the FUN Jupyter notebooks, you may need to restart your server by doing something along these lines:

2 Likes

Can you share the block of code for using validation_curve

I am doing it like this but get an error that it takes 3 arguments and am passing 5.

neighbors_range = [1, 2, 5, 10, 20, 50, 100, 200, 500]
model = make_pipeline(numerical_processor,KNeighborsClassifier())
from sklearn.model_selection import validation_curve
train_scores,test_scores = validation_curve(
KNeighborsClassifier(),
data,
target,
“n_neighbors”,
neighbors_range)
train_scores_mean = train_scores.mean()
test_scores_mean = test_scores.mean()