Q7 - Taking too long to run validation_curve on data

Geosphere · 23 February 2022 15:46

Hello,

Is there a way to speed up the cross-validation that uses parameter range? I’m using n_jobs=-1 to speed up but the cell is still running and it’s been more than 30 minutes.

lesteve · 23 February 2022 16:40

Hmmm weird maybe it is related to Very slow cross validation and accuracy reproducibility - #7 by lesteve ?

Geosphere · 23 February 2022 16:49

I’ll try the proposed solution. It might be related.

lesteve · 23 February 2022 16:50

For the record the full correction of this wrap-up quiz takes less than two seconds for me inside the FUN Jupyter.

Geosphere · 23 February 2022 16:53

Hmm, I stopped and restarted the Jupyter Server. It looks like it did not help.

Geosphere · 23 February 2022 17:03

It worked after restarting again.

lesteve · 23 February 2022 17:04

Great the good old “turn off and on again” but multiple times

AHC2022 · 28 February 2022 19:08

Hi! I hope you manage to solve it. In my case, I solved the same issue by naming the parameter to tune correctly (get_params())
Oddly enough, in such case, the notebook keeps running rather than returning an error

jarkendar · 3 March 2022 20:20

same with me, this was the solution param_name=“kneighborsclassifier__n_neighbors”

For some reason when it is wrong instead of raising an error it keeps running forever

lesteve · 7 March 2022 14:36

Thanks for the report, I can reproduce indeed. Looking a bit in more details, this is due to an IPython bug https://github.com/ipython/ipython/issues/12467.

We should probably use IPython < 8 until for the MOOC until this IPython bug is fixed.

lesteve · 8 March 2022 09:21

This has now been done on the FUN Jupyter notebooks, you may need to restart your server by doing something along these lines:

go to https://cloud-mooc.inria.fr/hub/home
click on Stop Server
click on Start Server
reload the FUN page with the Jupyter notebook

anam_zahra · 5 April 2022 15:08

Can you share the block of code for using validation_curve

I am doing it like this but get an error that it takes 3 arguments and am passing 5.

neighbors_range = [1, 2, 5, 10, 20, 50, 100, 200, 500]
model = make_pipeline(numerical_processor,KNeighborsClassifier())
from sklearn.model_selection import validation_curve
train_scores,test_scores = validation_curve(
KNeighborsClassifier(),
data,
target,
“n_neighbors”,
neighbors_range)
train_scores_mean = train_scores.mean()
test_scores_mean = test_scores.mean()