Potentially misleading comment on computational cost of BaggingRegressor

echidne · 24 June 2021 13:59

in the “Introductory example to ensemble models” little exercice you say that with BaggingRegressor :

the computational cost is reduced in comparison of seeking for the optimal hyperparameters.

But on the server it’s not obvious :

as you can see on your server cpu time and wall time is a little worse for BaggingRegressor
But if I do it locally on a linux kernel (ubuntu under WSL2):
cpu cost gridsearch linux

So locally you have a 6 times improvement of CPU times and a 2 time improvement of wall time.

P.S. : I was obliged to used Linux since %%time does not give CPU times on Windows

glemaitre58 · 24 June 2021 17:04

Yep, we wrote the notebook with a laptop with 4 cores.
@ogrisel How many cores do we provide on the server? It might be only 2.
And fit time is rather small as well. It would be best to test something that take at least ~100 ms.

lesteve · 25 June 2021 09:31

@ogrisel How many cores do we provide on the server? It might be only 2.

import joblib; joblib.cpu_count() says 4.

We should probably look at it for v2 if the comment is not up-to-date.

Side-comment: in the notebooks we have removed n_jobs=-1 in favor of n_jobs=2.

lesteve · 6 January 2022 13:00

On the .github.io it seems fine:
https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_introduction.html

Worth another look for v3 to see whether the problem still exists on the JupyterHub …