Q2 none of the answers seems to be correct?

miwojc · 27 June 2021 06:35

For the question 2 i run the GridSearchCV with cross_validate on DecisionTreeRegressor with SimpleImputer pipeline. I got range of optimal range from 5-15 (although this changes when i repeat the experiment). From possible answers 5-8 seems most close, but non of the available answers are correct? Am i doing something wrong?

Code

ThomasLoock · 27 June 2021 09:49

Hi, I used this snippet to visualize the optimal range.

import seaborn as sns
sns.set_context("talk")

max_depth = [estimator.best_params_["decisiontreeregressor__max_depth"]
             for estimator in cv_results["estimator"]]

max_depth = pd.Series(max_depth, name="max depth")
sns.swarmplot(max_depth)

I had to run the code several times to get one outlier as you did. And if you look at the plots you get you know what range is optimal. ( no spoiler plz )

miwojc · 27 June 2021 10:11

Thank you. I like the graph, much nicer to look at than just list…

Interesting i get the ‘outlier’ every single time, it’s different every time but it’s always there. Therefore none of the answers seem correct.

Am i missing something in my analysis?

miwojc · 27 June 2021 10:27

Are you talking such analysis? Based on which there is indeed a correct answer?

Analysis

miwojc · 27 June 2021 10:52

one more thing to add is that in the solution there is numpy function to generate params of max_depth as:

np.arange(1, 15)

but it generates array 1 to 14. and exercise asks for max depth from 1 to 15, which should be:

np.arange(1, 16)

miwojc · 27 June 2021 10:56

now reading through a solution for Q2 and it contains answer for Q3 perhaps needs to be corrected?

glemaitre58 · 27 June 2021 15:08

We could provide the range function to avoid issue.

lesteve · 28 January 2022 16:25

I think it has been done.