For the question 2 i run the GridSearchCV with cross_validate on DecisionTreeRegressor with SimpleImputer pipeline. I got range of optimal range from 5-15 (although this changes when i repeat the experiment). From possible answers 5-8 seems most close, but non of the available answers are correct? Am i doing something wrong?
Hi, I used this snippet to visualize the optimal range.
import seaborn as sns
sns.set_context("talk")
max_depth = [estimator.best_params_["decisiontreeregressor__max_depth"]
for estimator in cv_results["estimator"]]
max_depth = pd.Series(max_depth, name="max depth")
sns.swarmplot(max_depth)
I had to run the code several times to get one outlier as you did. And if you look at the plots you get you know what range is optimal. ( no spoiler plz )
Thank you. I like the graph, much nicer to look at than just list…
Interesting i get the ‘outlier’ every single time, it’s different every time but it’s always there. Therefore none of the answers seem correct.
Am i missing something in my analysis?
one more thing to add is that in the solution there is numpy function to generate params of max_depth as:
np.arange(1, 15)
but it generates array 1 to 14. and exercise asks for max depth from 1 to 15, which should be:
np.arange(1, 16)
now reading through a solution for Q2 and it contains answer for Q3 perhaps needs to be corrected?
We could provide the range function to avoid issue.
I think it has been done.