M5 wrap-up quiz question 2

Wisty · 17 April 2022 15:15

Is there an error in the answer to M5 wrap-up quiz question 2?

I answered the question correctly but I also checked the answer. Because it was different from what I had done, I copied/pasted the code given in the answer in the notebook to check the result.
The code proposed is actually generating an error which I do not understand, could you help?

Answer proposed code snippet:

import numpy as np
from sklearn.model_selection import GridSearchCV

params = {"max_depth": np.arange(1, 16)}
search = GridSearchCV(tree, params, cv=10)
cv_results_tree_optimal_depth = cross_validate(
    search, data_numerical, target, cv=10, return_estimator=True, n_jobs=2,
)

for search_cv in cv_results_tree_optimal_depth["estimator"]:
    print(search_cv.best_params_)

Error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_174/3789612302.py in <cell line: 11>()
     10 
     11 for search_cv in cv_results_tree_optimal_depth["estimator"]:
---> 12     print(search_cv.best_params_)

AttributeError: 'GridSearchCV' object has no attribute 'best_params_'

Thanks in advance for your help!

glemaitre58 · 19 April 2022 11:47

That’s weird. I reuse the same snippet of code where I use a dummy dataset and it works.

In [5]: import numpy as np
   ...: from sklearn.model_selection import cross_validate
   ...: from sklearn.datasets import load_iris
   ...: from sklearn.model_selection import GridSearchCV
   ...: from sklearn.tree import DecisionTreeClassifier
   ...: 
   ...: data_numerical, target = load_iris(return_X_y=True)
   ...: 
   ...: tree = DecisionTreeClassifier()
   ...: params = {"max_depth": np.arange(1, 16)}
   ...: search = GridSearchCV(tree, params, cv=10)
   ...: cv_results_tree_optimal_depth = cross_validate(
   ...:     search, data_numerical, target, cv=10, return_estimator=True, n_jobs
   ...: =2,
   ...: )
   ...: 
   ...: for search_cv in cv_results_tree_optimal_depth["estimator"]:
   ...:     print(search_cv.best_params_)
   ...: 
   ...: 

{'max_depth': 3}
{'max_depth': 4}
{'max_depth': 4}
{'max_depth': 4}
{'max_depth': 3}
{'max_depth': 5}
{'max_depth': 5}
{'max_depth': 9}
{'max_depth': 3}
{'max_depth': 3}

Could you check again what is the state of your intermediate variables?

basilerichard · 20 April 2022 14:52

I got the same issue.

ArturoAmorQ · 20 April 2022 15:12

@Wisty, @basilerichard do you get the same error if you run the snippet of code provided by @glemaitre58 in the comment above?

Can you please provide a minimal reproducer code that can help us reproduce the error?

Wisty · 23 April 2022 05:36

Hi,

Thank you, sorry for the delay in replying, I can only work on the course during week-ends.
Is that ok if I send you the notebook downloaded from the quiz notebook?

Best regards,
Wisty

Wisty · 23 April 2022 05:42

Hi,

I tried the snippet provided above in the same notebook and I don’t get any error:
(I just changed the var names just in case.)

import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

data_num, targ = load_iris(return_X_y=True)

arbre = DecisionTreeClassifier()
params = {“max_depth”: np.arange(1, 16)}
search = GridSearchCV(arbre, params, cv=10)
cv_results_tree_optimal_depth = cross_validate(
search, data_num, targ, cv=10, return_estimator=True, n_jobs=2,
)

for search_cv in cv_results_tree_optimal_depth[“estimator”]:
print(search_cv.best_params_)

import numpy as np

from sklearn.model_selection import cross_validate

from sklearn.datasets import load_iris

from sklearn.model_selection import GridSearchCV

from sklearn.tree import DecisionTreeClassifier

data_num, targ = load_iris(return_X_y=True)

arbre = DecisionTreeClassifier()

params = {“max_depth”: np.arange(1, 16)}

search = GridSearchCV(arbre, params, cv=10)

cv_results_tree_optimal_depth = cross_validate(

 search, data_num, targ, cv=10, return_estimator=True, n_jobs=2,

)

for search_cv in cv_results_tree_optimal_depth[“estimator”]:

print(search_cv.best_params_)

{‘max_depth’: 3}
{‘max_depth’: 6}
{‘max_depth’: 10}
{‘max_depth’: 2}
{‘max_depth’: 3}
{‘max_depth’: 13}
{‘max_depth’: 6}
{‘max_depth’: 4}
{‘max_depth’: 3}
{‘max_depth’: 3

Perditax · 8 May 2022 10:41

I did not answer correctly to that question because the text before the first question says “use sklearn.linear_model.LinearRegression and sklearn.tree.DecisionTreeRegressor to create the models. Use the default parameters for both models.”

It turns out the solution does not use the default parameters for DecisionTreeRegressor. Instead it uses random_state=0.

If you don’t, you can get very different solutions if you run the same code many times.

I suggest to rephrase the text to make people use random_state=0 for DecisionTreeRegressor.

ArturoAmorQ · 10 May 2022 12:42

We’ll look for a question with less variance for the next session. Thanks for your feedback.

DanielFinol · 15 May 2022 23:33

Thank you for this comment. That random setting was crucial.