M5 wrap-up quiz question 2

Is there an error in the answer to M5 wrap-up quiz question 2?

I answered the question correctly but I also checked the answer. Because it was different from what I had done, I copied/pasted the code given in the answer in the notebook to check the result.
The code proposed is actually generating an error which I do not understand, could you help?

Answer proposed code snippet:

import numpy as np
from sklearn.model_selection import GridSearchCV

params = {"max_depth": np.arange(1, 16)}
search = GridSearchCV(tree, params, cv=10)
cv_results_tree_optimal_depth = cross_validate(
    search, data_numerical, target, cv=10, return_estimator=True, n_jobs=2,
)

for search_cv in cv_results_tree_optimal_depth["estimator"]:
    print(search_cv.best_params_)

Error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_174/3789612302.py in <cell line: 11>()
     10 
     11 for search_cv in cv_results_tree_optimal_depth["estimator"]:
---> 12     print(search_cv.best_params_)

AttributeError: 'GridSearchCV' object has no attribute 'best_params_'

Thanks in advance for your help!

2 Likes

That’s weird. I reuse the same snippet of code where I use a dummy dataset and it works.

In [5]: import numpy as np
   ...: from sklearn.model_selection import cross_validate
   ...: from sklearn.datasets import load_iris
   ...: from sklearn.model_selection import GridSearchCV
   ...: from sklearn.tree import DecisionTreeClassifier
   ...: 
   ...: data_numerical, target = load_iris(return_X_y=True)
   ...: 
   ...: tree = DecisionTreeClassifier()
   ...: params = {"max_depth": np.arange(1, 16)}
   ...: search = GridSearchCV(tree, params, cv=10)
   ...: cv_results_tree_optimal_depth = cross_validate(
   ...:     search, data_numerical, target, cv=10, return_estimator=True, n_jobs
   ...: =2,
   ...: )
   ...: 
   ...: for search_cv in cv_results_tree_optimal_depth["estimator"]:
   ...:     print(search_cv.best_params_)
   ...: 
   ...: 

{'max_depth': 3}
{'max_depth': 4}
{'max_depth': 4}
{'max_depth': 4}
{'max_depth': 3}
{'max_depth': 5}
{'max_depth': 5}
{'max_depth': 9}
{'max_depth': 3}
{'max_depth': 3}

Could you check again what is the state of your intermediate variables?

1 Like

I got the same issue.

@Wisty, @basilerichard do you get the same error if you run the snippet of code provided by @glemaitre58 in the comment above?

Can you please provide a minimal reproducer code that can help us reproduce the error?

Hi,

Thank you, sorry for the delay in replying, I can only work on the course during week-ends.
Is that ok if I send you the notebook downloaded from the quiz notebook?

Best regards,
Wisty

Hi,

I tried the snippet provided above in the same notebook and I don’t get any error:
(I just changed the var names just in case.)

import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

data_num, targ = load_iris(return_X_y=True)

arbre = DecisionTreeClassifier()
params = {“max_depth”: np.arange(1, 16)}
search = GridSearchCV(arbre, params, cv=10)
cv_results_tree_optimal_depth = cross_validate(
search, data_num, targ, cv=10, return_estimator=True, n_jobs=2,
)

for search_cv in cv_results_tree_optimal_depth[“estimator”]:
print(search_cv.best_params_)

import numpy as np

from sklearn.model_selection import cross_validate

from sklearn.datasets import load_iris

from sklearn.model_selection import GridSearchCV

from sklearn.tree import DecisionTreeClassifier

data_num, targ = load_iris(return_X_y=True)

arbre = DecisionTreeClassifier()

params = {“max_depth”: np.arange(1, 16)}

search = GridSearchCV(arbre, params, cv=10)

cv_results_tree_optimal_depth = cross_validate(

 search, data_num, targ, cv=10, return_estimator=True, n_jobs=2,

)

for search_cv in cv_results_tree_optimal_depth[“estimator”]:

print(search_cv.best_params_)

{‘max_depth’: 3}
{‘max_depth’: 6}
{‘max_depth’: 10}
{‘max_depth’: 2}
{‘max_depth’: 3}
{‘max_depth’: 13}
{‘max_depth’: 6}
{‘max_depth’: 4}
{‘max_depth’: 3}
{‘max_depth’: 3

I did not answer correctly to that question because the text before the first question says “use sklearn.linear_model.LinearRegression and sklearn.tree.DecisionTreeRegressor to create the models. Use the default parameters for both models.”

It turns out the solution does not use the default parameters for DecisionTreeRegressor. Instead it uses random_state=0.

If you don’t, you can get very different solutions if you run the same code many times.

I suggest to rephrase the text to make people use random_state=0 for DecisionTreeRegressor.

1 Like

We’ll look for a question with less variance for the next session. Thanks for your feedback.

Thank you for this comment. That random setting was crucial.

1 Like