Best max_leaf_nodes different with the previous exercice

Hi,

I notice that in the best combination of parameters founded by grid_search, the max_leaf_nodes value is 30 whereas it was 10 in the previous exercice. I think that the remark ‘The accuracy and the best parameters of the grid-searched pipeline are similar to the ones we found in the previous exercise, where we searched the best parameters “by hand” through a double for loop.’ should be adjusted.

Hi,

I agree that at the moment when we mention that the results are similar it does not seem obvious as indeed max_leaf_nodes=30 is higher than max_leaf_nodes=10.

But it is also true that the impact of max_leaf_nodes is rather small for that given range of values once learning_rate=0.1 is fixed:

learning_rate max_leaf_nodes mean_test_score std_test_score rank_test_score
3 0.1 25 0.868827 0.001068 1
2 0.1 20 0.868690 0.000550 2
1 0.1 15 0.868281 0.000358 3
4 0.1 30 0.868063 0.000850 4
0 0.1 10 0.866425 0.000359 5

In this sense max_leaf_nodes=[10, 30] are “similar” as in “same order of magnitude” and “same impact”.

In any case, any suggested wording to make this clearer is welcome.

1 Like