Best max_leaf_nodes different with the previous exercice

MSavel · 27 November 2022 16:48

Hi,

I notice that in the best combination of parameters founded by grid_search, the max_leaf_nodes value is 30 whereas it was 10 in the previous exercice. I think that the remark ‘The accuracy and the best parameters of the grid-searched pipeline are similar to the ones we found in the previous exercise, where we searched the best parameters “by hand” through a double for loop.’ should be adjusted.

ArturoAmorQ · 28 November 2022 11:01

Hi,

I agree that at the moment when we mention that the results are similar it does not seem obvious as indeed max_leaf_nodes=30 is higher than max_leaf_nodes=10.

But it is also true that the impact of max_leaf_nodes is rather small for that given range of values once learning_rate=0.1 is fixed:

	learning_rate	max_leaf_nodes	mean_test_score	std_test_score	rank_test_score
3	0.1	25	0.868827	0.001068	1
2	0.1	20	0.868690	0.000550	2
1	0.1	15	0.868281	0.000358	3
4	0.1	30	0.868063	0.000850	4
0	0.1	10	0.866425	0.000359	5

In this sense max_leaf_nodes=[10, 30] are “similar” as in “same order of magnitude” and “same impact”.

In any case, any suggested wording to make this clearer is welcome.