How to work with the dictionary from cross_validate

Hi,
I tried to make a parallel plot to inspect the effects of hyperparameter values as an exercice of the notebook “Evaluation and hyperparameter tuning”, just after the command cv_results = cross_validate(model_grid_search, ...).
However, when I use the commande cv_results[column_results] as in the previous notebook, I got error message: unhashable type: ‘list’
If I try to extract parameter values using cv_results['estimator']['classifier__learning_rate'], I got the error message: list indices must be integers or slices, not str.
Any help would be welcome to handle the dictionary obtained after cv_results = cross_validate(model_grid_search, ...)
Thanks
Stéphane

Dear pedagogical team, since this is the last day the forum is open, I was hoping you would be willing to answer the above question before it closes. I take the opportunity to thank you all for this enlightening mooc. Best regards. Stéphane

If you run a cell with just

cv_results['estimator']

you will get a list with length equal to the number of folds of the outer cross-validation. A list can be accessed with integer indexes, for instance

cv_results['estimator'][0]

will output the estimator from the first fold of the outer cross-validation, in this case, a GridSearchCV. Then you can access all the methods from such estimator directly, for example:

cv_results['estimator'][0].best_params_

outputs the best parameters found by the GridSearchCV in the first fold of the outer cross-validation. This is a dictionary, which can now be accessed with the respective keys:

cv_results['estimator'][0].best_params_['classifier__learning_rate']

Notice that instead of manually iterating through folds, the last cell of the Evaluation and hyperparameter tuning notebook uses a for loop:

for cv_fold, estimator_in_fold in enumerate(cv_results["estimator"]):
    print(
        f"Best hyperparameters for fold #{cv_fold + 1}:\n"
        f"{estimator_in_fold.best_params_}"
    )

I hope that answers the question.

Thank you, it is very helpful. Just one last thing: I’d like to visualize all the results from grid search using a parallel plot. In other words, I am not interested by the best parameters only, but all the test score from all combinations of parameters as map by grid search. How?

One way to do it is

import plotly.express as px
from collections import defaultdict

cv_results_to_plot = defaultdict(list)

for estimator in cv_results["estimator"]:
    for cv_fold, params in enumerate(estimator.cv_results_["params"]):
        cv_results_to_plot["learning_rate"].append(params['classifier__learning_rate'])
        cv_results_to_plot["max_leaf_nodes"].append(params['classifier__max_leaf_nodes'])
        cv_results_to_plot['mean_test_score'].append(estimator.cv_results_['mean_test_score'][cv_fold])

fig = px.parallel_coordinates(
    cv_results_to_plot,
    color="mean_test_score",
    dimensions=["learning_rate", "max_leaf_nodes", "mean_test_score"],
    color_continuous_scale=px.colors.diverging.Tealrose,
)
fig.show()

Many thanks, you have been very helpful.
Cheers
Stéphane

You could also grab hold of test scores of the inner grid search CV as follows

for cv_fold, estimator_in_fold in enumerate(cv_results["estimator"]):
    print(f"Outer CV fold {cv_fold}")
    for i in range(4):
        split = 'split'+str(i)+'_test_score'
        print(f"{estimator_in_fold.cv_results_[split]}")

As mentioned in notebooks, the best_params could be different combinations in each of the outer CV and in that case we can deploy all the models/estimators found by the outer cross-validation loop and make them vote to get the final predictions, which makes sense too.