Learning curve: code to indicate the number of samples in the training set

chunault · 9 December 2022 18:05

Hi,

In Exercise M2.01, when computing the learning curve,

which part of the code makes that the number of samples in the training set varies between 100 and 700 (as can be seen from the figure showing the learning curve)?

Thank you in advance for your answer.

Claudine

ArturoAmorQ · 12 December 2022 08:50

When we define train_sizes = np.linspace(0.1, 1, num=10), which means

train_sizes = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

We then pass the train sizes as an argument to the learning_curve plot as

results = learning_curve(
    model, data, target, train_sizes=train_sizes,
    cv=cv, n_jobs=2)

where it is interpreted as fractions of the total number of samples in data.