Learning_curve with size=1

in the notebook on learning_curve the last train_size is equal to 1

my understanding was that it meant to train on all the samples, and to test on … no sample ? how can we get a test_score if we test on no sample ?

obviously I am getting something wrong here, please help me spot what it is

The learning_curve utility first perform the CV splits and then only subsamples the resulting training sets.

So the test sets (one per CV iterations) have fixed sizes for all points of the curve.

2 Likes

ok, so I understand now that train_size has no relation to cv
but there’s still something that I don’t quite get

imagine I run leaning_curve with train_sizes=[1] and cv=KFold(n=2)
so in total we run fit()+score() twice, right ?
can you please explicit what are going to be the training and testing sets for each of these 2 runs ?
does the 100% apply to the whole data or on the 50% that was tagged as test_data ?

thanks !

  • KFold(n=2) partitions X into X_a and X_b, y into y_a and y_b of equal size (X.shape[0] // 2).

  • for the first CV iteration:

    • define X_train = X_a, X_test = X_b, y_train = y_a, y_test = y_b
    • for each train_size in train_sizes:
      • subsample X_train and y_train by train_size elements at random;
      • fit a model on the subsample and score it on the full (X_test, y_test) each time
      • record the score value for train_size
  • for the second CV iteration:

    • define X_train = X_b, X_test = X_a, y_train = y_b, y_test = y_a
    • for each train_size in train_sizes:
      • subsample X_train and y_train by train_size elements at random;
      • fit a model on the subsample and score it on the full (X_test, y_test) each time
      • record the score value for train_size
  • for each train_size in train_sizes:

    • compute the average of all the scores (across CV iterations)
    • plot the point with the average score on the learning curve

If you want more details, have a look at the source code:

1 Like

Ok, now I get it (slap on the forehead)
Thanks for spelling it out for us !