Select the range of folds where the former has a better test score than the latter:

The last question is not posed correctly. It asks how often the model with just numerical features is better. The answers are the opposite.

Let’s compare the model using all features with the model consisting of only numerical features. Select the range of folds where the former has a better test score than the latter:

In this paragraph “the former” refers to the model with all the features, while “the latter” refers to the model with numerical features only. It then can be rephrased as:

Select the range of folds where the the model with all the features has a better test score than the the model with numerical features only.

Which does correspond to the answers. For instance, option a) means that the model with all the features almost never performed better than the numerical-only model, i.e. it is “substantially worse”.