Question 4

jesusjl · 24 February 2022 23:05

Please, can you further elaborate the explanation of Question 4? I don’t get why the presented scenario cause an increased or steady train error

Thanks

ArturoAmorQ · 25 February 2022 11:10

I cannot provide a very detailed answer without spoiling the question for other users, but think about this:

If you add new samples you can (or not) introduce variables that affect the data generation and were not previously captured by the model trained on the smaller dataset.

One example would be data being collected with tools that are not very precise, and new data being captured with a better resolution afterwards. New phenomena is likely to appear.

rcortes100 · 8 March 2022 07:27

I hope this comment won’t spoil the answer.

Picture this as if the problem you want to solve is to know the average size of tuna fish. So you fish about 10 tuna, and measure their size, calculate the mean and the standard error. But it turns out that by chance you caught very uniform size tuna that day (will the standard error be low or high?). Now, you keep catching tuna everyday until you get 100, and now your sample is more diverse (more tuna of different sizes than before), so now, how is the standard error compared to before?

jesusjl · 10 March 2022 11:36

Hi

Sorry for the delayed response. Thank you both for the provided examples, they help me to grasp the concept much better.

The standard error will increase as there is more variability in the sampled data.