Calling Data and Target from the Census

At the end of lecture one, when target_test and data_test are called to check the accuracy, the code is not actually different from the original data and target that were used to train the model. So why does it even gives a different answer?

I am not sure to understand. We are using to distinct set to train and test:

model.fit(data_train, target_train)
model.score(data_test, targtet_test)

Could you be more explicit with what output do you want to compare with?

At first we describe:
target = adult_census[target_name]
data = adult_census.drop(columns=[target_name, ])
and we go on with the task.

To check accuracy, we describe target_test and data_test exactly like above. Is it supposed to be this way? I thought the data which is used for the test must be different from that which’s been used for training.
Or have I terribly misunderstood the whole concept?

We do a train-test code in between

data_train, data_test, target_train, target_test = train_test_split(
    data, target)

So it means that we have 2 independent sets: a training and a testing set. At fit, we never used information from the testing set. We keep this information only for scoring.

1 Like

Thanks! I think I’ll get the hang of it as we go on.