Hello, I’m absolutely lost about how to use LeaveOneGroupOut as a cross validation strategy for cross_validate.
Here’s my code ending with an error:
# I assume the result of the previous question is not here for nothing
unique_ride_dates = np.unique(cycling.index.date)
# No idea of what I am doing here but it seems to match with the instructions
group, unique = pd.factorize(unique_ride_dates)
# create cv
from sklearn.model_selection import LeaveOneGroupOut
cv = LeaveOneGroupOut()
# Reusing the previous model with the (proper ?) parameter for LeaveOneGroupOut()
cv_results_linear = cross_validate(
linear_model , data, target,
cv=cv, groups=group,
scoring='neg_mean_absolute_error',
return_estimator=True,
return_train_score=True
)
# And here's the error:
ValueError: Found input variables with inconsistent numbers of samples: [38254, 38254, 4]
I understand the error: my data and target have a length of 30k+ samples whereas the array group have a length of 4 ranging from 0 to 3. But that’s what the previous instructions were asking for (I think ?). So I guess my error stem from my misunderstanding about LOGO.
I have already check the documentation of LOGO but I wasn’t able to make sense of it: LOGO doesn’t take any paramter, it just have 2 methods that I tried to use but my code end with the same error anyway.
The exemple of the doc doesn’t features LOGO as a cross_validate strategy. The only instance were it does features as a cross_validate strat is in an exemple of the course where the value for the groups parameter (an array of 100+ dates featuring the year quarter) does not looks like the group length 4 array in Q7 which, as I understand, is required by the exercice instructions.