Why you use np.ravel?

nickprock · 22 June 2021 15:03

In the notebook M4.01, in the MAE (or MSE) function you calculate the error as

errors = np.ravel(true_values) - np.ravel(predictions)

because true_values has shape (342,) and prediction (342,1), it depends from

data, target = penguins[[feature_name]], penguins[target_name]

Why you don’t use the pandas Series penguins[feature_name]?
I thought to fix the concepts:
data → DataFrame
target → Vector
or are there other reasons?
Thanks in advance

glemaitre58 · 22 June 2021 15:48

It is to make sure that we get a 1D vectors. We applied blindly to both variables even when it would have no effect on the vector of shape (342,).

glemaitre58 · 22 June 2021 15:50

The output of a scikit-learn model cannot be a pandas series or matrix up to version 1.0.
When scikit-learn will properly handle pandas series and dataframe, we will probably do as suggested.

Alvin19 · 26 June 2021 02:50

May I confirm that using ravel is to get an array, that is in parenthesis ( ) and not vector/matrix that is in square bracket [ ].

This is because it is contrary with this article:
https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

Another question that I have for Exercise M4.01 is when using MAE, why we do not dividend by n (total number of variable) for getting the Error?

glemaitre58 · 27 June 2021 13:14

np.mean will already divide by the number of samples (if it is what you mean by variables).