Rephrasing and typos

Hello,

In Encoding of categorical variables notebook, in Handling categorical data section, the last sentence of the Identify categorical variables paragraph is unclear because of the use of two “because”:

Because in this notebook we will use "education" because it represents the original data.

I guess the first one should be dropped.

In Encoding ordinal categories paragraph, I think there is a missing ‘s’:

However, be careful when applying this encoding strategy: using this integer representation leads downstream predictive models to assume that the values are ordered (0 < 1 < 2 < 3… for instance).

Same thing in following Encoding nominal categories paragraph, plus a missing ‘n’ in “downstream”:

OneHotEncoder is an alternative encoder that prevents the downstream models to make a false assumption about the ordering of categories.

In some following code cell,
print(f"The dataset encoded contains {data_encoded.shape[1]} features")
should be
print(f"The encoded dataset contains {data_encoded.shape[1]} features")

In Evaluate our predictive pipeline paragraph:
Shouldn’t

We see that the Holand-Netherlands category is occuring rarely.

be

We see that the Holand-Netherlands category is rarely occurring.

A little later:

In scikit-learn, there is two solutions to bypass this issue:

should be

In scikit-learn, there are two solutions to bypass this issue:

In the note following the creation of the pipeline:

Here, we need to increase the number of maximum iterations to obtain a fully converged LogisticRegression

And, in same note,

Contrary to numerical features, the one-hot encoded categorical features do not suffer from large variations and therefore increasing max_iter is the right thing to do.

Fixed thanks!