Non-linearity

You mentioned that non-linearity model when passed to a model assumes linear model and you gave 3 ways to solve this. My focus of interested is in the second step that is engineering the richer data. We had a data size of (100, 1) then after engineering we have (100, 3) of which was made possible by respectively raising the power of the data column/feature to 2 and 3 (data **2, data **3). Now my question is in case we have many features like 6 or 7, how to we identify the feature(s) to use in the engineering. Also, what other ways/method can we use in the engineering aspect, must it always be raise to power (**)?

1 Like

You can have a look at the following transformer:

They intend to create non-linear features. PolynomialFeatures can as well create interaction between features.

Thanks @glemaitre58 from the models you shared i understand that they can be used to create new features. While going through the documentations i saw this SplineTransformer(degree=2, n_knots=3) please can you explain what the degree and n_knot is all about judging from the fact that the data set had 6 rows and 1 column before applying this model.

Also for poly = PolynomialFeatures(2) can you explain what the 2 does, judging from the fact that the dataset had 3 rows and 2 columns before applying this model.

I have read the documentation but i seem not to understand this part, please do help explain.

There is no general rule on how to engineer new features. You may need expert knowledge to know which features are meaningful when transformed and how to do so. In the wrap-up quiz in Module 7 you will find a case where you use Newton’s second law to enrich the model.

Try running this snippet of code

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
poly.get_params()

You should get
{'degree': 2, 'include_bias': True, 'interaction_only': False, 'order': 'C'}
meaning that the number 2 is passed as argument to the parameter "degree".
Additionally you can run

poly.fit_transform([[2]])

which will output
array([[1., 2., 4.]])
to see the effect of the degree. You can additionally set include_bias=False and see what happens.

Maybe a good starting point is to explain that a spline is a type of piecewise polynomial function. The degree parameter controls the degree of such polynomial and the knots are the points where the pieces meet, whose number is controlled by the n_knot parameter.

Is a little clear now.