Q about iterations?

What exactly is the role of iterations in the model fitting procedure? What is the actual process that causes iterations to be higher for non-scaled data vs. scaled data. Thanks!

For some models, the optimal model parameters are found by finding the minimum of a so-called loss function. The minimization of this loss function will use different optimization algorithms and some of them will be iterative. Gradient descent is such an algorithm.

The number of iterations reported by the procedure is the number of iterations required by the optimization algorithm to find the best possible parameters.

The cause for a larger number of iterations in the exercise is indeed due to the fact that the internal optimization algorithm relies on derivates (i.e. gradients). Therefore, a gradient will be more impacted by features with a larger range of values and the optimization will “bounce around” more before converging.


(This image was part of the following lecture note)

1 Like