Many numerical solvers used to train machine learning models can only converge if the underlying numerical optimization problem is well-behaved (also known as “not ill-conditioned”). Explaining the details of numerical optimization is beyond the scope this MOOC, but in practice one way to avoid such numerical problems is to make sure that the input features of a machine learning model such as logistic regression are approximately on the same range on average. This means that if feature "a"is on a scale of -1 to 1, having feature “b” on scale of 0 to 10 is perfectly fine, but it it is on a scale from 0 to 100000 then you might run into numerical problems preventing the optimizer to converge.
The message in the scikit-learn convergence warning should suggest to use a preprocessor (such as StandardScaler
to scale the features to avoid this problem).
Another way to make it easier to converge would be to decrease the value of the parameter C
(that will increase regularization) but this can have a strong impact on the cross-validation performance of the model as will be explained later in the module on linear models.
If you really want to know the mathematical details and have a background in linear algebra and numerical methods, you can learn more about the numerical optimization problem at hand by following videos 15.1 to 15.7 in this playlist: (ML 15.1) Newton's method (for optimization) - intuition - YouTube and reading about ill-conditioned optimization problems.