Exercise M4.04

miwojc · 20 June 2021 10:27

When i execute the exercise M4.04 code (also checked solution, i just pulled from github to maker sure i have latest notebooks) i get repeated coefficients for repeated features. results are different than in the course website notebook
local:

website course solution:

echidne · 20 June 2021 14:27

Hi @miwojc ,

I also did the exercice localy and i did not see difference with the solution on the web site.
The graph and the array of coefficients you are obtained locally should be the result of the Ridge model ( as you can see at the bottom of the solution). Are you sure you did not be confused in your notebook between the 2 models ??

my local results for the linear_regression model :

and for the Ridge one :

glemaitre58 · 20 June 2021 17:31

@miwojc we wanted to point out a numerical issue and it might be that this behaviour is only reproducible in some OS. Are you using Linux? If you are using Linux, then it is a bit surprising and we should look at the low-level library used by SciPy and check differences then.

miwojc · 20 June 2021 19:05

I use JupyterLab and Windows Subsystem for Linux 2 to run the notebooks.

here’s the system information:

powershell:
❯ wsl -l -v
NAME STATE VERSION
* Ubuntu-20.04 Running 2

miwojc · 20 June 2021 19:08

i have pulled from github to make sure i am on latest notebooks. then i restarted and run all cells to make sure i didn’t confuse order or cells. however the results are identical for linear regression and ridge and they are different than in the course website.

glemaitre58 · 20 June 2021 20:26

OK, so it should be linked to what I mentioned earlier. Basically, having correlated features will lead to numerical imprecision due to the matrix inversion when computing the solution of the linear regression. On your system, it seems that the numerical error does not occur.

@ogrisel Do you have any idea on how WSL is working regarding the SciPy low level implementation.

miwojc · 21 June 2021 04:08

Oh i see. Thank you for looking into that.

miwojc · 21 June 2021 05:20

i have re-run it on windows and result is different to what i got on linux, but also different to what is in online course website:

glemaitre58 · 21 June 2021 06:57

So it is going sideways as well in a different way It would be nice that we find a way to make it quite reproducible. At least, we know it is on FUN

ogrisel · 21 June 2021 08:52

So it is going sideways as well in a different way It would be nice that we find a way to make it quite reproducible. At least, we know it is on FUN

I don’t think we can expect to get a robust reproducibility for numerical instability related behaviors, especially in a cross-platform and cross-scipy-version manner.

lesteve · 28 January 2022 16:55

Not much we can do about this I think, removing the prio-nice-to-hav tag