Add formatting to decimal output

In section:
Section: Fitting a scikit-learn model on numerical data.

sub-section:

Preprocessing for numerical features

can we add this option:
rounding numbers for output of .describe() function

pd.options.display.float_format = "{:,.3f}".format

I am not in favour of adding this code. The reason is that we made an effort to remove any unnecessary code that tweaks pandas, matplotlib, NumPy behaviour because it could be intimidating for beginners.

In materials where we would put a strong knowledge in NumPy / pandas as a requirement, I would certainly take the advice.

1 Like

Got it.
I’ve personally found it difficult to read numbers when they are output in scientific notation like that.
But, I understand the reasoning here.

Another possibility would be to have an rc file without to have to write code.
For this MOOC session it might be difficult but we should certainly look at this option for the next session.

Not sure what an rc file is. But, I think that if I can’t easily know what the count is from the .describe() function, then it’s not so useful. Even though it is the default, I cannot read those numbers and make sense of them. :confused:

Reference (documentation here).

Default

Formatted

Uhm there is something weird here still. Executing the notebook on FUN without tweaking anything, I am getting the following output by default.

        	age 	    capital-gain 	capital-loss 	hours-per-week
count 	36631.000000 	36631.000000 	36631.000000 	36631.000000
mean 	38.642352 	    1087.077721 	89.665311 	    4 0.431247
std 	13.725748 	    7522.692939 	407.110175 	    12.423952
min 	17.000000 	    0.000000 	    0.000000 	    1.000000
25% 	28.000000 	    0.000000 	    0.000000 	    40.000000
50% 	37.000000 	    0.000000 	    0.000000 	    40.000000
75% 	48.000000 	    0.000000 	    0.000000 	    45.000000
max 	90.000000 	    99999.000000 	4356.000000 	99.000000

@reshama Did you change anything in the notebook that could explain the change?

No, when I first ran the notebook, it gave me scientific notation.
Not sure why we are getting a different view, by default.

I assume I can only change my own notebook, right? I shouldn’t be able to change the main notebook in the MOOC.

From the FUN interface, you are indeed only modifying your own notebook. We are committing changes here: GitHub - INRIA/scikit-learn-mooc: scikit-learn-mooc

The python_scripts contains all the lectures indeed. Apart of the quizzes, we are making changing that will have a direct effect on FUN, once one is reverting the notebook to original.

In section “Preprocessing for Numerical Features”, after resetting the notebook to original and running the notebook from the very top cell:

this code data_train.describe(), I see this:

this code data_train_scaled.describe(), shows this:

I wonder why the formatting is different for the different cell codes.

Because we have a mean really close to zero but not being zero for numerical error reason after scaling. So pandas switch to engineering notation to be able to show us the small numbers (e.g. -1.2…e-16)

1 Like

tracked in Add formatting to decimal output · Issue #556 · INRIA/scikit-learn-mooc · GitHub