Issue with M1.01 Question 2 (spoiler alert)

I’m not sure why answer a) “used pandas to manipulate data” is marked as a correct for question 2. In my view, “manipulating” data means changing it - but we did not do this. If we had dropped “education-num” as suggested, then I would agree that we are manipulating data.

I would argue that opening a dataset, reading columns, counting values, etc. are data manipulation.

7 Likes

Apparently the definition of “data manipulation” is broad. It includes sorting, filtering, removing or merging columns, but it also includes applying formulas and functions (counting unique values or computing the median or mean value of a column).

Do you think it would be clearer if “computing statistics” was in the response?

4 Likes

Hi,

I would say “pandas to explore data”. :slightly_smiling_face:

Having that said, it is also true that we want to manipulate any object when we explore it.

We just used three libraries, pandas, seaborn and matplotlib. The latter two are for visualization.

3 Likes

We used a limited number of records for the pair plots, so we essentially split the dataset in two, and used one for the plotting. That would fall under the umbrella term “data manipulation”, if you ask me.

1 Like

This question is probably not that interesting anyway. Maybe we should just drop it for the next session of the MOOC.

1 Like

It seems that “manipulate” is open to interpretation, so if you have a particular interpretation in mind (or that is commonly understood in the field), then you may need to define your terms :grinning:

I agree that “computing statistics” or “exploring” (as suggested by @qdpham) would be clearer.

1 Like

it s the reason why we’ve got 2 try …

I got a bit confused as well.
I have misinterpert the verb manipulate and I did not consider this alternative as a right answer.

I think I understand that the word “manipulation” (in data science) refers to almost any interaction with the data (datasets), outside of a grammatically correct definition, such as transformation or another. The context is understood.

Data science = matematical skills, computer science and businnes(understand the context of the problems)

I think data manipulation does not imply changing or transforming data. In my opinion, it is an abroad term to define doing anything with data. If you want to be technical, the actual dataset is in the CSV file. You are loading data, converting from string to float or integer values, and so on, whatever happened under the hood of pandas. So, for me, data manipulation is well used here.