Problem to load data first exercize

mab66 · 19 May 2021 15:31

Hello,

I cannot load the penguin data
My order is
pinguin = pd.read_csv(“scikit-learn-mooc/penguins.csv at master · INRIA/scikit-learn-mooc · GitHub”)

i have a parser error.

ParserError: Error tokenizing data. C error: Expected 1 fields in line 81, saw 2

Thank you for help

Mirzon · 19 May 2021 15:35

Hello @mab66,

You need to use a relative path, the data is already on the server, no need to download it:

df = pd.read_csv("../datasets/penguins_classification.csv")

mab66 · 19 May 2021 15:40

Thank you

ThomasLoock · 19 May 2021 16:07

Hi mab66, you are using the wrong path to the file on github.
You need to use the link to the raw file.

Try
df = pd.read_csv(‘https://raw.githubusercontent.com/INRIA/scikit-learn-mooc/master/datasets/penguins.csv’)

msspock8581 · 19 May 2021 23:00

Thank you. That did the trick!

EDIT:
Wait…the exercise called for penguins_classification.csv. So using the “raw” path, the file would be here:

https://raw.githubusercontent.com/INRIA/scikit-learn-mooc/master/datasets/penguins_classification.csv

msspock8581 · 19 May 2021 23:24

I don’t know why, but the relative path that Mirzon suggested didn’t work for me. Am I missing something?

mab66 · 20 May 2021 06:01

It’s OK now with the RAW file, thank you

Mirzon · 20 May 2021 07:33

It only works if you are using the notebooks embedded in the FUN MOOC course. If you are using your own Jupyter notebook or running Python another way, the resource may not be in the same place.

To list available files, you could execute something like:

from glob import glob

glob("../**", recursive=True)