Problem to load data first exercize

Hello,

I cannot load the penguin data
My order is
pinguin = pd.read_csv(“scikit-learn-mooc/penguins.csv at master · INRIA/scikit-learn-mooc · GitHub”)

i have a parser error.

ParserError: Error tokenizing data. C error: Expected 1 fields in line 81, saw 2

Thank you for help

1 Like

Hello @mab66,

You need to use a relative path, the data is already on the server, no need to download it:

df = pd.read_csv("../datasets/penguins_classification.csv")
1 Like

Thank you

1 Like

Hi mab66, you are using the wrong path to the file on github.
You need to use the link to the raw file.

Try
df = pd.read_csv(‘https://raw.githubusercontent.com/INRIA/scikit-learn-mooc/master/datasets/penguins.csv’)

1 Like

Thank you. That did the trick!

EDIT:
Wait…the exercise called for penguins_classification.csv. So using the “raw” path, the file would be here:

https://raw.githubusercontent.com/INRIA/scikit-learn-mooc/master/datasets/penguins_classification.csv

I don’t know why, but the relative path that Mirzon suggested didn’t work for me. Am I missing something?

It’s OK now with the RAW file, thank you

It only works if you are using the notebooks embedded in the FUN MOOC course. If you are using your own Jupyter notebook or running Python another way, the resource may not be in the same place.

To list available files, you could execute something like:

from glob import glob

glob("../**", recursive=True)