Pair_Plot

Hi everyone!

Unable to understand pairplot, can anyone teach me how do I understand pairplot better?

3 Likes

this is seaborn.pairplot document

https://seaborn.pydata.org/generated/seaborn.pairplot.html

3 Likes

As the name suggest, in a pairplot, you select 2 features (columns) of the dataset. You can therefore plot the data in a 2D where the x-axis will be one of the features and the y-axis the second one. Plotting every sample of the dataset in this 2D axis, you will obtain what we usually call a scatter plot.

A pairplot is the collection of scatter plots that can be drawn by all the possible combinations of feature pairs. So for a dataset made of 4 features, you can create 4 x 4 = 16 scatter plots.

In addition, some of the combination is a pair of the same feature. It is represented in the diagonal of the pairplot. Since this pair is only a single feature, we usually represent the histogram of the dataset instead of the scatter plot.

8 Likes

Thank you, @glemaitre58 and @everfree
So, basically, it indicates the similarity btw two or more features, I am right?

I don’t think “similarity” is the right term. I would say the relationship between 2 particular features ignoring all others.

2 Likes

Do you mean the correlation between two particular features?

In the context of this exercise, we are more interested in visualizing if the different groups can be well isolated from one another with only 2 features or if the different classes (represented with different colors) overlap and therefore prevent to draw dividing lines between the species as explained in the solution.

2 Likes

In addition to @ogrisel explanation, the correlation is a statistical measure that quantifies/qualifies the relationship between the two features. Here, this is purely visualization.

1 Like

Alright, Thank you @ogrisel and @glemaitre58 :sunny:

Also I have some difficulties understanding documentation in general and pair-plot in particular.
If I understand well in a plot each point rapresent the values of the axis per one specific record.
If it is so, how can be interpreted the marginal plots along the diagonal? They must follow an other logic, otherwise as x and y represent the same value, they can draw only a line at 45 dregree…
Besides, I couldn’t find a way to print in the graph the top value of each bar. Could someone provide a help?

The diagonal is equivalent de the distribution of a single feature.

They must follow an other logic, otherwise as x and y represent the same value, they can draw only a line at 45 dregree…

This is exactly the reason plotting an histogram for this feature is more relevant there.

3 Likes

Thank you. It’s more clear

1 Like

Hello @glemaitre58 @ogrisel ,
for hue parameter why isnt_it accepting target_features=[“Species”] without braces it works why?

1 Like

Passing a list would mean that you want to encode several variables. Therefore the API of seaborn is only accepting a single string.

2 Likes