Calculating the mutual information between two random vectors returns the same value

I know maybe this is off topic, but I want to try my chance to get help from the Scikit-learn’s experts here. I want to calculate the mutual information between two numpy vectors:

>>>from sklearn.metrics.cluster import mutual_info_score
>>>import numpy as np

>>>a, b = np.random.rand(10), np.random.rand(10)
>>>mutual_info_score(a, b)
1.6094379124341005

>>>a, b = np.random.rand(10), np.random.rand(10)
>>>mutual_info_score(a, b)
1.6094379124341005

As you can see, although I updated a and b, it returned the same value. Then I tried another example:

>>>a = np.array([167.52523295,  73.2904335 ,  98.61953303, 152.17297007,
       211.01341451, 327.72296346, 356.60500081,  43.9371432 ,
       119.09474284, 125.20180842])

>>>b = np.array([280.9287028 , 131.76304983, 176.0277832 , 188.56630096,
       229.09811401, 228.47200012, 617.67000122,  52.7211511 ,
       125.95361582, 148.55247447])

>>>mutual_info_score(a, b)
2.302585092994046


>>>a = np.array([ 6.71381009,  1.43607653,  3.78729242, -4.75706796, -3.81281173,
        3.23440092, 10.84495625, -0.19646145,  4.09724507, -0.13858104])

>>>b = np.array([ 4.25330873,  3.02197642, -3.2833848 ,  0.41855662, -3.74693531,
        0.7674982 , 11.36459148,  0.64636462,  0.51817262,  1.65318943])

>>>mutual_info_score(a, b)
2.302585092994046

Why? Look at the difference between those numbers. Why it returns the same value? More importantly, how do I calculate the MI between two vectors?

Notice that you are passing continuous values to a metric for clustering, where the labels are expected to be integers (as in cluster #1, cluster #2, etc.).

Try instead:

from sklearn.metrics.cluster import mutual_info_score
import numpy as np

a, b = np.random.randint(0, 2, 10), np.random.randint(0, 2, 10)
mutual_info_score(a, b)

a, b = np.random.randint(0, 2, 10), np.random.randint(0, 2, 10)
mutual_info_score(a, b)

In that case you will certainly obtain different numbers each time you run the cell.

how do I calculate the MI between two vectors?

Maybe you can find some useful information on the following example:

I didn’t know that this function is specific to clustering subject! I want to calculate MI between pair of vectors and then use the result as the input of clustering! So I guess I’m doing wrong since Sklearn provides the MI for the sake of clustering.

Then maybe you want to take a look at mutual_info_regression and the Comparison of F-test and mutual information example.

1 Like

Thank you so much! It’s great!!