Jaccard algorithm

Tram Ho

Hi everybody. It’s been a long time. And in this article I will write about an algorithm that is applied quite a lot in AI or ML topics, the Jaccard Algorithm (or called the English Jaccard Index). To tag ML in this article is also a bit liar but know too know what to tag. Everyone forgive me.


The Jaccard Index Algorithm , also known as the ratio algorithm between the intersection and the union of two sets (can be more abbreviated but …), is an algorithm proposed by the mathematician study French Paul Jaccard. As mentioned above, this algorithm is the result of calculating the similarity between the intersection of 2 sets and the union of 2 sets, calculated as follows:

J ( A , B ) = A B A B = A B A + B A B J (A, B) = frac {| A∩B |} {| A∪B |} = frac {| A∩B |} {| A | + | B | – | A∩B |}

If practice


and episode


is empty, the default

J ( A , B ) = first J (A, B) = 1

Value of

The Jaccard discrete level is to measure the ratio of the disjointness between intersection and union of 2 sets, calculated by:

d J ( A , B ) = first J ( A , B ) = A B A B A B d_J (A, B) = 1-J (A, B) = frac {| A∪B | – | A∩B |} {| A∪B |}


So what does the pile of mathematical formulas mean? Take a look at the following image:

In the image we have 2 areas: the green area is the real object area (called the result), the red area is the prediction area. Good predictions need to identify good objects. These two regions will always intersect and the total area will decrease. The common area of ​​the two regions will gradually increase. It is clear that the greater the general area, the greater the precision, but therefore the total area will decrease. Therefore, the larger J (A, B), the bigger the numerator will be, the smaller the denominator. And this fraction will be maximized to 1 when the two red and blue areas match, and the smallest to zero when the two regions do not completely intersect.

The style is a bit confusing, but the above is the application of Jaccard in object detection. In addition, Jaccard can also be applied in the Recommendation System. There are 3 lists of actors:


= {Haruka Kudo, Okuyama Kazusa, Noa Tsurushima},


= {Okuyama Kazusa, Noa Tsurushima, Chika Osaki},


= {Haruka Kudo, Sakurako Okubo, Hiroe Igeta}. We have:

Section DeliveredHop Hop
A B A∩B = {Okuyama Kazusa, Noa Tsurushima}A B A∪B = {Haruka Kudo, Okuyama Kazusa, Noa Tsurushima, Chika Osaki}
B C B∩C = {}B C B∪C = {Okuyama Kazusa, Noa Tsurushima, Chika Osaki, Haruka Kudo, Sakurako Okubo, Hiroe Igeta}
A C A∩C = {Haruka Kudo}A C A∪C = {Okuyama Kazusa, Noa Tsurushima, Haruka Kudo, Sakurako Okubo, Hiroe Igeta}

Easy to see

J ( A , B ) = 2 / 4 = 0.5 J (A, B) = 2/4 = 0.5


Deploy with code 1

And after a bit of math theory above (and a few minutes for you to look up some of the names above 🤣🤣 ), we will try to see how Jaccard’s implementation is in the code. And this time we will return to the two examples above, but the order will be different. The other 3 lists will be converted into 3 arrays A, B, C. The task we need to do here is to implement the algorithm and give the pairs with the highest similarity.

The first is the algorithm installation. This is quite simple (actually there is jaccard function available in python already, but we will try to install from the beginning)

After having the above function, we will test 2 lists as follows:

And this is the result achieved. Correct calculation formula

The next step is also simple. A few conditional statements are fine. And so we do the following dirty code temporarily

The result is that A and B are printed like this. Does not look very nice. But forget it. At this point we have built a child suggestion system with 3 elements already. Up N elements you generalize offline

Deploy with code 2

Yes. Now go back to the object detection problem.

First of all we will add the necessary libraries

And to store the objects in the box as above, the algorithm is written as follows:

In the demo, I took a temporary white background image and drew 2 boxes. Outline instructions only

This is the result obtained


In this article, I just introduced you 1 of the algorithms that are included in AI / ML as well as digging a bit about the mathematical nature of AI. Here is my source code link: https://github.com/BlazingRockStorm/Jaccard_Index_Algorithm


https://en.wikipedia.org/wiki/Jaccard_index https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

Share the news now

Source : Viblo