ML From Scratch: Naive Bayes Classification Algorithm

Tram Ho

Hello everyone, in this Machine Learning From Scratch Series, you and I will go to implement basic algorithms in machine learning to better understand the nature of these algorithms.

1. Theory of Bayes theorem

image.png The above is the conditional probability formula, used when the probability of an event occurring depends on the event that has already occurred. For example, the probability that you wash the dishes is different from the probability that you do the dishes when your parents are at home, and the probability that you do the dishes when your mother is not at home. Then, to be consistent with the above formula, then A is the event that you wash the dishes, B is the event that your mother is at home, then the above formula represents the probability that you wash the dishes when your mother is at home, specifically it will dependent:

  • P(B): The probability that the mother is at home
  • P(B|A) The probability that your mother is at home when you wash the dishes
  • P(A): Probability that you wash dishes

image.png

And here is the Bayes theorem used in the classification algorithm. When using this formula for classification we assume that the classification attributes are independent of each other (It’s just an assumption, but it’s a bit difficult in practice). The event that the input data belongs to a certain class of n classes is considered a complete system (

REMOVE first B_1 ,

image.png

2. Example

image.png

With the above data, we need to predict the label C1 or C2 for data point X(

A first A_1 =1,

  • P(
    OLD first C_1 ) = 3/5 = 0.6
  • P(
    OLD 2 C_2 ) = 2/5 = 0.4
  • P(X |
    OLD first C_1 ) = P(

  • P(X |
    OLD 2 C_2 ) = P(

Finally, the two most important probabilities to compare (follow the Bayesian formula):

  • P(
    OLD first C_1 | X) =
  • P(
    OLD 2 C_2 | X) =

So the data point X(

A first A_1 =1,

Note: To classify we only need to calculate the numeric element and compare because the denominator is the same. However, here I still calculate fully to represent the event that the data belonging to each class is a complete system, so the sum will be equal to 1.

3. Code python

Ok, after understanding the theory, we will implement the Naive Bayes classification algorithm in python language. First we will initialize the data like the above table.

First is the probability function of class P(

OLD first C_1 ) and P(

Next is the function to calculate the probability of the attribute set according to the condition that is the classes to calculate P(X |

OLD 2 C_2 ), P(X |

This function will return the conditional probability of the input record according to the classes present in the data set. First at line 5, we will get the data according to the condition that the current layer is calculating. Then at line 8, we will find the number of occurrences of the attributes in the record (np.count_nonzero), then calculate the probability (/n) and their product to get the result (np.prod).

And finally, build a function to predict for a data point

This function calculates the probability that the input data point belongs to each class based on the above formula, from which it can make a decision to which class the data point belongs.

And here is the full code:

4. Conclusion

In this article, you and I have learned through how the Naive Bayes algorithm works in the classification problem. However, to be able to build a complete program for classification, it is necessary to optimize and edit a lot of things such as optimizing the running speed of the code, handling special cases (Example with the formula in above if there is a conditional probability of a class attribute equal to 0, then the final probability will also be 0, you can refer to the solutions here ) Finally, thank you for reading the article and remember to Upvote for me if you find the article useful.

Share the news now

Source : Viblo