Hello everyone, in this
Machine Learning From Scratch
Series, you and I will go to implement basic algorithms in machine learning to better understand the nature of these algorithms.
1. Theory of Bayes theorem
- P(B): The probability that the mother is at home
- P(B|A) The probability that your mother is at home when you wash the dishes
- P(A): Probability that you wash dishes
And here is the Bayes theorem used in the classification algorithm. When using this formula for classification we assume that the classification attributes are independent of each other (It’s just an assumption, but it’s a bit difficult in practice). The event that the input data belongs to a certain class of n classes is considered a complete system (
REMOVE first B_1 ,
2. Example
With the above data, we need to predict the label C1 or C2 for data point X(
A first A_1 =1,
- P(
OLD first C_1 ) = 3/5 = 0.6 - P(
OLD 2 C_2 ) = 2/5 = 0.4 - P(X |
OLD first C_1 ) = P( - P(X |
OLD 2 C_2 ) = P(
Finally, the two most important probabilities to compare (follow the Bayesian formula):
- P(
OLD first C_1 | X) = - P(
OLD 2 C_2 | X) =
So the data point X(
A first A_1 =1,
Note: To classify we only need to calculate the numeric element and compare because the denominator is the same. However, here I still calculate fully to represent the event that the data belonging to each class is a complete system, so the sum will be equal to 1.
3. Code python
Ok, after understanding the theory, we will implement the Naive Bayes classification algorithm in python language. First we will initialize the data like the above table.
1 2 3 4 5 6 7 8 9 10 | X = np.array([ [1, 0], [0, 0], [2, 1], [1, 2], [0, 1], ]) y = np.array(["C1", "C1", "C2", "C2", "C1"]) |
First is the probability function of class P(
OLD first C_1 ) and P(
1 2 3 4 5 6 7 8 9 10 11 | def get_class_prob(y: np.array): prob = {} n = len(y) for c in np.unique(y): prob[c] = np.count_nonzero(y == c) / n return prob get_class_prob(y) # output {'C1': 0.6, 'C2': 0.4} |
Next is the function to calculate the probability of the attribute set according to the condition that is the classes to calculate P(X |
OLD 2 C_2 ), P(X |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def get_condition_prob(X: np.ndarray, y: np.array, record: np.array): prob = {} for c in np.unique(y): # Lay tat ca record co class Ci class_records = X[y == c] n = class_records.shape[0] # Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n) return prob input = np.array([1, 1]) get_condition_prob(X, y, input) # output: {'C1': 0.1111111111111111, 'C2': 0.25} |
This function will return the conditional probability of the input record according to the classes present in the data set. First at line 5, we will get the data according to the condition that the current layer is calculating. Then at line 8, we will find the number of occurrences of the attributes in the record (np.count_nonzero), then calculate the probability (/n) and their product to get the result (np.prod).
And finally, build a function to predict for a data point
1 2 3 4 5 6 7 8 9 10 11 12 13 | def predict(X: np.ndarray, y: np.array, record: np.array): class_prob = get_class_prob(y) condition_prob = get_condition_prob(X, y, record) prob = {} for c in np.unique(y): prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob]) return prob input = np.array([1, 1]) predict(X, y, input) #output: {'C1': 0.39999999999999997, 'C2': 0.6} |
This function calculates the probability that the input data point belongs to each class based on the above formula, from which it can make a decision to which class the data point belongs.
And here is the full code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import numpy as np def get_class_prob(y: np.array): prob = {} n = len(y) for c in np.unique(y): prob[c] = np.count_nonzero(y == c) / n return prob def get_condition_prob(X: np.ndarray, y: np.array, record: np.array): prob = {} for c in np.unique(y): # Lay tat ca record co class Ci class_records = X[y == c] n = class_records.shape[0] # Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n) return prob def predict(X: np.ndarray, y: np.array, record: np.array): class_prob = get_class_prob(y) condition_prob = get_condition_prob(X, y, record) prob = {} for c in np.unique(y): prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob]) return prob if __name__ == "__main__": X = np.array([ [1, 0], [0, 0], [2, 1], [1, 2], [0, 1], ]) y = np.array(["C1", "C1", "C2", "C2", "C1"]) input = np.array([1, 1]) result = predict(X, y, input) print(result) |
4. Conclusion
In this article, you and I have learned through how the Naive Bayes algorithm works in the classification problem. However, to be able to build a complete program for classification, it is necessary to optimize and edit a lot of things such as optimizing the running speed of the code, handling special cases (Example with the formula in above if there is a conditional probability of a class attribute equal to 0, then the final probability will also be 0, you can refer to the solutions here ) Finally, thank you for reading the article and remember to Upvote for me if you find the article useful.