ML From Scratch: Naive Bayes Classification Algorithm

Tuesday, 11/10/2022

Tram Ho

Hello everyone, in this Machine Learning From Scratch Series, you and I will go to implement basic algorithms in machine learning to better understand the nature of these algorithms.

1. Theory of Bayes theorem

P(B): The probability that the mother is at home
P(B|A) The probability that your mother is at home when you wash the dishes
P(A): Probability that you wash dishes

And here is the Bayes theorem used in the classification algorithm. When using this formula for classification we assume that the classification attributes are independent of each other (It’s just an assumption, but it’s a bit difficult in practice). The event that the input data belongs to a certain class of n classes is considered a complete system (

$B_1$ $B$ ,

$B_2$ $B, \dots,$

$B_n$ $B). A is the event that a certain data needs to be guessed, so to classify a certain data is the same as calculating P($

$B_k$ $B, A) with k=1, 2, \dotsn and data A will belong to the class with the highest probability just found. Note: Event A is a data point to be predicted, so A can have many attributes, and since these attributes are independent (say above) it will be calculated using the multiplication formula below.$

2. Example

With the above data, we need to predict the label C1 or C2 for data point X(

$A_1$ $A$ =1,

$A_2$ $A = 1). I will calculate the component probabilities of the above formula before calculating P($

$C_1$ $C | X) and P($

$C_2$ $C | X)$

P(
$C_1$ $C$ ) = 3/5 = 0.6
P(
$C_2$ $C$ ) = 2/5 = 0.4
P(X |
$C_1$ $C$ ) = P( $A_1$ $A =1|$
$C_1$ $C)*P($
$A_2$ $A =1|$
$C_1$ $C) = 1/3 * 1/3 = 1/9$
P(X |
$C_2$ $C$ ) = P( $A_1$ $A =1|$
$C_2$ $C)*P($
$A_2$ $A =1|$
$C_2$ $C) = 1/2 * 1/2 = 0.25$

Finally, the two most important probabilities to compare (follow the Bayesian formula):

P(
$C_1$ $C$ | X) = $frac{P(C_1).P(X|C_1)}{P(X|C_1).P(C_1) + P(X|C_2).P(C_2)}$ $P ( X C _ ) . P ( C ) + P ( X C _ ) . P ( C ) P ( C ) . P ( X C _= 0.4$
P(
$C_2$ $C$ | X) = $frac{P(C_2).P(X|C_2)}{P(X|C_1).P(C_1) + P(X|C_2).P(C_2)}$ $P ( X C _ ) . P ( C ) + P ( X C _ ) . P ( C ) P ( C ) . P ( X C _= 0.6$

So the data point X(

$A_1$ $A$ =1,

$A_2$ $A = 1) belongs to class$

$C_2$ $C$

Note: To classify we only need to calculate the numeric element and compare because the denominator is the same. However, here I still calculate fully to represent the event that the data belonging to each class is a complete system, so the sum will be equal to 1.

3. Code python

Ok, after understanding the theory, we will implement the Naive Bayes classification algorithm in python language. First we will initialize the data like the above table.

X = np.array([
    [1, 0],
    [0, 0],
    [2, 1],
    [1, 2],
    [0, 1],
])

y = np.array(["C1", "C1", "C2", "C2", "C1"])

X = np.array([

[1, 0],

[0, 0],

[2, 1],

[1, 2],

[0, 1],

])

y = np.array(["C1", "C1", "C2", "C2", "C1"])

First is the probability function of class P(

$C_1$ $C$ ) and P(

$C_2$ $C). This function simply counts the number of occurrences of that class in the data set.$

def get_class_prob(y: np.array):
  prob = {}
  n = len(y)
  for c in np.unique(y):
    prob[c] = np.count_nonzero(y == c) / n
  return prob

get_class_prob(y)

# output {'C1': 0.6, 'C2': 0.4}

def get_class_prob(y: np.array):

prob = {}

n = len(y)

for c in np.unique(y):

prob[c] = np.count_nonzero(y == c) / n

return prob

get_class_prob(y)

# output {'C1': 0.6, 'C2': 0.4}

Next is the function to calculate the probability of the attribute set according to the condition that is the classes to calculate P(X |

$C_2$ $C$ ), P(X |

$C_2$ $C)$

def get_condition_prob(X: np.ndarray, y: np.array, record: np.array):
  prob = {}
  for c in np.unique(y):
    # Lay tat ca record co class Ci
    class_records = X[y == c]
    n = class_records.shape[0]
    # Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan
    prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n)
  return prob

input = np.array([1, 1])
get_condition_prob(X, y, input)

# output: {'C1': 0.1111111111111111, 'C2': 0.25}

def get_condition_prob(X: np.ndarray, y: np.array, record: np.array):

prob = {}

for c in np.unique(y):

# Lay tat ca record co class Ci

class_records = X[y == c]

n = class_records.shape[0]

# Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan

prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n)

return prob

input = np.array([1, 1])

get_condition_prob(X, y, input)

# output: {'C1': 0.1111111111111111, 'C2': 0.25}

This function will return the conditional probability of the input record according to the classes present in the data set. First at line 5, we will get the data according to the condition that the current layer is calculating. Then at line 8, we will find the number of occurrences of the attributes in the record (np.count_nonzero), then calculate the probability (/n) and their product to get the result (np.prod).

And finally, build a function to predict for a data point

def predict(X: np.ndarray, y: np.array, record: np.array):
  class_prob = get_class_prob(y)
  condition_prob = get_condition_prob(X, y, record)
  prob = {}
  for c in np.unique(y):
    prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob])
  return prob

input = np.array([1, 1])
predict(X, y, input)

#output: {'C1': 0.39999999999999997, 'C2': 0.6}

def predict(X: np.ndarray, y: np.array, record: np.array):

class_prob = get_class_prob(y)

condition_prob = get_condition_prob(X, y, record)

prob = {}

for c in np.unique(y):

prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob])

return prob

input = np.array([1, 1])

predict(X, y, input)

#output: {'C1': 0.39999999999999997, 'C2': 0.6}

This function calculates the probability that the input data point belongs to each class based on the above formula, from which it can make a decision to which class the data point belongs.

And here is the full code:

import numpy as np

def get_class_prob(y: np.array):
  prob = {}
  n = len(y)
  for c in np.unique(y):
    prob[c] = np.count_nonzero(y == c) / n
  return prob

def get_condition_prob(X: np.ndarray, y: np.array, record: np.array):
  prob = {}
  for c in np.unique(y):
    # Lay tat ca record co class Ci
    class_records = X[y == c]
    n = class_records.shape[0]
    # Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan
    prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n)
  return prob

def predict(X: np.ndarray, y: np.array, record: np.array):
  class_prob = get_class_prob(y)
  condition_prob = get_condition_prob(X, y, record)
  prob = {}
  for c in np.unique(y):
    prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob])
  return prob

if __name__ == "__main__":
  X = np.array([
    [1, 0],
    [0, 0],
    [2, 1],
    [1, 2],
    [0, 1],
  ])

  y = np.array(["C1", "C1", "C2", "C2", "C1"])
  input = np.array([1, 1])
  result = predict(X, y, input)
  print(result)

import numpy as np

def get_class_prob(y: np.array):

prob = {}

n = len(y)

for c in np.unique(y):

prob[c] = np.count_nonzero(y == c) / n

return prob

def get_condition_prob(X: np.ndarray, y: np.array, record: np.array):

prob = {}

for c in np.unique(y):

# Lay tat ca record co class Ci

class_records = X[y == c]

n = class_records.shape[0]

# Tinh xac xuat cua diem du lieu trong lop Ci theo cong thuc nhan

prob[c] = np.prod(np.count_nonzero(class_records == record, axis=0)/n)

return prob

def predict(X: np.ndarray, y: np.array, record: np.array):

class_prob = get_class_prob(y)

condition_prob = get_condition_prob(X, y, record)

prob = {}

for c in np.unique(y):

prob[c] = (class_prob[c]*condition_prob[c])/ np.sum([class_prob[ci]*condition_prob[ci] for ci in class_prob])

return prob

if __name__ == "__main__":

X = np.array([

[1, 0],

[0, 0],

[2, 1],

[1, 2],

[0, 1],

])

y = np.array(["C1", "C1", "C2", "C2", "C1"])

input = np.array([1, 1])

result = predict(X, y, input)

print(result)

4. Conclusion

In this article, you and I have learned through how the Naive Bayes algorithm works in the classification problem. However, to be able to build a complete program for classification, it is necessary to optimize and edit a lot of things such as optimizing the running speed of the code, handling special cases (Example with the formula in above if there is a conditional probability of a class attribute equal to 0, then the final probability will also be 0, you can refer to the solutions here ) Finally, thank you for reading the article and remember to Upvote for me if you find the article useful.

Share the news now

Source : Viblo

ML From Scratch: Naive Bayes Classification Algorithm

1. Theory of Bayes theorem

2. Example

3. Code python

4. Conclusion

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers