Naive Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of classification tasks. In this article, we will understand the Naive Bayes algorithm, some fundamental mathematics of Naive Bayes so that there is no room for doubts in understanding.

In simple words, Naive Bayes based on fundamental of probability and majorly used in classification problem.

Fundamental Mathematics

In this section we will understand the math behind algorithm, mostly asked in interview.

One by one, we will unfold the facts

Conditional probability is the probability of event occurring based on the already happened event.

Suppose there are two events A and B. Event B already happened, then it is the probability of A given B

P(A|B) = P(AՈB)/P(B)

where, P(B) != 0

P(AՈB) is the probability of both event A and B occurred


Q). There are two dices D1 and D2. What is the probability of getting 2 on D1 such that D1+D2 ≤ 7?

→ There are total 36 combinations of rolled values of the two dice.

P(A) = P(D1=2) = 1/6

P(B) = P(D1+D2≤7) = 21/36

P(AՈB)= P(D1=2 Ո D1+D2≤7) = 5/36

So the equation can be written as

P(A|B) = P(D1=2|D1+D2≤7) = (5/36)/(21/36)

P(A|B) = 5/21

Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

P(A|B) = (P(B|A)*P(A))/P(B)

P(A|B) = posterior

P(A) = prior

P(B|A) =likelihood

P(B) = evidence

For example, if the risk of developing health problems is known to increase with age.


P(A|B) = P(AՈB)/P(B) -— — — — — — — - — — -{1}

According to set theory,

P(AՈB) = P(BՈA) — — — — — — — — — — — — -{2}

P(B|A) = P(BՈA)/P(A)

P(BՈA) =P(B|A)*P(A) — — — — — — — — — — -{3}

Rewriting equation {1} using equation {2} and {3}

P(A|B) = P(BՈA)/P(B)

P(A|B) = (P(B|A)*P(A))/P(B)

Mathematics Behind Naive Bayes

Naive Bayes is a conditional probability model:

Represented a vector X = {x1,x2,x3……,xn} representing some n features (independent variables)

P(Ck|X) = (P(CkՈX)P(Ck))/P(X)

for each of k possible outcomes or classes Ck.

if k =2, it is binary classification

can be written as,

P(CkՈX) = P(Ck,X)

P(Ck|X) = P(Ck,X1,X2,X3……,Xn)

In machine learning capital letters are used to denote random variable. In this article x1 is written as X1, similarly xn written as Xn

which can be rewritten as follows, using the chain rule for repeated applications of the definition of conditional probability

P(Ck,X1,X2……,Xn) = P(X1,X2,X3……,Xn,Ck)

= P(X1|X2,X3,……,Xn,Ck) P(X2,X3,……,Ck)

= P(X1|X2,X3,……,Ck) P(X2|X3……,Ck) P(X3,……,Ck)

=P(X1|X2,X3…,Ck) P(X2|X3,…,Ck)….. P(Xn-1|Xn,Ck) P(Xn|Ck) P(Ck)

Now the “naive” conditional assumption comes into play. According to Naive conditional assumption, no feature depend on other feature.

So the equation can be written as

P(Ck,X1,X2……,Xn) = P(X1|Ck) P(X2|Ck)….. P(Xn-1|Ck) P(Xn|Ck) P(Ck)

P(Ck|X) = (P(Ck)*(P(X1|Ck) P(X2|Ck)….. P(Xn-1|Ck) P(Xn|Ck) ))/P(X)

P(X) is going to be constant if the values of the feature variables are known.

Out of from P(C1|X) to P(Ck|X), suppose P(Ci|X) is the largest value so that means Ci is the class any random variable X for given model.


  1. Check out to git repository to Naive Bayes implementation in Python Code:
  2. For more Detail explanation
  3. For official documentation, you can visit