Sitemap

Naïve Bayes from Concept to Completion

5 min readApr 7, 2021

Understanding Naive Bayes in a nutshell

If we keep it simple😜, Thinking of a machine learning model which digest the numeric data very easily one may wonder what happens with the text data? How a model trains on a text data? A widely known field of Artificial Intelligence called Natural Language Processing (NLP) having the origin as the same question.

In this article we will try to understand the Naive Bayes algorithm from end-to end and try to implement it in a machine learning model.

This article has three phases
* Bayes theorem
* Understanding Naive Bayes algorithm
* Implementation of Naive Bayes in a machine learning model

Without further ado lets start this journey.

Bayes Theorem

Before we try to understand Naive Bayes theorem we need to touch down the Bayes theorem. Around 2 and half century ago Reverend Thomas Bayes describe the probability of occurring an event having knowledge of a prior event, this concept is called conditional probability.

Suppose we have an event A and event B then…

So Bayes theorem state that probability of event A given B has already occurred is equal to probability of B given A has already occurred multiplied by probability of A wholesome divided by probability of B.

Understanding Naive Bayes theorem

Now lets see how the Bayes theorem is used to derived the Naive Bayes algorithm which will be used for machine learning model.

From the machine learning perspective if we talk about events, our independent events will be all features columns except output labels (i.e. f1,f2,f3…fn)and dependent event is output feature (i.e. y ).If we try to implement the Bayes theorem on the same it will be like. Event A ~ y and event B ~ f1,f2,f3,…,fn yields below.

🤦‍♂️ what does it mean bruh…?
Speaking in machine learning prediction language probability of any outcome is the argmax(maxima) of probability of any independent feature in the same instance.

Diving inside… how does this work on real world?

Suppose we have text data and we want to train a model on it to predict the future output. So the Naive Bayes…

Task1: Converts the whole data into frequency table, which tell the frequency of each words.
Task2: Creates a likelihood table to find out the features probability.
Task3: At last outcome for an instance will be the highest probability in all dependent feature.

Implementation of Naive Bayes in a machine learning model

Suppose we have text data and we want to train a model on it to predict the future output. So the Naive Bayes…

Task1: Converts the whole data into frequency table, which tell the frequency of each words.
Task2: Creates a likelihood table to find out the features probability.
Task3: At last outcome for an instance will be the highest probability in all dependent feature.

Having understood the Naive Bayes theorem we shall see the implementation of same.

Problem Statement: On the daily basis we get many emails many of them are spam. If we can develop any model which tell us that an email is spam or not, so that ignore or delete that automatically.

So we have the data where email content is there and label is there if the data is spam or not.

0-not spam and 1-spam

Lets check whether we have enough data to train our model or not?

So we know that the dataset is unbalanced but its not that bad because we have some ~150 words for 400 lines that is some 60K words to train our model. So we are kind of good.

Now lets think about the how can we have numeric representation of the email content. There comes the “Bag of Words” concept. We will try to find out the each word occurrence in the email. To do this sci-kit already provides the CountVectorizer concept, which help us to do the Task1 in our list. We can use TF-IDF mechanism to tokenize the list.

Lets train our model

To train our model we have created our training data which is already tokenized using CountVectorizer. Now we will train our model using Naive Bayes algorithm.

Moving further if we check out our accuracy we get approx. 98% ✌

Winding-up: We see how Naive Bayes helps us to train our model in real scenarios. Key point to remember before using Naive Bayes that the features must be independent because co-related features affects performance. The good thing about Naive Bayes algorithm that it works well with big amount of data. It doesn't require feature scaling and behave with null values too.

--

--

No responses yet