Bayes and Naive Bayes are very important techniques in machine learning. I am going to cover and dig into Naive Bayes in machine learning, and practice that using Python to detect email span. I am going to split this into many posts, because I am going to cover theory and practice.
List of posts
We are going to cover these points:
- Preperation and introduction (this post)
- Naive Bayes by example.
- Scrubbing natural language text.
- Naive Bayes’ Classifire.
- Writing Naive Bayes from scratch
- Using Scikit-learn library
1. Downloan the training data
We are going to work on a sample database as a training samples. You can download it from here.
After you download the package file, uncompress it. It contains two folders one called
spam and the second is
We will assume in the next posts that you uncompressed the file under the folder called
2. Download the required packages
We are going to use Python, and more specifically natural language library to clean the data.
Make sure you install the library nltk, and will download its accessories that will help our work.
import nltk nltk.download('names') nltk.download('wordnet')