Linear Kernel: Why is it recommended for text classification ?

The Support Vector Machine can be viewed as a kernel machine. As a result, you can change its behavior by using a different kernel function.

The most popular kernel functions are :

  • the linear kernel
  • the polynomial kernel
  • the RBF (Gaussian) kernel
  • the string kernel

The linear kernel is often recommended for text classification

It is interesting to note that :

The original optimal hyperplane algorithm proposed by Vapnik in 1963 was a linear classifier [1]

That's only 30 years later that the kernel trick was introduced.

If it is the simpler algorithm, why is the linear kernel recommended for text classification?

Text is often linearly separable

Most of text classification problems are linearly separable [2]

Linear kernel works well with linearly separable data
Linear kernel works well with linearly separable data

Read more

How to classify text using SVM in C#

SVM Tutorial : Classify text in C#

In this tutorial I will show you how to classify text with SVM in C#.

The main steps to classify text in C# are:

  1. Create a new project
  2. Install the SVM package with Nuget
  3. Prepare the data
  4. Read the data
  5. Generate a problem
  6. Train the model
  7. Predict

Step 1: Create the Project

Create a new Console application.

SVM Tutorial Csharp

Step 2: Install the SVM package with NuGet

In the solution explorer, right click on "References" and click on "Manage NuGet Packages..."

svm tutorial csharp

Select "Online" and in the search box type "SVM".

svm tutorial csharp 3

You should now see the libsvm.net package. Click on Install, and that's it !

There are several libsvm implementations in C#. We will use libsvm.net because it is the more up to date and it is easily downloadable via NuGet.

Read more