How to classify text using SVM in C#

SVM Tutorial : Classify text in C#

In this tutorial I will show you how to classify text with SVM in C#.

The main steps to classify text in C# are:

  1. Create a new project
  2. Install the SVM package with Nuget
  3. Prepare the data
  4. Read the data
  5. Generate a problem
  6. Train the model
  7. Predict

Step 1: Create the Project

Create a new Console application.

SVM Tutorial Csharp

Step 2: Install the SVM package with NuGet

In the solution explorer, right click on "References" and click on "Manage NuGet Packages..."

svm tutorial csharp

Select "Online" and in the search box type "SVM".

svm tutorial csharp 3

You should now see the libsvm.net package. Click on Install, and that's it !

There are several libsvm implementations in C#. We will use libsvm.net because it is the more up to date and it is easily downloadable via NuGet.

Step 3: Prepare the data

Every time you want to classify text, you will need to prepare your data. As this is a language agnostic process I created a different page for it :   How to prepare your data for text classification ?   Check it out before reading the remaining of this svm tutorial !

Step 4: Read the data

The document-term matrix is saved as a CSV file.
It can easily be read in C#.
To do this we will use another Nuget package called CsvReader.

            const string dataFilePath = @"D:\sunnyData.csv";
            var dataTable = DataTable.New.ReadCsv(dataFilePath);
            List<string> x = dataTable.Rows.Select(row => row["Text"]).ToList();
            double[] y = dataTable.Rows.Select(row => double.Parse(row["IsSunny"]))
                                       .ToArray();

We have loaded all the sentences in the x variable, and all the class (-1 or +1) in the y variable.

The following code generate the vocabulary:

var vocabulary = x.SelectMany(GetWords).Distinct().OrderBy(word => word).ToList();

Step 5: Generate a problem

Using the data, we are now able to generate a svm_problem.

This is an in-memory representation of the document-term matrix.

            var problemBuilder = new TextClassificationProblemBuilder();
            var problem = problemBuilder.CreateProblem(x, y, vocabulary.ToList());

 Step 6: Create and train a SVM model

const int C = 1;
var model = new C_SVC(problem, KernelHelper.LinearKernel(), C);

When the C_SVC object constructor is called, it immediately calls the Train() method.
We use a linear kernel because they are particularly good with textual data.
The C value is constant for now, but should be optimized for better results.

Step 7: Predict

Once the model is trained, it can be used to make predictions. The main method for that is the Predict method which takes an array of svm_node as a parameter.

            string userInput;
            _predictionDictionary = new Dictionary<int, string> { { -1, "Rainy" }, { 1, "Sunny" } };
            do
            {
                userInput = Console.ReadLine();
                var newX = TextClassificationProblemBuilder.CreateNode(userInput, vocabulary);

                var predictedY = model.Predict(newX);
                Console.WriteLine("The prediction is {0}", _predictionDictionary[(int)predictedY]);
                Console.WriteLine(new string('=', 50));
            } while (userInput != "quit");

Summary of this SVM Tutorial

Congratulations ! You have trained a SVM model and used it to make prediction on unknown data.

If you are interested by learning how to classify text with other languages you can read:

51 thoughts on “How to classify text using SVM in C#”

  1. What all directives and assembly reference do we need to add so as to run it in Visual Studios 2013 using c#. I am very much new to machine learning and visual studios as well.

    Reply
  2. I get an exception when I try to run your solution downloaded from github:

    The type initializer for 'libsvm.svm' threw an exception.

    Reply
  3. Hello Alexander,

    first of all, thank you very much - there is not really much of information available about SVM in C#. I got one question - in your code, you're always training your svm at the beginning. Is it also possible to store the training and just load it before doing a prediction? I want to design an email-classificator and I dont want to train the svm before each classification and I don't think, it's necessary.

    Thank you in advance,
    Jan

    Reply
  4. Thanks for the great tutorial. Just one question on the text preparation: In your page you use a matrix that is based on the words + number of times the word appears. Is there a way to use SVM in scenarios where the word order is important? Could I use a matrix that always uses a count of 1, but has the same word appear multiple times within the matrix?
    thx

    Reply
  5. Hi Alexandre,
    I was wondering if there are 10 classes, would you label each sample's label 1 to 10?, or do you need to create 10 models with the one-to-all approach?
    Thanks very much
    Ben

    Reply

Leave a Comment