How to classify text using SVM in C#

SVM Tutorial : Classify text in C#

In this tutorial I will show you how to classify text with SVM in C#.

The main steps to classify text in C# are:

  1. Create a new project
  2. Install the SVM package with Nuget
  3. Prepare the data
  4. Read the data
  5. Generate a problem
  6. Train the model
  7. Predict

Step 1: Create the Project

Create a new Console application.

SVM Tutorial Csharp

Step 2: Install the SVM package with NuGet

In the solution explorer, right click on "References" and click on "Manage NuGet Packages..."

svm tutorial csharp

Select "Online" and in the search box type "SVM".

svm tutorial csharp 3

You should now see the libsvm.net package. Click on Install, and that's it !

There are several libsvm implementations in C#. We will use libsvm.net because it is the more up to date and it is easily downloadable via NuGet.

Step 3: Prepare the data

Every time you want to classify text, you will need to prepare your data. As this is a language agnostic process I created a different page for it :   How to prepare your data for text classification ?   Check it out before reading the remaining of this svm tutorial !

Step 4: Read the data

The document-term matrix is saved as a CSV file.
It can easily be read in C#.
To do this we will use another Nuget package called CsvReader.

            const string dataFilePath = @"D:\sunnyData.csv";
            var dataTable = DataTable.New.ReadCsv(dataFilePath);
            List<string> x = dataTable.Rows.Select(row => row["Text"]).ToList();
            double[] y = dataTable.Rows.Select(row => double.Parse(row["IsSunny"]))
                                       .ToArray();

We have loaded all the sentences in the x variable, and all the class (-1 or +1) in the y variable.

The following code generate the vocabulary:

var vocabulary = x.SelectMany(GetWords).Distinct().OrderBy(word => word).ToList();

Step 5: Generate a problem

Using the data, we are now able to generate a svm_problem.

This is an in-memory representation of the document-term matrix.

            var problemBuilder = new TextClassificationProblemBuilder();
            var problem = problemBuilder.CreateProblem(x, y, vocabulary.ToList());

 Step 6: Create and train a SVM model

const int C = 1;
var model = new C_SVC(problem, KernelHelper.LinearKernel(), C);

When the C_SVC object constructor is called, it immediately calls the Train() method.
We use a linear kernel because they are particularly good with textual data.
The C value is constant for now, but should be optimized for better results.

Step 7: Predict

Once the model is trained, it can be used to make predictions. The main method for that is the Predict method which takes an array of svm_node as a parameter.

            string userInput;
            _predictionDictionary = new Dictionary<int, string> { { -1, "Rainy" }, { 1, "Sunny" } };
            do
            {
                userInput = Console.ReadLine();
                var newX = TextClassificationProblemBuilder.CreateNode(userInput, vocabulary);

                var predictedY = model.Predict(newX);
                Console.WriteLine("The prediction is {0}", _predictionDictionary[(int)predictedY]);
                Console.WriteLine(new string('=', 50));
            } while (userInput != "quit");

Summary of this SVM Tutorial

Congratulations ! You have trained a SVM model and used it to make prediction on unknown data.

If you are interested by learning how to classify text with other languages you can read:

I am passionate about machine learning and Support Vector Machine. When I am not writing this blog, you can find me on Kaggle participating in some competition.

32 thoughts on “How to classify text using SVM in C#

  1. Marcelo Calbucci

    Hi Alexandre, I'm quite inexperienced with text classifiers and I'm looking for something super simple so I can pass a set of text documents (all belong to the same subject matter) to train the system and then pass another text document to get a a probability that it belongs to the same subject matter. It doesn't seem that this is what SVM can do very easily. Do you know a simple solution for that? (preferably C#/.NET). thank you.

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Marcelo. You can use SVM to predict probabilities. In libsvm.net you just need to give the value 1 to the probabilities parameter of the C_SVC class, then you will be able to call the PredictProbabilities method.

      Reply
      1. jim

        It would be cool to have a simple example project where you divide documents into categories like "spam" and "no spam". Like Marcelo Calbucci I am inexpierienced too with libsvm. Especially the data preparation for libsvm looks complicated to me.

        Reply
      2. akhilesh juyal

        Hello Alexandre KOWALCZYK sir,I am a student and I found your blog quite useful.
        I am currently working on a project where we need to find the missing letter from a given string,example-->th*n as then.Can you help me ,how svm can solve this problem.
        Thankyou.

        Reply
        1. Alexandre KOWALCZYK Post author

          If for each word with a missing letter you assign a class number corresponding to the missing letter (1 to 26) then it is a multi-class classification problem. Then you use SVM in the one-vs-all approach to predict which is the missing letter.

          Reply
      3. Ibrar

        Hi Alexandre KOWALCZYK

        I have read the whole article and it was very useful to me as i am going to work on project for detection of languages using SVM algorithm.

        In this article only one thing i want to know is that: What would be the User Input at the end. I want to know about the format of input

        Reply
          1. Alexandre KOWALCZYK Post author

            I am not sure to understand the question. Are you asking if you should keep the punctuation?

  2. Alexandre BRUN

    Hi Alexandre, first thank you for the tutorial.

    Can we find the different class/methods like "TextClassificationProblemBuilder" or "GetWord" somewhere or do we have to create them ?

    I'm a beginner in C# and that's probably gonna take me an entire day to sort that out 😀

    Thanks.

    Reply
    1. Alexandre KOWALCZYK Post author

      Thanks for your comment Alexandre. I just found that I did not share the source code for this article. You can find it on GitHub. Note: Be sure to change the path in the Program.cs file to use your directory structure. I hope this will help you getting started.

      Reply
    1. alexandrekow

      No I don't know a tutorial for multi-class classification. There is not much to say about it as it is usually performed using the one-vs-all approach. There is an option in libsvm to indicate that you want to perform multi class classification.

      Reply
      1. BRI

        Alex, how can I use it for testing numeric classification, you have shown here text classification. My data is like 0.196078431 and 0.039215686, I need to classify in which category it falls.

        Reply
        1. Alexandre KOWALCZYK Post author

          Well it is even simpler than with text. You don't need to transform the text into numbers. Just put them in your x vector.

          Reply
    1. Alexandre KOWALCZYK Post author

      Well I am afraid is question is too broad. I would probably first research papers on the internet to see if someone did that before. Then I would probably try using Convolutional Neural Networks as it is currently the state of the art for image classification. You have to pick the best tool for the task at hand, and I would not recommend SVM for this one. If you are forced to stick with C# I would say take a look at the Accord.Net library.

      Reply
      1. Rajni Jain

        Hi Alexandre,
        My .csv file contains more than one column (you have only column here as 'text' in your csv file), so in my case the 'List x' will have all the columns except the class column. Could you please give any hint how do i achieve this?.

        Thanks,
        Rajni

        Reply
        1. Alexandre KOWALCZYK Post author

          Hello Rajni. It does not really matter how many column you have. At the end keep in mind that the text column is transformed into a vector X. So if you have several columns you just have to find a way to transform them in a vector X which describes the problem at hand.
          Regards,
          Alexandre

          Reply
  3. rogernazir

    Hey I have XML file for gender detection that trained on SVM. I want to run this XML file do i need to make a code of SVM and after give path of that XML in SVM code?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. You would need to extract the data from your xml file to create a x and y vector then call the CreateProblem method as in the article. To do SVM with C# nowadays I recommend Accord.Net, they have a lot of documentation that should help you.

      Reply
  4. S Apoorva Nandini

    I actually want to train an SVM for Gesture recognition .. So my Data set will be a matrix of numbers and the no. of classes will be more than 2. Can I use the above described procedure to perform it?

    Reply
  5. abdallah

    hi alexander
    i need simple implemention for SVM if you have
    to use it in my GP in data mining
    thnx 🙂

    Reply
  6. Anish Deva

    Hi Alexander
    The tutorial was very helpful.
    I'd like to know if this works for other languages like arabic, tamil.
    Thankyou

    Reply
  7. Shani

    Please suggest me an article from which i can understand SVM for Sentiment Analysis from Start to End.... Kindly

    Reply

Leave a Reply