Support Vector Regression with R

In this article I will show how to use R to perform a Support Vector Regression.
We will first do a simple linear regression, then move to the Support Vector Regression so that you can see how the two behave with the same data.

A simple data set

To begin with we will use this simple data set:

A simple data set in excel

I just put some data in excel. I prefer that over using an existing well-known data-set because the purpose of the article is not about the data, but more about the models we will use.

As you can see there seems to be some kind of relation between our two variables X and Y, and it look like we could fit a line which would pass near each point.

Let's do that in R !

Step 1: Simple linear regression in R

Here is the same data in CSV format, I saved it in a file regression.csv :

A simple data set in CSV

We can now use R to display the data and fit a line:

# Load the data from the csv file
dataDirectory <- "D:/" # put your own folder here
data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE)

# Plot the data
plot(data, pch=16)

# Create a linear regression model
model <- lm(Y ~ X, data)

# Add the fitted line

The code above displays the following graph:

The linear regression with our simple data set

Step 2: How good is our regression ?

In order to be able to compare the linear regression with the support vector regression we first need a way to measure how good it is.

To do that we will change a little bit our code to visualize each prediction made by our model

dataDirectory <- "D:/"
data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE)

plot(data, pch=16)
model <- lm(Y ~ X , data)

# make a prediction for each X
predictedY <- predict(model, data)

# display the predictions
points(data$X, predictedY, col = "blue", pch=4)

This produces the following graph:linear model prediction

For each data point X_i the model makes a prediction \hat{Y}_i displayed as a blue cross on the graph. The only difference with the previous graph is that the dots are not connected with each other.

In order to measure how good our model is we will compute how much errors it makes.

We can compare each Y_i value with the associated predicted value \hat{Y}_i and see how far away they are with a simple difference.

Note that the expression \hat{Y}_i - Y_i is the error, if we make a perfect prediction \hat{Y}_i will be equal to Y_i and the error will be zero.

If we do this for each data point and sum the error we will have the sum of the errors, and if we takes the mean we will get the Mean Squared Error (MSE)

MSE = \frac{1}{n}\sum\limits_{i=1}^n (\hat{Y}_i - Y_i)^2

A common way to measure error in machine learning is to use the Root Mean Squared Error (RMSE) so we will use it instead.

To compute the RMSE  we take the square root and we get the RMSE

RMSE = \sqrt{MSE}

Using R we can come with the following code to compute the RMSE

rmse <- function(error)

error <- model$residuals  # same as data$Y - predictedY
predictionRMSE <- rmse(error)   # 5.703778

We know now that the RMSE of our linear regression model is 5.70. Let's try to improve it with SVR !

Step 3: Support Vector Regression

In order to create a SVR model with R you will need the package e1071. So be sure to install it and to add the library(e1071) line at the start of your file.

Below is the code to make predictions with Support Vector Regression:

  model <- svm(Y ~ X , data)

  predictedY <- predict(model, data)

  points(data$X, predictedY, col = "red", pch=4)

As you can see it looks a lot like the linear regression code. Note that we called the svm function (not svr !)  it's because this function can also be used to make classifications with Support Vector Machine. The function will automatically choose SVM if it detects that the data is categorical (if the variable is a factor in R).

The code draws the following graph:

Support Vector Regression Predictions

This time the predictions is closer to the real values ! Let's compute the RMSE of our support vector regression model.

  # /!\ this time  svrModel$residuals  is not the same as data$Y - predictedY
  # so we compute the error like this
  error <- data$Y - predictedY
  svrPredictionRMSE <- rmse(error)  # 3.157061

As expected the RMSE is better, it is now 3.15  compared to 5.70 before.

But can we do better ?

Step 4: Tuning your support vector regression model

In order to improve the performance of the support vector regression we will need to select the best parameters for the model.

In our previous example, we performed an epsilon-regression, we did not set any value for epsilon ( \epsilon ), but it took a default value of 0.1.  There is also a cost parameter which we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or model selection.

The standard way of doing it is by doing a grid search. It means we will train a lot of models for the different couples of \epsilon and cost, and choose the best one.

  # perform a grid search
  tuneResult <- tune(svm, Y ~ X,  data = data,
                ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
  # Draw the tuning graph

There is two important points in the code above:

  •  we use the tune method to train models with \epsilon = 0, 0.1, 0.2, ... ,1  and cost = 2^2, 2^3, 2^4, ... ,2^9 which means it will train  88 models (it can take a long time)
  • the tuneResult returns the MSE, don't forget to convert it to RMSE before comparing the value to our previous model.

The last line plot the result of the grid search:

support-vector regression performance 1


On this graph we can see that the darker the region is the better our model is (because the RMSE is closer to zero in darker regions).

This means we can try another grid search in a narrower range we will try with \epsilon values between 0 and 0.2. It does not look like the cost value is having an effect for the moment so we will keep it as it is to see if it changes.

  tuneResult <- tune(svm, Y ~ X,  data = data,
                     ranges = list(epsilon = seq(0,0.2,0.01), cost = 2^(2:9))


We trained different 168 models with this small piece of code.

As we zoomed-in inside the dark region we can see that there is several darker patch.
From the graph you can see that models with C between 200 and 300 and \epsilon between 0.08 and 0.09 have less error.

support vector regression performance 2
Hopefully for us, we don't have to select the best model with our eyes and R allows us to get it very easily and use it to make predictions.

  tunedModel <- tuneResult$best.model
  tunedModelY <- predict(tunedModel, data) 

  error <- data$Y - tunedModelY  

  # this value can be different on your computer
  # because the tune method  randomly shuffles the data
  tunedModelRMSE <- rmse(error)  # 2.219642  

We improved again the RMSE of our support vector regression model !

If we want we can visualize both our models. The first SVR model is in red, and the tuned SVR model is in blue on the graph below :

support vector regression comparaison
I hope you enjoyed this introduction on Support Vector Regression with R.
You can get the source code of this tutorial. Each step has its own file.

If you want to learn more about Support Vector Machines, you can now read this article:
An overview of Support Vector Machines

188 thoughts on “Support Vector Regression with R

  1. Jose

    Good stuff. How would this behave if for example, I wanted to predict some more X variables that are not in the training set? Is this useful in those instances? - In that case, how?

    Many thanks

    1. Alexandre KOWALCZYK Post author

      You just need to use the predict method with two parameters: the trained model and your new data. This will give you the predicted values. This is useful because that is our original goal, we want to predict unseen data.

      1. Joshua Dunn

        I have tried predicting unseen data but it always seems to underestimate the effect of it. For example, with temperature as my x-variable, if my SVR has not seen temperatures below zero degrees C (ie minus 2 degrees C) it effectively predicts them as it would zero. Would you be able to tell me what this is called or point me in a direction to solve this? Regards

        1. Alexandre KOWALCZYK Post author

          For me it looks like you are overfitting your model with your training data. What you should try is to modify increase the weight of the regularization parameter (or use regularization if you were not)

      2. Md. Moyazzem Hossain


        Thank you very much. Actually I want to predict the future value of univariate time series by SVM. I have used the library e1071. I am able to predict the value over the study period but i want to forecast the future value.

        1. Quinn

          You should try to implement a time series model like look at the output of the AR; could make an MA model or a combination ARMA (or ARIMA for most seasonal data). These models are meant to have predictive power will predict the next however-many-you-want data point. As far as I know this is the best practice unless you are trying to gather model inputs. (ie summar(svm_model)) for a simulation. If not too late, try the 'timeSeries' package in R.

      3. PARISA

        Thank you for your excellent description.
        I created an SVM model using my data and computed the RMSE. How can I compute the coefficient of determination of the model? Also, How can I define the method of model validation?

  2. Liz

    "we use the tune method to train models with ϵ=0,0.1,0.2,...,1 and cost = 22,23,24,...,29 which means it will train 88 models (it can take a long time)"

    Hello. Can you explain how the number 88 is calculated? Thank you.

    1. Alexandre KOWALCZYK Post author

      There is 11 values of epsilon, and 8 values for the cost. We can associate each epsilon with the 8 cost values to create 8 couples. As there is 11 epsilons, there is 11\times8 couples.

  3. Fakhrul Agustriwan

    Hello Mr. Kowalczyk.
    This tutorial is very helpful. Actually i am trying to forecast the future value of a time-series data by using SVR method, but i am quite confused how to perform it in R. Could you explain the steps on how to do it?

    Thank you 🙂

    1. Alexandre KOWALCZYK Post author

      Thanks for your comment. Unfortunately I have never used SVR to forecast timeseries. However I found this question and one of the answer is pointing to this article. As suggested in the answer you will need to transform the classification problem to a regression one but this might be a good starting point for you.

  4. loic

    As I understand, SVM implemented in R uses the Radial Basis Kernel by default. Therefore, there is another parameter (called gamma). How do you deal with this one? I think you should fit it also.
    One article mentionned to take the median of pairwise distances between the learning points. (After the scaling process)

  5. loic

    Ok thanks for your reply.
    Using tune.svm I noticed that this function is very very long (around 3 seconds per configuration of parameters for 1000 observations of 7 variables).
    Surprisingly if you use svm(..., cross = 10) you can get the cross validation error for less than 0.5 second on the same data. So, I concluded that tune.svm was very badly coded, do you have any idea about this issue?
    Therefore I coded my own parameters tuning function using svm(...,cross=10).

    Also, I have found several papers that use a BFGS optimization algorithm (on a log2 scale) instead of grid search. I tried this, it turned out to be very efficient.

    1. Alexandre KOWALCZYK Post author

      When you are using svm(..., cross = 10) you are performing a 10-fold cross-classification on the training data. This is not the same as doing a grid search. If the method tune.svm is so slow, it is not because it is poorly coded, but because it trains one svm model per combination of hyperparameter. So if you want to try gamma=0.1,0.01 and C=1,10,100 for instance it will train 6 different svm models ([0.1,1][0.1,10][0.1,100] [0.01,1][0.01,10][0.01,100]) In other word, it will try each couple in the cartesian product of the gamma set with the C set. If you try it for 10 values of gamma and 10 values of C, it will train 100 models. Which should indeed be much slower than training only 10 models.

  6. loic

    That's not what I meant. I am aware of that of course. But actually, I made grid search "by hand" with a loop on 10x10 values of gamma and C using svm(...,cross = 10). Therefore I called 100 times svm and then keep the minimum cv error. The overall time it took was something like 10 times less than calling once tune.svm() on a 10x10 grid.
    That was what made me think this function was poorly coded or it might use sofisticated techniques I am not aware of.

    I've been trying to find the reason in vain.

    Actually, I am a bit doubtful about the results of svm(..., cross = 10), it seems that it does not compute the sv error on a stochastic way and the results are only one decimal digit precise which is weird comparing to tune.svm()

    1. Alexandre KOWALCZYK Post author

      I can't really help you more without seeing your code. Maybe you can ask on stackoverflow or cross validated if you want to dig deeper and understand what happens in your particular case. Feel free to post the link here afterward and I'll take a look.

    2. Renan

      Hi loic,
      I am very interested on your code by hand. Because I have a lot of data to train and it takes a very very long time. Could send me this part ?
      Thanks a lot

  7. Spartan

    Great tutorial for svm, clearly defining its function as a classifier or a regressor, thanks Alexandre.

  8. HAP

    Thank you for this valuable post. If I have more than one X variable including some dummy variables can I fit the SVR for that case?

  9. Ilgaz

    I read your blog posts. I am not very clear about how to forecast future values of time series using SVR. I looks to me that SVR fits a model using training set. But how about using predict() to predict future values ( n.ahead values ) in R? I couldnt find this feature so far..

    Sincerely, Ilgaz

  10. Aseel

    First of all, thanks for the very helpful tutorial. I'm using R 3.2.1, but svm doesn't work correctly. On step 3, when I'm running this: model <- svm(Y ~ X , data), the error is :

    Error in predict(ret, xhold, decision.values = TRUE) :
    unused argument (decision.values = TRUE)

    Can you please help me?


    1. Alexandre KOWALCZYK Post author

      Hello. I don't have a lot of idea about this one. You might want to take a look at this answer and try the provided solution. Otherwise I would advise you to try the code on another machine to see if it works and if it does try to replicate the environment on your machine. Best regards.

      1. Aseel

        I really appreciate your replay. I think the problem is there is same function name in two packages, for example predict() function in both ".GlobalEnv" and "package:stats" packages.

        I will try to figure out how to solve that.

        Thanks a lot,

        1. Aseel

          I've found that I have function with the same name with predict. So, simply, I've copied my function to another name and remove predict function. That was making the confusion.

          Thanks again for helping me.

  11. Pingback: Support Vector Regression in R | logicalerrors

  12. Espartaco

    Hi Alexandre. Thank you so much for all the information, I have a few questions.
    1. Can I use any kínd of variables in a SVM ? Continuous, categorical?
    2. If I am using a SVM to classify two groups, is there a way to get a probability of assignment to each group?
    3. How do you validate that the SVM is a good model?

    1. Alexandre KOWALCZYK Post author

      1. Yes. For continuous data it is called SVR and SVM for categorical data
      2. Yes. Most framework provide a method "predict probabilities" to do so
      3. You use a score to measure the quality of your model, if you want to learn more I recommend you this book.

  13. Danny

    Hi Alexandre,

    Thanks for such a comprehensive tutorial. Much appreciated. I am trying to SVR for predicting time series. As mentioned in your post, tune() shuffles the data. Is there any option or way to not to shuffle the data?

    1. Alexandre KOWALCZYK Post author

      Hello. You can specify a tunecontrol parameter to specify the behavior of the tune method. I think tune.control(sampling = "fix") might suit your need.

      1. tahir

        thank you very much, but when I use lssvm I get this message

        (Using automatic sigma estimation (sigest) for RBF or laplace kernel
        Error in if (n != dim(y)[1]) stop("Labels y and data x dont match") :
        argument is of length zero)

        1. Alexandre KOWALCZYK Post author

          Hello tahir.

          Sorry I can't help you more without a reproductible example.
          The best place for you to ask your question is I hope you will find plenty of help there.



      2. Ely

        Hi Alexandre,

        Thank you for script on SVM. I have a few questions:

        1. What are some advantages of the LSSVM over SVM and ANN (artificial neural network)? Does the LSSVM have a tendency to overfit?

        2. Do you have an R script to perform an LSSVM? How to tune LSSVM and cross-validate it.

  14. Ankit


    I am trying to use SVM for classification setting. All the variables/Attributes in the dataset are qualitative . Will SVM work in this case? or do I need to convert qualitative variables in quantitative before using it?


  15. Subhasri

    Hi... Thank you for the superb article.. I've been reading about SVMs since a few days now. I have a doubt.... We normalize data and then give that data as input to SVM; can SVM be used for actual normalization? and how do we determine a kernel function?

    1. Alexandre KOWALCZYK Post author

      The goal of SVM is not normalization, it is classification (or regression in the case of SVR). If your question was "How to select a kernel", this link might help you.

  16. Minerva

    Hello Sir. Is it possible to calculate the AIC of the SVM regression model (the way you would for linear regression) ? If yes, how?

    1. Alexandre KOWALCZYK Post author

      Hello. I never used any package using Cuda or OpenCl with R. For Cuda I am using python. Maybe you can find some informations here.

  17. Scott

    Hi Alexandre,
    First of all -- excellent tutorial! Thanks for putting the time into this.
    I'm wondering if you can help answer a question. I have a 125 independent observations of spectral 200 wavelength NIR data for a random set of samples, my X matrix. For each I have an independent scalar value of the concentration of a certain analyte, essentially my Y vector.

    My goal is to find the minimal set of important wavelengths that correlate best to Y,
    I've used PLS techniques combined with wavelength selection methods to find a subset of wavelengths. Even with the full set of 200 wavelengths and 20 latent vectors I get a nominal fit to my Y concentration data. But of course, when you fit a PLS model, you hope to find a few PLS factors that explain most of the variation in both predictors and responses. Now, the regression coefficient profile (loadings) gives a direct indication of which predictors are most useful for predicting the dependent variable.

    OK, on to my question. Once one finds a reasonable SVR regression fit, how does one extract which wavelengths/variables were weighted more highly than others? I like the use of SVR over PLS since, with the right kernel choice, it can incorporate potential nonlinearities in my data. But all the SVR packages I've looked at seem to lack the fit and weighting assessment plots that are found in PLS.

    Can you offer any thoughts/suggestions on this?

    1. Alexandre KOWALCZYK Post author

      Hi Scott.

      Thanks for your comment, I am glad people find my articles helpful.

      Unfortunately there is no built-in way to retrieve the relative importance of variable for SVR. You can try to remove one predictor and see how it impacts the performance of the SVR (but you will have to do it for each predictors which can take a long time).

      I found this paper which might help you, it even has a video.

      I hope it helps you.

  18. Sharda Tripathi

    Hi Alexandre,

    Your tutorial is very informative and easy to understand. Keep up the good work. I however have a more conceptual question from SVR, not related to SVR implementation in R. I have performed support vector regression on a time series. The residuals (i.e actual value-predicted value) shows strong auto correlation.The auto correlation plot of residuals has a damped sinusoidal nature. I have read in literature that fitted model is not good if such is the case. I have tried transformations like first difference  and log of time series,still the result is same.

    Does this auto correlation imply that my model is not good.If so, what can I do to get rid of it.

    Any suggestion would be of great help 🙂


    1. Alexandre KOWALCZYK Post author

      Hello Sharda. Thanks for your comment. Indeed this autocorrelation implies that your model is not perfect. You should think about your problem and see if you can add another independant variable. Adding it might remove the autocorrelation. If it does not work, you can try other techniques like the Cochrane-Orcutt Method or the AR(1) Method as described in this chapter. Regards.

  19. Amir

    Hi, Thank you for sharing. I am trying to perform a multi classification in credit rating allocation for a bank, my data set involves a set of financial ratios, I have tried several techniques to perform the classification , however, the models output just like flipping a coin !!!

    I wonder what i should do . any hint?. I can send you my data set if required and highly need help since its my master thesis as well.

    1. Alexandre KOWALCZYK Post author

      Hello Amir. The first question I would ask myself is "Is what I am trying possible?" if so, "Is there somebody else who did that? With which results ?" "What is the state of the art?" That's why I often start looking for papers and I advise you to do so. If your models perform poorly, maybe your data is not clean. Did you preprocess it correctly? Did you use standardization or normalization? When in doubt, one approach might be to try your model on another simpler dataset on which such model usually perform good. If your model does not work well on it, it might be a programming or data preparation mistake. I hope this helps you.

  20. sajal

    Thanks Alexandre for nice and useful post. Is it possible to obtain the model equations directly using SVR (preferably the best fitted one) to apply in another platform for calculation, for example in MS Excel based on the fitted models?

    If happened so, could you please explain a bit. Thanks in advance.

  21. Rachel

    Hi Alexandre. How to apply SVM for univariate time series data to classify into 2 ccategories (either normal or outlier) ?

    1. Alexandre KOWALCZYK Post author

      Hi Rachel. Sorry but this is a pretty broad question which would need a specific article to answer. You can go on this site to post such questions, but don't forget to do your own research before. Best regards.

  22. lichenyu

    I met the problem same as loic refers.
    In fact, that is caused by the default setting of the function tune.svm(), which will perform a 10 cross-validation.
    And for svm(), there is no cross-validation by default.
    So, when using tune(), it may take around 10 times as expected if not considering this issue.

  23. Adhi

    Thank you for the tutorial.
    I am curious, how do the manual calculation in SVR until got the function and prediction value?

  24. Shahriar SHAMILUULU

    Hi Alexandre,

    Thank you for tutorial and I have a question below.

    How we can get an equation for the model generated by svr, i.e., intercept, coefficient for x and R2, because when I try to see a summary there is nothing like that.

    Thank you.

    1. Lulu

      Hello! I have the same problem, but Alexandre's answer didn't help. How did you solve it?

  25. sumana

    good tutorial
    can you please tell me how svm is used to tell whether the dataset is linear or nonlinear

    1. Alexandre KOWALCZYK Post author

      Well you can't use SVM to know if data is linear or non linear. However if you achieve a very good score with a SVM and a linear kernel it is most likely that the data is linearly separable.

  26. dmitrio

    Thanks for your tutorial.

    i'm forecasting time series data with 4 predictor (t-4,t-3,t-2,t-1) to predict 't' data.
    There is a rule of minimum training sample to build SVR model ?


  27. bahtiyar

    Hi alexander,

    I try to implement SVR in my prediction time series. I use univariate data for prediction..
    format data that i use is

    x-1, x-2, x-3, x-n -> [x+1]

    x+1 : target value
    x-1,x-2,x-3,x-n : atribute value

    I use libsvm (e1071) in R to help calculate the prediction and i got high error value..
    Must I scale the data to [0,1] or [-1,+1] as a classification problem. . If there must be scale, I didn't find parameter to set it in R. In manual lib 'e1071', I just found parameter that the data scale or not.

    The parameter like this
    svm(....parameter...... , scale = FALSE)

    Any sugestion for this...?


  28. William

    Thank you very much, Mr. KOWALCZYK! Thanks to your lm(y,x,data) function I was able to successfully plot a regression line! I had tried lm(y,x) before but I kept getting the error "Error in if (noInt) { : argument is of length zero". Thanks again!

  29. Pietro

    Hi! Great tutorial! Please can you help me on these two things:

    1) I have to use SVR in order to predict future values of energy consumption. My input is the day of the week and the output is the correspondent energy consumption value. How can I encode this input information?

    2) I have one month energy data: How do I divide the whole set into Training and Set? How do I use the test set in order to validate the model?

    Thanks a lot!

    1. Alexandre KOWALCZYK Post author

      Hello Pietro.
      1) For the day of the week you could use a number for each day (0 to 7) but this is not so good because there is an order between the number so instead you should one-hot-encode it. For this you can use the OneHotEncoder provided by sklearn.
      2) You can watch this video which explains everything 🙂

  30. Alex


    this is a very useful tutorial. Thanks. 😉

    I am wondering how you can extract out the coefficients of the SVM regression, just like the coefficients in the linear regression.

    Thanks in advance.

    1. Alexandre KOWALCZYK Post author

      You can use the coefs property of the svm object which is returned after the training.

  31. LT


    This is a great tutorial.

    Just a question. After the SVM is trained, can we do a hand calculation, like we can do in a simple model, to predict a value for a new variable set.


    1. Alexandre KOWALCZYK Post author

      You can use the trained model to make a new prediction. I don't get what you mean by "hand calculation". If you want to do it by hand on paper it would be tedious.

  32. Vishesh Sahni

    Hello. It's a great tutorial so thanks for putting it here. I've my modelled my data and obtained a graph. I want to predict the next value. How do I do it? Thanks a lot.

    1. Alexandre KOWALCZYK Post author

      Hello. Thank you for your comment. Unfortunately your question is way too broad. Maybe you can find a dedicated forum or a teacher to help you with this matter. Regards.

  33. Chimezie

    Hi Alexandre,

    Can you please explain the Dispersion term in the SVM tuning process ! What does Dispersion stands for???


    1. Alexandre KOWALCZYK Post author

      Hello. I don't see what you mean. I don't see any dispersion term in the e1071. Could you clarify?

  34. RC

    Thanks for this, very helpful. I know with SVM only cannot usually figure out what the features are that led to the good prediction model but I was wondering if there is a way to extract the features which are crucial in generating the predictedY with SVR? Essentially, I want to use SVR for feature selection. I am getting pretty descent error (0.31) for my model and I'd like to know which features have the highest weights enabling this? Any help would be greatly appreciated. Thank you.

  35. Pingback: Get ready for R/Finance 2016 – Mubashir Qasim

  36. LJ

    Hi Alexander,

    Thanks for this, very helpful. I'm trying to test different parameterization SVM in prediction problems using epsilon-svr and nu-svr. In what range do I should test the parameters ε, ν (nu) and C?

    1. Alexandre KOWALCZYK Post author

      You should test them using grid search. The particular value of the parameters differ greatly between problems so you just have to do a grid search first and then try to narrow the range until you find values which give you satisfaction.

  37. ram

    Hi Alex,
    Really great article!

    I have few questions here, how that epsilon and cost is related here to the model.
    And where have you used the kernel part in the above calculation?

    and how does the kernel impact in processing of the model?

    1. Alexandre KOWALCZYK Post author

      We did not specify the kernel parameter when we created the svm so the kernel is "radial" by default. (See documentation). You will understand how epsilon and C affect the model by reading this article. Best regards

      1. Ruaa

        thanks alot for your great tutorial, it has helped me alot, but i am wondering i have a data set that's to predict electric load forecastig, my question should i normalize the data set first or automatically normalized by svm, my second question about the first point to tune the paremeters, how can i choose it

  38. Emre

    Hi Alex,

    It is a solid tutorial. Thank you very much.

    I have a question and need to find the answer asap.

    I need perform v-svm which has additional parameter "v" . Can you help me modify the svm code to obtain v-svm code.

    And, I am curious about how I can see the whole code of SVM in R. Is there any way to step in the function SVM?

    Thank you very much.

  39. SA

    Great Tutorial!!!!!!
    how to find 95% confidence interval for non linear regression? I don't you can use lm right??

    1. Rafael

      Hey "SA"!
      Did you find how to estimate the 95% confidence interval for SVM in regression?
      Please, tell me.

  40. Jean-Pierre GERBIER

    Very interesting tutorial, thanks a lot
    If I am not too late .... I don't understand why whith exactly the same data set and same code snippets, I get a different result at step 3 : i get predictedY
    1 2 3 4 5 6 7
    7.667638 6.323641 5.578090 5.453718 6.066055 7.625997 10.367069
    8 9 10 11 12 13 14
    14.423447 19.718063 25.922760 32.519648 38.941073 44.724122 49.608321
    15 16 17 18 19 20
    53.536553 56.570848 58.777275 60.144779 60.578090 59.961279
    Thanks if you have time to help me

  41. Jean-Pierre GERBIER

    Sorry and sorry Alexandre ... I made a mistake ... absolutly sorry and many thanks again for your great tutorial

  42. Weiwei Liu

    thanks for you tutor,i has one question which has been a long time.I don't known if you are familar with caret packages,i want to know the difference between function train() in caret package and tune() in e1071 package.they are all the training function about the SVM,but why is I use the same data i get the difference result,such as if i use the tune()"
    obj<- tune(svm,y~x, data = df,
    ranges = list(gamma = 2^(-2:2), cost = 2^(2:9),epsilon = seq(0,1,0.1)),
    tunecontrol = tune.control(sampling = "cross",cross=10))"
    i get the best parameter about cost gamma and epsilon,but is I use the train()
    i get the C with the crossponding MSE and Rsquared.besides that i also get the gamma in the line of summary(df.svm) and the result is different with the result with tune(),(such as C)
    so I want to ask which should i choose to use.
    i hope i will be unserstood,in not please let me known. thanks

    1. Alexandre KOWALCZYK Post author

      Hello Weiwei, my guess is that the internal routine of tune use some kind of randomness to perform the cross validation. Also, if your data set is small, examples picked to be in one set can make the result change considerably. If you wish to have a more detailled answer, posting a question on stackoverflow might help.

  43. Harshith

    Hello, iam trying to predict an unknown future variable using this method and i get an error

    Error in model.frame.default (formula = $ wb1 new_col ~ y + x1 + x2 + x3 +:
    invalid type (NULL) for the variable 'wb1 $ new_col
    new col is the new column of values which i Need to predict and wb1 is the dataframe. I'am trying to build svm Regression model for that formula. Can you please help me?

    The code Looks like this
    svmModel<-svm(formula = wb1$new_col~ y+x1+x2+x3+x4+x5+x6+x7+x8+x9, data = training, kernel = "radial", cost = 32, gamma = 0.1,scale = FALSE)

    1. Alexandre KOWALCZYK Post author

      I looks to me that some value in new_col is NULL. Try replacing all NULL values by a number before running the code. If it works check your data and your loading procedure to find where the null value comes from.

      1. Harshith

        Thanks a lot for the reply.
        When i predict on the test set, the predicted values are that of Training data. how can i solve this? The code Looks like this.

        svmModel<-svm(formula = y ~x1+x2+x3+x4+x5+x6+x8, data = training, kernel = "radial", cost = 32,epsilon=0,C=0.1, gamma = 0.1,scale = FALSE)

        pred <- predict(svmModel, newdata = testing[,-42]).

        I get Training value answers for These. Can you please help me?

    1. zied

      Hello again, Just to clarify my previous message.
      I followed already you link
      and I tried
      But I got 21 coefficients. How can I use them to build the equation. I expected 13 (12 for each variable and the intercept).

      I have another question, I have the multivariable model, is it to possible to apply a non linear kernel?
      You showedin the tutorial how to get RMSE, is it possible to get R2?

  44. prasun

    Hi, I am using SVM for classification problem. Do i need to create dummy variables for categorical variables before passing to SVM or it will handle on it's own.

    1. Alexandre KOWALCZYK Post author

      Yes it is recommended that you create dummy variables to encode categorical variables.

  45. Maomao

    Hi Alex,
    Many thanks for your sharing. I got one question,
    there are existing some missing information in my multiple X variables, How could I impute or deal with these missing values?? Thanks.

    1. Alexandre KOWALCZYK Post author

      It is common to replace the missing value by the mean, but you can also replace it by the most frequent value or the median. There is some information about how to do it in Python on this on this page.

      1. Maomao

        Alex, thanks for your answer, but how could I get the 95% CI and P of AUC by SVM model???

  46. SAM

    Hi Alexandre,

    Thanks for the good example showing us how to use SVR with GS

    Could you further show us how to use the particle swarm optimization to optimize the parameters?

    1. Alexandre KOWALCZYK Post author

      Hello. Thank you for your comment. I never used particle swarn optimization, so I do not plan to write an article on the subject for the moment 😉

  47. Alvin

    Hi Alexandre,
    Thank you very much for your post. I would like to ask, how to perform multiple linear regression using support vector regression? Do you have any post on this or any other website that you know shows how this can be done using R? Thanks

    1. Alexandre KOWALCZYK Post author

      If you use a support vector machine you will be performing support vector regression, not multiple linear regression. You can give a vector as input to perform multivariate support vector regression if you wish.

      1. Jun

        Thank you for your post. I learned a lot from the tutorial. I just wonder how to perform multivariate svm regression, too. Even though you explained how to do that, I cannot understand how does it work in real programming. If you don't mind, would you please give me an example? I hope you show that method through the R code. Thank you.

  48. Sadiq Ahmad

    Currently we are working on a research paper in which we have conducted psychological experiment to get data-set. After that we have applied Multiple regression to find the relation among dependent variable and independent variables. our model was significant because Sig value was less the .05 and we found a good relation among dependent and independent variables.

    Now my idea is, to develop new algorithm which will have different mathematical equations and all these equations will based on that regression analyses. For example if regression analysis shows that humidity have strong relation with rain. then we will say that "Humidity is directly proportional to rain".

    So my question is, did we have formal mathematical techniques or any software tool which can provide different equations according with regression analysis.


    We will manually draw equations from that regression analyses.

  49. lj

    Hi Alexander,

    I have some doubts. The kernel functions, with the exception of linear, also have a cost parameter (C)? How can I perform grid search setting the cost parameter of the function and the kernel? You can separate them?

  50. noviyanti sagala

    Hi Alexander,

    How do we specify the training and testing dataset? I can't see you used different dataset.

  51. abhishek bansal

    great work!!!!!!how we find cofficient of determination in svr..command for that?????

  52. Mostafa

    Hi Alexandre. Many thanks for your valuable tutorial. You mentioned that SVR also works when X is multidimensional. Could you please let me know how I can load the multi dimensional X so that it runs with the following code:
    model <- svm(Y ~ X , data)
    predictedY <- predict(model, data)

    Many thanks

    Kind regards


  53. harsha

    Hi Alexandre,

    can i extend this regression tool for spatial modelling. presently i am using random forest for spatial modelling. I tried using with cubist but with not much success. As per my knowledge random Forest can easily handle both continuous and categorical variables at the same time, is it possible with SVM as well??

    1. Alexandre KOWALCZYK Post author

      Yes, I think so. You just need to one-hot-encode the categorical variables.

  54. Gaurav

    How to visualize both our models ?. You said that The first SVR model is in red, and the tuned SVR model is in blue on the graph. How to plot it ?

  55. Jack

    Hello Alex,

    Have you ever tried to use Amibroker for buidling and testing a SVM ?
    Anyone can do some research in Excel, however Amibroker is pretty fast while working on data arrays and its formula language is very much C - like. Visualising effectiveness of set parameters in 3D is also possible.

    Thanks for a brilliant tutorial !

    1. Alexandre KOWALCZYK Post author

      Well, that is very unfortunate. Keep in mind that SVR is not the solution to every regression problem. Moreover, you should try to use machine learning to predict things for which you believe there is an underlying (unknown) relation. Maybe the relation between currency pairs is too random and cannot be predicted, or there is no relation, or it keeps changing.

  56. Kaustubh

    I am using this method for forecasting. I have assumed a linear model with 6 variables. Thus it has 6 parameters. This method is forecasting the final output. I want to know the values of the 6 parameters. That means I want to find the model.

  57. Stelios

    Hello Alex,

    I was wondering if you could develop (using your toolboox) a Support Vector Regression model based on a Gaussian- RBF functions in which you need to choose C,γ and ε.

    Kind Regards,


    1. Alexandre KOWALCZYK Post author

      Hello Stelios. I do not see why it would not be possible. You need to look for the documentation of the R package to do so.

  58. Samantha

    Hello Alexandre,
    Thanks for this good tuto.
    I would like to know how can I reproduce the predictions with the output given by R?
    I have to do it with Excel (VBA) using the model parameters fitted by R.


    1. Alexandre KOWALCZYK Post author

      Hi Samantha,
      I have never used Excel to do SVR so I am sorry I cannot help you on this matter.

  59. Samantha

    Ok thanks.
    But maybe you know how R calculate the predictions with the parameters of the model?
    I tried with a linear kernel but I couldn't find the predictions given by predict.svm....

  60. Anastasiya

    Hi, Alexandre!

    I've performed SVM and tuned the parameters (gamma and cost) by doing grid search with 5 cross validation. But I also came across in an article that there is another option of finding this optimal combination by implementing some performance metrics. So what I would like to do is to find an optimal pair of gamma and cost which results in the highest cross-validation area under the receiver operating curve (AUC). Do you have any idear how it can be implemented?

    1. Alexandre KOWALCZYK Post author

      Hello Anastasiya,

      The most common approach for tuning SVM is indeed grid search like you did. If you wish to find the best value, you can try doing this with a smaller grid search around the value which seems the best. There are also other more complicated techniques, so if you really wish to find the optimal value it may be good to take a look at them. In the section 3.2 of their guide, the libsvm authors say that the other methods are not really "better" as they depend on some heuristics or approximations. It may be worth the shot to try looking for paper on the subject and try some other methods. If you do, I would be interested in knowing your results.


  61. Chinmaya

    Hi Alexandre,

    Thanks for such a nice write-up.

    I've seen examples where different powers of 10 are used for Cost; here you have used powers of 2. My question is whether it's significant to use powers of 2 or 10; or we can literally supply any list of values ?

    Is there any thumb rule regarding the range of Cost ?

    1. Alexandre KOWALCZYK Post author

      It does not really matter whether you use powers of 2 or powers of 10. The rule of thumb is that when you perform a "grid" search you make the grid smaller and smaller. For instance, you can try values between 10^0 and 10^5, and then you see that the best one is 10^3, so now, you can perform a smaller grid search, between 500 and 1500 with 100 increments. If the best one is 800, you can try another search between 650 and 950 with increments of 50. In the end, doing a search too precise is often not worth the time, that is why you can be completely fine with the first value of 10^3. But if you really want to find the best C (and have the time), then refining your grid like that is the way to go. The same logic applies if you have more than just one parameter to find, you need to find a set of parameters among all the possible combinations...). Note that the grid search method, is an empirical method and that there are other ways to find the best parameter.

  62. Sarah

    Thank you for your valuable information. I have few questions

    1- what machine learning algorithm can be applied for text classification such as tweets from Twitter with best accuracy and easiest implementation?

    2- what programming language can I use to get a web based system with ML algorithm embedded in it? I'm thinking currently of .NET but I don't know if I can use the classifier there

    3- using analytical tool such as AlchemyAPI which is based on deep learning algorithm can be enough for text classification or I need to apply algorithm such as SVM ?

    1. Alexandre KOWALCZYK Post author

      Hello Sarah,

      1. Basically, all machine learning algorithm which can deal with text data. There is no single algorithm better than all the others, you have to test by yourself on your specific case.
      2. In .NET you can use Accord.Net which is a pretty good framework, however, you can also create websites in Python and use scikit-learn, and in a lot of other languages too.
      3. Using this API might be a good idea if you are not very inclined towards programming. Once again it depends what you want to do, and what it can do.


  63. Pingback: Support Vector Machines - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

  64. Pingback: Support Vector Machines — Part 1 - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

  65. Pingback: Support Vector Machines — References - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

  66. JEW DAS

    I have more than one independent component (i.e X is more than 1 variable) but one dependent component (Y is one). Then how to do this multiple regression?

    1. Alexandre KOWALCZYK Post author


      Sorry but your question is too broad. As a first step I can suggest you to try find if there is some papers on the subject.

  67. anjana

    Your papers are really superb it helps me so much but one thing how could I download a that dataset which was used in r studio? let me get that dataset

    1. Alexandre KOWALCZYK Post author

      Hello. You can download the dataset and the code with this link. Thanks for pointing out that the link was broken.

      1. mitesh

        can you please help me find out how svm calculates probability when we use predict function on svm trained model. Please let me know the formula for the same to manually verify the probability.

        Although when i used predict on svm it produces the probability which gives more than one and less than zero as well in the output.

          1. mitesh

            Thanks Arun I am using following link as reference :

            Although I’ve a few queries :

            Is there a formula that will calculate predict function’s output on svm model on radial and polynomial kernels (the way we use in logistic regression Y = B0 + B1X 1+B2X 2+e and then put in logit func.)
            I am using tune.svm function and summary of model gives me best performance value; but this value changes with the same cost and gamma every time I run the the code……… How is this value of best performance is calculated
            Why tune.svm does give stable cost and gamma values? I am setting cost= seq(from=1,to=100,by=5) and gamma=(from=0.0005,to=0.05,by=0.005)…… Is there any other way to get the stable value
            Lastly with reference to the link of analytic vidhya in the beginning, I use isoreg technique, but with this output of fits.isoreg gives only 13 unique levels out of 772 observation. This give me error when i use cut function in R for binning the probability values. (Error in cut.default(temp9$prob1, breaks = quantile(temp9$prob1, probs = seq(0, : 'breaks' are not unique )Can you please give help me out

  68. Fergus

    Thanks for the very clear tutorial on SVM - very helpful introduction.

    Can you explain in Step 4 when you first perform the grid search why epsilon is returned as e1-04 even though it was set to cycle between 0 and 1 in increments of 0.1? Also, why did you ignore this value of epsilon and instead choose to zoom in between 0 and 0.2?

    (BTW, towards the end of Step 4 I think there is a typo when the text says “From the graph you can see that models with C between 200 and 300 and ϵϵ between 0.8 and 0.9 have less error.” Should the range of epsilon values here should be 0.08 and 0.09?)

    Thank you.

    1. Alexandre KOWALCZYK Post author

      Hello Fergus. I corrected the two problems. I think both were typos. Thanks a lot for your comment.

  69. Maliha Ashraf

    Excellent tutorial. Can you tell how to extract the function which is modeled by SVM here?


    How to calculate the Lagrangian multipliers in case of nonlinear regression in support vector regression in r software (package =e1071).
    For prediction problems.?

  71. Lokeshwar

    Hi Kowalczyk,

    I just read your article on SVM . Before reading the article I have no knowledge of SVM . But now I have much more idea on it than I thought of . Its very clear , useful and informative .
    A ' very very big thanks ' to you.

  72. Lou

    Hallo and thanks a lot for the tutorial:

    Is it possible to calculat the AICc to evaluate the svr model. I performed an eps-regretion
    I tried to calculat AICc using MuMin package but I receive an error message:

    Error in UseMethod("logLik") :
    no applicable method for 'logLik' applied to an object of class "c('double', 'numeric')"


  73. Olivia

    Thanks so much for this article, really helpful. Now I have a question:
    I have a data call "vitd", I divided it into "training" and "testing" with flag that created in SAS.
    I have successfully imported the "vitd" from SAS to R, and built the svm in "training" data.
    The problem is I can not get the predicted value in "testing" data.
    I can only get the predicted value in "training" data.
    Here is the code for getting the predicted value in "testing" data, but still gave the predicted value in "training" data.
    To make it more clear, I could only get the predicted value in "training" data no matter what I changed my code.
    Could you give me some information about this? Thanks so much in advance.

  74. rawua

    Have you an idea please ow can we appply such PSO technique to optimize SVM parameters and thank you

  75. Omar Alnyme

    Can I get intercept and coefficients for input variables like LM?
    Hope to hear from you soon.

  76. Omar Alnyme

    Please answer me :).
    How do we perform evaluation on test data. if test data has 4500 observation and the SVR model was built using 8760 observation.
    I need to get MAE on test data.

    Please assist me on this.


  77. Ro

    Hello, thanks a lot. One question, the first tuning parameters you use can be use for any problem? or do I have to change the grid? thanks again.

  78. tasnim

    Hello Alexandre

    Thanks for nice tutorial, I have one concern, how to calculate training error and validation error to justify over fitting in your model.

  79. bibhuti

    That's pretty good explanation. Nowadays people use only packages without knowing the math behind it.But my concerned is if you can code SVM regression or any algorithm detailed (without using any package as in R) that really attract many viewers to your blog and it really helpful.

  80. edward


    I have been studying and using the SVM. I work in a financial institution, and one of my responsabilities is to forecast the quarter budget for some variables, in this case non-performance-loans stock reduction. It happens that I trained a SVM model that fits my historic data very well.
    My question is : Ok you have this model, Then What ?! How could i use this trained model to predict future values ? ( I this case the next Quarter !)

    1. Alexandre KOWALCZYK Post author

      Your problem is not with SVM but with any machine learning model you could use. If your model fit your data and you make the assumption that it correctly represent and underlying unknown relation, then you input new data and use their result as prediction.

  81. Enrico

    Hi Alexandre,
    very nice explanation, thank you very much!
    Just a couple of questions (I am using a different dataset than yours):

    1. when i predict the new data (my test set in my case) using the command:
    tunedModelY <- predict(tunedModel, data) --in my case data = test_set
    I get this error:
    "Error in scale.default(newdata[, object$scaled, drop = FALSE], center = object$x.scale$"scaled:center", :
    length of 'center' must equal the number of columns of 'x'"
    However, if I use the normal svm formula including the new tuned parameters manually, the command works (see code below):
    tuned_regressor = svm(formula = Y ~ .,
    data = training_set,
    type = "eps-regression",
    kernel = "radial",
    cost = 256,
    epsilon = 0)
    y_pred_best = predict(tuned_regressor, newdata = test_set)
    Do you have any idea why?

    2. Is it ever possible for the not tuned model to return a smaller RMSE than the tuned model? That's what happened in my case.

    Thank you very much for taking the time to answer this.


    1. Alexandre KOWALCZYK Post author

      Hello Enrico,
      For 1. I can't give you help it really depends on your data.
      For 2. Yes it is possible, that means that tuning did not improve the model. You can try to increase the range of the trained hyper parameters and see if it find a better model.

Comments are closed.