Support Vector Regression with R

In this article I will show how to use R to perform a Support Vector Regression.
We will first do a simple linear regression, then move to the Support Vector Regression so that you can see how the two behave with the same data.

A simple data set

To begin with we will use this simple data set:

A simple data set in excel

I just put some data in excel. I prefer that over using an existing well-known data-set because the purpose of the article is not about the data, but more about the models we will use.

As you can see there seems to be some kind of relation between our two variables X and Y, and it look like we could fit a line which would pass near each point.

Let's do that in R !

Step 1: Simple linear regression in R

Here is the same data in CSV format, I saved it in a file regression.csv :

A simple data set in CSV

We can now use R to display the data and fit a line:

# Load the data from the csv file
dataDirectory <- "D:/" # put your own folder here
data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE)

# Plot the data
plot(data, pch=16)

# Create a linear regression model
model <- lm(Y ~ X, data)

# Add the fitted line
abline(model)

The code above displays the following graph:

The linear regression with our simple data set

Step 2: How good is our regression ?

In order to be able to compare the linear regression with the support vector regression we first need a way to measure how good it is.

To do that we will change a little bit our code to visualize each prediction made by our model

dataDirectory <- "D:/"
data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE)

plot(data, pch=16)
model <- lm(Y ~ X , data)

# make a prediction for each X
predictedY <- predict(model, data)

# display the predictions
points(data$X, predictedY, col = "blue", pch=4)

This produces the following graph:linear model prediction

For each data point X_i the model makes a prediction \hat{Y}_i displayed as a blue cross on the graph. The only difference with the previous graph is that the dots are not connected with each other.

In order to measure how good our model is we will compute how much errors it makes.

We can compare each Y_i value with the associated predicted value \hat{Y}_i and see how far away they are with a simple difference.

Note that the expression \hat{Y}_i - Y_i is the error, if we make a perfect prediction \hat{Y}_i will be equal to Y_i and the error will be zero.

If we do this for each data point and sum the error we will have the sum of the errors, and if we takes the mean we will get the Mean Squared Error (MSE)

MSE = \frac{1}{n}\sum\limits_{i=1}^n (\hat{Y}_i - Y_i)^2

A common way to measure error in machine learning is to use the Root Mean Squared Error (RMSE) so we will use it instead.

To compute the RMSE  we take the square root and we get the RMSE

RMSE = \sqrt{MSE}

Using R we can come with the following code to compute the RMSE

rmse <- function(error)
{
  sqrt(mean(error^2))
}

error <- model$residuals  # same as data$Y - predictedY
predictionRMSE <- rmse(error)   # 5.703778

We know now that the RMSE of our linear regression model is 5.70. Let's try to improve it with SVR !

Step 3: Support Vector Regression

In order to create a SVR model with R you will need the package e1071. So be sure to install it and to add the library(e1071) line at the start of your file.

Below is the code to make predictions with Support Vector Regression:

  model <- svm(Y ~ X , data)

  predictedY <- predict(model, data)

  points(data$X, predictedY, col = "red", pch=4)

As you can see it looks a lot like the linear regression code. Note that we called the svm function (not svr !)  it's because this function can also be used to make classifications with Support Vector Machine. The function will automatically choose SVM if it detects that the data is categorical (if the variable is a factor in R).

The code draws the following graph:

Support Vector Regression Predictions

This time the predictions is closer to the real values ! Let's compute the RMSE of our support vector regression model.

  # /!\ this time  svrModel$residuals  is not the same as data$Y - predictedY
  # so we compute the error like this
  error <- data$Y - predictedY
  svrPredictionRMSE <- rmse(error)  # 3.157061

As expected the RMSE is better, it is now 3.15  compared to 5.70 before.

But can we do better ?

Step 4: Tuning your support vector regression model

In order to improve the performance of the support vector regression we will need to select the best parameters for the model.

In our previous example, we performed an epsilon-regression, we did not set any value for epsilon ( \epsilon ), but it took a default value of 0.1.  There is also a cost parameter which we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or model selection.

The standard way of doing it is by doing a grid search. It means we will train a lot of models for the different couples of \epsilon and cost, and choose the best one.

  # perform a grid search
  tuneResult <- tune(svm, Y ~ X,  data = data,
                ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
  )
  print(tuneResult)
  # best performance: MSE = 8.371412, RMSE = 2.89 epsilon 1e-04 cost 4
  # Draw the tuning graph
  plot(tuneResult)

There is two important points in the code above:

  •  we use the tune method to train models with \epsilon = 0, 0.1, 0.2, ... ,1  and cost = 2^2, 2^3, 2^4, ... ,2^9 which means it will train  88 models (it can take a long time)
  • the tuneResult returns the MSE, don't forget to convert it to RMSE before comparing the value to our previous model.

The last line plot the result of the grid search:

support-vector regression performance 1

 

On this graph we can see that the darker the region is the better our model is (because the RMSE is closer to zero in darker regions).

This means we can try another grid search in a narrower range we will try with \epsilon values between 0 and 0.2. It does not look like the cost value is having an effect for the moment so we will keep it as it is to see if it changes.

  tuneResult <- tune(svm, Y ~ X,  data = data,
                     ranges = list(epsilon = seq(0,0.2,0.01), cost = 2^(2:9))
  ) 

  print(tuneResult)
  plot(tuneResult)

We trained different 168 models with this small piece of code.

As we zoomed-in inside the dark region we can see that there is several darker patch.
From the graph you can see that models with C between 200 and 300 and \epsilon between 0.8 and 0.9 have less error.

support vector regression performance 2
Hopefully for us, we don't have to select the best model with our eyes and R allows us to get it very easily and use it to make predictions.

  tunedModel <- tuneResult$best.model
  tunedModelY <- predict(tunedModel, data) 

  error <- data$Y - tunedModelY  

  # this value can be different on your computer
  # because the tune method  randomly shuffles the data
  tunedModelRMSE <- rmse(error)  # 2.219642  

We improved again the RMSE of our support vector regression model !

If we want we can visualize both our models. The first SVR model is in red, and the tuned SVR model is in blue on the graph below :

support vector regression comparaison
I hope you enjoyed this introduction on Support Vector Regression with R.
You can download the source code of this tutorial. Each step has its own file.

I am passionate about machine learning and Support Vector Machine. When I am not writing this blog, you can find me on Kaggle participating in some competition.

142 thoughts on “Support Vector Regression with R

  1. Jose

    Good stuff. How would this behave if for example, I wanted to predict some more X variables that are not in the training set? Is this useful in those instances? - In that case, how?

    Many thanks

    Reply
    1. Alexandre KOWALCZYK Post author

      You just need to use the predict method with two parameters: the trained model and your new data. This will give you the predicted values. This is useful because that is our original goal, we want to predict unseen data.

      Reply
      1. Joshua Dunn

        I have tried predicting unseen data but it always seems to underestimate the effect of it. For example, with temperature as my x-variable, if my SVR has not seen temperatures below zero degrees C (ie minus 2 degrees C) it effectively predicts them as it would zero. Would you be able to tell me what this is called or point me in a direction to solve this? Regards

        Reply
        1. Alexandre KOWALCZYK Post author

          For me it looks like you are overfitting your model with your training data. What you should try is to modify increase the weight of the regularization parameter (or use regularization if you were not)

          Reply
      2. Md. Moyazzem Hossain

        Dear

        Thank you very much. Actually I want to predict the future value of univariate time series by SVM. I have used the library e1071. I am able to predict the value over the study period but i want to forecast the future value.

        Reply
  2. Liz

    "we use the tune method to train models with ϵ=0,0.1,0.2,...,1 and cost = 22,23,24,...,29 which means it will train 88 models (it can take a long time)"

    Hello. Can you explain how the number 88 is calculated? Thank you.

    Reply
    1. Alexandre KOWALCZYK Post author

      There is 11 values of epsilon, and 8 values for the cost. We can associate each epsilon with the 8 cost values to create 8 couples. As there is 11 epsilons, there is 11\times8 couples.

      Reply
  3. Fakhrul Agustriwan

    Hello Mr. Kowalczyk.
    This tutorial is very helpful. Actually i am trying to forecast the future value of a time-series data by using SVR method, but i am quite confused how to perform it in R. Could you explain the steps on how to do it?

    Thank you 🙂

    Reply
    1. Alexandre KOWALCZYK Post author

      Thanks for your comment. Unfortunately I have never used SVR to forecast timeseries. However I found this question and one of the answer is pointing to this article. As suggested in the answer you will need to transform the classification problem to a regression one but this might be a good starting point for you.

      Reply
  4. loic

    As I understand, SVM implemented in R uses the Radial Basis Kernel by default. Therefore, there is another parameter (called gamma). How do you deal with this one? I think you should fit it also.
    One article mentionned to take the median of pairwise distances between the learning points. (After the scaling process)

    Reply
  5. loic

    Ok thanks for your reply.
    Using tune.svm I noticed that this function is very very long (around 3 seconds per configuration of parameters for 1000 observations of 7 variables).
    Surprisingly if you use svm(..., cross = 10) you can get the cross validation error for less than 0.5 second on the same data. So, I concluded that tune.svm was very badly coded, do you have any idea about this issue?
    Therefore I coded my own parameters tuning function using svm(...,cross=10).

    Also, I have found several papers that use a BFGS optimization algorithm (on a log2 scale) instead of grid search. I tried this, it turned out to be very efficient.

    Reply
    1. Alexandre KOWALCZYK Post author

      When you are using svm(..., cross = 10) you are performing a 10-fold cross-classification on the training data. This is not the same as doing a grid search. If the method tune.svm is so slow, it is not because it is poorly coded, but because it trains one svm model per combination of hyperparameter. So if you want to try gamma=0.1,0.01 and C=1,10,100 for instance it will train 6 different svm models ([0.1,1][0.1,10][0.1,100] [0.01,1][0.01,10][0.01,100]) In other word, it will try each couple in the cartesian product of the gamma set with the C set. If you try it for 10 values of gamma and 10 values of C, it will train 100 models. Which should indeed be much slower than training only 10 models.

      Reply
  6. loic

    That's not what I meant. I am aware of that of course. But actually, I made grid search "by hand" with a loop on 10x10 values of gamma and C using svm(...,cross = 10). Therefore I called 100 times svm and then keep the minimum cv error. The overall time it took was something like 10 times less than calling once tune.svm() on a 10x10 grid.
    That was what made me think this function was poorly coded or it might use sofisticated techniques I am not aware of.

    I've been trying to find the reason in vain.

    Actually, I am a bit doubtful about the results of svm(..., cross = 10), it seems that it does not compute the sv error on a stochastic way and the results are only one decimal digit precise which is weird comparing to tune.svm()

    Reply
    1. Alexandre KOWALCZYK Post author

      I can't really help you more without seeing your code. Maybe you can ask on stackoverflow or cross validated if you want to dig deeper and understand what happens in your particular case. Feel free to post the link here afterward and I'll take a look.

      Reply
    2. Renan

      Hi loic,
      I am very interested on your code by hand. Because I have a lot of data to train and it takes a very very long time. Could send me this part ?
      Thanks a lot
      Renan

      Reply
  7. HAP

    Thank you for this valuable post. If I have more than one X variable including some dummy variables can I fit the SVR for that case?

    Reply
  8. Ilgaz

    Hello,
    I read your blog posts. I am not very clear about how to forecast future values of time series using SVR. I looks to me that SVR fits a model using training set. But how about using predict() to predict future values ( n.ahead values ) in R? I couldnt find this feature so far..

    Sincerely, Ilgaz

    Reply
  9. Aseel

    First of all, thanks for the very helpful tutorial. I'm using R 3.2.1, but svm doesn't work correctly. On step 3, when I'm running this: model <- svm(Y ~ X , data), the error is :

    Error in predict(ret, xhold, decision.values = TRUE) :
    unused argument (decision.values = TRUE)

    Can you please help me?

    Thanks,

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. I don't have a lot of idea about this one. You might want to take a look at this answer and try the provided solution. Otherwise I would advise you to try the code on another machine to see if it works and if it does try to replicate the environment on your machine. Best regards.

      Reply
      1. Aseel

        I really appreciate your replay. I think the problem is there is same function name in two packages, for example predict() function in both ".GlobalEnv" and "package:stats" packages.

        I will try to figure out how to solve that.

        Thanks a lot,

        Reply
        1. Aseel

          I've found that I have function with the same name with predict. So, simply, I've copied my function to another name and remove predict function. That was making the confusion.

          Thanks again for helping me.

          Reply
  10. Pingback: Support Vector Regression in R | logicalerrors

  11. Espartaco

    Hi Alexandre. Thank you so much for all the information, I have a few questions.
    1. Can I use any kínd of variables in a SVM ? Continuous, categorical?
    2. If I am using a SVM to classify two groups, is there a way to get a probability of assignment to each group?
    3. How do you validate that the SVM is a good model?

    Reply
    1. Alexandre KOWALCZYK Post author

      1. Yes. For continuous data it is called SVR and SVM for categorical data
      2. Yes. Most framework provide a method "predict probabilities" to do so
      3. You use a score to measure the quality of your model, if you want to learn more I recommend you this book.

      Reply
  12. Danny

    Hi Alexandre,

    Thanks for such a comprehensive tutorial. Much appreciated. I am trying to SVR for predicting time series. As mentioned in your post, tune() shuffles the data. Is there any option or way to not to shuffle the data?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. You can specify a tunecontrol parameter to specify the behavior of the tune method. I think tune.control(sampling = "fix") might suit your need.

      Reply
      1. tahir

        thank you very much, but when I use lssvm I get this message

        (Using automatic sigma estimation (sigest) for RBF or laplace kernel
        Error in if (n != dim(y)[1]) stop("Labels y and data x dont match") :
        argument is of length zero)

        Reply
      2. Ely

        Hi Alexandre,

        Thank you for script on SVM. I have a few questions:

        1. What are some advantages of the LSSVM over SVM and ANN (artificial neural network)? Does the LSSVM have a tendency to overfit?

        2. Do you have an R script to perform an LSSVM? How to tune LSSVM and cross-validate it.

        Reply
  13. Ankit

    Hello,

    I am trying to use SVM for classification setting. All the variables/Attributes in the dataset are qualitative . Will SVM work in this case? or do I need to convert qualitative variables in quantitative before using it?

    Thanks
    Ankit

    Reply
  14. Subhasri

    Hi... Thank you for the superb article.. I've been reading about SVMs since a few days now. I have a doubt.... We normalize data and then give that data as input to SVM; can SVM be used for actual normalization? and how do we determine a kernel function?

    Reply
    1. Alexandre KOWALCZYK Post author

      The goal of SVM is not normalization, it is classification (or regression in the case of SVR). If your question was "How to select a kernel", this link might help you.

      Reply
  15. Minerva

    Hello Sir. Is it possible to calculate the AIC of the SVM regression model (the way you would for linear regression) ? If yes, how?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. I never used any package using Cuda or OpenCl with R. For Cuda I am using python. Maybe you can find some informations here.

      Reply
  16. Scott

    Hi Alexandre,
    First of all -- excellent tutorial! Thanks for putting the time into this.
    I'm wondering if you can help answer a question. I have a 125 independent observations of spectral 200 wavelength NIR data for a random set of samples, my X matrix. For each I have an independent scalar value of the concentration of a certain analyte, essentially my Y vector.

    My goal is to find the minimal set of important wavelengths that correlate best to Y,
    I've used PLS techniques combined with wavelength selection methods to find a subset of wavelengths. Even with the full set of 200 wavelengths and 20 latent vectors I get a nominal fit to my Y concentration data. But of course, when you fit a PLS model, you hope to find a few PLS factors that explain most of the variation in both predictors and responses. Now, the regression coefficient profile (loadings) gives a direct indication of which predictors are most useful for predicting the dependent variable.

    OK, on to my question. Once one finds a reasonable SVR regression fit, how does one extract which wavelengths/variables were weighted more highly than others? I like the use of SVR over PLS since, with the right kernel choice, it can incorporate potential nonlinearities in my data. But all the SVR packages I've looked at seem to lack the fit and weighting assessment plots that are found in PLS.

    Can you offer any thoughts/suggestions on this?
    Thanks!!
    -Scott

    Reply
    1. Alexandre KOWALCZYK Post author

      Hi Scott.

      Thanks for your comment, I am glad people find my articles helpful.

      Unfortunately there is no built-in way to retrieve the relative importance of variable for SVR. You can try to remove one predictor and see how it impacts the performance of the SVR (but you will have to do it for each predictors which can take a long time).

      I found this paper which might help you, it even has a video.

      I hope it helps you.

      Reply
  17. Sharda Tripathi

    Hi Alexandre,

    Your tutorial is very informative and easy to understand. Keep up the good work. I however have a more conceptual question from SVR, not related to SVR implementation in R. I have performed support vector regression on a time series. The residuals (i.e actual value-predicted value) shows strong auto correlation.The auto correlation plot of residuals has a damped sinusoidal nature. I have read in literature that fitted model is not good if such is the case. I have tried transformations like first difference  and log of time series,still the result is same.

    Does this auto correlation imply that my model is not good.If so, what can I do to get rid of it.

    Any suggestion would be of great help 🙂

    Thanks
    Sharda

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Sharda. Thanks for your comment. Indeed this autocorrelation implies that your model is not perfect. You should think about your problem and see if you can add another independant variable. Adding it might remove the autocorrelation. If it does not work, you can try other techniques like the Cochrane-Orcutt Method or the AR(1) Method as described in this chapter. Regards.

      Reply
  18. Amir

    Hi, Thank you for sharing. I am trying to perform a multi classification in credit rating allocation for a bank, my data set involves a set of financial ratios, I have tried several techniques to perform the classification , however, the models output just like flipping a coin !!!

    I wonder what i should do . any hint?. I can send you my data set if required and highly need help since its my master thesis as well.

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Amir. The first question I would ask myself is "Is what I am trying possible?" if so, "Is there somebody else who did that? With which results ?" "What is the state of the art?" That's why I often start looking for papers and I advise you to do so. If your models perform poorly, maybe your data is not clean. Did you preprocess it correctly? Did you use standardization or normalization? When in doubt, one approach might be to try your model on another simpler dataset on which such model usually perform good. If your model does not work well on it, it might be a programming or data preparation mistake. I hope this helps you.

      Reply
  19. sajal

    Thanks Alexandre for nice and useful post. Is it possible to obtain the model equations directly using SVR (preferably the best fitted one) to apply in another platform for calculation, for example in MS Excel based on the fitted models?

    If happened so, could you please explain a bit. Thanks in advance.

    Reply
  20. Rachel

    Hi Alexandre. How to apply SVM for univariate time series data to classify into 2 ccategories (either normal or outlier) ?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hi Rachel. Sorry but this is a pretty broad question which would need a specific article to answer. You can go on this site to post such questions, but don't forget to do your own research before. Best regards.

      Reply
  21. lichenyu

    I met the problem same as loic refers.
    In fact, that is caused by the default setting of the function tune.svm(), which will perform a 10 cross-validation.
    And for svm(), there is no cross-validation by default.
    So, when using tune(), it may take around 10 times as expected if not considering this issue.

    Reply
  22. Adhi

    Thank you for the tutorial.
    I am curious, how do the manual calculation in SVR until got the function and prediction value?

    Reply
  23. Shahriar SHAMILUULU

    Hi Alexandre,

    Thank you for tutorial and I have a question below.

    How we can get an equation for the model generated by svr, i.e., intercept, coefficient for x and R2, because when I try to see a summary there is nothing like that.

    Thank you.

    Reply
    1. Alexandre KOWALCZYK Post author

      Well you can't use SVM to know if data is linear or non linear. However if you achieve a very good score with a SVM and a linear kernel it is most likely that the data is linearly separable.

      Reply
  24. dmitrio

    Thanks for your tutorial.

    i'm forecasting time series data with 4 predictor (t-4,t-3,t-2,t-1) to predict 't' data.
    There is a rule of minimum training sample to build SVR model ?

    Thanks

    Reply
  25. bahtiyar

    Hi alexander,

    I try to implement SVR in my prediction time series. I use univariate data for prediction..
    format data that i use is

    x-1, x-2, x-3, x-n -> [x+1]

    x+1 : target value
    x-1,x-2,x-3,x-n : atribute value

    I use libsvm (e1071) in R to help calculate the prediction and i got high error value..
    Must I scale the data to [0,1] or [-1,+1] as a classification problem. . If there must be scale, I didn't find parameter to set it in R. In manual lib 'e1071', I just found parameter that the data scale or not.

    The parameter like this
    svm(....parameter...... , scale = FALSE)

    Any sugestion for this...?
    Thanks..

    -bahtiyar

    Reply
  26. William

    Thank you very much, Mr. KOWALCZYK! Thanks to your lm(y,x,data) function I was able to successfully plot a regression line! I had tried lm(y,x) before but I kept getting the error "Error in if (noInt) { : argument is of length zero". Thanks again!

    Reply
  27. Pietro

    Hi! Great tutorial! Please can you help me on these two things:

    1) I have to use SVR in order to predict future values of energy consumption. My input is the day of the week and the output is the correspondent energy consumption value. How can I encode this input information?

    2) I have one month energy data: How do I divide the whole set into Training and Set? How do I use the test set in order to validate the model?

    Thanks a lot!

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Pietro.
      1) For the day of the week you could use a number for each day (0 to 7) but this is not so good because there is an order between the number so instead you should one-hot-encode it. For this you can use the OneHotEncoder provided by sklearn.
      2) You can watch this video which explains everything 🙂

      Reply
  28. Alex

    hello,

    this is a very useful tutorial. Thanks. 😉

    I am wondering how you can extract out the coefficients of the SVM regression, just like the coefficients in the linear regression.

    Thanks in advance.

    Reply
    1. Alexandre KOWALCZYK Post author

      You can use the coefs property of the svm object which is returned after the training.

      Reply
  29. LT

    Hi,

    This is a great tutorial.

    Just a question. After the SVM is trained, can we do a hand calculation, like we can do in a simple model, to predict a value for a new variable set.

    Thanks.

    Reply
    1. Alexandre KOWALCZYK Post author

      You can use the trained model to make a new prediction. I don't get what you mean by "hand calculation". If you want to do it by hand on paper it would be tedious.

      Reply
  30. Vishesh Sahni

    Hello. It's a great tutorial so thanks for putting it here. I've my modelled my data and obtained a graph. I want to predict the next value. How do I do it? Thanks a lot.

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. Thank you for your comment. Unfortunately your question is way too broad. Maybe you can find a dedicated forum or a teacher to help you with this matter. Regards.

      Reply
  31. Chimezie

    Hi Alexandre,

    Can you please explain the Dispersion term in the SVM tuning process ! What does Dispersion stands for???

    Regards

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. I don't see what you mean. I don't see any dispersion term in the e1071. Could you clarify?

      Reply
  32. RC

    Thanks for this, very helpful. I know with SVM only cannot usually figure out what the features are that led to the good prediction model but I was wondering if there is a way to extract the features which are crucial in generating the predictedY with SVR? Essentially, I want to use SVR for feature selection. I am getting pretty descent error (0.31) for my model and I'd like to know which features have the highest weights enabling this? Any help would be greatly appreciated. Thank you.

    Reply
  33. Pingback: Get ready for R/Finance 2016 – Mubashir Qasim

  34. LJ

    Hi Alexander,

    Thanks for this, very helpful. I'm trying to test different parameterization SVM in prediction problems using epsilon-svr and nu-svr. In what range do I should test the parameters ε, ν (nu) and C?

    Reply
    1. Alexandre KOWALCZYK Post author

      You should test them using grid search. The particular value of the parameters differ greatly between problems so you just have to do a grid search first and then try to narrow the range until you find values which give you satisfaction.

      Reply
  35. ram

    Hi Alex,
    Really great article!

    I have few questions here, how that epsilon and cost is related here to the model.
    And where have you used the kernel part in the above calculation?

    and how does the kernel impact in processing of the model?

    Reply
    1. Alexandre KOWALCZYK Post author

      We did not specify the kernel parameter when we created the svm so the kernel is "radial" by default. (See documentation). You will understand how epsilon and C affect the model by reading this article. Best regards

      Reply
      1. Ruaa

        thanks alot for your great tutorial, it has helped me alot, but i am wondering i have a data set that's to predict electric load forecastig, my question should i normalize the data set first or automatically normalized by svm, my second question about the first point to tune the paremeters, how can i choose it

        Reply
  36. Emre

    Hi Alex,

    It is a solid tutorial. Thank you very much.

    I have a question and need to find the answer asap.

    I need perform v-svm which has additional parameter "v" . Can you help me modify the svm code to obtain v-svm code.

    And, I am curious about how I can see the whole code of SVM in R. Is there any way to step in the function SVM?

    Thank you very much.

    Reply
  37. SA

    Great Tutorial!!!!!!
    how to find 95% confidence interval for non linear regression? I don't you can use lm right??

    Reply
  38. Jean-Pierre GERBIER

    Very interesting tutorial, thanks a lot
    If I am not too late .... I don't understand why whith exactly the same data set and same code snippets, I get a different result at step 3 : i get predictedY
    1 2 3 4 5 6 7
    7.667638 6.323641 5.578090 5.453718 6.066055 7.625997 10.367069
    8 9 10 11 12 13 14
    14.423447 19.718063 25.922760 32.519648 38.941073 44.724122 49.608321
    15 16 17 18 19 20
    53.536553 56.570848 58.777275 60.144779 60.578090 59.961279
    Thanks if you have time to help me
    Jean-Pierre

    Reply
  39. Jean-Pierre GERBIER

    Sorry and sorry Alexandre ... I made a mistake ... absolutly sorry and many thanks again for your great tutorial
    Jean-Pierre

    Reply
  40. Weiwei Liu

    Hi,Alex:
    thanks for you tutor,i has one question which has been a long time.I don't known if you are familar with caret packages,i want to know the difference between function train() in caret package and tune() in e1071 package.they are all the training function about the SVM,but why is I use the same data i get the difference result,such as if i use the tune()"
    obj<- tune(svm,y~x, data = df,
    ranges = list(gamma = 2^(-2:2), cost = 2^(2:9),epsilon = seq(0,1,0.1)),
    tunecontrol = tune.control(sampling = "cross",cross=10))"
    i get the best parameter about cost gamma and epsilon,but is I use the train()
    "ctrl<-trainControl(method="repeatedcv",number=10,repeats=5,
    search="grid")
    df.svm<-train(y~x,data=df,method="svmRadial",trControl=ctrl)"
    i get the C with the crossponding MSE and Rsquared.besides that i also get the gamma in the line of summary(df.svm) and the result is different with the result with tune(),(such as C)
    so I want to ask which should i choose to use.
    i hope i will be unserstood,in not please let me known. thanks

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Weiwei, my guess is that the internal routine of tune use some kind of randomness to perform the cross validation. Also, if your data set is small, examples picked to be in one set can make the result change considerably. If you wish to have a more detailled answer, posting a question on stackoverflow might help.

      Reply
  41. Harshith

    Hello, iam trying to predict an unknown future variable using this method and i get an error

    Error in model.frame.default (formula = $ wb1 new_col ~ y + x1 + x2 + x3 +:
    invalid type (NULL) for the variable 'wb1 $ new_col
    new col is the new column of values which i Need to predict and wb1 is the dataframe. I'am trying to build svm Regression model for that formula. Can you please help me?

    The code Looks like this
    svmModel<-svm(formula = wb1$new_col~ y+x1+x2+x3+x4+x5+x6+x7+x8+x9, data = training, kernel = "radial", cost = 32, gamma = 0.1,scale = FALSE)

    Reply
    1. Alexandre KOWALCZYK Post author

      I looks to me that some value in new_col is NULL. Try replacing all NULL values by a number before running the code. If it works check your data and your loading procedure to find where the null value comes from.

      Reply
      1. Harshith

        Thanks a lot for the reply.
        When i predict on the test set, the predicted values are that of Training data. how can i solve this? The code Looks like this.

        svmModel<-svm(formula = y ~x1+x2+x3+x4+x5+x6+x8, data = training, kernel = "radial", cost = 32,epsilon=0,C=0.1, gamma = 0.1,scale = FALSE)

        pred <- predict(svmModel, newdata = testing[,-42]).

        I get Training value answers for These. Can you please help me?

        Reply
    1. zied

      Hello again, Just to clarify my previous message.
      I followed already you link
      https://stat.ethz.ch/pipermail/r-help/2009-August/399845.html
      and I tried
      model$coefs
      But I got 21 coefficients. How can I use them to build the equation. I expected 13 (12 for each variable and the intercept).

      I have another question, I have the multivariable model, is it to possible to apply a non linear kernel?
      You showedin the tutorial how to get RMSE, is it possible to get R2?
      Thanks,
      Zied

      Reply
  42. prasun

    Hi, I am using SVM for classification problem. Do i need to create dummy variables for categorical variables before passing to SVM or it will handle on it's own.

    Reply
    1. Alexandre KOWALCZYK Post author

      Yes it is recommended that you create dummy variables to encode categorical variables.

      Reply
  43. Maomao

    Hi Alex,
    Many thanks for your sharing. I got one question,
    there are existing some missing information in my multiple X variables, How could I impute or deal with these missing values?? Thanks.

    Reply
    1. Alexandre KOWALCZYK Post author

      It is common to replace the missing value by the mean, but you can also replace it by the most frequent value or the median. There is some information about how to do it in Python on this on this page.

      Reply
  44. SAM

    Hi Alexandre,

    Thanks for the good example showing us how to use SVR with GS

    Could you further show us how to use the particle swarm optimization to optimize the parameters?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello. Thank you for your comment. I never used particle swarn optimization, so I do not plan to write an article on the subject for the moment 😉

      Reply
  45. Alvin

    Hi Alexandre,
    Thank you very much for your post. I would like to ask, how to perform multiple linear regression using support vector regression? Do you have any post on this or any other website that you know shows how this can be done using R? Thanks

    Reply
    1. Alexandre KOWALCZYK Post author

      If you use a support vector machine you will be performing support vector regression, not multiple linear regression. You can give a vector as input to perform multivariate support vector regression if you wish.

      Reply
  46. Sadiq Ahmad

    Currently we are working on a research paper in which we have conducted psychological experiment to get data-set. After that we have applied Multiple regression to find the relation among dependent variable and independent variables. our model was significant because Sig value was less the .05 and we found a good relation among dependent and independent variables.

    Now my idea is, to develop new algorithm which will have different mathematical equations and all these equations will based on that regression analyses. For example if regression analysis shows that humidity have strong relation with rain. then we will say that "Humidity is directly proportional to rain".

    So my question is, did we have formal mathematical techniques or any software tool which can provide different equations according with regression analysis.

    Or

    We will manually draw equations from that regression analyses.

    Reply
  47. lj

    Hi Alexander,

    I have some doubts. The kernel functions, with the exception of linear, also have a cost parameter (C)? How can I perform grid search setting the cost parameter of the function and the kernel? You can separate them?

    Reply
  48. noviyanti sagala

    Hi Alexander,

    How do we specify the training and testing dataset? I can't see you used different dataset.

    Reply
  49. Mostafa

    Hi Alexandre. Many thanks for your valuable tutorial. You mentioned that SVR also works when X is multidimensional. Could you please let me know how I can load the multi dimensional X so that it runs with the following code:
    model <- svm(Y ~ X , data)
    predictedY <- predict(model, data)

    Many thanks

    Kind regards

    Mostafa

    Reply
  50. harsha

    Hi Alexandre,

    can i extend this regression tool for spatial modelling. presently i am using random forest for spatial modelling. I tried using with cubist but with not much success. As per my knowledge random Forest can easily handle both continuous and categorical variables at the same time, is it possible with SVM as well??

    Reply
  51. Gaurav

    How to visualize both our models ?. You said that The first SVR model is in red, and the tuned SVR model is in blue on the graph. How to plot it ?

    Reply
  52. Jack

    Hello Alex,

    Have you ever tried to use Amibroker for buidling and testing a SVM ?
    Anyone can do some research in Excel, however Amibroker is pretty fast while working on data arrays and its formula language is very much C - like. Visualising effectiveness of set parameters in 3D is also possible.

    Thanks for a brilliant tutorial !

    Reply
    1. Alexandre KOWALCZYK Post author

      Well, that is very unfortunate. Keep in mind that SVR is not the solution to every regression problem. Moreover, you should try to use machine learning to predict things for which you believe there is an underlying (unknown) relation. Maybe the relation between currency pairs is too random and cannot be predicted, or there is no relation, or it keeps changing.

      Reply
  53. Kaustubh

    I am using this method for forecasting. I have assumed a linear model with 6 variables. Thus it has 6 parameters. This method is forecasting the final output. I want to know the values of the 6 parameters. That means I want to find the model.

    Reply
  54. Stelios

    Hello Alex,

    I was wondering if you could develop (using your toolboox) a Support Vector Regression model based on a Gaussian- RBF functions in which you need to choose C,γ and ε.

    Kind Regards,

    Stelios

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Stelios. I do not see why it would not be possible. You need to look for the documentation of the R package to do so.

      Reply
  55. Samantha

    Hello Alexandre,
    Thanks for this good tuto.
    I would like to know how can I reproduce the predictions with the output given by R?
    I have to do it with Excel (VBA) using the model parameters fitted by R.

    Thanks.

    Reply
  56. Samantha

    Ok thanks.
    But maybe you know how R calculate the predictions with the parameters of the model?
    I tried with a linear kernel but I couldn't find the predictions given by predict.svm....

    Reply
  57. Anastasiya

    Hi, Alexandre!

    I've performed SVM and tuned the parameters (gamma and cost) by doing grid search with 5 cross validation. But I also came across in an article that there is another option of finding this optimal combination by implementing some performance metrics. So what I would like to do is to find an optimal pair of gamma and cost which results in the highest cross-validation area under the receiver operating curve (AUC). Do you have any idear how it can be implemented?

    Reply
    1. Alexandre KOWALCZYK Post author

      Hello Anastasiya,

      The most common approach for tuning SVM is indeed grid search like you did. If you wish to find the best value, you can try doing this with a smaller grid search around the value which seems the best. There are also other more complicated techniques, so if you really wish to find the optimal value it may be good to take a look at them. In the section 3.2 of their guide, the libsvm authors say that the other methods are not really "better" as they depend on some heuristics or approximations. It may be worth the shot to try looking for paper on the subject and try some other methods. If you do, I would be interested in knowing your results.

      Regards,

      Reply
  58. Chinmaya

    Hi Alexandre,

    Thanks for such a nice write-up.

    I've seen examples where different powers of 10 are used for Cost; here you have used powers of 2. My question is whether it's significant to use powers of 2 or 10; or we can literally supply any list of values ?

    Is there any thumb rule regarding the range of Cost ?

    Reply
    1. Alexandre KOWALCZYK Post author

      It does not really matter whether you use powers of 2 or powers of 10. The rule of thumb is that when you perform a "grid" search you make the grid smaller and smaller. For instance, you can try values between 10^0 and 10^5, and then you see that the best one is 10^3, so now, you can perform a smaller grid search, between 500 and 1500 with 100 increments. If the best one is 800, you can try another search between 650 and 950 with increments of 50. In the end, doing a search too precise is often not worth the time, that is why you can be completely fine with the first value of 10^3. But if you really want to find the best C (and have the time), then refining your grid like that is the way to go. The same logic applies if you have more than just one parameter to find, you need to find a set of parameters among all the possible combinations...). Note that the grid search method, is an empirical method and that there are other ways to find the best parameter.

      Reply

Leave a Reply