In this article I will show how to use R to perform a Support Vector Regression.

We will first do a simple linear regression, then move to the Support Vector Regression so that you can see how the two behave with the same data.

## A simple data set

To begin with we will use this simple data set:

I just put some data in excel. I prefer that over using an existing well-known data-set because the purpose of the article is not about the data, but more about the models we will use.

As you can see there seems to be some kind of relation between our two variables X and Y, and it look like we could fit a line which would pass near each point.

Let's do that in R !

## Step 1: Simple linear regression in R

Here is the same data in CSV format, I saved it in a file regression.csv :

We can now use R to display the data and fit a line:

# Load the data from the csv file dataDirectory <- "D:/" # put your own folder here data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE) # Plot the data plot(data, pch=16) # Create a linear regression model model <- lm(Y ~ X, data) # Add the fitted line abline(model)

The code above displays the following graph:

## Step 2: How good is our regression ?

In order to be able to compare the linear regression with the support vector regression we first need a way to measure how good it is.

To do that we will change a little bit our code to visualize each prediction made by our model

dataDirectory <- "D:/" data <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE) plot(data, pch=16) model <- lm(Y ~ X , data) # make a prediction for each X predictedY <- predict(model, data) # display the predictions points(data$X, predictedY, col = "blue", pch=4)

This produces the following graph:

For each data point the model makes a prediction displayed as a blue cross on the graph. The only difference with the previous graph is that the dots are not connected with each other.

In order to measure how good our model is we will compute how much errors it makes.

We can compare each value with the associated predicted value and see how far away they are with a simple difference.

Note that the expression is the error, if we make a perfect prediction will be equal to and the error will be zero.

If we do this for each data point and sum the error we will have the sum of the errors, and if we takes the mean we will get the Mean Squared Error (MSE)

A common way to measure error in machine learning is to use the Root Mean Squared Error (RMSE) so we will use it instead.

To compute the RMSE we take the square root and we get the RMSE

Using R we can come with the following code to compute the RMSE

rmse <- function(error) { sqrt(mean(error^2)) } error <- model$residuals # same as data$Y - predictedY predictionRMSE <- rmse(error) # 5.703778

We know now that the RMSE of our linear regression model is 5.70. Let's try to improve it with SVR !

## Step 3: Support Vector Regression

In order to create a SVR model with R you will need the package **e1071**. So be sure to install it and to add the **library(e1071)** line at the start of your file.

Below is the code to make predictions with Support Vector Regression:

model <- svm(Y ~ X , data) predictedY <- predict(model, data) points(data$X, predictedY, col = "red", pch=4)

As you can see it looks a lot like the linear regression code. Note that we called the **svm** function (not **svr** !) it's because this function can also be used to make classifications with Support Vector Machine. The function will automatically choose SVM if it detects that the data is categorical (if the variable is a factor in R).

The code draws the following graph:

This time the predictions is closer to the real values ! Let's compute the RMSE of our support vector regression model.

# /!\ this time svrModel$residuals is not the same as data$Y - predictedY # so we compute the error like this error <- data$Y - predictedY svrPredictionRMSE <- rmse(error) # 3.157061

As expected the RMSE is better, it is now 3.15 compared to 5.70 before.

But can we do better ?

## Step 4: Tuning your support vector regression model

In order to improve the performance of the support vector regression we will need to select the best parameters for the model.

In our previous example, we performed an epsilon-regression, we did not set any value for **epsilon **( ), but it took a default value of 0.1. There is also a **cost** parameter which we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or **model selection**.

The standard way of doing it is by doing a **grid search**. It means we will train a lot of models for the different couples of and cost, and choose the best one.

# perform a grid search tuneResult <- tune(svm, Y ~ X, data = data, ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9)) ) print(tuneResult) # Draw the tuning graph plot(tuneResult)

There is two important points in the code above:

- we use the tune method to train models with and cost = which means it will train 88 models (it can take a long time)
- the tuneResult returns the MSE, don't forget to convert it to RMSE before comparing the value to our previous model.

The last line plot the result of the grid search:

On this graph we can see that **the darker the region is the better our model is** (because the RMSE is closer to zero in darker regions).

This means we can try another grid search in a narrower range we will try with values between 0 and 0.2. It does not look like the cost value is having an effect for the moment so we will keep it as it is to see if it changes.

tuneResult <- tune(svm, Y ~ X, data = data, ranges = list(epsilon = seq(0,0.2,0.01), cost = 2^(2:9)) ) print(tuneResult) plot(tuneResult)

We trained different 168 models with this small piece of code.

As we zoomed-in inside the dark region we can see that there is several darker patch.

From the graph you can see that models with C between 200 and 300 and between 0.08 and 0.09 have less error.

Hopefully for us, we don't have to select the best model with our eyes and R allows us to get it very easily and use it to make predictions.

tunedModel <- tuneResult$best.model tunedModelY <- predict(tunedModel, data) error <- data$Y - tunedModelY # this value can be different on your computer # because the tune method randomly shuffles the data tunedModelRMSE <- rmse(error) # 2.219642

We improved again the RMSE of our support vector regression model !

If we want we can visualize both our models. The first SVR model is in red, and the tuned SVR model is in blue on the graph below :

I hope you enjoyed this introduction on Support Vector Regression with R.

You can get the source code of this tutorial. Each step has its own file.

If you want to learn more about Support Vector Machines, you can now read this article:

**An overview of Support Vector Machines**

JoseGood stuff. How would this behave if for example, I wanted to predict some more X variables that are not in the training set? Is this useful in those instances? - In that case, how?

Many thanks

Alexandre KOWALCZYKPost authorYou just need to use the predict method with two parameters: the trained model and your new data. This will give you the predicted values. This is useful because that is our original goal, we want to predict unseen data.

Joshua DunnI have tried predicting unseen data but it always seems to underestimate the effect of it. For example, with temperature as my x-variable, if my SVR has not seen temperatures below zero degrees C (ie minus 2 degrees C) it effectively predicts them as it would zero. Would you be able to tell me what this is called or point me in a direction to solve this? Regards

Alexandre KOWALCZYKPost authorFor me it looks like you are overfitting your model with your training data. What you should try is to modify increase the weight of the regularization parameter (or use regularization if you were not)

Md. Moyazzem HossainDear

Thank you very much. Actually I want to predict the future value of univariate time series by SVM. I have used the library e1071. I am able to predict the value over the study period but i want to forecast the future value.

linaWhat is the software on which you do the programming??? tks

Alexandre KOWALCZYKPost authorIn this case this is RStudio which can be downloaded here

Liz"we use the tune method to train models with ϵ=0,0.1,0.2,...,1 and cost = 22,23,24,...,29 which means it will train 88 models (it can take a long time)"

Hello. Can you explain how the number 88 is calculated? Thank you.

Alexandre KOWALCZYKPost authorThere is 11 values of epsilon, and 8 values for the cost. We can associate each epsilon with the 8 cost values to create 8 couples. As there is 11 epsilons, there is couples.

Fakhrul AgustriwanHello Mr. Kowalczyk.

This tutorial is very helpful. Actually i am trying to forecast the future value of a time-series data by using SVR method, but i am quite confused how to perform it in R. Could you explain the steps on how to do it?

Thank you 🙂

Alexandre KOWALCZYKPost authorThanks for your comment. Unfortunately I have never used SVR to forecast timeseries. However I found this question and one of the answer is pointing to this article. As suggested in the answer you will need to transform the classification problem to a regression one but this might be a good starting point for you.

loicAs I understand, SVM implemented in R uses the Radial Basis Kernel by default. Therefore, there is another parameter (called gamma). How do you deal with this one? I think you should fit it also.

One article mentionned to take the median of pairwise distances between the learning points. (After the scaling process)

Alexandre KOWALCZYKPost authorYou just need to add the gamma parameter in the tune function. There is an example in the e1071 package documentation :

`obj < - tune.svm(Species~., data = iris, gamma = 2^(-1:1), cost = 2^(2:4))`

loicOk thanks for your reply.

Using tune.svm I noticed that this function is very very long (around 3 seconds per configuration of parameters for 1000 observations of 7 variables).

Surprisingly if you use svm(..., cross = 10) you can get the cross validation error for less than 0.5 second on the same data. So, I concluded that tune.svm was very badly coded, do you have any idea about this issue?

Therefore I coded my own parameters tuning function using svm(...,cross=10).

Also, I have found several papers that use a BFGS optimization algorithm (on a log2 scale) instead of grid search. I tried this, it turned out to be very efficient.

Alexandre KOWALCZYKPost authorWhen you are using svm(..., cross = 10) you are performing a 10-fold cross-classification on the training data. This is not the same as doing a

grid search. If the method tune.svm is so slow, it is not because it is poorly coded, but because it trains one svm model per combination of hyperparameter. So if you want to try gamma=0.1,0.01 and C=1,10,100 for instance it will train 6 different svm models ([0.1,1][0.1,10][0.1,100] [0.01,1][0.01,10][0.01,100]) In other word, it will try each couple in the cartesian product of the gamma set with the C set. If you try it for 10 values of gamma and 10 values of C, it will train 100 models. Which should indeed be much slower than training only 10 models.loicThat's not what I meant. I am aware of that of course. But actually, I made grid search "by hand" with a loop on 10x10 values of gamma and C using svm(...,cross = 10). Therefore I called 100 times svm and then keep the minimum cv error. The overall time it took was something like 10 times less than calling once tune.svm() on a 10x10 grid.

That was what made me think this function was poorly coded or it might use sofisticated techniques I am not aware of.

I've been trying to find the reason in vain.

Actually, I am a bit doubtful about the results of svm(..., cross = 10), it seems that it does not compute the sv error on a stochastic way and the results are only one decimal digit precise which is weird comparing to tune.svm()

Alexandre KOWALCZYKPost authorI can't really help you more without seeing your code. Maybe you can ask on stackoverflow or cross validated if you want to dig deeper and understand what happens in your particular case. Feel free to post the link here afterward and I'll take a look.

RenanHi loic,

I am very interested on your code by hand. Because I have a lot of data to train and it takes a very very long time. Could send me this part ?

Thanks a lot

Renan

SpartanGreat tutorial for svm, clearly defining its function as a classifier or a regressor, thanks Alexandre.

HAPThank you for this valuable post. If I have more than one X variable including some dummy variables can I fit the SVR for that case?

Alexandre KOWALCZYKPost authorYes you can. SVR also works when X is multidimensional.

HAPThanks. I'l try.

IlgazHello,

I read your blog posts. I am not very clear about how to forecast future values of time series using SVR. I looks to me that SVR fits a model using training set. But how about using predict() to predict future values ( n.ahead values ) in R? I couldnt find this feature so far..

Sincerely, Ilgaz

Alexandre KOWALCZYKPost authorI think you should take a look at the kernlab package as suggested in this stackexchange answer.

AseelFirst of all, thanks for the very helpful tutorial. I'm using R 3.2.1, but svm doesn't work correctly. On step 3, when I'm running this: model <- svm(Y ~ X , data), the error is :

Error in predict(ret, xhold, decision.values = TRUE) :

unused argument (decision.values = TRUE)

Can you please help me?

Thanks,

Alexandre KOWALCZYKPost authorHello. I don't have a lot of idea about this one. You might want to take a look at this answer and try the provided solution. Otherwise I would advise you to try the code on another machine to see if it works and if it does try to replicate the environment on your machine. Best regards.

AseelI really appreciate your replay. I think the problem is there is same function name in two packages, for example predict() function in both ".GlobalEnv" and "package:stats" packages.

I will try to figure out how to solve that.

Thanks a lot,

AseelI've found that I have function with the same name with predict. So, simply, I've copied my function to another name and remove predict function. That was making the confusion.

Thanks again for helping me.

Pingback: Support Vector Regression in R | logicalerrors

EspartacoHi Alexandre. Thank you so much for all the information, I have a few questions.

1. Can I use any kínd of variables in a SVM ? Continuous, categorical?

2. If I am using a SVM to classify two groups, is there a way to get a probability of assignment to each group?

3. How do you validate that the SVM is a good model?

Alexandre KOWALCZYKPost author1. Yes. For continuous data it is called SVR and SVM for categorical data

2. Yes. Most framework provide a method "predict probabilities" to do so

3. You use a score to measure the quality of your model, if you want to learn more I recommend you this book.

DannyHi Alexandre,

Thanks for such a comprehensive tutorial. Much appreciated. I am trying to SVR for predicting time series. As mentioned in your post, tune() shuffles the data. Is there any option or way to not to shuffle the data?

Alexandre KOWALCZYKPost authorHello. You can specify a tunecontrol parameter to specify the behavior of the tune method. I think

tune.control(sampling = "fix")might suit your need.DannyYes. Worked perfectly! 🙂

tahirhi Alexandre I am asking for using ls-svm in regression are there any R packages support it

Alexandre KOWALCZYKPost authorHello, you can use the function lssvm available in the kernlab package.

tahirthank you very much, but when I use lssvm I get this message

(Using automatic sigma estimation (sigest) for RBF or laplace kernel

Error in if (n != dim(y)[1]) stop("Labels y and data x dont match") :

argument is of length zero)

Alexandre KOWALCZYKPost authorHello tahir.

Sorry I can't help you more without a reproductible example.

The best place for you to ask your question is http://www.stackoverflow.com I hope you will find plenty of help there.

Regards,

Alexandre

ElyHi Alexandre,

Thank you for script on SVM. I have a few questions:

1. What are some advantages of the LSSVM over SVM and ANN (artificial neural network)? Does the LSSVM have a tendency to overfit?

2. Do you have an R script to perform an LSSVM? How to tune LSSVM and cross-validate it.

Alexandre KOWALCZYKPost authorHello, I never tried LSSVM before. You can check this paper comparing LSSVM and SVM. Best regards,

AnkitHello,

I am trying to use SVM for classification setting. All the variables/Attributes in the dataset are qualitative . Will SVM work in this case? or do I need to convert qualitative variables in quantitative before using it?

Thanks

Ankit

Alexandre KOWALCZYKPost authorI really depends on the level of measurement of your dependant variable. If your attributes are ordinal you can treat them as number. Howerver if it is not the case you can do some aditionnal transformations as seen on this answer.

SubhasriHi... Thank you for the superb article.. I've been reading about SVMs since a few days now. I have a doubt.... We normalize data and then give that data as input to SVM; can SVM be used for actual normalization? and how do we determine a kernel function?

Alexandre KOWALCZYKPost authorThe goal of SVM is not normalization, it is classification (or regression in the case of SVR). If your question was "How to select a kernel", this link might help you.

MinervaHello Sir. Is it possible to calculate the AIC of the SVM regression model (the way you would for linear regression) ? If yes, how?

Alexandre KOWALCZYKPost authorI think it might be possible. Check out this paper.

AlHi Alexandre, I would like to ask you for any R package using Cuda or OpenCl for SVR estimastion?

Alexandre KOWALCZYKPost authorHello. I never used any package using Cuda or OpenCl with R. For Cuda I am using python. Maybe you can find some informations here.

HitechCould you give me a source code for finding optimal hyperplane?

ScottHi Alexandre,

First of all -- excellent tutorial! Thanks for putting the time into this.

I'm wondering if you can help answer a question. I have a 125 independent observations of spectral 200 wavelength NIR data for a random set of samples, my X matrix. For each I have an independent scalar value of the concentration of a certain analyte, essentially my Y vector.

My goal is to find the minimal set of important wavelengths that correlate best to Y,

I've used PLS techniques combined with wavelength selection methods to find a subset of wavelengths. Even with the full set of 200 wavelengths and 20 latent vectors I get a nominal fit to my Y concentration data. But of course, when you fit a PLS model, you hope to find a few PLS factors that explain most of the variation in both predictors and responses. Now, the regression coefficient profile (loadings) gives a direct indication of which predictors are most useful for predicting the dependent variable.

OK, on to my question. Once one finds a reasonable SVR regression fit, how does one extract which wavelengths/variables were weighted more highly than others? I like the use of SVR over PLS since, with the right kernel choice, it can incorporate potential nonlinearities in my data. But all the SVR packages I've looked at seem to lack the fit and weighting assessment plots that are found in PLS.

Can you offer any thoughts/suggestions on this?

Thanks!!

-Scott

Alexandre KOWALCZYKPost authorHi Scott.

Thanks for your comment, I am glad people find my articles helpful.

Unfortunately there is no built-in way to retrieve the relative importance of variable for SVR. You can try to remove one predictor and see how it impacts the performance of the SVR (but you will have to do it for each predictors which can take a long time).

I found this paper which might help you, it even has a video.

I hope it helps you.

GhadaAmazing!!

Great people can explain complicated topics in a simple way 🙂

Sharda TripathiHi Alexandre,

Your tutorial is very informative and easy to understand. Keep up the good work. I however have a more conceptual question from SVR, not related to SVR implementation in R. I have performed support vector regression on a time series. The residuals (i.e actual value-predicted value) shows strong auto correlation.The auto correlation plot of residuals has a damped sinusoidal nature. I have read in literature that fitted model is not good if such is the case. I have tried transformations like first difference and log of time series,still the result is same.

Does this auto correlation imply that my model is not good.If so, what can I do to get rid of it.

Any suggestion would be of great help 🙂

Thanks

Sharda

Alexandre KOWALCZYKPost authorHello Sharda. Thanks for your comment. Indeed this autocorrelation implies that your model is not perfect. You should think about your problem and see if you can add another independant variable. Adding it might remove the autocorrelation. If it does not work, you can try other techniques like the Cochrane-Orcutt Method or the AR(1) Method as described in this chapter. Regards.

Sharda TripathiThank you so much for your response. I am trying as suggested by you .

AmirHi, Thank you for sharing. I am trying to perform a multi classification in credit rating allocation for a bank, my data set involves a set of financial ratios, I have tried several techniques to perform the classification , however, the models output just like flipping a coin !!!

I wonder what i should do . any hint?. I can send you my data set if required and highly need help since its my master thesis as well.

Alexandre KOWALCZYKPost authorHello Amir. The first question I would ask myself is "Is what I am trying possible?" if so, "Is there somebody else who did that? With which results ?" "What is the state of the art?" That's why I often start looking for papers and I advise you to do so. If your models perform poorly, maybe your data is not clean. Did you preprocess it correctly? Did you use standardization or normalization? When in doubt, one approach might be to try your model on another simpler dataset on which such model usually perform good. If your model does not work well on it, it might be a programming or data preparation mistake. I hope this helps you.

sajalThanks Alexandre for nice and useful post. Is it possible to obtain the model equations directly using SVR (preferably the best fitted one) to apply in another platform for calculation, for example in MS Excel based on the fitted models?

If happened so, could you please explain a bit. Thanks in advance.

Alexandre KOWALCZYKPost authorHello sajal. You can recover w and b from the support vectors as explained in this forum post. I hope this help. 🙂

RachelHi Alexandre. How to apply SVM for univariate time series data to classify into 2 ccategories (either normal or outlier) ?

Alexandre KOWALCZYKPost authorHi Rachel. Sorry but this is a pretty broad question which would need a specific article to answer. You can go on this site to post such questions, but don't forget to do your own research before. Best regards.

sajalmany thanks for your prompt response and valuable suggestion. Happy New Year. Wish you best of luck.

Raj AnandExcellent tutorial . Thanks..

lichenyuI met the problem same as loic refers.

In fact, that is caused by the default setting of the function tune.svm(), which will perform a 10 cross-validation.

And for svm(), there is no cross-validation by default.

So, when using tune(), it may take around 10 times as expected if not considering this issue.

AdhiThank you for the tutorial.

I am curious, how do the manual calculation in SVR until got the function and prediction value?

Alexandre KOWALCZYKPost authorYou can check out this paper.

Shahriar SHAMILUULUHi Alexandre,

Thank you for tutorial and I have a question below.

How we can get an equation for the model generated by svr, i.e., intercept, coefficient for x and R2, because when I try to see a summary there is nothing like that.

Thank you.

Alexandre KOWALCZYKPost authorHello. I think this answer might help you. Best regards.

LuluHello! I have the same problem, but Alexandre's answer didn't help. How did you solve it?

sumanagood tutorial

can you please tell me how svm is used to tell whether the dataset is linear or nonlinear

Alexandre KOWALCZYKPost authorWell you can't use SVM to know if data is linear or non linear. However if you achieve a very good score with a SVM and a linear kernel it is most likely that the data is linearly separable.

dmitrioThanks for your tutorial.

i'm forecasting time series data with 4 predictor (t-4,t-3,t-2,t-1) to predict 't' data.

There is a rule of minimum training sample to build SVR model ?

Thanks

Alexandre KOWALCZYKPost authorNot that I am aware of.

bahtiyarHi alexander,

I try to implement SVR in my prediction time series. I use univariate data for prediction..

format data that i use is

x-1, x-2, x-3, x-n -> [x+1]

x+1 : target value

x-1,x-2,x-3,x-n : atribute value

I use libsvm (e1071) in R to help calculate the prediction and i got high error value..

Must I scale the data to [0,1] or [-1,+1] as a classification problem. . If there must be scale, I didn't find parameter to set it in R. In manual lib 'e1071', I just found parameter that the data scale or not.

The parameter like this

svm(....parameter...... , scale = FALSE)

Any sugestion for this...?

Thanks..

-bahtiyar

Alexandre KOWALCZYKPost authorhttps://cran.r-project.org/web/packages/e1071/e1071.pdf we can see that scale is "A logical vector indicating the variables to be scaled" so you can specify which variable to scale. Note that per default, data are scaled internally (both x and y variables) to zero mean and unit variance. Maybe this paper can help you get better results.

WilliamThank you very much, Mr. KOWALCZYK! Thanks to your lm(y,x,data) function I was able to successfully plot a regression line! I had tried lm(y,x) before but I kept getting the error "Error in if (noInt) { : argument is of length zero". Thanks again!

Alexandre KOWALCZYKPost authorYou're welcome 😉

bahtiyarthank you for help...:) let me try suggestion from you

PietroHi! Great tutorial! Please can you help me on these two things:

1) I have to use SVR in order to predict future values of energy consumption. My input is the day of the week and the output is the correspondent energy consumption value. How can I encode this input information?

2) I have one month energy data: How do I divide the whole set into Training and Set? How do I use the test set in order to validate the model?

Thanks a lot!

Alexandre KOWALCZYKPost authorHello Pietro.

1) For the day of the week you could use a number for each day (0 to 7) but this is not so good because there is an order between the number so instead you should one-hot-encode it. For this you can use the OneHotEncoder provided by sklearn.

2) You can watch this video which explains everything 🙂

akashHi! I'am also trying to do the same i.e predict future values but i am not able to predict unseen data.Can you share your code?My mail is joshi.akash123@gmail.com

Alexhello,

this is a very useful tutorial. Thanks. 😉

I am wondering how you can extract out the coefficients of the SVM regression, just like the coefficients in the linear regression.

Thanks in advance.

Alexandre KOWALCZYKPost authorYou can use the coefs property of the svm object which is returned after the training.

LTHi,

This is a great tutorial.

Just a question. After the SVM is trained, can we do a hand calculation, like we can do in a simple model, to predict a value for a new variable set.

Thanks.

Alexandre KOWALCZYKPost authorYou can use the trained model to make a new prediction. I don't get what you mean by "hand calculation". If you want to do it by hand on paper it would be tedious.

Vishesh SahniHello. It's a great tutorial so thanks for putting it here. I've my modelled my data and obtained a graph. I want to predict the next value. How do I do it? Thanks a lot.

Alexandre KOWALCZYKPost authorHello. Thank you for your comment. Unfortunately your question is way too broad. Maybe you can find a dedicated forum or a teacher to help you with this matter. Regards.

ChimezieHi Alexandre,

Can you please explain the Dispersion term in the SVM tuning process ! What does Dispersion stands for???

Regards

Alexandre KOWALCZYKPost authorHello. I don't see what you mean. I don't see any dispersion term in the e1071. Could you clarify?

Vedang LokegaonkarHello,

Can we use SVM when the predictors are categorical?

Alexandre KOWALCZYKPost authorYes you can. You might want to preprocess your data though.

RCThanks for this, very helpful. I know with SVM only cannot usually figure out what the features are that led to the good prediction model but I was wondering if there is a way to extract the features which are crucial in generating the predictedY with SVR? Essentially, I want to use SVR for feature selection. I am getting pretty descent error (0.31) for my model and I'd like to know which features have the highest weights enabling this? Any help would be greatly appreciated. Thank you.

Alexandre KOWALCZYKPost authorI have never done this with SVR. Maybe this paper can help you 🙂

Pingback: Get ready for R/Finance 2016 – Mubashir Qasim

LJHi Alexander,

Thanks for this, very helpful. I'm trying to test different parameterization SVM in prediction problems using epsilon-svr and nu-svr. In what range do I should test the parameters ε, ν (nu) and C?

Alexandre KOWALCZYKPost authorYou should test them using grid search. The particular value of the parameters differ greatly between problems so you just have to do a grid search first and then try to narrow the range until you find values which give you satisfaction.

ramHi Alex,

Really great article!

I have few questions here, how that epsilon and cost is related here to the model.

And where have you used the kernel part in the above calculation?

and how does the kernel impact in processing of the model?

Alexandre KOWALCZYKPost authorWe did not specify the kernel parameter when we created the svm so the kernel is "radial" by default. (See documentation). You will understand how epsilon and C affect the model by reading this article. Best regards

Ruaathanks alot for your great tutorial, it has helped me alot, but i am wondering i have a data set that's to predict electric load forecastig, my question should i normalize the data set first or automatically normalized by svm, my second question about the first point to tune the paremeters, how can i choose it

EmreHi Alex,

It is a solid tutorial. Thank you very much.

I have a question and need to find the answer asap.

I need perform v-svm which has additional parameter "v" . Can you help me modify the svm code to obtain v-svm code.

And, I am curious about how I can see the whole code of SVM in R. Is there any way to step in the function SVM?

Thank you very much.

Weiwei Liufunction tune() and tune.control() in e1071 package may be useful for you

if you want to learn how to perform the svm ,the package vignette should be read carefully,in https://cran.r-project.org/web/packages/e1071/e1071.pdf

SAGreat Tutorial!!!!!!

how to find 95% confidence interval for non linear regression? I don't you can use lm right??

Jean-Pierre GERBIERVery interesting tutorial, thanks a lot

If I am not too late .... I don't understand why whith exactly the same data set and same code snippets, I get a different result at step 3 : i get predictedY

1 2 3 4 5 6 7

7.667638 6.323641 5.578090 5.453718 6.066055 7.625997 10.367069

8 9 10 11 12 13 14

14.423447 19.718063 25.922760 32.519648 38.941073 44.724122 49.608321

15 16 17 18 19 20

53.536553 56.570848 58.777275 60.144779 60.578090 59.961279

Thanks if you have time to help me

Jean-Pierre

Jean-Pierre GERBIERSorry and sorry Alexandre ... I made a mistake ... absolutly sorry and many thanks again for your great tutorial

Jean-Pierre

Weiwei LiuHi,Alex:

thanks for you tutor,i has one question which has been a long time.I don't known if you are familar with caret packages,i want to know the difference between function train() in caret package and tune() in e1071 package.they are all the training function about the SVM,but why is I use the same data i get the difference result,such as if i use the tune()"

obj<- tune(svm,y~x, data = df,

ranges = list(gamma = 2^(-2:2), cost = 2^(2:9),epsilon = seq(0,1,0.1)),

tunecontrol = tune.control(sampling = "cross",cross=10))"

i get the best parameter about cost gamma and epsilon,but is I use the train()

"ctrl<-trainControl(method="repeatedcv",number=10,repeats=5,

search="grid")

df.svm<-train(y~x,data=df,method="svmRadial",trControl=ctrl)"

i get the C with the crossponding MSE and Rsquared.besides that i also get the gamma in the line of summary(df.svm) and the result is different with the result with tune(),(such as C)

so I want to ask which should i choose to use.

i hope i will be unserstood,in not please let me known. thanks

Alexandre KOWALCZYKPost authorHello Weiwei, my guess is that the internal routine of tune use some kind of randomness to perform the cross validation. Also, if your data set is small, examples picked to be in one set can make the result change considerably. If you wish to have a more detailled answer, posting a question on stackoverflow might help.

HarshithHello, iam trying to predict an unknown future variable using this method and i get an error

Error in model.frame.default (formula = $ wb1 new_col ~ y + x1 + x2 + x3 +:

invalid type (NULL) for the variable 'wb1 $ new_col

new col is the new column of values which i Need to predict and wb1 is the dataframe. I'am trying to build svm Regression model for that formula. Can you please help me?

The code Looks like this

svmModel<-svm(formula = wb1$new_col~ y+x1+x2+x3+x4+x5+x6+x7+x8+x9, data = training, kernel = "radial", cost = 32, gamma = 0.1,scale = FALSE)

Alexandre KOWALCZYKPost authorI looks to me that some value in new_col is NULL. Try replacing all NULL values by a number before running the code. If it works check your data and your loading procedure to find where the null value comes from.

HarshithThanks a lot for the reply.

When i predict on the test set, the predicted values are that of Training data. how can i solve this? The code Looks like this.

svmModel<-svm(formula = y ~x1+x2+x3+x4+x5+x6+x8, data = training, kernel = "radial", cost = 32,epsilon=0,C=0.1, gamma = 0.1,scale = FALSE)

pred <- predict(svmModel, newdata = testing[,-42]).

I get Training value answers for These. Can you please help me?

ziedHello,

How can I extract the equation of the regression model after running SVM in R?

Is it always linear or it can be non linear?

I would like to find something like

Y=a.X1+b.X2+c.X3+d.X4+e

Thank you,

Zied

How can I extract the equation of the regression model after running SVM in R? - ResearchGate. Available from: https://www.researchgate.net/post/How_can_I_extract_the_equation_of_the_regression_model_after_running_SVM_in_R [accessed Jun 16, 2016].

ziedHello again, Just to clarify my previous message.

I followed already you link

https://stat.ethz.ch/pipermail/r-help/2009-August/399845.html

and I tried

model$coefs

But I got 21 coefficients. How can I use them to build the equation. I expected 13 (12 for each variable and the intercept).

I have another question, I have the multivariable model, is it to possible to apply a non linear kernel?

You showedin the tutorial how to get RMSE, is it possible to get R2?

Thanks,

Zied

prasunHi, I am using SVM for classification problem. Do i need to create dummy variables for categorical variables before passing to SVM or it will handle on it's own.

Alexandre KOWALCZYKPost authorYes it is recommended that you create dummy variables to encode categorical variables.

MaomaoHi Alex,

Many thanks for your sharing. I got one question,

there are existing some missing information in my multiple X variables, How could I impute or deal with these missing values?? Thanks.

Alexandre KOWALCZYKPost authorIt is common to replace the missing value by the mean, but you can also replace it by the most frequent value or the median. There is some information about how to do it in Python on this on this page.

MaomaoAlex, thanks for your answer, but how could I get the 95% CI and P of AUC by SVM model???

SAMHi Alexandre,

Thanks for the good example showing us how to use SVR with GS

Could you further show us how to use the particle swarm optimization to optimize the parameters?

Alexandre KOWALCZYKPost authorHello. Thank you for your comment. I never used particle swarn optimization, so I do not plan to write an article on the subject for the moment 😉

AlvinHi Alexandre,

Thank you very much for your post. I would like to ask, how to perform multiple linear regression using support vector regression? Do you have any post on this or any other website that you know shows how this can be done using R? Thanks

Alexandre KOWALCZYKPost authorIf you use a support vector machine you will be performing support vector regression, not multiple linear regression. You can give a vector as input to perform multivariate support vector regression if you wish.

JunThank you for your post. I learned a lot from the tutorial. I just wonder how to perform multivariate svm regression, too. Even though you explained how to do that, I cannot understand how does it work in real programming. If you don't mind, would you please give me an example? I hope you show that method through the R code. Thank you.

DevGreat tutorial. Thanks a lot!

Sadiq AhmadCurrently we are working on a research paper in which we have conducted psychological experiment to get data-set. After that we have applied Multiple regression to find the relation among dependent variable and independent variables. our model was significant because Sig value was less the .05 and we found a good relation among dependent and independent variables.

Now my idea is, to develop new algorithm which will have different mathematical equations and all these equations will based on that regression analyses. For example if regression analysis shows that humidity have strong relation with rain. then we will say that "Humidity is directly proportional to rain".

So my question is, did we have formal mathematical techniques or any software tool which can provide different equations according with regression analysis.

Or

We will manually draw equations from that regression analyses.

ljHi Alexander,

I have some doubts. The kernel functions, with the exception of linear, also have a cost parameter (C)? How can I perform grid search setting the cost parameter of the function and the kernel? You can separate them?

noviyanti sagalaHi Alexander,

How do we specify the training and testing dataset? I can't see you used different dataset.

abhishek bansalgreat work!!!!!!how we find cofficient of determination in svr..command for that?????

MostafaHi Alexandre. Many thanks for your valuable tutorial. You mentioned that SVR also works when X is multidimensional. Could you please let me know how I can load the multi dimensional X so that it runs with the following code:

model <- svm(Y ~ X , data)

predictedY <- predict(model, data)

Many thanks

Kind regards

Mostafa

harshaHi Alexandre,

can i extend this regression tool for spatial modelling. presently i am using random forest for spatial modelling. I tried using with cubist but with not much success. As per my knowledge random Forest can easily handle both continuous and categorical variables at the same time, is it possible with SVM as well??

Alexandre KOWALCZYKPost authorYes, I think so. You just need to one-hot-encode the categorical variables.

GauravHow to visualize both our models ?. You said that The first SVR model is in red, and the tuned SVR model is in blue on the graph. How to plot it ?

Jagadeeshathanks a lot for the this post. helped me a lot.

JackHello Alex,

Have you ever tried to use Amibroker for buidling and testing a SVM ?

Anyone can do some research in Excel, however Amibroker is pretty fast while working on data arrays and its formula language is very much C - like. Visualising effectiveness of set parameters in 3D is also possible.

Thanks for a brilliant tutorial !

Alexandre KOWALCZYKPost authorHello Jack. No, I never tried this.

ma-EHI Alex, I am forecasting y/y currency reruns, but i always get wrong foretasted values.

Alexandre KOWALCZYKPost authorWell, that is very unfortunate. Keep in mind that SVR is not the solution to every regression problem. Moreover, you should try to use machine learning to predict things for which you believe there is an underlying (unknown) relation. Maybe the relation between currency pairs is too random and cannot be predicted, or there is no relation, or it keeps changing.

KaustubhI am using this method for forecasting. I have assumed a linear model with 6 variables. Thus it has 6 parameters. This method is forecasting the final output. I want to know the values of the 6 parameters. That means I want to find the model.

SteliosHello Alex,

I was wondering if you could develop (using your toolboox) a Support Vector Regression model based on a Gaussian- RBF functions in which you need to choose C,γ and ε.

Kind Regards,

Stelios

Alexandre KOWALCZYKPost authorHello Stelios. I do not see why it would not be possible. You need to look for the documentation of the R package to do so.

SamanthaHello Alexandre,

Thanks for this good tuto.

I would like to know how can I reproduce the predictions with the output given by R?

I have to do it with Excel (VBA) using the model parameters fitted by R.

Thanks.

Alexandre KOWALCZYKPost authorHi Samantha,

I have never used Excel to do SVR so I am sorry I cannot help you on this matter.

SamanthaOk thanks.

But maybe you know how R calculate the predictions with the parameters of the model?

I tried with a linear kernel but I couldn't find the predictions given by predict.svm....

AnastasiyaHi, Alexandre!

I've performed SVM and tuned the parameters (gamma and cost) by doing grid search with 5 cross validation. But I also came across in an article that there is another option of finding this optimal combination by implementing some performance metrics. So what I would like to do is to find an optimal pair of gamma and cost which results in the highest cross-validation area under the receiver operating curve (AUC). Do you have any idear how it can be implemented?

Alexandre KOWALCZYKPost authorHello Anastasiya,

The most common approach for tuning SVM is indeed grid search like you did. If you wish to find the best value, you can try doing this with a smaller grid search around the value which seems the best. There are also other more complicated techniques, so if you really wish to find the optimal value it may be good to take a look at them. In the section 3.2 of their guide, the libsvm authors say that the other methods are not really "better" as they depend on some heuristics or approximations. It may be worth the shot to try looking for paper on the subject and try some other methods. If you do, I would be interested in knowing your results.

Regards,

ChinmayaHi Alexandre,

Thanks for such a nice write-up.

I've seen examples where different powers of 10 are used for Cost; here you have used powers of 2. My question is whether it's significant to use powers of 2 or 10; or we can literally supply any list of values ?

Is there any thumb rule regarding the range of Cost ?

Alexandre KOWALCZYKPost authorIt does not really matter whether you use powers of 2 or powers of 10. The rule of thumb is that when you perform a "grid" search you make the grid smaller and smaller. For instance, you can try values between 10^0 and 10^5, and then you see that the best one is 10^3, so now, you can perform a smaller grid search, between 500 and 1500 with 100 increments. If the best one is 800, you can try another search between 650 and 950 with increments of 50. In the end, doing a search too precise is often not worth the time, that is why you can be completely fine with the first value of 10^3. But if you really want to find the

bestC (and have the time), then refining your grid like that is the way to go. The same logic applies if you have more than just one parameter to find, you need to find a set of parameters among all the possible combinations...). Note that the grid search method, is an empirical method and that there are other ways to find the best parameter.SarahThank you for your valuable information. I have few questions

1- what machine learning algorithm can be applied for text classification such as tweets from Twitter with best accuracy and easiest implementation?

2- what programming language can I use to get a web based system with ML algorithm embedded in it? I'm thinking currently of .NET but I don't know if I can use the classifier there

3- using analytical tool such as AlchemyAPI which is based on deep learning algorithm can be enough for text classification or I need to apply algorithm such as SVM ?

Alexandre KOWALCZYKPost authorHello Sarah,

1. Basically, all machine learning algorithm which can deal with text data. There is no single algorithm better than all the others, you have to test by yourself on your specific case.

2. In .NET you can use Accord.Net which is a pretty good framework, however, you can also create websites in Python and use scikit-learn, and in a lot of other languages too.

3. Using this API might be a good idea if you are not very inclined towards programming. Once again it depends what you want to do, and what it can do.

Regards,

Pingback: Support Vector Machines - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

Pingback: Support Vector Machines — Part 1 - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

Pingback: Support Vector Machines — References - Dr. Idlewyld’s Data Analysis Emporium and Assorted Quantitative Goodies

JEW DASHi,

I have more than one independent component (i.e X is more than 1 variable) but one dependent component (Y is one). Then how to do this multiple regression?

Alexandre KOWALCZYKPost authorYou can use a matrix X and it works the same (except you cannot visualize it).

WenThank you very much! I did not expect that SVM can also solve regression problem!

This post helps me a lot!

PrantikHi, I want to use SVR on text, can you please tell how to proceed?

Alexandre KOWALCZYKPost authorHello,

Sorry but your question is too broad. As a first step I can suggest you to try find if there is some papers on the subject.

anjanaYour papers are really superb it helps me so much but one thing how could I download a that dataset which was used in r studio? let me get that dataset

Alexandre KOWALCZYKPost authorHello. You can download the dataset and the code with this link. Thanks for pointing out that the link was broken.

miteshHi,

can you please help me find out how svm calculates probability when we use predict function on svm trained model. Please let me know the formula for the same to manually verify the probability.

Although when i used predict on svm it produces the probability which gives more than one and less than zero as well in the output.

Alexandre KOWALCZYKPost authorHello,

You need to use Platt Scaling.

miteshThanks Arun I am using following link as reference : https://www.analyticsvidhya.com/...

Although I’ve a few queries :

Is there a formula that will calculate predict function’s output on svm model on radial and polynomial kernels (the way we use in logistic regression Y = B0 + B1X 1+B2X 2+e and then put in logit func.)

I am using tune.svm function and summary of model gives me best performance value; but this value changes with the same cost and gamma every time I run the the code……… How is this value of best performance is calculated

Why tune.svm does give stable cost and gamma values? I am setting cost= seq(from=1,to=100,by=5) and gamma=(from=0.0005,to=0.05,by=0.005)…… Is there any other way to get the stable value

Lastly with reference to the link of analytic vidhya in the beginning, I use isoreg technique, but with this output of fits.isoreg gives only 13 unique levels out of 772 observation. This give me error when i use cut function in R for binning the probability values. (Error in cut.default(temp9$prob1, breaks = quantile(temp9$prob1, probs = seq(0, : 'breaks' are not unique )Can you please give help me out

FergusThanks for the very clear tutorial on SVM - very helpful introduction.

Can you explain in Step 4 when you first perform the grid search why epsilon is returned as e1-04 even though it was set to cycle between 0 and 1 in increments of 0.1? Also, why did you ignore this value of epsilon and instead choose to zoom in between 0 and 0.2?

(BTW, towards the end of Step 4 I think there is a typo when the text says “From the graph you can see that models with C between 200 and 300 and ϵϵ between 0.8 and 0.9 have less error.” Should the range of epsilon values here should be 0.08 and 0.09?)

Thank you.

Alexandre KOWALCZYKPost authorHello Fergus. I corrected the two problems. I think both were typos. Thanks a lot for your comment.

Maliha AshrafExcellent tutorial. Can you tell how to extract the function which is modeled by SVM here?

BIBHUTI BHUSAN SAHOOHow to calculate the Lagrangian multipliers in case of nonlinear regression in support vector regression in r software (package =e1071).

For prediction problems.?

LokeshwarHi Kowalczyk,

I just read your article on SVM . Before reading the article I have no knowledge of SVM . But now I have much more idea on it than I thought of . Its very clear , useful and informative .

A ' very very big thanks ' to you.

LouHallo and thanks a lot for the tutorial:

Is it possible to calculat the AICc to evaluate the svr model. I performed an eps-regretion

I tried to calculat AICc using MuMin package but I receive an error message:

Error in UseMethod("logLik") :

no applicable method for 'logLik' applied to an object of class "c('double', 'numeric')"

thanks.