# Support Vector Regression with R

In this article I will show how to use R to perform a Support Vector Regression.
We will first do a simple linear regression, then move to the Support Vector Regression so that you can see how the two behave with the same data.

## A simple data set

To begin with we will use this simple data set:

I just put some data in excel. I prefer that over using an existing well-known data-set because the purpose of the article is not about the data, but more about the models we will use.

As you can see there seems to be some kind of relation between our two variables X and Y, and it look like we could fit a line which would pass near each point.

Let's do that in R !

## Step 1: Simple linear regression in R

Here is the same data in CSV format, I saved it in a file regression.csv :

We can now use R to display the data and fit a line:

# Load the data from the csv file

# Plot the data
plot(data, pch=16)

# Create a linear regression model
model <- lm(Y ~ X, data)

abline(model)


The code above displays the following graph:

## Step 2: How good is our regression ?

In order to be able to compare the linear regression with the support vector regression we first need a way to measure how good it is.

To do that we will change a little bit our code to visualize each prediction made by our model

dataDirectory <- "D:/"

plot(data, pch=16)
model <- lm(Y ~ X , data)

# make a prediction for each X
predictedY <- predict(model, data)

# display the predictions
points(data$X, predictedY, col = "blue", pch=4)  This produces the following graph: For each data point $X_i$ the model makes a prediction $\hat{Y}_i$ displayed as a blue cross on the graph. The only difference with the previous graph is that the dots are not connected with each other. In order to measure how good our model is we will compute how much errors it makes. We can compare each $Y_i$ value with the associated predicted value $\hat{Y}_i$ and see how far away they are with a simple difference. Note that the expression $\hat{Y}_i - Y_i$ is the error, if we make a perfect prediction $\hat{Y}_i$ will be equal to $Y_i$ and the error will be zero. If we do this for each data point and sum the error we will have the sum of the errors, and if we takes the mean we will get the Mean Squared Error (MSE) A common way to measure error in machine learning is to use the Root Mean Squared Error (RMSE) so we will use it instead. To compute the RMSE we take the square root and we get the RMSE Using R we can come with the following code to compute the RMSE rmse <- function(error) { sqrt(mean(error^2)) } error <- model$residuals  # same as data$Y - predictedY predictionRMSE <- rmse(error) # 5.703778  We know now that the RMSE of our linear regression model is 5.70. Let's try to improve it with SVR ! ## Step 3: Support Vector Regression In order to create a SVR model with R you will need the package e1071. So be sure to install it and to add the library(e1071) line at the start of your file. Below is the code to make predictions with Support Vector Regression:  model <- svm(Y ~ X , data) predictedY <- predict(model, data) points(data$X, predictedY, col = "red", pch=4)



As you can see it looks a lot like the linear regression code. Note that we called the svm function (not svr !)  it's because this function can also be used to make classifications with Support Vector Machine. The function will automatically choose SVM if it detects that the data is categorical (if the variable is a factor in R).

The code draws the following graph:

This time the predictions is closer to the real values ! Let's compute the RMSE of our support vector regression model.

  # /!\ this time  svrModel$residuals is not the same as data$Y - predictedY
# so we compute the error like this
error <- data$Y - predictedY svrPredictionRMSE <- rmse(error) # 3.157061  As expected the RMSE is better, it is now 3.15 compared to 5.70 before. But can we do better ? ## Step 4: Tuning your support vector regression model In order to improve the performance of the support vector regression we will need to select the best parameters for the model. In our previous example, we performed an epsilon-regression, we did not set any value for epsilon ( $\epsilon$ ), but it took a default value of 0.1. There is also a cost parameter which we can change to avoid overfitting. The process of choosing these parameters is called hyperparameter optimization, or model selection. The standard way of doing it is by doing a grid search. It means we will train a lot of models for the different couples of $\epsilon$ and cost, and choose the best one.  # perform a grid search tuneResult <- tune(svm, Y ~ X, data = data, ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9)) ) print(tuneResult) # best performance: MSE = 8.371412, RMSE = 2.89 epsilon 1e-04 cost 4 # Draw the tuning graph plot(tuneResult)  There is two important points in the code above: • we use the tune method to train models with $\epsilon = 0, 0.1, 0.2, ... ,1$ and cost = $2^2, 2^3, 2^4, ... ,2^9$ which means it will train 88 models (it can take a long time) • the tuneResult returns the MSE, don't forget to convert it to RMSE before comparing the value to our previous model. The last line plot the result of the grid search: On this graph we can see that the darker the region is the better our model is (because the RMSE is closer to zero in darker regions). This means we can try another grid search in a narrower range we will try with $\epsilon$ values between 0 and 0.2. It does not look like the cost value is having an effect for the moment so we will keep it as it is to see if it changes.  tuneResult <- tune(svm, Y ~ X, data = data, ranges = list(epsilon = seq(0,0.2,0.01), cost = 2^(2:9)) ) print(tuneResult) plot(tuneResult)  We trained different 168 models with this small piece of code. As we zoomed-in inside the dark region we can see that there is several darker patch. From the graph you can see that models with C between 200 and 300 and $\epsilon$ between 0.8 and 0.9 have less error. Hopefully for us, we don't have to select the best model with our eyes and R allows us to get it very easily and use it to make predictions.  tunedModel <- tuneResult$best.model
tunedModelY <- predict(tunedModel, data)

error <- data$Y - tunedModelY # this value can be different on your computer # because the tune method randomly shuffles the data tunedModelRMSE <- rmse(error) # 2.219642  We improved again the RMSE of our support vector regression model ! If we want we can visualize both our models. The first SVR model is in red, and the tuned SVR model is in blue on the graph below : I hope you enjoyed this introduction on Support Vector Regression with R. You can download the source code of this tutorial. Each step has its own file. ## 153 thoughts on “Support Vector Regression with R” 1. Jose Good stuff. How would this behave if for example, I wanted to predict some more X variables that are not in the training set? Is this useful in those instances? - In that case, how? Many thanks 1. Alexandre KOWALCZYK Post author You just need to use the predict method with two parameters: the trained model and your new data. This will give you the predicted values. This is useful because that is our original goal, we want to predict unseen data. 1. Joshua Dunn I have tried predicting unseen data but it always seems to underestimate the effect of it. For example, with temperature as my x-variable, if my SVR has not seen temperatures below zero degrees C (ie minus 2 degrees C) it effectively predicts them as it would zero. Would you be able to tell me what this is called or point me in a direction to solve this? Regards 1. Alexandre KOWALCZYK Post author For me it looks like you are overfitting your model with your training data. What you should try is to modify increase the weight of the regularization parameter (or use regularization if you were not) 2. Md. Moyazzem Hossain Dear Thank you very much. Actually I want to predict the future value of univariate time series by SVM. I have used the library e1071. I am able to predict the value over the study period but i want to forecast the future value. 2. Liz "we use the tune method to train models with ϵ=0,0.1,0.2,...,1 and cost = 22,23,24,...,29 which means it will train 88 models (it can take a long time)" Hello. Can you explain how the number 88 is calculated? Thank you. 1. Alexandre KOWALCZYK Post author There is 11 values of epsilon, and 8 values for the cost. We can associate each epsilon with the 8 cost values to create 8 couples. As there is 11 epsilons, there is $11\times8$ couples. 3. Fakhrul Agustriwan Hello Mr. Kowalczyk. This tutorial is very helpful. Actually i am trying to forecast the future value of a time-series data by using SVR method, but i am quite confused how to perform it in R. Could you explain the steps on how to do it? Thank you 🙂 1. Alexandre KOWALCZYK Post author Thanks for your comment. Unfortunately I have never used SVR to forecast timeseries. However I found this question and one of the answer is pointing to this article. As suggested in the answer you will need to transform the classification problem to a regression one but this might be a good starting point for you. 4. loic As I understand, SVM implemented in R uses the Radial Basis Kernel by default. Therefore, there is another parameter (called gamma). How do you deal with this one? I think you should fit it also. One article mentionned to take the median of pairwise distances between the learning points. (After the scaling process) 1. Alexandre KOWALCZYK Post author You just need to add the gamma parameter in the tune function. There is an example in the e1071 package documentation : obj < - tune.svm(Species~., data = iris, gamma = 2^(-1:1), cost = 2^(2:4)) 5. loic Ok thanks for your reply. Using tune.svm I noticed that this function is very very long (around 3 seconds per configuration of parameters for 1000 observations of 7 variables). Surprisingly if you use svm(..., cross = 10) you can get the cross validation error for less than 0.5 second on the same data. So, I concluded that tune.svm was very badly coded, do you have any idea about this issue? Therefore I coded my own parameters tuning function using svm(...,cross=10). Also, I have found several papers that use a BFGS optimization algorithm (on a log2 scale) instead of grid search. I tried this, it turned out to be very efficient. 1. Alexandre KOWALCZYK Post author When you are using svm(..., cross = 10) you are performing a 10-fold cross-classification on the training data. This is not the same as doing a grid search. If the method tune.svm is so slow, it is not because it is poorly coded, but because it trains one svm model per combination of hyperparameter. So if you want to try gamma=0.1,0.01 and C=1,10,100 for instance it will train 6 different svm models ([0.1,1][0.1,10][0.1,100] [0.01,1][0.01,10][0.01,100]) In other word, it will try each couple in the cartesian product of the gamma set with the C set. If you try it for 10 values of gamma and 10 values of C, it will train 100 models. Which should indeed be much slower than training only 10 models. 6. loic That's not what I meant. I am aware of that of course. But actually, I made grid search "by hand" with a loop on 10x10 values of gamma and C using svm(...,cross = 10). Therefore I called 100 times svm and then keep the minimum cv error. The overall time it took was something like 10 times less than calling once tune.svm() on a 10x10 grid. That was what made me think this function was poorly coded or it might use sofisticated techniques I am not aware of. I've been trying to find the reason in vain. Actually, I am a bit doubtful about the results of svm(..., cross = 10), it seems that it does not compute the sv error on a stochastic way and the results are only one decimal digit precise which is weird comparing to tune.svm() 1. Alexandre KOWALCZYK Post author I can't really help you more without seeing your code. Maybe you can ask on stackoverflow or cross validated if you want to dig deeper and understand what happens in your particular case. Feel free to post the link here afterward and I'll take a look. 2. Renan Hi loic, I am very interested on your code by hand. Because I have a lot of data to train and it takes a very very long time. Could send me this part ? Thanks a lot Renan 7. Spartan Great tutorial for svm, clearly defining its function as a classifier or a regressor, thanks Alexandre. 8. HAP Thank you for this valuable post. If I have more than one X variable including some dummy variables can I fit the SVR for that case? 9. Ilgaz Hello, I read your blog posts. I am not very clear about how to forecast future values of time series using SVR. I looks to me that SVR fits a model using training set. But how about using predict() to predict future values ( n.ahead values ) in R? I couldnt find this feature so far.. Sincerely, Ilgaz 10. Aseel First of all, thanks for the very helpful tutorial. I'm using R 3.2.1, but svm doesn't work correctly. On step 3, when I'm running this: model <- svm(Y ~ X , data), the error is : Error in predict(ret, xhold, decision.values = TRUE) : unused argument (decision.values = TRUE) Can you please help me? Thanks, 1. Alexandre KOWALCZYK Post author Hello. I don't have a lot of idea about this one. You might want to take a look at this answer and try the provided solution. Otherwise I would advise you to try the code on another machine to see if it works and if it does try to replicate the environment on your machine. Best regards. 1. Aseel I really appreciate your replay. I think the problem is there is same function name in two packages, for example predict() function in both ".GlobalEnv" and "package:stats" packages. I will try to figure out how to solve that. Thanks a lot, 1. Aseel I've found that I have function with the same name with predict. So, simply, I've copied my function to another name and remove predict function. That was making the confusion. Thanks again for helping me. 11. Espartaco Hi Alexandre. Thank you so much for all the information, I have a few questions. 1. Can I use any kínd of variables in a SVM ? Continuous, categorical? 2. If I am using a SVM to classify two groups, is there a way to get a probability of assignment to each group? 3. How do you validate that the SVM is a good model? 1. Alexandre KOWALCZYK Post author 1. Yes. For continuous data it is called SVR and SVM for categorical data 2. Yes. Most framework provide a method "predict probabilities" to do so 3. You use a score to measure the quality of your model, if you want to learn more I recommend you this book. 12. Danny Hi Alexandre, Thanks for such a comprehensive tutorial. Much appreciated. I am trying to SVR for predicting time series. As mentioned in your post, tune() shuffles the data. Is there any option or way to not to shuffle the data? 1. Alexandre KOWALCZYK Post author Hello. You can specify a tunecontrol parameter to specify the behavior of the tune method. I think tune.control(sampling = "fix") might suit your need. 1. tahir thank you very much, but when I use lssvm I get this message (Using automatic sigma estimation (sigest) for RBF or laplace kernel Error in if (n != dim(y)[1]) stop("Labels y and data x dont match") : argument is of length zero) 1. Alexandre KOWALCZYK Post author Hello tahir. Sorry I can't help you more without a reproductible example. The best place for you to ask your question is http://www.stackoverflow.com I hope you will find plenty of help there. Regards, Alexandre 2. Ely Hi Alexandre, Thank you for script on SVM. I have a few questions: 1. What are some advantages of the LSSVM over SVM and ANN (artificial neural network)? Does the LSSVM have a tendency to overfit? 2. Do you have an R script to perform an LSSVM? How to tune LSSVM and cross-validate it. 13. Ankit Hello, I am trying to use SVM for classification setting. All the variables/Attributes in the dataset are qualitative . Will SVM work in this case? or do I need to convert qualitative variables in quantitative before using it? Thanks Ankit 14. Subhasri Hi... Thank you for the superb article.. I've been reading about SVMs since a few days now. I have a doubt.... We normalize data and then give that data as input to SVM; can SVM be used for actual normalization? and how do we determine a kernel function? 1. Alexandre KOWALCZYK Post author The goal of SVM is not normalization, it is classification (or regression in the case of SVR). If your question was "How to select a kernel", this link might help you. 15. Minerva Hello Sir. Is it possible to calculate the AIC of the SVM regression model (the way you would for linear regression) ? If yes, how? 1. Alexandre KOWALCZYK Post author Hello. I never used any package using Cuda or OpenCl with R. For Cuda I am using python. Maybe you can find some informations here. 16. Scott Hi Alexandre, First of all -- excellent tutorial! Thanks for putting the time into this. I'm wondering if you can help answer a question. I have a 125 independent observations of spectral 200 wavelength NIR data for a random set of samples, my X matrix. For each I have an independent scalar value of the concentration of a certain analyte, essentially my Y vector. My goal is to find the minimal set of important wavelengths that correlate best to Y, I've used PLS techniques combined with wavelength selection methods to find a subset of wavelengths. Even with the full set of 200 wavelengths and 20 latent vectors I get a nominal fit to my Y concentration data. But of course, when you fit a PLS model, you hope to find a few PLS factors that explain most of the variation in both predictors and responses. Now, the regression coefficient profile (loadings) gives a direct indication of which predictors are most useful for predicting the dependent variable. OK, on to my question. Once one finds a reasonable SVR regression fit, how does one extract which wavelengths/variables were weighted more highly than others? I like the use of SVR over PLS since, with the right kernel choice, it can incorporate potential nonlinearities in my data. But all the SVR packages I've looked at seem to lack the fit and weighting assessment plots that are found in PLS. Can you offer any thoughts/suggestions on this? Thanks!! -Scott 1. Alexandre KOWALCZYK Post author Hi Scott. Thanks for your comment, I am glad people find my articles helpful. Unfortunately there is no built-in way to retrieve the relative importance of variable for SVR. You can try to remove one predictor and see how it impacts the performance of the SVR (but you will have to do it for each predictors which can take a long time). I found this paper which might help you, it even has a video. I hope it helps you. 17. Sharda Tripathi Hi Alexandre, Your tutorial is very informative and easy to understand. Keep up the good work. I however have a more conceptual question from SVR, not related to SVR implementation in R. I have performed support vector regression on a time series. The residuals (i.e actual value-predicted value) shows strong auto correlation.The auto correlation plot of residuals has a damped sinusoidal nature. I have read in literature that fitted model is not good if such is the case. I have tried transformations like first difference and log of time series,still the result is same. Does this auto correlation imply that my model is not good.If so, what can I do to get rid of it. Any suggestion would be of great help 🙂 Thanks Sharda 1. Alexandre KOWALCZYK Post author Hello Sharda. Thanks for your comment. Indeed this autocorrelation implies that your model is not perfect. You should think about your problem and see if you can add another independant variable. Adding it might remove the autocorrelation. If it does not work, you can try other techniques like the Cochrane-Orcutt Method or the AR(1) Method as described in this chapter. Regards. 18. Amir Hi, Thank you for sharing. I am trying to perform a multi classification in credit rating allocation for a bank, my data set involves a set of financial ratios, I have tried several techniques to perform the classification , however, the models output just like flipping a coin !!! I wonder what i should do . any hint?. I can send you my data set if required and highly need help since its my master thesis as well. 1. Alexandre KOWALCZYK Post author Hello Amir. The first question I would ask myself is "Is what I am trying possible?" if so, "Is there somebody else who did that? With which results ?" "What is the state of the art?" That's why I often start looking for papers and I advise you to do so. If your models perform poorly, maybe your data is not clean. Did you preprocess it correctly? Did you use standardization or normalization? When in doubt, one approach might be to try your model on another simpler dataset on which such model usually perform good. If your model does not work well on it, it might be a programming or data preparation mistake. I hope this helps you. 19. sajal Thanks Alexandre for nice and useful post. Is it possible to obtain the model equations directly using SVR (preferably the best fitted one) to apply in another platform for calculation, for example in MS Excel based on the fitted models? If happened so, could you please explain a bit. Thanks in advance. 20. Rachel Hi Alexandre. How to apply SVM for univariate time series data to classify into 2 ccategories (either normal or outlier) ? 1. Alexandre KOWALCZYK Post author Hi Rachel. Sorry but this is a pretty broad question which would need a specific article to answer. You can go on this site to post such questions, but don't forget to do your own research before. Best regards. 21. lichenyu I met the problem same as loic refers. In fact, that is caused by the default setting of the function tune.svm(), which will perform a 10 cross-validation. And for svm(), there is no cross-validation by default. So, when using tune(), it may take around 10 times as expected if not considering this issue. 22. Adhi Thank you for the tutorial. I am curious, how do the manual calculation in SVR until got the function and prediction value? 23. Shahriar SHAMILUULU Hi Alexandre, Thank you for tutorial and I have a question below. How we can get an equation for the model generated by svr, i.e., intercept, coefficient for x and R2, because when I try to see a summary there is nothing like that. Thank you. 24. sumana good tutorial can you please tell me how svm is used to tell whether the dataset is linear or nonlinear 1. Alexandre KOWALCZYK Post author Well you can't use SVM to know if data is linear or non linear. However if you achieve a very good score with a SVM and a linear kernel it is most likely that the data is linearly separable. 25. dmitrio Thanks for your tutorial. i'm forecasting time series data with 4 predictor (t-4,t-3,t-2,t-1) to predict 't' data. There is a rule of minimum training sample to build SVR model ? Thanks 26. bahtiyar Hi alexander, I try to implement SVR in my prediction time series. I use univariate data for prediction.. format data that i use is x-1, x-2, x-3, x-n -> [x+1] x+1 : target value x-1,x-2,x-3,x-n : atribute value I use libsvm (e1071) in R to help calculate the prediction and i got high error value.. Must I scale the data to [0,1] or [-1,+1] as a classification problem. . If there must be scale, I didn't find parameter to set it in R. In manual lib 'e1071', I just found parameter that the data scale or not. The parameter like this svm(....parameter...... , scale = FALSE) Any sugestion for this...? Thanks.. -bahtiyar 27. William Thank you very much, Mr. KOWALCZYK! Thanks to your lm(y,x,data) function I was able to successfully plot a regression line! I had tried lm(y,x) before but I kept getting the error "Error in if (noInt) { : argument is of length zero". Thanks again! 28. Pietro Hi! Great tutorial! Please can you help me on these two things: 1) I have to use SVR in order to predict future values of energy consumption. My input is the day of the week and the output is the correspondent energy consumption value. How can I encode this input information? 2) I have one month energy data: How do I divide the whole set into Training and Set? How do I use the test set in order to validate the model? Thanks a lot! 1. Alexandre KOWALCZYK Post author Hello Pietro. 1) For the day of the week you could use a number for each day (0 to 7) but this is not so good because there is an order between the number so instead you should one-hot-encode it. For this you can use the OneHotEncoder provided by sklearn. 2) You can watch this video which explains everything 🙂 29. Alex hello, this is a very useful tutorial. Thanks. 😉 I am wondering how you can extract out the coefficients of the SVM regression, just like the coefficients in the linear regression. Thanks in advance. 1. Alexandre KOWALCZYK Post author You can use the coefs property of the svm object which is returned after the training. 30. LT Hi, This is a great tutorial. Just a question. After the SVM is trained, can we do a hand calculation, like we can do in a simple model, to predict a value for a new variable set. Thanks. 1. Alexandre KOWALCZYK Post author You can use the trained model to make a new prediction. I don't get what you mean by "hand calculation". If you want to do it by hand on paper it would be tedious. 31. Vishesh Sahni Hello. It's a great tutorial so thanks for putting it here. I've my modelled my data and obtained a graph. I want to predict the next value. How do I do it? Thanks a lot. 1. Alexandre KOWALCZYK Post author Hello. Thank you for your comment. Unfortunately your question is way too broad. Maybe you can find a dedicated forum or a teacher to help you with this matter. Regards. 32. Chimezie Hi Alexandre, Can you please explain the Dispersion term in the SVM tuning process ! What does Dispersion stands for??? Regards 1. Alexandre KOWALCZYK Post author Hello. I don't see what you mean. I don't see any dispersion term in the e1071. Could you clarify? 33. RC Thanks for this, very helpful. I know with SVM only cannot usually figure out what the features are that led to the good prediction model but I was wondering if there is a way to extract the features which are crucial in generating the predictedY with SVR? Essentially, I want to use SVR for feature selection. I am getting pretty descent error (0.31) for my model and I'd like to know which features have the highest weights enabling this? Any help would be greatly appreciated. Thank you. 34. LJ Hi Alexander, Thanks for this, very helpful. I'm trying to test different parameterization SVM in prediction problems using epsilon-svr and nu-svr. In what range do I should test the parameters ε, ν (nu) and C? 1. Alexandre KOWALCZYK Post author You should test them using grid search. The particular value of the parameters differ greatly between problems so you just have to do a grid search first and then try to narrow the range until you find values which give you satisfaction. 35. ram Hi Alex, Really great article! I have few questions here, how that epsilon and cost is related here to the model. And where have you used the kernel part in the above calculation? and how does the kernel impact in processing of the model? 1. Alexandre KOWALCZYK Post author We did not specify the kernel parameter when we created the svm so the kernel is "radial" by default. (See documentation). You will understand how epsilon and C affect the model by reading this article. Best regards 1. Ruaa thanks alot for your great tutorial, it has helped me alot, but i am wondering i have a data set that's to predict electric load forecastig, my question should i normalize the data set first or automatically normalized by svm, my second question about the first point to tune the paremeters, how can i choose it 36. Emre Hi Alex, It is a solid tutorial. Thank you very much. I have a question and need to find the answer asap. I need perform v-svm which has additional parameter "v" . Can you help me modify the svm code to obtain v-svm code. And, I am curious about how I can see the whole code of SVM in R. Is there any way to step in the function SVM? Thank you very much. 37. SA Great Tutorial!!!!!! how to find 95% confidence interval for non linear regression? I don't you can use lm right?? 38. Jean-Pierre GERBIER Very interesting tutorial, thanks a lot If I am not too late .... I don't understand why whith exactly the same data set and same code snippets, I get a different result at step 3 : i get predictedY 1 2 3 4 5 6 7 7.667638 6.323641 5.578090 5.453718 6.066055 7.625997 10.367069 8 9 10 11 12 13 14 14.423447 19.718063 25.922760 32.519648 38.941073 44.724122 49.608321 15 16 17 18 19 20 53.536553 56.570848 58.777275 60.144779 60.578090 59.961279 Thanks if you have time to help me Jean-Pierre 39. Jean-Pierre GERBIER Sorry and sorry Alexandre ... I made a mistake ... absolutly sorry and many thanks again for your great tutorial Jean-Pierre 40. Weiwei Liu Hi,Alex: thanks for you tutor,i has one question which has been a long time.I don't known if you are familar with caret packages,i want to know the difference between function train() in caret package and tune() in e1071 package.they are all the training function about the SVM,but why is I use the same data i get the difference result,such as if i use the tune()" obj<- tune(svm,y~x, data = df, ranges = list(gamma = 2^(-2:2), cost = 2^(2:9),epsilon = seq(0,1,0.1)), tunecontrol = tune.control(sampling = "cross",cross=10))" i get the best parameter about cost gamma and epsilon,but is I use the train() "ctrl<-trainControl(method="repeatedcv",number=10,repeats=5, search="grid") df.svm<-train(y~x,data=df,method="svmRadial",trControl=ctrl)" i get the C with the crossponding MSE and Rsquared.besides that i also get the gamma in the line of summary(df.svm) and the result is different with the result with tune(),(such as C) so I want to ask which should i choose to use. i hope i will be unserstood,in not please let me known. thanks 1. Alexandre KOWALCZYK Post author Hello Weiwei, my guess is that the internal routine of tune use some kind of randomness to perform the cross validation. Also, if your data set is small, examples picked to be in one set can make the result change considerably. If you wish to have a more detailled answer, posting a question on stackoverflow might help. 41. Harshith Hello, iam trying to predict an unknown future variable using this method and i get an error Error in model.frame.default (formula =$ wb1 new_col ~ y + x1 + x2 + x3 +:
invalid type (NULL) for the variable 'wb1 $new_col new col is the new column of values which i Need to predict and wb1 is the dataframe. I'am trying to build svm Regression model for that formula. Can you please help me? The code Looks like this svmModel<-svm(formula = wb1$new_col~ y+x1+x2+x3+x4+x5+x6+x7+x8+x9, data = training, kernel = "radial", cost = 32, gamma = 0.1,scale = FALSE)

1. Alexandre KOWALCZYK Post author

I looks to me that some value in new_col is NULL. Try replacing all NULL values by a number before running the code. If it works check your data and your loading procedure to find where the null value comes from.

1. Harshith

Thanks a lot for the reply.
When i predict on the test set, the predicted values are that of Training data. how can i solve this? The code Looks like this.

svmModel<-svm(formula = y ~x1+x2+x3+x4+x5+x6+x8, data = training, kernel = "radial", cost = 32,epsilon=0,C=0.1, gamma = 0.1,scale = FALSE)

pred <- predict(svmModel, newdata = testing[,-42]).

1. zied

Hello again, Just to clarify my previous message.
https://stat.ethz.ch/pipermail/r-help/2009-August/399845.html
and I tried
model\$coefs
But I got 21 coefficients. How can I use them to build the equation. I expected 13 (12 for each variable and the intercept).

I have another question, I have the multivariable model, is it to possible to apply a non linear kernel?
You showedin the tutorial how to get RMSE, is it possible to get R2?
Thanks,
Zied

42. prasun

Hi, I am using SVM for classification problem. Do i need to create dummy variables for categorical variables before passing to SVM or it will handle on it's own.

1. Alexandre KOWALCZYK Post author

Yes it is recommended that you create dummy variables to encode categorical variables.

43. Maomao

Hi Alex,
Many thanks for your sharing. I got one question,
there are existing some missing information in my multiple X variables, How could I impute or deal with these missing values?? Thanks.

1. Alexandre KOWALCZYK Post author

It is common to replace the missing value by the mean, but you can also replace it by the most frequent value or the median. There is some information about how to do it in Python on this on this page.

1. Maomao

Alex, thanks for your answer, but how could I get the 95% CI and P of AUC by SVM model???

44. SAM

Hi Alexandre,

Thanks for the good example showing us how to use SVR with GS

Could you further show us how to use the particle swarm optimization to optimize the parameters?

1. Alexandre KOWALCZYK Post author

Hello. Thank you for your comment. I never used particle swarn optimization, so I do not plan to write an article on the subject for the moment 😉

45. Alvin

Hi Alexandre,
Thank you very much for your post. I would like to ask, how to perform multiple linear regression using support vector regression? Do you have any post on this or any other website that you know shows how this can be done using R? Thanks

1. Alexandre KOWALCZYK Post author

If you use a support vector machine you will be performing support vector regression, not multiple linear regression. You can give a vector as input to perform multivariate support vector regression if you wish.

Currently we are working on a research paper in which we have conducted psychological experiment to get data-set. After that we have applied Multiple regression to find the relation among dependent variable and independent variables. our model was significant because Sig value was less the .05 and we found a good relation among dependent and independent variables.

Now my idea is, to develop new algorithm which will have different mathematical equations and all these equations will based on that regression analyses. For example if regression analysis shows that humidity have strong relation with rain. then we will say that "Humidity is directly proportional to rain".

So my question is, did we have formal mathematical techniques or any software tool which can provide different equations according with regression analysis.

Or

We will manually draw equations from that regression analyses.

47. lj

Hi Alexander,

I have some doubts. The kernel functions, with the exception of linear, also have a cost parameter (C)? How can I perform grid search setting the cost parameter of the function and the kernel? You can separate them?

48. noviyanti sagala

Hi Alexander,

How do we specify the training and testing dataset? I can't see you used different dataset.

49. abhishek bansal

great work!!!!!!how we find cofficient of determination in svr..command for that?????

50. Mostafa

Hi Alexandre. Many thanks for your valuable tutorial. You mentioned that SVR also works when X is multidimensional. Could you please let me know how I can load the multi dimensional X so that it runs with the following code:
model <- svm(Y ~ X , data)
predictedY <- predict(model, data)

Many thanks

Kind regards

Mostafa

51. harsha

Hi Alexandre,

can i extend this regression tool for spatial modelling. presently i am using random forest for spatial modelling. I tried using with cubist but with not much success. As per my knowledge random Forest can easily handle both continuous and categorical variables at the same time, is it possible with SVM as well??

1. Alexandre KOWALCZYK Post author

Yes, I think so. You just need to one-hot-encode the categorical variables.

52. Gaurav

How to visualize both our models ?. You said that The first SVR model is in red, and the tuned SVR model is in blue on the graph. How to plot it ?

53. Jack

Hello Alex,

Have you ever tried to use Amibroker for buidling and testing a SVM ?
Anyone can do some research in Excel, however Amibroker is pretty fast while working on data arrays and its formula language is very much C - like. Visualising effectiveness of set parameters in 3D is also possible.

Thanks for a brilliant tutorial !

1. Alexandre KOWALCZYK Post author

Well, that is very unfortunate. Keep in mind that SVR is not the solution to every regression problem. Moreover, you should try to use machine learning to predict things for which you believe there is an underlying (unknown) relation. Maybe the relation between currency pairs is too random and cannot be predicted, or there is no relation, or it keeps changing.

54. Kaustubh

I am using this method for forecasting. I have assumed a linear model with 6 variables. Thus it has 6 parameters. This method is forecasting the final output. I want to know the values of the 6 parameters. That means I want to find the model.

55. Stelios

Hello Alex,

I was wondering if you could develop (using your toolboox) a Support Vector Regression model based on a Gaussian- RBF functions in which you need to choose C,γ and ε.

Kind Regards,

Stelios

1. Alexandre KOWALCZYK Post author

Hello Stelios. I do not see why it would not be possible. You need to look for the documentation of the R package to do so.

56. Samantha

Hello Alexandre,
Thanks for this good tuto.
I would like to know how can I reproduce the predictions with the output given by R?
I have to do it with Excel (VBA) using the model parameters fitted by R.

Thanks.

1. Alexandre KOWALCZYK Post author

Hi Samantha,
I have never used Excel to do SVR so I am sorry I cannot help you on this matter.

57. Samantha

Ok thanks.
But maybe you know how R calculate the predictions with the parameters of the model?
I tried with a linear kernel but I couldn't find the predictions given by predict.svm....

58. Anastasiya

Hi, Alexandre!

I've performed SVM and tuned the parameters (gamma and cost) by doing grid search with 5 cross validation. But I also came across in an article that there is another option of finding this optimal combination by implementing some performance metrics. So what I would like to do is to find an optimal pair of gamma and cost which results in the highest cross-validation area under the receiver operating curve (AUC). Do you have any idear how it can be implemented?

1. Alexandre KOWALCZYK Post author

Hello Anastasiya,

The most common approach for tuning SVM is indeed grid search like you did. If you wish to find the best value, you can try doing this with a smaller grid search around the value which seems the best. There are also other more complicated techniques, so if you really wish to find the optimal value it may be good to take a look at them. In the section 3.2 of their guide, the libsvm authors say that the other methods are not really "better" as they depend on some heuristics or approximations. It may be worth the shot to try looking for paper on the subject and try some other methods. If you do, I would be interested in knowing your results.

Regards,

59. Chinmaya

Hi Alexandre,

Thanks for such a nice write-up.

I've seen examples where different powers of 10 are used for Cost; here you have used powers of 2. My question is whether it's significant to use powers of 2 or 10; or we can literally supply any list of values ?

Is there any thumb rule regarding the range of Cost ?

1. Alexandre KOWALCZYK Post author

It does not really matter whether you use powers of 2 or powers of 10. The rule of thumb is that when you perform a "grid" search you make the grid smaller and smaller. For instance, you can try values between 10^0 and 10^5, and then you see that the best one is 10^3, so now, you can perform a smaller grid search, between 500 and 1500 with 100 increments. If the best one is 800, you can try another search between 650 and 950 with increments of 50. In the end, doing a search too precise is often not worth the time, that is why you can be completely fine with the first value of 10^3. But if you really want to find the best C (and have the time), then refining your grid like that is the way to go. The same logic applies if you have more than just one parameter to find, you need to find a set of parameters among all the possible combinations...). Note that the grid search method, is an empirical method and that there are other ways to find the best parameter.

60. Sarah

Thank you for your valuable information. I have few questions

1- what machine learning algorithm can be applied for text classification such as tweets from Twitter with best accuracy and easiest implementation?

2- what programming language can I use to get a web based system with ML algorithm embedded in it? I'm thinking currently of .NET but I don't know if I can use the classifier there

3- using analytical tool such as AlchemyAPI which is based on deep learning algorithm can be enough for text classification or I need to apply algorithm such as SVM ?

1. Alexandre KOWALCZYK Post author

Hello Sarah,

1. Basically, all machine learning algorithm which can deal with text data. There is no single algorithm better than all the others, you have to test by yourself on your specific case.
2. In .NET you can use Accord.Net which is a pretty good framework, however, you can also create websites in Python and use scikit-learn, and in a lot of other languages too.
3. Using this API might be a good idea if you are not very inclined towards programming. Once again it depends what you want to do, and what it can do.

Regards,

61. JEW DAS

Hi,
I have more than one independent component (i.e X is more than 1 variable) but one dependent component (Y is one). Then how to do this multiple regression?

1. Alexandre KOWALCZYK Post author

Hello,

Sorry but your question is too broad. As a first step I can suggest you to try find if there is some papers on the subject.