Shweta Sehgal Dec 26, 2018 No Comments

We discussed in our previous article ‘Beginners guide to SVM‘ the concept and the basics that one needs to know to learn Support Vector Machines.

Now lets talk about how to implement it in R. Explaining it by a simple hands-on example as below-

The data set we have used here is called **iris**. It is already present in the R studio and it is mainly used for practising binary classification problems.

Saving the values of the predictor variables in x

**x <- subset (iris, select=-Species)**

** **Saving the values of the dependent variable** Species** in** Y **

**y <- iris$Species**

The package **e1071 **should be installed by the given code:** install.packages (“e1071”).**

We call the package before using the functions available in them.

**library (e1071)**

** **Saving the svm model in** svm_model**

**svm_model <- svm (Species ~., data=iris)**

** **The summary of the model shows the default values of the parameters of svm.

**summary (svm_model)**

Now we predict the values of y with function** predict () **and save it in** pred1.**

**pred1 <- predict (svm_model, x)**

** ****confusionMatrix ()** function is available in** caret** package and it is used to see the accuracy of the model by comparing the predicted values with the original values .We find from the summary below that , we have achieved an accuracy of **97.33 %** and there is misclassification in both **versicolor **and **virginica**.

**library (caret)**

**confusionMatrix (pred1, y)**

**PARAMETERS TUNING**

**TYPE**

SVM models can be classified into four distinct groups:

- c- classification
- nu- classification
- epsilon- regression
- nu- regression

**C** and **nu** regularization parameters help in implementing a penalty on the errors that are performed while separating the classes. This helps in improving the accuracy of the output.** C** ranges from (**0 to infinity)** that can be a bit hard to estimate and use. A better modification of this is **nu** which operates between (**0-1)** and the advantage of using **nu** is that it has a control over the number of support vectors.

**Kernel**

Kernel parameter selects the type of hyperplane used to separate the data. The ‘linear’ kernel uses a linear hyperplane. While the ‘radial’ and ‘polynomial’ is used for non linear hyper-plane.

The best way to determine the kernel is to check which kernel works well for the data. The linear kernel will work fine if the dataset is linearly separable; however, if the dataset is not linearly separable, a linear kernel will lead to high amount of misclassifications.

The radial kernel decision boundary is also linear. But the difference between both the kernels is that, the radial kernel actually creates **non-linear combinations of features** to uplift the samples onto a **higher-dimensional feature space** where one can use a linear decision boundary to separate the classes.

Now another question is, what if both a linear and a nonlinear kernel works equally well on a dataset?

So in that case we choose the simpler linear SVM because firstly linear kernel is a parametric model, secondly the complexity of the radial kernel grows with the size of the training set. So it is more expensive to train a radial kernel.

**Gamma**

Gamma is a parameter for non linear hyperplanes. The higher the gamma values the more it tries to exactly fit the training data set. When gamma is **very small**, the model cannot capture the complexity the data. The region of influence of any selected support vector would include the whole training data set i.e. points far away from the separation line. Whereas when gamma is **too large**, the radius of the area of influence of the support vectors only includes the support vector close to the separation line. But sometimes very high gamma values lead to overfitting the training data. So choosing gamma values at an intermediate level is always desirable.

**Cost**

Cost controls the tradeoff between the length of a decision boundary and accuracy of the model. It tells the SVM how much misclassification to avoid. Now for large values of cost, there will be a smaller margin hyperplane which does a better job of getting all the training points classified correctly. On the other hand, a smaller value of cost leads to larger-margin hyperplanes which misclassifies more points.

**Degree**

Degree is a parameter that is used when kernel is set to ‘polynomial’. It is the degree of the polynomial used to find the hyperplane to split the data.

** **Now we use the function** tune () **to get the best values of the parameters for the given dataset.

**svm_tune <- tune (svm, train.x=x, train.y=y, **

** kernel=”radial”, ranges=list(cost=c(0.1,1,10,100,1000), gamma=c(0.1,1,10,100)))**

** **So after printing **svm_tune** we get that the best value for **cost** is **10** and **gamma** is **0.1**

**print (svm_tune)**

We save the tuned model in** svm_model_after_tune**

**svm_model_after_tune <- svm (Species ~., data=iris, kernel=”radial”, cost=1, gamma=0.1)**

** **Predicting the values of y with function** predict () **and saving it in** pred2.**

**pred2 <- predict (svm_model_after_tune, x)**

** **From the updated confusion matrix and accuracy of the model (given below) it is clear that ,the model has improved after a bit of parameter tuning . The accuracy is now **98% **and the misclassification is confined to **virginica** .

**confusionMatrix (pred2, y)**

Parameters tuning does not end here. There are a lot of features of the **svm ()** function. For more information click the links given below:

- https://www.rdocumentation.org/packages/e1071/versions/1.7-0/topics/svm
- https://www.rdocumentation.org/packages/e1071/versions/1.7-0/topics/tune

You can also use the svm algorithm from the **caret** package .Here is the link to the functions available in **caret **package.

http://topepo.github.io/caret/train-models-by-tag.html#support-vector-machines

SVM is a very useful technique in case of** nonlinear** and **high dimensional data**. But there are some drawbacks. Firstly, it is very **complex** and people prefer to use simpler and easily interpretable models. Secondly, the **correct choice of kernel parameters is crucial** for obtaining good results, which means that an extensive research must be conducted on the parameters before using the model. Furthermore, a tuned model may give excellent result in classification accuracy for a problem A, but may result in poor classification accuracy for a problem B.

*This article has been contributed by our student Saptarshi Mukherjee.*

## Leave a Reply