SVM - Understanding the math - Part 2

svm tutorial math

This is Part 2 of my series of tutorial about the math behind Support Vector Machines.
If you did not read the previous article, you might want to start the serie at the beginning by reading this article: an overview of Support Vector Machine.

In the first part, we saw what is the aim of the SVM. Its goal is to find the hyperplane which maximizes the margin.

But how do we calculate this margin?

SVM = Support VECTOR Machine

In Support Vector Machine, there is the word vector.
That means it is important to understand vector well and how to use them.

Here a short sum-up of what we will see today:

  • What is a vector?
    • its norm
    • its direction
  • How to add and subtract vectors ?
  • What is the dot product ?
  • How to project a vector onto another ?

Once we have all these tools in our toolbox, we will then see:

  • What is the equation of the hyperplane?
  • How to compute the margin?

What is a vector?

If we define a point A (3,4) in \mathbb{R}^2 we can plot it like this.

a point in the plane

Figure 1: a point

Definition: Any point x = (x_1, x_2), x\neq0, in \mathbb{R}^2 specifies a vector in the plane, namely the vector starting at the origin and ending at x.

This definition means that there exists a vector between the origin and A.


Figure 2 - a vector

If we say that the point at the origin is the point O (0,0) then the vector above is the vector \vec{OA}. We could also give it an arbitrary name such as  \mathbf{u}.

Note: You can notice that we write vector either with an arrow on top of them, or in bold, in the rest of this text I will use the arrow when there is two letters like \vec{OA} and the bold notation otherwise.

Ok so now we know that there is a vector, but we still don't know what IS a vector.

Definition: A vector is an object that has both a magnitude and a direction.

We will now look at these two concepts.

1) The magnitude

The magnitude or length of a vector x is written \|x\|  and is called its norm.
For our vector \vec{OA},   \|OA\| is the length of the segment OA


Figure 3

From Figure 3 we can easily calculate the distance OA using Pythagoras' theorem:

OA^2 = OB^2 + AB^2

OA^2 = 3^2 + 4^2

OA^2 = 25

OA = \sqrt{25}

\|OA\| =OA=5

2) The direction

The direction is the second component of a vector.

Definition : The direction of a vector \mathbf{u} (u_1,u_2) is the vector  \mathbf{w}(\frac{u_1}{\|u\|}, \frac{u_2}{\|u\|})

Where does the coordinates of  \mathbf{w}  come from ?

Understanding the definition

To find the direction of a vector, we need to use its angles.


Figure 4 - direction of a vector

Figure 4 displays the vector \mathbf{u} (u_1,u_2) with u_1=3 and u_2=4

We could say that :

Naive definition 1: The direction of the vector \mathbf{u} is defined by the angle \theta with respect to the horizontal axis, and with the angle \alpha with respect to the vertical axis.

This is tedious. Instead of that we will use the cosine of the angles.

In a right triangle, the cosine of an angle \beta is defined by :


In Figure 4 we can see that we can form two right triangles, and in both case the adjacent side will be on one of the axis. Which means that the definition of the cosine implicitly contains the axis related to an angle. We can rephrase our naïve definition to :

Naive definition 2: The direction of the vector \mathbf{u} is defined by the cosine of the angle \theta and the cosine of the angle \alpha.

Now if we look at their values :



Hence the original definition of the vector \mathbf{w} . That's why its coordinates are also called direction cosine.

Computing the direction vector

We will now compute the direction of the vector \mathbf{u}  from Figure 4.:

cos(\theta)=\frac{u_1}{\|u\|}=\frac{3}{5} =0.6



The direction of \mathbf{u}(3,4) is the vector \mathbf{w}(0.6,0.8)

If we draw this vector we get Figure 5:

direction vector

Figure 5: the direction of u

We can see that \mathbf{w} as indeed the same look as \mathbf{u} except it is smaller. Something interesting about direction vectors like \mathbf{w} is that their norm is equal to 1. That's why we often call them unit vectors.

The sum of two vectors


Figure 6: two vectors u and v

Given two vectors \mathbf{u} (u_1, u_2) and \mathbf{v} (v_1, v_2) then :

\mathbf{u}+\mathbf{v}= (u_1+v_1, u_2+v_2)

Which means that adding two vectors gives us a third vector whose coordinate are the sum of the coordinates of the original vectors.

You can convince yourself with the example below:


Figure 7: the sum of two vectors

The difference between two vectors

The difference works the same way :

\mathbf{u}-\mathbf{v}= (u_1-v_1, u_2-v_2)


Figure 8: the difference of two vectors

Since the subtraction is not commutative, we can also consider the other case:

\mathbf{v}-\mathbf{u}= (v_1-u_1, v_2-u_2)


Figure 9: the difference v-u

The last two pictures describe the "true" vectors generated by the difference of \mathbf{u} and \mathbf{v}.

However, since a vector has a magnitude and a direction, we often consider that parallel translate of a given vector (vectors with the same magnitude and direction but with a different origin) are the same vector, just drawn in a different place in space.

So don't be surprised if you meet the following :


Figure 10: another way to view the difference v-u



Figure 11: another way to view the difference u-v

If you do the math, it looks wrong, because the end of the vector \mathbf{u-v} is not in the right point, but it is a convenient way of thinking about vectors which you'll encounter often.

The dot product

One very important notion to understand SVM is the dot product.

Definition: Geometrically, it is the product of the Euclidian magnitudes of the two vectors and the cosine of the angle between them

Which means if we have two vectors \mathbf{x} and \mathbf{y} and there is an angle \theta  (theta) between them, their dot product is :

 \mathbf{x} \cdot \mathbf{y} = \|x\| \|y\|cos(\theta)

Why ?

To understand let's look at the problem geometrically.

dot product

Figure 12

In the definition, they talk about cos(\theta), let's see what it is.

By definition we know that in a right-angled triangle:


In our example, we don't have a right-angled triangle.

However if we take a different look Figure 12 we can find two right-angled triangles formed by each vector with the horizontal axis.

dot product

Figure 13


dot product

Figure 14

So now we can view our original schema like this:

dot product

Figure 15

We can see that

 \theta = \beta - \alpha

So computing cos(\theta) is like computing cos(\beta - \alpha)

There is a special formula called the difference identity for cosine which says that:

cos(\beta - \alpha) = cos(\beta)cos(\alpha) + sin(\beta)sin(\alpha)

(if you want you can read  the demonstration here)

Let's use this formula!

 cos(\beta) =\frac{adjacent}{hypotenuse} =\frac{x_1}{\|x\|}

 sin(\beta) =\frac{opposite}{hypotenuse} =\frac{x_2}{\|x\|}

 cos(\alpha) =\frac{adjacent}{hypotenuse} =\frac{y_1}{\|y\|}

 sin(\alpha) =\frac{opposite}{hypotenuse} = \frac{y_2}{\|y\|}

So if we replace each term

cos(\theta) = cos(\beta - \alpha) = cos(\beta)cos(\alpha) + sin(\beta)sin(\alpha)

cos(\theta) = \frac{x_1}{\|x\|}\frac{y_1}{\|y\|}+ \frac{x_2}{\|x\|}\frac{y_2}{\|y\|}

cos(\theta) = \frac{x_1y_1 + x_2y_2}{\|x\|\|y\|}\

If we multiply both sides by \|x\|\|y\| we get:

\|x\|\|y\|cos(\theta) = x_1y_1 + x_2y_2

Which is the same as :

\|x\|\|y\|cos(\theta) = \mathbf{x} \cdot \mathbf{y}

We just found the geometric definition of the dot product ! 

Eventually from the two last equations we can see that :

\mathbf{x} \cdot \mathbf{y} =x_1y_1 + x_2y_2 = \sum_{i=1}^{2}(x_iy_i)

This is the algebraic definition of the dot product !

 A few words on notation

The dot product is called like that because we write a dot between the two vectors.
Talking about the dot product \mathbf{x} \cdot \mathbf{y} is the same as talking about

  • the inner product  \langle x,y \rangle (in linear algebra)
  • scalar product because we take the product of two vectors and it returns a scalar (a real number)

The orthogonal projection of a vector

Given two vectors \mathbf{x} and \mathbf{y}, we would like to find the orthogonal projection of \mathbf{x} onto \mathbf{y}.

projection of a vector

Figure 16

To do this we project the vector \mathbf{x} onto \mathbf{y}


Figure 17

This give us the vector \mathbf{z}

z is the projection of x onto y

Figure 18 : z is the projection of x onto y

By definition :

cos(\theta)= \frac{\|z\|}{\|x\|}


We saw in the section about the dot product that

 cos(\theta) = \frac{\mathbf{x} \cdot \mathbf{y}}{\|x\|\|y\|}

So we replace cos(\theta) in our equation:

\|z\|=\|x\|\frac{\mathbf{x} \cdot \mathbf{y}}{\|x\|\|y\|}

\|z\|=\frac{\mathbf{x} \cdot \mathbf{y}}{\|y\|}

If we define the vector \mathbf{u} as the direction of \mathbf{y} then



\|z\|=\mathbf{u} \cdot \mathbf{x}

We now have a simple way to compute the norm of the vector \mathbf{z}.
Since this vector is in the same direction as \mathbf{y} it has the direction  \mathbf{u}



And we can say :

The vector \mathbf{z} = (\mathbf{u} \cdot \mathbf{x})\mathbf{u} is the orthogonal projection of \mathbf{x} onto \mathbf{y}.

Why are we interested by the orthogonal projection ? Well in our example, it allows us to compute the distance between \mathbf{x} and the line which goes through \mathbf{y}.


Figure 19

We see that this distance is \|x-z\|

\|x-z\| = \sqrt{(3-4)^2 + (5-1)^2}=\sqrt{17}

The SVM hyperplane

Understanding the equation of the hyperplane

You probably learnt that an equation of a line is : y = ax + b. However when reading about hyperplane, you will often find that the equation of an hyperplane is defined by :

\mathbf{w}^T\mathbf{x} = 0

How does these two forms relate ?
In the hyperplane equation you can see that the name of the variables are in bold. Which means that they are vectors !  Moreover, \mathbf{w}^T\mathbf{x} is how we compute the inner product of two vectors, and if you recall, the inner product is just another name for the dot product !

Note that

 y = ax + b

is the same thing as

y - ax - b= 0

Given two vectors  \mathbf{w}\begin{pmatrix}-b\\-a\\1\end{pmatrix} and \mathbf{x}\begin{pmatrix}1\\x\\y\end{pmatrix}

\mathbf{w}^T\mathbf{x} = -b\times (1) + (-a)\times x + 1 \times y

\mathbf{w}^T\mathbf{x} = y - ax - b

The two equations are just different ways of expressing the same thing.

It is interesting to note that w_0 is -b, which means that this value determines the intersection of the line with the vertical axis.

Why do we use the hyperplane equation \mathbf{w}^T\mathbf{x} instead of  y = ax + b ?

For two reasons:

  • it is easier to work in more than two dimensions with this notation,
  • the vector \mathbf{w} will always be normal to the hyperplane(Note: I received a lot of questions about the last remark. \mathbf{w} will always be normal because we use this vector to define the hyperplane, so by definition it will be normal. As you can see this page, when we define a hyperplane, we suppose that we have a vector that is orthogonal to the hyperplane)

And this last property will come in handy to compute the distance from a point to the hyperplane.

Compute the distance from a point to the hyperplane

In Figure 20 we have an hyperplane, which separates two group of data.

svm hyperplane

Figure 20

To simplify this example, we have set w_0 = 0.

As you can see on the Figure 20, the equation of the hyperplane is :

x_2 = -2x_1

which is equivalent to


with \mathbf{w}\begin{pmatrix}2 \\1\end{pmatrix}  and \mathbf{x} \begin{pmatrix}x_1 \\ x_2\end{pmatrix}

Note that the vector \mathbf{w} is shown on the Figure 20. (w is not a data point)

We would like to compute the distance between the point A(3,4) and the hyperplane.

This is the distance between A and its projection onto the hyperplane

svm hyperplane

Figure 21

We can view the point A as a vector from the origin to A.
If we project it onto the normal vector \mathbf{w}

projection of a onto w

Figure 22 : projection of a onto w

We get the vector \mathbf{p}

p is the projection of a onto w

Figure 23: p is the projection of a onto w

Our goal is to find the distance between the point A(3,4) and the hyperplane.
We can see in Figure 23 that this distance is the same thing as \|p\|.
Let's compute this value.

We start with two vectors, \mathbf{w}=(2,1) which is normal to the hyperplane, and \mathbf{a} = (3,4) which is the vector between the origin and A.


Let the vector \mathbf{u} be the direction of \mathbf{w}

\mathbf{u} = (\frac{2}{\sqrt{5}},\frac{1}{\sqrt{5}})

\mathbf{p} is the orthogonal projection of \mathbf{a} onto \mathbf{w} so :

\mathbf{p} = (\mathbf{u} \cdot \mathbf{a})\mathbf{u}

\mathbf{p} = ( 3 \times \frac{2}{\sqrt{5}} + 4 \times \frac{1}{\sqrt{5}}) \mathbf{u}

\mathbf{p} = (\frac{6}{\sqrt{5}} + \frac{4}{\sqrt{5}})\mathbf{u}

\mathbf{p} = \frac{10}{\sqrt{5}}\mathbf{u}

\mathbf{p} = (\frac{10}{\sqrt{5}}\times\frac{2}{\sqrt{5}},\frac{10}{\sqrt{5}}\times\frac{1}{\sqrt{5}})

\mathbf{p} = (\frac{20}{5},\frac{10}{5})

\mathbf{p} = (4,2)

\|p\| =\sqrt{4^2+2^2} = 2\sqrt{5}

Compute the margin of the hyperplane

Now that we have the distance \|p\| between A and the hyperplane, the margin is defined by :

margin = 2\|p\| = 4\sqrt{5}

We did it ! We computed the margin of the hyperplane !


This ends the Part 2 of this tutorial about the math behind SVM.
There was a lot more of math, but I hope you have been able to follow the article without problem.

What's next?

Now that we know how to compute the margin, we might want to know how to select the best hyperplane, this is described in Part 3 of the tutorial : How to find the optimal hyperplane ?

138 thoughts on “SVM - Understanding the math - Part 2

  1. Oleg Prutz

    Are you planning to tell about support vectors, non-linear kernels and optimization (I mean finding the minimum of the distance from the hyperplane to the suport vectors) in this tutorial? It seems that one need to know optimization theory in depth to understand this algorithm. It would be nice to see the simple explanation of what the algorithm is doing actually.

    1. Alexandre KOWALCZYK Post author

      Yes that is what I am planning to do. However optimization theory is indeed very important to understand the algorithm and I am still figuring out how to explain SVM without going too deep into details.

      1. Ranjith

        Hello Alexandre.. Great explanation.. Thanks..
        I have a question on figure19.

        My interpretation below:
        We are assuming ∥x−z∥ to be the hypotenuse but actually it is not the case right? the vector opposite to ∥x−z∥ is hypotenuse (X). So the formula following that should substract the distance from the hypotenuse instead of adding.

        I'm bit confused as I'm new to this. Kindly help me clarify my understanding.

        1. Alexandre KOWALCZYK Post author

          No, we are not assuming ∥x−z∥ to be the hypotenuse.

          When I do the calculation in figure, I say, let us compute the norm of the vector x-z. Let us call this vector k. What are the coordinates of k? k(5-1, 3-4) which gives us k(4,-1), then we compute the norm as usual ∥k∥ = sqrt(4^2+1^2) = sqrt(17)

          We can find the same result by applying Pythagoras' theorem multiple times. Let us consider the hypotenuse x. The theorem tells us that ∥x∥^2 = ∥z∥^2 + ∥x−z∥^2. Which is equivalent to ∥x−z∥^2 = ∥x∥^2 - ∥z∥^2. ∥x∥^2 = 3^2 + 5^2 = 9 +25 = 34.
          ∥z∥^2 = 4^2 + 1^2 = 16 + 1 =17. If we replace= ∥x−z∥^2 = ∥x∥^2 - ∥z∥^2 = 34 - 17 = 17. Hence, ∥x−z∥ = sqrt(17).

      2. Bharath

        Hello alex. I wanna learn ml. Can you suggest me the process to learn it practically as you did. Can you suggest me books .

      3. shah

        This tutorial is excellent. Its very interesting and easy to learn the difficult concepts using this tutorial. Many thanks

    2. damon

      hey, I am wondering in the real example, how do we know which data point is the closest to the hyperplane? By computing all the distances and compare them?

  2. Franck Berthuit

    Very clear article, Alexandre... and enjoyable for a poor mathematician like me.
    I'm eager to read then next one.

    1. Alexandre KOWALCZYK Post author

      Thanks for your kind comment. I need to find more time to write new articles. 🙂

      1. Shivani Bhardwaj

        I was trying to understand SVM from a very long time. your blog really helped me a lot and now I know what I am dealing with. your tutorial not only helped in understanding the mathematical jargon but also give me the clear perspective of what I am doing.
        Thanks a lot!!

    1. Alexandre KOWALCZYK Post author

      Thanks for the comment Shyam. I am afraid that recently I have spent most of my time on kaggle competitions and playing with convolutionnal neural networks. I will try to write the following part in the coming weeks in order to no achieve this work.

      1. Pragatheeswaran

        Very detailed explanation! Great work! Please notify us when you publish tutorials on CNN.

  3. Kunal

    Of all the links I found while doing a google search on SVM this is by far the best one in terms of simplicity of language in which it is explained...Thanks Alex

    1. Alexandre KOWALCZYK Post author

      I am currently writing it. But it is coming soon. 🙂

  4. dragon518

    This is the best blog about SVM I have seen ever, help me so much, thank you very much, look forward to excellent part 3. BTW, "To simplify this example, we have set w_0=0", do you mean that setting the start point of vector \mathbf{w} at origin?

    1. Alexandre KOWALCZYK Post author

      Thanks for your kind comment. No this does not mean setting the start point of the vector \mathbf{w} at the origin. We could place it somewhere else because we often consider that the parallel translate of a given vector is the same vector (this is illustrated in the section about the difference of two vectors)In the definition of the equation of a hyperplane the \mathbf{w} vector is a 3-dimensional vector \mathbf{w}(w_0,w_1,w_2). By setting w_0 to 0 we can do the remaining calculations with a 2-dimensional \mathbf{w}(w_1,w_2) vector. Because the definition says that w_0 = -b and we use w_0 = 0 instead, it removes the intercept term from the equation. As a result the hyperplane passes through the origin. In the Part 3 I wrote in more details about the hyperplane equation, things should be easier to understand.

    1. Alexandre KOWALCZYK Post author

      Thanks. The part 3 is now online. (I added the link at the end of the article)

  5. Gabriel B. Théberge

    That is effectively crystal clear! I have read a lot of papers on this topic but nothing was as clear and accessible as your presentation Alexandre!

  6. Felicia

    This is the most useful blog about SVM I've seen so far, especially for people like who don't have much knowledge in linear algebra.

    A dumb question: why is the direction of \mathbf{w} perpendicular to the hyper-plane?

  7. Subha MG

    Hi Alexandre..Your blog is simply superb! The way you've explained concepts!! I saw several videos on SVMs..but I didn't get a clear picture..Your articles have made it super-clear!! Super-like!!

  8. Md. Asadur Rahman

    No Word to thank you, brother! I was very worried and eager to learn about SVM, You have solved my problem. Be blessed by Almighty.

  9. Sameer Panna

    Great Blog.. and excuse my ignorance but can you please explain how one arrives here w(−b,−a,1) and x(1,x,y)

    1. Alexandre KOWALCZYK Post author

      Thank you. You find these two vectors by continuing the reasoning.
      We want to express the equation y-ax−b=0 with a dot product between two vectors.
      The dot product is the sum of several products. In our case there is two minus signs, so there is three elements being summed together. Our vectors will have three elements each. Then we transform the equation to display these products: y−ax−b=0 is equivalent to y*1−a*x−b*1=0 and then we transform the differences into sums : y*1+(−a)*x+(−b)*1=0

  10. Koushik

    Wonderful gives me clear understanding even in some of Linear Algebra concepts. Thanks....keep this good work up..... 🙂

  11. adhi21

    I am understand now why the equation is w^(T) . x + b. I have another question, what does "T" mean in that equation?
    Thank you

  12. omar

    thank you very much about that useful tutorial , can you write an article about dealing SVM with non_linear dataset

    1. Alexandre KOWALCZYK Post author

      Thanks for the suggestion. I will try to finish this tutorial series first. 😉

  13. sawi

    Thanks for a very useful article, explaining every tiny detail about the calculation with simple language and figures.

  14. chansungpark

    Thank you for very explicit explanation.
    I have a question since I have no background knowledge about cosine, sine, etc.
    Shouldn't Direction of vector be just angle of the triangle? I am just curious what cos(β)= adjacent/hypotenuse formula fundamentally means?

    1. Alexandre KOWALCZYK Post author

      No because you can have another vector with the same angle between the axis and itself but with the vectors pointing in another direction. By using cosine, we use the length of the adjacent and hypotenuse and as we are using coordinates we obtain a vector pointing in the same direction.

    1. Alexandre KOWALCZYK Post author

      This was a simplification. The margin is explained in more details in Part 3.

  15. Brandon

    Thank for you blog, that is great. However, i have a question about the W(-b,-a,1) and X(1,x,y),
    the transpose of W is a column vector and X is a row vector, the result of [ column * row] is a matrix that size is (3,3), can you tell me where i missing?

    1. Alexandre KOWALCZYK Post author

      Good catch. Indeed both w and x needs to be column vectors so that the transpose of w is a row and we do [row * column]. I updated the article. Thanks!

  16. Shawn

    So far the best explanation of SVM in the net for those who do not have the required math background. Fantastic job! Please post non-linear SVMs and Kernel explanation. Thanks a lot !

  17. Ashutosh Srivastava

    Very Nice and crystal clear explaination i have ever found on internet.
    It will be very helpful if you give some practical demostration of how SVM and other
    learning algorithms can be implemented and interpreted on various platforms like weka and orange.What is confusion matrix and ROI. How that Wt-b equation is generated etc.
    Giving practical demonstration will be very helpful.

    Thanks and Regards...

  18. Everaldo Aguiar

    Phenomenal explanation of SVMs. Thanks a lot for taking the time to write and publish this. I was wondering if you would mind if I used brief excerpts of your content. I am preparing a few slides for a course that I will be teaching and found some of your images and explanations very helpful. I'll be sure to include citations and a reference to your blog posts.

    1. Alexandre KOWALCZYK Post author

      Thanks for your comment. No problem you can use some excerpts. For which course is it?

  19. Farai Leboho

    Speechless, this is downright simple to understand. This makes SVM move from very hard to simply understandably, thanks a lot mate. At least now i have an idea of what's happening behind the scenes of svm.SVC().fit(),
    Great work.

  20. rssoni

    The best explanation I can think of. I made the concept clear. The writing style is lucid and understandable. Nothing else can be easier than this explanation of SVM. Awesome, thanks Alex

  21. A Logical Geek

    You are amazing. Thanks a lot for this.. Because of lack of enough maths background i have having difficulty reaching here . You helped a lot. Is it possible for you to explain the justification of langrange's multipliers as well as further explanation of SVM.

  22. Ganecian

    I'm still confused in determining normal vector w. If the equation of the hyperplane is x2 = 1/3 * x1 + 1, what is the normal vector w? How to calculate it when w0 is not zero. Thanks

    1. Alexandre KOWALCZYK Post author

      In your example: x2 = 1/3 * x1 + 1, is in the form: x2 = a * x1 + b. To get the normal vector you just get the vector w(a,-1). So in this case, we define w(1/3,-1), x(x1,x2) and b = 1. And you can see that wx+b=0 is equivalent to x2 = 1/3 * x1 + 1.

      Now if we plot, the vector w(1/3,-1), we can start to draw it where we want. I could start drawing at the origin x(0,0) or I can start drawing it directly on the hyperplane. I choose to do so, and I start drawing it at x(1,1+1/3).
      As you can see in the figure: it is normal to the hyperplane.
      W is normal

      Where I chose to start drawing it, does not change the fact that it is normal to the hyperplane.

  23. Beungeut Boloho

    Hi, can you explain what is the bias b visually? Is it the distance of vertical axis to the origin or the distance of the hyperplane to the origin?

    1. Alexandre KOWALCZYK Post author

      Given an hyperplane having the equation wx+b=0 with vectors w(w0,w1) and x(x0,x1). b is the distance between the vertical axis and the origin only when the value w1 of the weight vector is equal -1. Indeed, when we transform this hyperplane equation to a line equation of the form y=ax+c we get a = -w0/w1 and c = -b/w1. Some books represent b as being the distance between the origin and the hyperplane, but I think this is true only under certain conditions, at least that is what I found when trying to verify it by myself using the first formula of this article using formulas from this page.

  24. Bahareh Moradi

    Thank you a thousand times............You explained Lagrange multipliers in the best way in the world.....
    can you introduce me some useful books which I can read and get more information about classification?

  25. Jeetendra Ahuja

    Awesome tutorial, TAL man!
    Just a small suggestion, when you give a link like for "cumulative", "dot product" , can you change a code of your site such that after clicking on this link, it gets open in different tab instead of opening in current tab.

    1. Alexandre KOWALCZYK Post author

      Hello. Thank you for your comment. I thought all my links were opening in a new tab but indeed it was not the case. I updated all the problematic links in this article. Thank for the remark !

  26. Sonia


    x+.w+b= +1
    x_.w+b = -1
    why it is always equal to +1 and -1 for positive and negative support vector respectively.?. How to normalize this distance of hyperplane to support vectors?. Once we normalize, it always remains same for any kind of data. could you explain?. I am not clear about the distance between hyperplane and support vectors

    1. Alexandre KOWALCZYK Post author

      It is always equal to +1 and -1 because we are free to select w for which it will be the case (we can rescale w and b and keep the same hyperplane). So we decide arbitrarily to select among the ones for which it is equal to +1 and -1 because it will make the following computation easier.

  27. Vijay Gupta

    Hi ... Very good and detailed article . But one doubt : the vector w will always be normal to the hyperplane .. Why is this so ?

        1. Alexandre KOWALCZYK Post author

          Hi, I thought that it was pretty easily printable with the theme I used. Do you mean you would like a printable version of the whole tutorial? Do you have trouble printing an invidual article?

    1. YAN KANG

      Yes, I have doubt here too. What does the normal mean here? I think it mean Orthogonal since two vectors u and v whose dot product is u·v=0 are said to be orthogonal.

  28. siddhant

    I had taken machine learning as my final semester research topic.For about a month I was unable to decide which topic to specifically decide to work upon.After a month my guide told me to work upon SVM in image processing.I had a little knowledge about SVM but the math part was very difficult.It was intricate as to say.
    Finally I came upon this blog and found it very help , the math was really very detailed yet simple to understand.Thanks for such a nice article the 6 part series has helped me a lot in understanding machine learning as a whole.

  29. ridarafisyed

    very very nice and clear article. but please will you tell me about "the vector w will always be normal to the hyperplane". i don't understand this point...

    1. Alexandre KOWALCZYK Post author

      As stated in the comments before, this is by definition. We defined the hyperplane with this equation, so w is normal. If you read the example in this page: you can see that they say "Let’s also suppose that we have a vector that is orthogonal (perpendicular) to the plane".

  30. pradeep

    We can see that ww as indeed the same look as uu except it is smaller. Something interesting about direction vectors like ww is that their norm is equal to 1. That's why we often call them unit vectors. -- why direction vectors like 'w' is that their norm is equal to 1? can you please explain?

  31. Nick

    I understood the proceeding for calculate the distance but there is a method more simple for do it: distance= dot(A,w)/||w|| Is there a particular reason for prefer your described method rather than my simpler?

    Great tutorial! Compliments!

    1. Alexandre KOWALCZYK Post author

      Thank you for your comment 🙂

      Yes indeed your method is simpler. And it is basically what we are doing in the article because dot(A,w)/||w|| = dot(A,u) and then we create p using dot(A,u)u.

      My goal was to use the projection p so that you can see what is this distance in the figure.

  32. Al Fritsch

    Question for you, why did you call the resulting vector "u" instead of calling it "y" hat which it really is (a unit vector in the direction y).

    1. Alexandre KOWALCZYK Post author

      I believe we can call the vector the name we want as long as we define it explicitly.

  33. Jill Minor

    I can't thank you enough for this. I'm in 500 level Artificial intelligence but I have no real background, even in linear algebra. You saved me.

  34. Rahul

    Hi, great tutorial I am learning lots from this. I have a doubt though, to find the orth proj of a on X1=-2x2 you have used w(1,2), how did u arrive to this value of w? The ortho project of a onto hyper plane X1=-2x2 should be same from any point normal to it, but your equations are dependent on the values of w and if I were to choose a different line normal to hyperplane I would get a diff magnitude won't I ? It would be great if you can explain

    1. Alexandre KOWALCZYK Post author

      Yes indeed but it does not matter if you get a different magnitude because you are trying to minimize the distance and you would use the same w in all your calculations. This ability to choose the w arbitrarily is used later when we define the optimization problem. I explain this in detail in my ebook.

  35. Sunder


    It is showing this error in many places on the page: '[Math Processing Error]'

    Can you please check?


    1. Sunder

      Now, it looks fine. Don't knw what caused it last time.

      Thanks for the prompt reply.


  36. uday

    "" Our goal is to find the distance between the point A(3,4) and the hyperplane.
    We can see in Figure 23 that this distance is the same thing as ∥p∥""

    How did you know that the distance of to hyperplane is equal to ∥p∥ ??

    1. Alexandre KOWALCZYK Post author

      Start from the hyperplane and follow the dotted line until you reach A. This is the distance we want to find. Start from the hyperplane (0,0) and follow the line until you reach the end of p. You can see that this distance is the same. You have to see that p is the vector starting at (0,0 and finishing at (4,2). ||p|| the length of p, so ||p|| is this distance we are looking for

  37. Lalit

    This really easy to understand. I had forgotten concepts of vectors, It came handy as a refresher.

  38. rio

    I saw a lot of materials related to SVM...
    (Almost materials are not good to understand.. because
    explanation is not good I think..)


    very very thank you !!!!

  39. Pratik

    Hi Alex,

    All the mathematics and calculus would be more clear if you can show it with some working example.
    Example: We have a data set that has 2 explanatory variables i.e x1 and x2 and one binary outcome variable y. Then how would we calculate hyper plane and w values.

  40. Pragatheeswaran

    A small doubt,
    Before fig 19, The formula to find the vector is derived as,
    z = (u.x)u
    and in the derivation I found this equation
    |z| = u.x -------------(remember)

    After fig 23 to find p, you said p = (u.a)u
    so like that we can find |p| = u.a using this

    but you are trying to find vector p and then |p|
    so if our ultimate aim is to find |p| then we can use only |p|=u.a
    without finding the actual vector

    1. Alexandre KOWALCZYK Post author

      Yes indeed, we do not need to find the actual vector p to compute its norm, but I found it more easy to understand this way. Thanks for your insight!

  41. janne

    Thank you for this great tutorial!
    I have recently started my hopeless journey to understanding SVM. I'm wondering why this example starts hyperplane calculations from point origin (0,0)? how do equations change in situation when every data point is of the right side of (0,0). Where do we get the starting point?

    1. Alexandre KOWALCZYK Post author

      Hello Janne and thank you for your comment. This example use an hyperplane passing through the origin in order to be more simple and avoid adding another term in the equation. 🙂

  42. Raj

    Your explanation is very helpful to me, but I have a doubt,
    If we define the vector u

    as the direction of y


    u= y / ∥y∥ how we got this, please explain

    1. Alexandre KOWALCZYK Post author

      We are using the definition of the direction: the direction of a vector x(x_1, x_2) is the vector w(x_1/|x|,x_2/|x|)

  43. Lakshmikanta Nath

    hii...first of all i would like to thank you for sharing a wonderful site like this. I found this article quite interesting and useful for me.But, unfortunately I would like you bring your kind attention that, in this article, it is showing "MATH PROCESSING ERROR" instead of showing the actual math equation or any numeric.

    Hope you will consider this situation and would make the necessary corrections and would reply me.

    Have a nice day.

    Thanking you.

    1. Alexandre KOWALCZYK Post author

      Hello, Thank you for your message. The equations are displayed correctly on my computer. Maybe you have a problem with your web browser. I am using Chrome and everything looks good. You could try with another browser.

  44. Shadab Faiz

    Nice article..!!! However 1 thing bug me most. Since we wanted to find the distance of A from hyperplane, then why don't we just use pythagoras theorum instead. That seems to be much easier.

    1. Alexandre KOWALCZYK Post author

      We need to know the norm of p to use Pythagoras, so it seems the same for me. What was your idea?

  45. GHALI

    I want to thank you very very very much.
    it's one of my best tutorial that i read and understand in my career.
    thank's again.
    my best reguard
    have a nice day

  46. Erfan

    You are amazing Alexander , thank you very much. you did a big job !
    god bless you.
    i wish if you post a lecture about LDA and PCA .
    thanks again

  47. Anuj Gupta

    I am sorry but I am still confused about vector. I understand the math and the numbers you have shown here, but I cannot visualise vector yet in real world. When you add 2 and 3 apples I can visualise 5 apples. When you add two vectors, subtract them, dot, cross product, I understand the math, but I still cant visualise it.

    1. Alexandre KOWALCZYK Post author

      Well, in figure 7 you can visualize that the sum of the two vectors (the two arrows) produce a third vector (the middle arrow). Here the arrow, is a representation of the vector. The same way I could represent the number one with the picture of one apple, and the number five with the picture of five apples. What is important to understand is that both a number and a vector should be considered as mathematical objects. And that these object have some properties. I can recommend you to read the books Mathematics a very short introduction and How to study as a mathematics major to get a better understanding of how you can consider mathematical objects.


    A very easy and fluent explanation of dot product, equation of plane and matrix combined. And the figures make it so clear. Thank you.!


    ashoke saha

  49. Adrian

    We need more people like you in this world! Clearer than any other resource I have seen out there. Please keep doing more great work like these 🙂

  50. Sasha

    Hello, can you some explain. After the word "Define the vector u as the direction of y"
    I dont understand how that formula appear u=y/|y| can anyone explain please why devide
    I try find information but I can't in this case. Thanks

  51. valekar

    Great Explanation!! Crisp and clear.
    I just had one doubt, how did you consider the point "w" value to be (1,2) as the value? Is it chosen randomly?

    1. Alexandre KOWALCZYK Post author

      The value of w is derived from the equation of the hyperplane. Search the sentence starting with "As you can see on the Figure 20".

      1. Maruf

        undoubtedly it's a great work! Thanks for this outstanding explanation. I have a question. If a want to implement it in matlab to classify ,then how will I put the hyperplane or the value of w?

  52. Kaushal

    Really great in depth explanation of SVM. It would be great if you write tutorial about kernels, Lagrange duality and other theoretical concepts. Thanks

  53. forough

    Thank you!
    my question:
    According to Fig. 20, Why the equation of the hyperplane is: X2=-2X1
    Please explain
    Thank you very much.

  54. TheeNinja

    Hello, this tutorial makes the theory of SVM very understandable. I have one question regarding wt = 0 in this tutorial and wt + b = 0 in the third tutorial (both forms of an equation of hyperplane, like you said).

    Does the magnitude of w matter, or simply its direction. It seems that in the equation w is used in order to yield all possible values of x that are perpendicular to w. If two vectors are perpendicular, magnitude of both do not batter due to the fact that dot product will always be 0. In addition, you take unit vector of w when projecting the vector to point A in this tutorial. If it is true that magnitude of W does not matter, then I will completely understand the information needed for me to go onto part 4.

    1. TheeNinja

      I'm sorry, i mean wx not wt (I am omitting the superscript of t due to this being a textbox).

  55. Allan Johns

    Hello Alexandre,
    I appreciate the lessons you are giving here. At the begining I had the concept that SVMs are insanely difficult, but by studying your tutorials I am able to say that I am specializing in machine learning.
    Thank you very much!

  56. Satya

    The vector z=(u⋅x)u is the orthogonal projection of x onto y.
    Actual projection formula is (u.x).u/|u|^2
    Why we are not using |u|^2 in denominator

  57. Sunita Anand

    Thanks for the excellent article. Your tutorials have helped me to understand the concepts of SVM and the maths behind it.

Comments are closed.