Pereceptron Learning Algorithm (PLA) is a binary classifier which can partition the linear separable points into two classes.
Based on the Perceptron Convergence Theorem, we have:
For any finite set of linearly separable labeled examples, the PLA will halt after a finite number of iterations.
But why and when perceptron halts?
Next, we will prove the Perceptron Convergence Theorem step by step.
Notations:
: the weight of step
: the example point used at step
: the perfect weight corresponding to the target function, which means
: the angle between
: the cos value of angle between
: margin, i.e. the Euclidean distance of the point from the plane , where is strictly positive since all points are classified correctly.
: the minimal margin relative to the separation hyperplane .
Assume at the step, , then the weight is updated by .
So we have , and .
Then the numerator of is:
After applying the above inequality above n times, starting from , to get (here we get the numerator of )
If n is large enough, then we have
Consider the denominator of ,
where
Apply the above inequality n times, we get
if n is large enough, then we get (here we get the denominator of )
Based on the inequality of both numerator and denominator of , we get
We also know , so and
Now we get the maximum step is less than