At its core a perceptron model is one of the simplest **supervised learning** algorithms for binary classification. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

A more intuitive way to think about is like a **Neural Network with only one neuron**.

The way it works is very simple. It gets a vector of input values ** x** of which each element is a feature of our data set.

# An Example:

Say that we want to classify whether an object is a bicycle or a car. For the sake of this example let’s say that we want to select 2 features. The height and the width of the object. In that case ** x = [x1, x2]** where

**is the height and**

*x1***is the width.**

*x2*Then once we have our input vector ** x** we want to multiply each element in that vector with a weight. Usually the higher the value of the weight the more important the feature is. If for example we used

**as feature**

*color***and there is a red bike and a red car the perceptron will set a very low weight to it so that the color doesn’t impact the final prediction.**

*x3*Alright so we have multiplied the 2 vectors ** x** and

**and we got back a vector. Now we need to sum the elements of this vector. A smart way to do this is instead of simple multiplying**

*w***by**

*x***we can multiply**

*w***by**

*x***where**

*wT***stands for transpose. We can imagine the**

*T***of a vector as a rotated version of the vector. For more info you can read the Wikipedia page. Essentially by taking the transpose of the vector**

*transpose***we get an**

*w***vector instead of a**

*Nx1***. Thus if we now multiply our input vector with size**

*1xN***with this**

*1xN***weight vector we will get a**

*Nx1***vector (or simply a single value) which will be equal to**

*1×1***. Having done that, we now have our prediction. But there is one last thing. This prediction will probably not be a simple 1 or -1 to be able to classify a new sample. So what we can do is to simply say the following: If our prediction is bigger than 0 then we say that the sample belongs to class 1, otherwise if the prediction is smaller than zero we say that the sample belongs to the class -1. This is called a step function.**

*x1 * w1 + x2 * w2 + … + xn * wn*But how do we get the right weights so that we do correct predictions? In other words, how do we ** train** our perceptron model?

Well in the case of the perceptron we do not need fancy math equations to ** train** our model. Our weights can be adjusted by the following equation:

*Δw = eta * (y – prediction) * x(i)*

where ** x(i)** is our feature (x1 for example for weight 1, x2 for w2 and so on…).

Also noticed that there is a variable called ** eta** that is the learning rate. You can imagine the learning rate as how big we want the change of the weights to be. A good learning rate results in a fast learning algorithm. A too high value of

**can result in an increasing amount of errors at each epoch and results in the model doing really bad predictions and never converging. Too low of a learning rate can have as a result the model to take too much time to converge. (Usually a good value to set**

*eta***to for the perceptron model is 0.1 but it can differ from case to case).**

*eta*Finally some of you might have noticed that the first input is a constant ( 1 ) and is multiplied by w0. So what exactly is that? In order to get a good prediction we need to add a bias. And that’s exactly what that constant is.

To modify the weight of the bias term we use the same equation as we did for the other weights but in this case we do not multiply it by the input (because the input is a constant 1 and so we don’t have to):

*Δw = eta * (y – prediction)*

So that is basically how a simple perceptron model works! Once we train our weights we can give it new data and have our predictions.

# NOTE:

The Perceptron model has an important disadvantage! It will never converge (e.i find the perfect weights) if the data isn’t linearly separable, which means being able to separate the 2 classes in a feature space by a straight line. So in order to avoid that it is a good practice to add a fixed number of iterations so that the model isn’t stuck at adjusting weights that will never be perfectly tuned.

if you want to reproduce, please indicate the source:

machine-learning – What exactly is a perceptron? - CodeDay