Perceptron Algorithm - How it Works?

1. Introduction

The perceptron algorithm is a type of neural network algorithm and was developed by Frank Rosenblatt in the late 1950s. It is a simple linear binary classification algorithm used for supervised learning.

This algorithm takes an input vector of features and assigns each feature a weight. It then calculates the weighted sum of the input features and applies a threshold function (or the activation function) to the sum. Further, if the result is above a certain threshold, the algorithm outputs one class, and below the threshold, the other class.

The perceptron algorithm learns by adjusting the weights of the input features based on the error in its predictions. It does this by comparing its output to the true label of the data point and updating the weights accordingly. The algorithm continues to iterate over the data until it converges to a set of weights that produce accurate classifications on the training data.

The perceptron algorithm can only classify linearly separable data. It means that there exists a hyperplane that perfectly separates the two classes, i.e., there exists a linear decision boundary that can perfectly separate the two classes. This algorithm is suitable for tasks such as text classification, image recognition, and pattern recognition. However, it has limitations and can fail to converge or produce incorrect classifications on certain types of data or produce a suboptimal solution. I will explain all these concepts in detail in the third blog article of the series "Understanding Deep Learning Basics".

2. Steps in the perceptron algorithm

The perceptron algorithm, a simple neural network, consists of a single layer of neurons, and it is based on the idea of finding a linear decision boundary that separates the two classes of data. Following are the steps of a perceptron algorithm -

Initialize the weights: First, we need to initialize the weights of the perceptron either to a small random value or to zero.
Input the data: The perceptron is then fed with a set of input data (X) and their corresponding binary labels (y).
Compute the output: Afterwards, the perceptron computes a weighted sum of the input features and applies an activation function to the result. The output of the perceptron is a binary prediction of the class label.
Update the weights: If the prediction is incorrect, the weights are updated using the perceptron learning rule, which adjusts the weights in the direction of the misclassified point. The update rule is given by:

w = w + η *(y - ŷ) * x

where,

w is the weight vector,

η is the learning rate (a hyperparameter that controls the step size of the update),

y is the true label, ŷ is the predicted label, and

x is the input vector.
Repeat: Next, we repeat steps 2-4 until the algorithm converges (i.e., the weights do not change significantly between iterations) or a stopping criterion is met.

Now let us dig deeper into the mathematical explanation of the perceptron algorithm working.

A mathematical basis of the working of the perceptron algorithm

Mathematically, the perceptron algorithm can be described as follows:

Given a set of N data points, where each data point is represented by a feature vector x_i and a label y_i, the goal of the perceptron algorithm is to learn a weight vector w that produces accurate classifications on the training data.

At each iteration of the algorithm, the perceptron takes a data point x_i and computes the weighted sum of its features, z_i, as follows:

z_i = w^T x_i

where w^T is the transpose of the weight vector w.

The perceptron then applies a threshold function to the weighted sum, g(z_i), to predict the label of the data point, ŷ_i, as follows:

g(z_i) = {1 if z_i > 0 -1 otherwise}

ŷ_i = g(z_i)

If the predicted label, ŷ_i, is equal to the true label, y_i, then the perceptron continues to the next data point. If the predicted label is incorrect, then the perceptron updates the weight vector by adding or subtracting the feature vector of the data point, x_i, multiplied by a learning rate, η, as follows:

w = w + η (y_i - ŷ_i) x_i

where η is a hyperparameter that controls the step size of the weight updates.

The perceptron algorithm continues to iterate over the data until it converges to a set of weights that produce accurate classifications on the training data.

Despite its simplicity and limitations, the perceptron algorithm has historical significance in the development of neural networks and machine learning, and it laid the foundation for more advanced neural network architectures.

3. Implementing perceptron algorithm in C++

The following is the implementation of the perceptron algorithm in C++:

#include <iostream>
#include <vector>

using namespace std;

class Perceptron {
private:
    vector<double> weights;
    double learning_rate;
    int n_iterations;
public:
    Perceptron(double lr, int n_iters) {
        learning_rate = lr;
        n_iterations = n_iters;
    }

    int predict(vector<double>& input) {
        double activation = 0;
        for (int i = 0; i < weights.size(); i++) {
            activation += weights[i] * input[i];
        }
        return activation >= 0 ? 1 : -1;
    }

    void train(vector<vector<double>>& X, vector<int>& y) {
        weights.resize(X[0].size());
        for (int i = 0; i < X[0].size(); i++) {
            weights[i] = 0;
        }

        for (int iter = 0; iter < n_iterations; iter++) {
            for (int i = 0; i < X.size(); i++) {
                int prediction = predict(X[i]);
                int error = y[i] - prediction;
                for (int j = 0; j < weights.size(); j++) {
                    weights[j] += learning_rate * error * X[i][j];
                }
            }
        }
    }
};

int main() {
    vector<vector<double>> X = {{1, 2}, {2, 3}, {3, 1}, {4, 3}};
    vector<int> y = {1, 1, -1, -1};
    Perceptron p(0.1, 10);
    p.train(X, y);

    vector<double> test1 = {1, 1};
    vector<double> test2 = {4, 4};
    cout << "Prediction for test1: " << p.predict(test1) << endl;
    cout << "Prediction for test2: " << p.predict(test2) << endl;

    return 0;
}

Assumptions -

The input data is stored as a vector of vectors (i.e., each row of the input matrix is a vector) and
That the labels are stored as a separate vector of integers.

The following steps are worth mentioning -

The Perceptron class contains a predict method that takes an input vector and returns a binary prediction of the class label (+1 or -1) based on the current weights.
The train method performs the training loop, updating the weights using the perceptron learning rule.
In the main function, a Perceptron object is created with a learning rate of 0.1 and 10 iterations.
The train method is called with the input data and labels, and the resulting weights are used to make predictions on two test examples.

Now let us look at all these steps in detail in the next subsection.

Steps of the perceptron algorithm in C++

The Perceptron class is defined with a constructor that takes a learning rate and a number of iterations.

class Perceptron {
private:
    vector<double> weights;
    double learning_rate;
    int n_iterations;
public:
    Perceptron(double lr, int n_iters) {
        learning_rate = lr;
        n_iterations = n_iters;
    }
    // ...
};

The predict method is defined to take an input vector and return a binary prediction of the class label based on the current weights. The activation of the perceptron is calculated as the dot product of the input vector and the weight vector. If the activation is greater than or equal to zero, the prediction is +1, otherwise, it is -1.

int predict(vector<double>& input) {
    double activation = 0;
    for (int i = 0; i < weights.size(); i++) {
        activation += weights[i] * input[i];
    }
    return activation >= 0 ? 1 : -1;
}

The train method is defined to perform the training loop for a fixed number of iterations using the perceptron learning rule. The weight vector is initialized to zero, and for each iteration, the weight vector is updated for each training example. For each training example, the activation of the perceptron is calculated, and the weights are updated based on the difference between the predicted label and the true label using the update rule.

void train(vector<vector<double>>& X, vector<int>& y) {
     weights.resize(X[0].size());
     for (int i = 0; i < X[0].size(); i++) {
         weights[i] = 0;
     }

     for (int iter = 0; iter < n_iterations; iter++) {
         for (int i = 0; i < X.size(); i++) {
             int prediction = predict(X[i]);
             int error = y[i] - prediction;
             for (int j = 0; j < weights.size(); j++) {
                 weights[j] += learning_rate * error * X[i][j];
             }
         }
     }
 }

In the main function, a Perceptron object is created with a learning rate of 0.1 and 10 iterations. The train method is called with the input data and labels, and the resulting weights are used to make predictions on two test examples.

 int main() {
       vector<vector<double>> X = {{1, 2}, {2, 3}, {3, 1}, {4, 3}};
       vector<int> y = {1, 1, -1, -1};
       Perceptron p(0.1, 10);
       p.train(X, y);

       vector<double> test1 = {1, 1};
       vector<double> test2 = {4, 4};
       cout << "Prediction for test1: " << p.predict(test1) << endl;
       cout << "Prediction for test2: " << p.predict(test2) << endl;

       return 0;
   }

In this example, the input data is represented as a vector of vectors, where each row represents a training example, and each column represents a feature. The labels are stored in a separate vector. The Perceptron object is trained using the train method, and the resulting weights are used to make predictions on two test examples using the predict method.

4. Implementing the perceptron algorithm in Python

The following is the implementation of the perceptron algorithm in Python:

import numpy as np

class Perceptron:
    def __init__(self, learning_rate=0.1, n_iterations=10):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations

    def fit(self, X, y):
        self.weights = np.zeros(X.shape[1] + 1)

        for _ in range(self.n_iterations):
            for i in range(X.shape[0]):
                activation = np.dot(X[i], self.weights[1:]) + self.weights[0]
                if activation >= 0:
                    prediction = 1
                else:
                    prediction = -1
                update = self.learning_rate * (y[i] - prediction)
                self.weights[1:] += update * X[i]
                self.weights[0] += update

    def predict(self, X):
        activation = np.dot(X, self.weights[1:]) + self.weights[0]
        return np.where(activation >= 0, 1, -1)

Assumptions -

The input data is stored as a numpy array, where each row represents a sample and
Each column represents a feature. The labels are also stored as a numpy array.

The important steps are -

The Perceptron class contains a fit method that takes the input data and labels and trains the perceptron using the perceptron learning rule.
The predict method takes an input data matrix and returns a binary prediction of the class labels (+1 or -1) based on the current weights.
In the fit method, the weights are initialized to zero, and the perceptron learning rule is applied for a fixed number of iterations.
For each iteration, the weights are updated based on the difference between the predicted label and the true label using the update rule.
The predict method calculates the activation of the perceptron for each sample and applies a threshold function to return the binary prediction.

Now let us look in detail at each step.

Steps of the perceptron algorithm in Python

Define the Perceptron class with a constructor that takes a learning rate and a number of iterations:

class Perceptron:
    def __init__(self, learning_rate=0.1, n_iterations=100):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self. Weights = None

Define the fit method to train the perceptron on a training set. The method initializes the weights to zero and loops through the training set for a fixed number of iterations, updating the weights according to the perceptron learning rule:

def fit(self, X, y):
    # Initialize weights to zero
    self.weights = np.zeros(X.shape[1])

    # Loop through training set for a fixed number of iterations
    for _ in range(self.n_iterations):
        for i in range(X.shape[0]):
            # Calculate the predicted class
            prediction = np.dot(X[i], self.weights)

            # Update weights based on the perceptron learning rule
            update = self.learning_rate * (y[i] - prediction)
            self.weights += update * X[i]

Define the predict method to make predictions on new data. The method calculates the predicted class by taking the dot product of the input features and the learned weights, and returns a binary output based on whether the predicted value is greater than or equal to zero:

def predict(self, X):
    # Calculate the predicted class
    predictions = np.dot(X, self.weights)

    # Return binary output
    return np.where(predictions >= 0, 1, -1)

In the main function, create a Perceptron object and train it on a training set using the fit method. Then, use the learned weights to make predictions on a test set using the predict method:

# Create a Perceptron object and train it on a training set
p = Perceptron()
p.fit(X_train, y_train)

# Use the learned weights to make predictions on a test set
y_pred = p.predict(X_test)

In this example, X_train and X_test are numpy arrays containing the training and test input features, and y_train is a numpy array containing the corresponding class labels. The Perceptron object is trained using the fit method, and the resulting weights are used to make predictions on the test set using the predict method.

5. Cases where the perceptron algorithm fails

The perceptron algorithm works well on linearly separable data. However, there are certain cases where the perceptron algorithm may fail to converge or produce incorrect classifications. They are -

Non-linearly separable data: The perceptron algorithm can only classify data that is linearly separable, meaning that there exists a hyperplane that can perfectly separate the two classes. If the data is not linearly separable, the perceptron algorithm will not converge and may produce incorrect classifications.
Imbalanced data: If the data has a severe class imbalance, where one class has significantly more data points than the other, the perceptron algorithm may produce biased classifications. This is because the algorithm is not optimized to handle imbalanced data and may classify all new data points as the majority class.
Overlapping classes: If the two classes of data overlap in the feature space, it may be impossible to find a hyperplane that perfectly separates the two classes. In this case, the perceptron algorithm may converge, but it may produce incorrect classifications on new data points.
Redundant features: If the input features have redundant or irrelevant information, the perceptron algorithm may produce incorrect classifications. This is because the algorithm is sensitive to the input features and may classify based on irrelevant information.
Noise in the data: If the data has noise or errors, the perceptron algorithm may produce incorrect classifications. This is because the algorithm tries to fit a line to the data, which may be influenced by the noise and result in incorrect classifications.

Thus, the type of data put a limiting factor on the efficient working of the perceptron algorithm resulting in failing to converge or produce incorrect classifications

6. Summary

In summary, the perceptron algorithm computes the weighted sum of the features of a data point, applies a threshold function to the sum, and updates the weights based on the error in its predictions. It learns a decision boundary that separates the two classes of data points and produces accurate classifications on new data points.

The perceptron algorithm is a linear binary classification algorithm that learns a decision boundary to separate two classes of data points.
It was developed by Frank Rosenblatt in the late 1950s and is a type of neural network algorithm.
The perceptron algorithm takes an input vector of features and assigns each feature a weight.
It computes the weighted sum of the input features and applies a threshold function to the sum to predict the label of a data point.
If the predicted label is incorrect, the perceptron algorithm updates the weights based on the error in its predictions.
The algorithm continues to iterate over the data until it converges to a set of weights that produce accurate classifications on the training data.
The perceptron algorithm can only classify linearly separable data, meaning that there exists a hyperplane that perfectly separates the two classes.
The algorithm is simple and efficient and can be used for tasks such as text classification, image recognition, and pattern recognition.
However, the perceptron algorithm has limitations and can fail to converge or produce incorrect classifications on certain types of data.
The perceptron algorithm is a building block for more complex neural network architectures and has paved the way for deep learning research and applications.