Perceptron Principles and Implementation
The perceptron is a foundational concept in machine learning and the cornerstone of neural networks. This article explores the principles and implementation of perceptrons through single-output perceptrons, multi-output perceptrons, the chain rule, and the backpropagation algorithm.
Single-Output Perceptron
The single-output perceptron is the simplest neural network. Its core idea is to use a linear function
Mathematical Formulation
- Activation function:
where is an activation function (typically the sigmoid). - Loss function (squared error):
Gradient Calculation
Using gradient descent to optimize perceptron weights, we have:
Python Example
import numpy as np
# Activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Forward pass
def forward(X, W, b):
return sigmoid(np.dot(X, W) + b)
# Loss function
def loss(O, T):
return 0.5 * np.sum((O - T) ** 2)
# Gradient update
def update_weights(X, W, b, O, T, lr=0.1):
delta = (O - T) * O * (1 - O)
dW = np.dot(X.T, delta)
db = np.sum(delta)
W -= lr * dW
b -= lr * db
return W, b
Training output before:
[[1]
[1]
[1]
[1]]
Epoch 0, Loss: 0.7340890587930462
Epoch 2000, Loss: 0.031960823202117496
Epoch 4000, Loss: 0.01475373398032874
Epoch 6000, Loss: 0.009359312487717476
Epoch 8000, Loss: 0.006791321484745856
Training output after:
[[0]
[0]
[0]
[1]]
Multi-Output Perceptron
The multi-output perceptron extends the single-output version to allow multiple output nodes, suitable for multi-class classification tasks.
Mathematical Formulation
-
Multi-output formula:
-
Total error:
Gradient Calculation
Python Example
# (Code unchanged)
Epoch 0, Loss: 2.0055848558408393
Epoch 2000, Loss: 0.2695123992451301
Epoch 4000, Loss: 0.26755760116702415
Epoch 6000, Loss: 0.2668457531681846
Epoch 8000, Loss: 0.26653101686497677
Test set accuracy: 100.00%
Chain Rule
The chain rule is the core tool used to compute gradients in complex neural networks. It propagates the error step by step through each layer.
Chain Rule Formula
In neural networks:
Python Example
def chain_rule_grad(X, W, b, T, lr=0.1):
O = forward(X, W, b)
delta = (O - T) * O * (1 - O)
dW = np.dot(X.T, delta)
db = np.sum(delta, axis=0)
W -= lr * dW
b -= lr * db
return W, b
Backpropagation Algorithm
Backpropagation applies the chain rule across all layers of a neural network.
Formula Derivation
-
Output layer:
-
Hidden layer:
Python Example
# (Code unchanged)
Epoch 0, Loss: 0.5887016577000074
Epoch 2000, Loss: 0.4070781963562403
Epoch 4000, Loss: 0.09266207200904633
Epoch 6000, Loss: 0.016896752153564412
Epoch 8000, Loss: 0.008292971056384607
Training output after:
[[0]
[1]
[1]
[0]]
