简介
卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,广泛应用于图像识别、语音识别等领域。本文将介绍如何使用纯numpy实现一个简单的卷积神经网络,用于手写数字识别。
数据集
我们将使用MNIST数据集,该数据集包含60,000个训练图像和10,000个测试图像,每个图像都是28x28像素的灰度图像。我们将使用numpy和matplotlib库来加载和可视化数据集。
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
# 加载数据集
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 可视化数据集
fig, axs = plt.subplots(2, 5, figsize=(10, 5))
axs = axs.flatten()
for i in range(10):
axs[i].imshow(x_train[i], cmap='gray')
axs[i].set_title(str(y_train[i]))
plt.show()
上面的代码将加载MNIST数据集,并可视化前10个图像及其标签。
数据预处理
在训练模型之前,我们需要对数据进行预处理。首先,我们将对图像进行归一化,将像素值缩放到0到1之间。其次,我们将对标签进行one-hot编码,将每个标签转换为一个长度为10的向量,其中对应标签的位置为1,其余位置为0。
# 归一化图像
x_train = x_train / 255.0
x_test = x_test / 255.0
# one-hot编码标签
y_train = np.eye(10)[y_train]
y_test = np.eye(10)[y_test]
构建模型
我们将使用numpy实现一个简单的卷积神经网络,包含两个卷积层和一个全连接层。下面是模型的架构图:
Input -> Conv2D -> ReLU -> MaxPool2D Conv2D -> ReLU -> MaxPool2D -> Flatten -> Dense -> Softmax
我们将使用以下超参数:
- 卷积核大小:3x3
- 卷积核数量:32和64
- 池化大小:2x2
- 全连接层大小:128
class Conv2D:
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
self.in_channels = in_channels
self.out_channels = out_channels
self.kernel_size = kernel_size
self.stride = stride
self.padding = padding
self.weights = np.random.randn(out_channels, in_channels, kernel_size, kernel_size)
self.bias = np.zeros((out_channels, 1))
def forward(self, x):
batch_size, in_channels, in_height, in_width = x.shape
out_height = int((in_height + 2 * self.padding - self.kernel_size) / self.stride + 1)
out_width = int((in_width + 2 * self.padding - self.kernel_size) / self.stride + 1)
out = np.zeros((batch_size, self.out_channels, out_height, out_width))
padded_x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding), (self.padding, self.padding)), mode='constant')
for b in range(batch_size):
for c in range(self.out_channels):
for i in range(out_height):
for j in range(out_width):
out[b, c, i, j] = np.sum(padded_x[b, :, i*self.stride:i*self.stride+self.kernel_size, j*self.stride:j*self.stride+self.kernel_size] * self.weights[c]) + self.bias[c]
return out
class ReLU:
def forward(self, x):
return np.maximum(0, x)
class MaxPool2D:
def __init__(self, kernel_size, stride=None):
self.kernel_size = kernel_size
self.stride = stride or kernel_size
def forward(self, x):
batch_size, channels, in_height, in_width = x.shape
out_height = int((in_height - self.kernel_size) / self.stride + 1)
out_width = int((in_width - self.kernel_size) / self.stride + 1)
out = np.zeros((batch_size, channels, out_height, out_width))
for b in range(batch_size):
for c in range(channels):
for i in range(out_height):
for j in range(out_width):
out[b, c, i, j] = np.max(x[b, c, i*self.stride:i*self.stride+self.kernel_size, j*self.stride:j*self.stride+self.kernel_size])
return out
class Flatten:
def forward(self, x):
return x.reshape(x.shape[0], -1)
class Dense:
def __init__(self, in_features, out_features):
self.in_features = in_features
self.out_features = out_features
self.weights = np.random.randn(out_features, in_features)
self.bias = np.zeros((out_features, 1))
def forward(self, x):
return np.dot(self.weights, x.T).T + self.bias.T
class Softmax:
def forward(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / np.sum(exp_x, axis=1, keepdims=True)
class CNN:
def __init__(self):
self.layers = [
Conv2D(1, 32, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(kernel_size=2),
Conv2D(32, 64, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(kernel_size=2),
Flatten(),
Dense(7*7*64, 128),
ReLU(),
Dense(128, 10),
Softmax()
]
def forward(self, x):
for layer in self.layers:
x = layer.forward(x)
return x
训练模型
我们将使用交叉熵损失函数和随机梯度下降优化器来训练模型。下面是训练代码:
# 定义超参数
learning_rate = 0.01
batch_size = 128
epochs = 10
# 创建模型
model = CNN()
# 训练模型
for epoch in range(epochs):
for i in range(0, len(x_train), batch_size):
x_batch = x_train[i:i+batch_size]
y_batch = y_train[i:i+batch_size]
# 前向传播
y_pred = model.forward(x_batch)
# 计算损失
loss = -np.sum(y_batch * np.log(y_pred)) / len(x_batch)
# 反向传播
grad = y_pred - y_batch
for layer in reversed(model.layers):
grad = layer.backward(grad, learning_rate)
# 打印损失
if i % 1000 == 0:
print(f'Epoch {epoch+1}/{epochs}, Step {i+1}/{len(x_train)}, Loss {loss:.4f}')
# 测试模型
y_pred = model.forward(x_test)
accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(y_test, axis=1))
print(f'Test Accuracy: {accuracy:.4f}')
上面的代码将训练模型,并在每个epoch结束时打印损失。训练完成后,我们将使用测试集评估模型的性能。
示例
下面是两个示例,演示了如何使用训练好的模型对新图像进行预测。
# 示例1
img = x_test[0]
plt.imshow(img, cmap='gray')
plt.show()
pred = model.forward(img.reshape(1, 1, 28, 28))
print(f'Prediction: {np.argmax(pred)}')
# 示例2
img = x_test[1]
plt.imshow(img, cmap='gray')
plt.show()
pred = model.forward(img.reshape(1, 1, 28, 28))
print(f'Prediction: {np.argmax(pred)}')
上面的代码将显示测试集中的两个图像,并使用训练好的模型对其进行预测。
总结
本文介绍了如何使用纯numpy实现一个简单的卷积神经网络,用于手写数字识别。我们使用MNIST数据集进行训练和测试,并演示了如何使用训练好的模型对新图像进行预测。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:纯numpy卷积神经网络实现手写数字识别的实践 - Python技术站