如何在Python中进行二次回归

2023年3月25日下午3:54 • python-answer

在Python中进行二次回归可以使用scikit-learn库中的PolynomialFeatures类和LinearRegression类。

下面是进行二次回归的完整步骤：

1. 导入所需库

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

2. 准备数据

准备数据，包括训练集和测试集。首先生成样本数据，以y=x^2为规律生成y的数据。

# 生成样本数据
x = np.arange(-5, 5, 0.1)
y = x**2
y += np.random.randn(len(x)) * 5

将数据划分为训练集和测试集：

# 划分数据集
split_index = int(len(x) * 0.7)

x_train = x[:split_index].reshape(-1, 1) # 将数据变为列向量
y_train = y[:split_index].reshape(-1, 1)

x_test = x[split_index:].reshape(-1, 1)
y_test = y[split_index:].reshape(-1, 1)

3. 特征工程

使用PolynomialFeatures类进行特征工程，将一维的特征数据转化为二维的特征数据，增加训练时的精度。这里使用二次特征转换，即将一维的特征转换为二次的特征。

# 特征工程
poly = PolynomialFeatures(degree=2)
x_train_poly = poly.fit_transform(x_train)
x_test_poly = poly.fit_transform(x_test)

4. 模型训练和预测

使用LinearRegression类进行模型训练和预测。

# 模型训练和预测
lr = LinearRegression()
lr.fit(x_train_poly, y_train)
y_train_predict = lr.predict(x_train_poly)
y_test_predict = lr.predict(x_test_poly)

5. 结果展示

使用Matplotlib进行结果展示，把训练集、测试集和拟合的结果都可视化出来。

# 结果展示
plt.figure(figsize=(12, 8))
plt.scatter(x_train, y_train, color='red')
plt.plot(x_train, y_train_predict, color='blue')

plt.scatter(x_test, y_test, color='green')
plt.plot(x_test, y_test_predict, color='yellow')

plt.title('Polynomial Regression')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

以上就是Python中进行二次回归的完整步骤。接下来我们来看两个示例。

示例1

首先生成100个在[-3,3]之间的随机数作为$x$，然后计算出对应的值$y=3x^2-2x+1$，再加上一些噪声。

# 生成样本数据
np.random.seed(1)
x = np.linspace(-3, 3, 100)
y = 3*x**2 - 2*x + 1 + np.random.randn(100)

进行特征工程和模型训练：

poly = PolynomialFeatures(degree=2) # 二次特征转换
x_poly = poly.fit_transform(x.reshape(-1, 1))
model = LinearRegression()
model.fit(x_poly, y) # 训练模型

得到拟合的结果：

# 显示结果
x_line = np.linspace(-3, 3, 100)
x_line_poly = poly.fit_transform(x_line.reshape(-1, 1))
y_line = model.predict(x_line_poly)

plt.scatter(x, y)
plt.plot(x_line, y_line, color='red')
plt.show()

下图为拟合结果：

示例1

示例2

生成1000个在[0, 2]之间的随机数作为$x$，然后计算出对应的值$y=4x^2-3x+1$，再加上一些噪声。

# 生成样本数据
np.random.seed(2)
x = np.linspace(0, 2, 1000)
y = 4*x**2 - 3*x + 1 + np.random.randn(1000)