一、总结

一句话总结:

用了两个SimpleRNN,后面接Dropout,最后是一个dense层输出结果
model = tf.keras.Sequential([
    SimpleRNN(80, return_sequences=True),
    Dropout(0.2),
    SimpleRNN(100),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss='mean_squared_error')  # 损失函数用均方误差

 

1、SimpleRNN输入数据?

依次是数据量、循环核时间展开步数、输出特征:x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))
# 使x_train符合RNN输入要求:[送入样本数, 循环核时间展开步数, 每个时间步输入特征个数]。
# 此处整个数据集送入,送入样本数为x_train.shape[0]即2066组数据;输入60个开盘价,预测出第61天的开盘价,
# 循环核时间展开步数为60; 每个时间步送入的特征是某一天的开盘价,
# 只有1个数据,故每个时间步输入特征个数为1
x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))

 

 

二、循环神经网络实现股票预测

博客对应课程的视频位置:

 

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dropout, Dense, SimpleRNN
import matplotlib.pyplot as plt
import os
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
In [2]:
maotai = pd.read_csv('./SH600519.csv')  # 读取股票文件
print(maotai)
      Unnamed: 0        date      open     close      high       low  
0             74  2010-04-26    88.702    87.381    89.072    87.362   
1             75  2010-04-27    87.355    84.841    87.355    84.681   
2             76  2010-04-28    84.235    84.318    85.128    83.597   
3             77  2010-04-29    84.592    85.671    86.315    84.592   
4             78  2010-04-30    83.871    82.340    83.871    81.523   
...          ...         ...       ...       ...       ...       ...   
2421        2495  2020-04-20  1221.000  1227.300  1231.500  1216.800   
2422        2496  2020-04-21  1221.020  1200.000  1223.990  1193.000   
2423        2497  2020-04-22  1206.000  1244.500  1249.500  1202.220   
2424        2498  2020-04-23  1250.000  1252.260  1265.680  1247.770   
2425        2499  2020-04-24  1248.000  1250.560  1259.890  1235.180   

         volume    code  
0     107036.13  600519  
1      58234.48  600519  
2      26287.43  600519  
3      34501.20  600519  
4      85566.70  600519  
...         ...     ...  
2421   24239.00  600519  
2422   29224.00  600519  
2423   44035.00  600519  
2424   26899.00  600519  
2425   19122.00  600519  

[2426 rows x 8 columns]
In [3]:
training_set = maotai.iloc[0:2426 - 300, 2:3].values  # 前(2426-300=2126)天的开盘价作为训练集,表格从0开始计数,2:3 是提取[2:3)列,前闭后开,故提取出C列开盘价
test_set = maotai.iloc[2426 - 300:, 2:3].values  # 后300天的开盘价作为测试集
print(training_set.shape)
print(test_set.shape)
(2126, 1)
(300, 1)
In [4]:
# 归一化
sc = MinMaxScaler(feature_range=(0, 1))  # 定义归一化:归一化到(0,1)之间
print(sc)
MinMaxScaler(copy=True, feature_range=(0, 1))
In [5]:
training_set_scaled = sc.fit_transform(training_set)  # 求得训练集的最大值,最小值这些训练集固有的属性,并在训练集上进行归一化
test_set = sc.transform(test_set)  # 利用训练集的属性对测试集进行归一化
print(training_set_scaled[:5,])
print(test_set[:5,])
[[0.011711  ]
 [0.00980951]
 [0.00540518]
 [0.00590914]
 [0.00489135]]
[[0.84288404]
 [0.85345726]
 [0.84641315]
 [0.87046756]
 [0.86758781]]
In [6]:
x_train = []
y_train = []

x_test = []
y_test = []
In [7]:
# 测试集:csv表格中前2426-300=2126天数据
# 利用for循环,遍历整个训练集,提取训练集中连续60天的开盘价作为输入特征x_train,第61天的数据作为标签,for循环共构建2426-300-60=2066组数据。
for i in range(60, len(training_set_scaled)):
    x_train.append(training_set_scaled[i - 60:i, 0])
    y_train.append(training_set_scaled[i, 0])
In [8]:
print(x_train[:2])
print(y_train[:2])
[array([0.011711  , 0.00980951, 0.00540518, 0.00590914, 0.00489135,
       0.00179279, 0.00162198, 0.00450456, 0.        , 0.00540518,
       0.00863926, 0.00656697, 0.00651332, 0.00799979, 0.00783745,
       0.00360251, 0.00405424, 0.00310844, 0.00306327, 0.0080986 ,
       0.00864773, 0.00495487, 0.00675613, 0.00517932, 0.00590914,
       0.0080986 , 0.00804496, 0.00990833, 0.0135659 , 0.01280926,
       0.01387222, 0.01402609, 0.01189028, 0.00901758, 0.00576515,
       0.00654015, 0.00540518, 0.00532331, 0.00342324, 0.00378321,
       0.00226145, 0.        , 0.00166574, 0.00049549, 0.00152034,
       0.00097403, 0.00251978, 0.00207512, 0.00341194, 0.00543059,
       0.00537554, 0.00415729, 0.00515673, 0.00552094, 0.00747607,
       0.01013984, 0.01062262, 0.01270479, 0.01214014, 0.01444112]), array([0.00980951, 0.00540518, 0.00590914, 0.00489135, 0.00179279,
       0.00162198, 0.00450456, 0.        , 0.00540518, 0.00863926,
       0.00656697, 0.00651332, 0.00799979, 0.00783745, 0.00360251,
       0.00405424, 0.00310844, 0.00306327, 0.0080986 , 0.00864773,
       0.00495487, 0.00675613, 0.00517932, 0.00590914, 0.0080986 ,
       0.00804496, 0.00990833, 0.0135659 , 0.01280926, 0.01387222,
       0.01402609, 0.01189028, 0.00901758, 0.00576515, 0.00654015,
       0.00540518, 0.00532331, 0.00342324, 0.00378321, 0.00226145,
       0.        , 0.00166574, 0.00049549, 0.00152034, 0.00097403,
       0.00251978, 0.00207512, 0.00341194, 0.00543059, 0.00537554,
       0.00415729, 0.00515673, 0.00552094, 0.00747607, 0.01013984,
       0.01062262, 0.01270479, 0.01214014, 0.01444112, 0.01388634])]
[0.013886340087578372, 0.016159086609993864]
In [9]:
# 对训练集进行打乱
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
In [10]:
# 将训练集由list格式变为array格式
x_train, y_train = np.array(x_train), np.array(y_train)
In [11]:
print(x_train.shape)
print(y_train.shape)
(2066, 60)
(2066,)
In [12]:
# 使x_train符合RNN输入要求:[送入样本数, 循环核时间展开步数, 每个时间步输入特征个数]。
# 此处整个数据集送入,送入样本数为x_train.shape[0]即2066组数据;输入60个开盘价,预测出第61天的开盘价,
# 循环核时间展开步数为60; 每个时间步送入的特征是某一天的开盘价,
# 只有1个数据,故每个时间步输入特征个数为1
x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))
In [13]:
# 测试集:csv表格中后300天数据
# 利用for循环,遍历整个测试集,提取测试集中连续60天的开盘价作为输入特征x_train,第61天的数据作为标签,for循环共构建300-60=240组数据。
for i in range(60, len(test_set)):
    x_test.append(test_set[i - 60:i, 0])
    y_test.append(test_set[i, 0])
# 测试集变array并reshape为符合RNN输入要求:[送入样本数, 循环核时间展开步数, 每个时间步输入特征个数]
x_test, y_test = np.array(x_test), np.array(y_test)
x_test = np.reshape(x_test, (x_test.shape[0], 60, 1))
In [14]:
print(x_train.shape)
print(y_train.shape)
(2066, 60, 1)
(2066,)
In [15]:
print(x_test.shape)
print(y_test.shape)
(240, 60, 1)
(240,)
In [16]:
model = tf.keras.Sequential([
    SimpleRNN(80, return_sequences=True),
    Dropout(0.2),
    SimpleRNN(100),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss='mean_squared_error')  # 损失函数用均方误差
In [17]:
# 该应用只观测loss数值,不观测准确率,所以删去metrics选项,一会在每个epoch迭代显示时只显示loss值

checkpoint_save_path = "./checkpoint/rnn_stock.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='val_loss')
-------------load the model-----------------
In [18]:
history = model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])

model.summary()
Epoch 1/50
33/33 [==============================] - 4s 136ms/step - loss: 0.0015 - val_loss: 0.0028
Epoch 2/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 3/50
33/33 [==============================] - 4s 123ms/step - loss: 0.0015 - val_loss: 0.0012
Epoch 4/50
33/33 [==============================] - 4s 121ms/step - loss: 0.0014 - val_loss: 0.0074
Epoch 5/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0014 - val_loss: 0.0020
Epoch 6/50
33/33 [==============================] - 4s 125ms/step - loss: 0.0015 - val_loss: 0.0010
Epoch 7/50
33/33 [==============================] - 4s 122ms/step - loss: 0.0013 - val_loss: 0.0029
Epoch 8/50
33/33 [==============================] - 4s 129ms/step - loss: 0.0013 - val_loss: 0.0024
Epoch 9/50
33/33 [==============================] - 4s 126ms/step - loss: 0.0013 - val_loss: 0.0048
Epoch 10/50
33/33 [==============================] - 4s 129ms/step - loss: 0.0012 - val_loss: 0.0018
Epoch 11/50
33/33 [==============================] - 4s 125ms/step - loss: 0.0012 - val_loss: 0.0023
Epoch 12/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0012 - val_loss: 0.0065
Epoch 13/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0014 - val_loss: 0.0033
Epoch 14/50
33/33 [==============================] - 4s 118ms/step - loss: 0.0012 - val_loss: 0.0019
Epoch 15/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0011 - val_loss: 0.0012
Epoch 16/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0017
Epoch 17/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0012 - val_loss: 0.0014
Epoch 18/50
33/33 [==============================] - 4s 121ms/step - loss: 0.0011 - val_loss: 0.0010
Epoch 19/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0011 - val_loss: 0.0042
Epoch 20/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0012 - val_loss: 0.0014
Epoch 21/50
33/33 [==============================] - 4s 111ms/step - loss: 0.0012 - val_loss: 0.0017
Epoch 22/50
33/33 [==============================] - 4s 119ms/step - loss: 0.0012 - val_loss: 0.0010
Epoch 23/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0010 - val_loss: 0.0027
Epoch 24/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0012 - val_loss: 0.0024
Epoch 25/50
33/33 [==============================] - 4s 114ms/step - loss: 0.0011 - val_loss: 0.0011
Epoch 26/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0011 - val_loss: 0.0049
Epoch 27/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0043
Epoch 28/50
33/33 [==============================] - 4s 115ms/step - loss: 0.0010 - val_loss: 0.0058
Epoch 29/50
33/33 [==============================] - 4s 116ms/step - loss: 0.0011 - val_loss: 0.0029
Epoch 30/50
33/33 [==============================] - 4s 115ms/step - loss: 9.5443e-04 - val_loss: 0.0071
Epoch 31/50
33/33 [==============================] - 4s 115ms/step - loss: 8.3169e-04 - val_loss: 0.0027
Epoch 32/50
33/33 [==============================] - 4s 116ms/step - loss: 9.4572e-04 - val_loss: 0.0047
Epoch 33/50
33/33 [==============================] - 4s 115ms/step - loss: 8.9343e-04 - val_loss: 0.0090
Epoch 34/50
33/33 [==============================] - 4s 116ms/step - loss: 9.6406e-04 - val_loss: 0.0016
Epoch 35/50
33/33 [==============================] - 4s 118ms/step - loss: 0.0010 - val_loss: 0.0026
Epoch 36/50
33/33 [==============================] - 4s 117ms/step - loss: 9.9515e-04 - val_loss: 0.0064
Epoch 37/50
33/33 [==============================] - 4s 117ms/step - loss: 0.0012 - val_loss: 0.0081
Epoch 38/50
33/33 [==============================] - 4s 115ms/step - loss: 8.3921e-04 - val_loss: 0.0020
Epoch 39/50
33/33 [==============================] - 4s 118ms/step - loss: 9.1372e-04 - val_loss: 0.0025
Epoch 40/50
33/33 [==============================] - 4s 117ms/step - loss: 8.1070e-04 - val_loss: 0.0034
Epoch 41/50
33/33 [==============================] - 4s 116ms/step - loss: 8.9496e-04 - val_loss: 0.0014
Epoch 42/50
33/33 [==============================] - 4s 115ms/step - loss: 8.7054e-04 - val_loss: 0.0038
Epoch 43/50
33/33 [==============================] - 4s 116ms/step - loss: 9.2930e-04 - val_loss: 0.0035
Epoch 44/50
33/33 [==============================] - 4s 120ms/step - loss: 9.5918e-04 - val_loss: 9.3904e-04
Epoch 45/50
33/33 [==============================] - 4s 119ms/step - loss: 9.5214e-04 - val_loss: 0.0012
Epoch 46/50
33/33 [==============================] - 4s 118ms/step - loss: 8.5229e-04 - val_loss: 0.0025
Epoch 47/50
33/33 [==============================] - 4s 118ms/step - loss: 8.4545e-04 - val_loss: 0.0055
Epoch 48/50
33/33 [==============================] - 4s 119ms/step - loss: 8.0176e-04 - val_loss: 9.1022e-04
Epoch 49/50
33/33 [==============================] - 4s 116ms/step - loss: 7.9480e-04 - val_loss: 0.0053
Epoch 50/50
33/33 [==============================] - 4s 113ms/step - loss: 7.2215e-04 - val_loss: 0.0032
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       (None, 60, 80)            6560      
_________________________________________________________________
dropout (Dropout)            (None, 60, 80)            0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 100)               18100     
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 101       
=================================================================
Total params: 24,761
Trainable params: 24,761
Non-trainable params: 0
_________________________________________________________________
In [19]:
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + 'n')
    file.write(str(v.shape) + 'n')
    file.write(str(v.numpy()) + 'n')
file.close()

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()
Tensorflow2(预课程)---11.1、循环神经网络实现股票预测
In [20]:
################## predict ######################
# 测试集输入模型进行预测
predicted_stock_price = model.predict(x_test)
# 对预测数据还原---从(0,1)反归一化到原始范围
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# 对真实数据还原---从(0,1)反归一化到原始范围
real_stock_price = sc.inverse_transform(test_set[60:])
# 画出真实数据和预测数据的对比曲线
plt.plot(real_stock_price, color='red', label='MaoTai Stock Price')
plt.plot(predicted_stock_price, color='blue', label='Predicted MaoTai Stock Price')
plt.title('MaoTai Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('MaoTai Stock Price')
plt.legend()
plt.show()
Tensorflow2(预课程)---11.1、循环神经网络实现股票预测
In [21]:
##########evaluate##############
# calculate MSE 均方误差 ---> E[(预测值-真实值)^2] (预测值减真实值求平方后求均值)
mse = mean_squared_error(predicted_stock_price, real_stock_price)
# calculate RMSE 均方根误差--->sqrt[MSE]    (对均方误差开方)
rmse = math.sqrt(mean_squared_error(predicted_stock_price, real_stock_price))
# calculate MAE 平均绝对误差----->E[|预测值-真实值|](预测值减真实值求绝对值后求均值)
mae = mean_absolute_error(predicted_stock_price, real_stock_price)
print('均方误差: %.6f' % mse)
print('均方根误差: %.6f' % rmse)
print('平均绝对误差: %.6f' % mae)
均方误差: 1619.990084
均方根误差: 40.249100
平均绝对误差: 35.700861
In [ ]: