一、总结

一句话总结:

模型的话也是比较普通的卷积神经网络,就是图像数据用的生成器:ImageDataGenerator

 

 

1、ImageDataGenerator.flow_from_directory常用参数的意思?

|||-begin

train_generator = train_datagen.flow_from_directory(         
    train_dir, # 目标目录
    target_size=(150, 150), # 将所有图像的大小调整为 150×150        
    batch_size=20,         
    class_mode='binary') # 因为使用了binary_crossentropy 损失,所以需要用二进制标签

|||-end

directory:【目标目录的路径】:目标目录的路径。每个类应该包含一个子目录。任何在子目录树下的 PNG, JPG, BMP, PPM 或 TIF 图像,都将被包含在生成器中。
target_size:【所有的图像将被调整到的尺寸】:整数元组 (height, width),默认:(256, 256)。所有的图像将被调整到的尺寸。
batch_size:【批大小】:一批数据的大小(默认 32)。
class_mode:【决定返回的标签数组的类型】:"categorical", "binary", "sparse", "input" 或 None 之一。默认:"categorical"。决定返回的标签数组的类型:

 

2、tf.keras.preprocessing.image.ImageDataGenerator 作用?

【Generate batches of tensor image data】:Generate batches of tensor image data with real-time data augmentation.

 

 

3、tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory 作用?

【data augumentation】:Takes the path to a directory & generates batches of augmented data.

 

 

4、ImageDataGenerator生成的train_generator输出格式?

data_batch, labels_batch这种形式
for data_batch, labels_batch in train_generator: 
    print('data batch shape:', data_batch.shape)

 

 

5、fit模型拟合的时候,训练数据和验证数据都用生成器数据实例?

第一个参数就是train_generator
验证集数据就可以用validation_data=validation_generator
history = model.fit(       
    train_generator,       
    steps_per_epoch=100,       
    epochs=30,       
    validation_data=validation_generator,       
    validation_steps=50)

 

 

6、如何理解下面这段fit代码?

|||-begin

history = model.fit(       
    train_generator,       
    steps_per_epoch=100,       
    epochs=30,       
    validation_data=validation_generator,       
    validation_steps=50)

|||-end

【Python生成器】:它的第一个参数应该是一个 Python 生成器,可以不停地生 成输入和目标组成的批量,比如 train_generator。
【要知道每一轮需要从生成器中抽取多少个样本】:因为数据是不断生成的,所以 Keras 模型 要知道每一轮需要从生成器中抽取多少个样本。
【从生成器中抽取 steps_per_epoch 个批量后进入下一个轮次】:这是 steps_per_epoch 参数的作用:从生成器中抽取 steps_per_epoch 个批量后(即运行了 steps_per_epoch 次梯度下降),拟合过程 将进入下一个轮次。
【每个批量包含20 个样本】:本例中,每个批量包含20 个样本,所以读取完所有2000 个样本需要100 个批量。

 

 

二、5.2-3、猫狗分类(基本模型)

博客对应课程的视频位置:

 

2、构建网络

In [1]:
 
import os, shutil
# 原始数据集解压目录的路径
original_dataset_dir = 'E:\\78_recorded_lesson\\001_course_github\\AI_dataSet\\dogs-vs-cats\\kaggle_original_data\\train'
# 保存较小数据集的目录
base_dir = 'E:\\78_recorded_lesson\\001_course_github\\AI_dataSet\\dogs-vs-cats\\cats_and_dogs_small'
# 分别对应划分后的训练、验证和测试的目录
train_dir = os.path.join(base_dir, 'train') 
validation_dir = os.path.join(base_dir, 'validation') 
test_dir = os.path.join(base_dir, 'test') 
# 猫的训练图像目录
train_cats_dir = os.path.join(train_dir, 'cats') 
# 狗的训练图像目录
train_dogs_dir = os.path.join(train_dir, 'dogs')   
# 猫的验证图像目录
validation_cats_dir = os.path.join(validation_dir, 'cats')  
# 狗的验证图像目录
validation_dogs_dir = os.path.join(validation_dir, 'dogs') 
# 猫的测试图像目录
test_cats_dir = os.path.join(test_dir, 'cats')   
# 狗的测试图像目录
test_dogs_dir = os.path.join(test_dir, 'dogs')   
In [2]:
 
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
In [2]:
 
model = tf.keras.models.Sequential() 
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu',                         
                        input_shape=(150, 150, 3))) 
model.add(tf.keras.layers.MaxPooling2D((2, 2))) 
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(tf.keras.layers.MaxPooling2D((2, 2))) 
model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(tf.keras.layers.MaxPooling2D((2, 2))) 
model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu')) 
model.add(tf.keras.layers.MaxPooling2D((2, 2))) 
model.add(tf.keras.layers.Flatten()) 
model.add(tf.keras.layers.Dense(512, activation='relu')) 
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.summary() 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               3211776   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

3、配置模型用于训练

In [3]:
 
model.compile(loss='binary_crossentropy',               
              optimizer=tf.keras.optimizers.RMSprop(lr=1e-4),               
              metrics=['acc'])

4、数据预处理

注意:ImageDataGenerator

In [6]:
 
# 将所有图像乘以 1/255 缩放
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)   
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255) 
train_generator = train_datagen.flow_from_directory(         
    train_dir, # 目标目录
    target_size=(150, 150), # 将所有图像的大小调整为 150×150        
    batch_size=20,         
    class_mode='binary') # 因为使用了binary_crossentropy 损失,所以需要用二进制标签
validation_generator = test_datagen.flow_from_directory(         
    validation_dir,         
    target_size=(150, 150),         
    batch_size=20,         
    class_mode='binary')
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
In [7]:
 
for data_batch, labels_batch in train_generator: 
    print('data batch shape:', data_batch.shape)
    # print(data_batch)
    print('labels batch shape:', labels_batch.shape)
    # print(labels_batch)
    break 
data batch shape: (20, 150, 150, 3)
labels batch shape: (20,)

5、利用批量生成器拟合模型

In [8]:
 
history = model.fit_generator(       
    train_generator,       
    steps_per_epoch=100,       
    epochs=30,       
    validation_data=validation_generator,       
    validation_steps=50)
WARNING:tensorflow:From <ipython-input-8-aefa4c2da2ae>:6: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.
Epoch 1/30
100/100 [==============================] - 36s 365ms/step - loss: 0.6926 - acc: 0.5365 - val_loss: 0.6800 - val_acc: 0.6190
Epoch 2/30
100/100 [==============================] - 8s 83ms/step - loss: 0.6716 - acc: 0.5965 - val_loss: 0.6708 - val_acc: 0.5720
Epoch 3/30
100/100 [==============================] - 8s 83ms/step - loss: 0.6316 - acc: 0.6535 - val_loss: 0.6411 - val_acc: 0.6330
Epoch 4/30
100/100 [==============================] - 9s 91ms/step - loss: 0.5864 - acc: 0.6940 - val_loss: 0.6118 - val_acc: 0.6700
Epoch 5/30
100/100 [==============================] - 9s 94ms/step - loss: 0.5619 - acc: 0.7155 - val_loss: 0.5813 - val_acc: 0.6900
Epoch 6/30
100/100 [==============================] - 9s 94ms/step - loss: 0.5184 - acc: 0.7545 - val_loss: 0.6149 - val_acc: 0.6620
Epoch 7/30
100/100 [==============================] - 9s 85ms/step - loss: 0.4881 - acc: 0.7645 - val_loss: 0.5801 - val_acc: 0.7040
Epoch 8/30
100/100 [==============================] - 9s 91ms/step - loss: 0.4738 - acc: 0.7660 - val_loss: 0.5685 - val_acc: 0.7030
Epoch 9/30
100/100 [==============================] - 9s 93ms/step - loss: 0.4410 - acc: 0.8005 - val_loss: 0.5672 - val_acc: 0.7200
Epoch 10/30
100/100 [==============================] - 9s 92ms/step - loss: 0.4106 - acc: 0.8135 - val_loss: 0.5541 - val_acc: 0.7240
Epoch 11/30
100/100 [==============================] - 8s 84ms/step - loss: 0.3891 - acc: 0.8280 - val_loss: 0.5591 - val_acc: 0.7050
Epoch 12/30
100/100 [==============================] - 8s 85ms/step - loss: 0.3690 - acc: 0.8285 - val_loss: 0.5731 - val_acc: 0.7160
Epoch 13/30
100/100 [==============================] - 9s 92ms/step - loss: 0.3449 - acc: 0.8515 - val_loss: 0.5661 - val_acc: 0.7260
Epoch 14/30
100/100 [==============================] - 9s 93ms/step - loss: 0.3149 - acc: 0.8640 - val_loss: 0.5707 - val_acc: 0.7190
Epoch 15/30
100/100 [==============================] - 9s 89ms/step - loss: 0.2928 - acc: 0.8770 - val_loss: 0.5798 - val_acc: 0.7300
Epoch 16/30
100/100 [==============================] - 9s 91ms/step - loss: 0.2681 - acc: 0.8910 - val_loss: 0.6003 - val_acc: 0.7320
Epoch 17/30
100/100 [==============================] - 9s 87ms/step - loss: 0.2505 - acc: 0.8940 - val_loss: 0.7731 - val_acc: 0.6870
Epoch 18/30
100/100 [==============================] - 9s 87ms/step - loss: 0.2271 - acc: 0.9140 - val_loss: 0.6663 - val_acc: 0.7090
Epoch 19/30
100/100 [==============================] - 8s 85ms/step - loss: 0.2065 - acc: 0.9275 - val_loss: 0.6454 - val_acc: 0.7270
Epoch 20/30
100/100 [==============================] - 8s 84ms/step - loss: 0.1838 - acc: 0.9310 - val_loss: 0.7018 - val_acc: 0.7250
Epoch 21/30
100/100 [==============================] - 8s 82ms/step - loss: 0.1621 - acc: 0.9475 - val_loss: 0.6983 - val_acc: 0.7250
Epoch 22/30
100/100 [==============================] - 8s 85ms/step - loss: 0.1383 - acc: 0.9520 - val_loss: 0.7337 - val_acc: 0.7250
Epoch 23/30
100/100 [==============================] - 8s 83ms/step - loss: 0.1290 - acc: 0.9565 - val_loss: 0.8058 - val_acc: 0.7210
Epoch 24/30
100/100 [==============================] - 9s 90ms/step - loss: 0.1119 - acc: 0.9675 - val_loss: 0.7313 - val_acc: 0.7440
Epoch 25/30
100/100 [==============================] - 8s 85ms/step - loss: 0.0920 - acc: 0.9725 - val_loss: 0.8506 - val_acc: 0.7280
Epoch 26/30
100/100 [==============================] - 8s 84ms/step - loss: 0.0873 - acc: 0.9765 - val_loss: 0.7989 - val_acc: 0.7380
Epoch 27/30
100/100 [==============================] - 8s 82ms/step - loss: 0.0727 - acc: 0.9780 - val_loss: 0.8836 - val_acc: 0.7260
Epoch 28/30
100/100 [==============================] - 8s 84ms/step - loss: 0.0629 - acc: 0.9795 - val_loss: 1.0962 - val_acc: 0.6890
Epoch 29/30
100/100 [==============================] - 9s 87ms/step - loss: 0.0549 - acc: 0.9890 - val_loss: 0.9980 - val_acc: 0.7140
Epoch 30/30
100/100 [==============================] - 8s 83ms/step - loss: 0.0452 - acc: 0.9895 - val_loss: 1.1059 - val_acc: 0.7180

6、保存模型

In [9]:
 
model.save('cats_and_dogs_small_1.h5') 

7、绘制训练过程中的损失曲线和精度曲线

In [10]:
 
acc = history.history['acc'] 
val_acc = history.history['val_acc'] 
loss = history.history['loss'] 
val_loss = history.history['val_loss'] 

epochs = range(1, len(acc) + 1) 

plt.plot(epochs, acc, 'bo', label='Training acc') 
plt.plot(epochs, val_acc, 'b', label='Validation acc') 
plt.title('Training and validation accuracy') 
plt.legend() 

plt.figure() 

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.legend() 

plt.show()
《python深度学习》笔记---5.2-3、猫狗分类(基本模型)
《python深度学习》笔记---5.2-3、猫狗分类(基本模型)