1、导入keras及可视化相关包 2、设计DCGAN主函数 3、运行
1、三个数据集的实验结果: ①LSUN ②face ③Imagenet-1K
2、隐含空间 3、学习到的特征 4、词嵌入(Future Work)
DCGAN初始文章Unsupervised representation learning with deep convolutional generative adversarial networks
本篇blog的内容基于DCGAN的论文和《生成对抗网络入门指南》第四章。
一、DCGAN介绍
1. DCGAN的设计规则
原作者意图使用一种Unsupervised Learning的方式应用在CV领域,所谓DCGAN即为DeepCNN in GANs,并且制定了四条设计规则来防止GAN的Unstable:
-
使用卷积层替代池化层,使用分数步长卷积(fractional-strided c)
步长卷积(fractional-strided convolution)参考Transposed Convolution, Fractionally Strided Convolution or Deconvolution
-
去掉全连接层fully-connected layer
全连接层缺点在于参数过多,当神经网络深了后,收敛速度会很慢,容易过拟合。有研究使用global average pooling来代替全连接层,虽然会更稳定,但是速度仍然受影响。论文提出了一个折中方案是将生成器的随机输入直接与卷积层的特征输入连接,同样将判别器的输出层与卷积层的特征输出直接连接。
-
使用batch norm
可以参考BN的论文优点。可以使数据更有效服从某个固定的分布。
-
使用恰当的**函数,ReLU在生成器使用(最后一层用tanh),使用Leaky**函数在判别器
sigmoid是非常常用的一种方式,缺点是x大于0时,结果会趋近于1;x小于0时,结果会趋近于0;常用作01二分类。第一,当x极大或者极小时候对反向传播中的学习不利;第二,sigmoid均值不是0,使得训练时候只会产生全正或者全负的反馈。
Tanh范围是{-1,1},解决了sigmoid均值不是0的问题。但是数学上tanh只是对sigmoid一个缩放变形
ReLU在随机梯度下降中比sigmoid和tanh更容易收敛。改进为LeakyReLU,即是在x小于0时取f(x)=ax。
原文中使用了3个数据集LSUN、Imagenet-1k、Face dataset。
2. DCGAN的框架结构
下面是生成器G的架构图,而判别器D为下面的反向操作Is the deconvolution layer the same as a convolutional layer?
详细的架构见本篇最末。
训练细节:
①将数据压缩在[-1,1]之间,也是tanh的取值范围
②使用SGD,Mini-batch的大小为128,
③初始权重:mean=0,var=0.02,Gauss distribution
④LeakyReLU,Leaky部分斜率为0.2
⑤使用Adam调节优化,learning_rate=0.00002,momentum
详细可以参考NG的课程笔记[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)及论文ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
二、DCGAN的工程实践及代码
使用kera来实现DCGAN。选取MNIST数据集作为实例展开。
这里需要另开一个blog处理graphviz问题,环境配置好后,下面是完整代码。
1、导入keras及可视化相关包
from __future__ import print_function, division
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils import plot_model
import matplotlib.pyplot as plt
import sys
import numpy as np
2、设计DCGAN主函数
class DCGAN():
def __init__(self):
...
def build_generator(self):
...
def build_discriminator(self):
...
def train(self, epochs, batch_size=128, save_interval=50):
...
①按照前面叙述的几个DCGAN的原则进行初始化
- 初始图片形状超参数及优化器超参数
优化器使用:学习率0.0002,动量\beta_1取值0.5
class DCGAN():
def __init__(self):
# Input shape
self.img_rows = 28
self.img_cols = 28
self.channels = 1
self.img_shape = (self.img_rows, self.img_cols, self.channels)
self.latent_dim = 100
optimizer = Adam(0.0002, 0.5)
- 搭建及编译判别器和生成器,这里相当于替keras自定义一个GAN模型
# Build and compile the discriminator
self.discriminator = self.build_discriminator()
self.discriminator.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
# Build the generator
self.generator = self.build_generator()
- 噪声分布Z作为输入,将输出重新变成4维的张量,然后作为卷积层的初始值
# The generator takes noise as input and generates imgs
z = Input(shape=(100,))
img = self.generator(z)
# For the combined model we will only train the generator
self.discriminator.trainable = False
# The discriminator takes generated images as input and determines validity
valid = self.discriminator(img)
# The combined model (stacked generator and discriminator)
# Trains the generator to fool the discriminator
self.combined = Model(z, valid)
self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)
训练生成器的时候需要将判别器与生成器相连,并且把判别器设置为不可训练模式,仅优化生成器的参数。
②根据设计搭建DCGAN生成器(生成器结构的图见文章最末)
- 使用上采样卷积层代替池化层
- 中间不包含全连接层
- 加入batch normalization
- 生成器的**函数使用ReLU,输出层使用Tanh
def build_generator(self):
model = Sequential()
model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((7, 7, 128)))
model.add(UpSampling2D())
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(Conv2D(self.channels, kernel_size=3, padding="same"))
model.add(Activation("tanh"))
model.summary()
plot_model(model, show_shapes= True, to_file='generator.png')
noise = Input(shape=(self.latent_dim,))
img = model(noise)
return Model(noise, img)
③根据DCGAN的设计构建判别器(判别器结构的图见文章最末)
- 使用步长为2的卷积层替代池化层
- 中间不包含全连接层
- 添加batch normalization
- **函数使用LeakyReLU,斜率为0.2
def build_discriminator(self):
model = Sequential()
model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
model.add(ZeroPadding2D(padding=((0,1),(0,1))))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.summary()
plot_model(model,show_shapes= True, to_file='discriminator.png')
img = Input(shape=self.img_shape)
validity = model(img)
return Model(img, validity)
④训练部分代码:
- 从MNIST载入数据
- 将范围缩放到[-1,1]
- 先用生成数据和真实数据训练判别器,然后用随机输入和训练好的判别器来训练生成器
def train(self, epochs, batch_size=128, save_interval=50):
# Load the dataset
(X_train, _), (_, _) = mnist.load_data()
# Rescale -1 to 1
X_train = X_train / 127.5 - 1.
X_train = np.expand_dims(X_train, axis=3)
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half of images
idx = np.random.randint(0, X_train.shape[0], batch_size)
imgs = X_train[idx]
# Sample noise and generate a batch of new images
noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
gen_imgs = self.generator.predict(noise)
# Train the discriminator (real classified as ones and generated as zeros)
d_loss_real = self.discriminator.train_on_batch(imgs, valid)
d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator
# ---------------------
# Train the generator (wants discriminator to mistake images as real)
g_loss = self.combined.train_on_batch(noise, valid)
# Plot the progress
print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
# If at save interval => save generated image samples
if epoch % save_interval == 0:
self.save_imgs(epoch)
⑤存储训练过程中的图像
def save_imgs(self, epoch):
r, c = 5, 5
noise = np.random.normal(0, 1, (r * c, self.latent_dim))
gen_imgs = self.generator.predict(noise)
# Rescale images 0 - 1
gen_imgs = 0.5 * gen_imgs + 0.5
fig, axs = plt.subplots(r, c)
cnt = 0
for i in range(r):
for j in range(c):
axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
axs[i,j].axis('off')
cnt += 1
fig.savefig("images/mnist_%d.png" % epoch)
plt.close()
3、运行
if __name__ == '__main__':
dcgan = DCGAN()
dcgan.train(epochs=4000, batch_size=32, save_interval=50)
0 [D loss: 1.223735, acc.: 39.06%] [G loss: 0.575408]
500 [D loss: 0.718215, acc.: 53.12%] [G loss: 0.950422]
1000 [D loss: 0.790420, acc.: 50.00%] [G loss: 1.018388]
2000 [D loss: 0.725934, acc.: 64.06%] [G loss: 0.844147]
3000 [D loss: 0.627288, acc.: 64.06%] [G loss: 1.180686]
3999 [D loss: 0.628420, acc.: 62.50%] [G loss: 1.007173]
实验室结果
三、实验室应用
1、三个数据集的实验结果:
DCGAN研究者在LSUN室内数据集、人脸数据集和Imagenet-1K数据集实验,实验生成结果如下:
①LSUN
有十分明显的提升,并且没有使用任何数据增强的技术(data argument),并且还提出模型学习的速度和他们的生成图片效果有着直接联系Train faster, generalize better: Stability ofstochastic gradient descent.
②face
③Imagenet-1K
2、隐含空间
- 随着输入Z的不断变化,输出的图像会平滑地变成另一个景象。
图像是Z的9个随机值,可以看到从第六步器,能看见一个没有窗户的房间慢慢变成有一个大窗户的房间,在第十步还能慢慢看见有电视机转变成窗户。
3、学习到的特征
- 我们知道传统CNN在学习过程中会学到许多特征,DCGAN的无监督性学习也一样。可视作是语义分析一种,可以看到DCGAN也同样能用在context文本数据上生成和判别。
4、词嵌入(Future Work)
由第三点学习到的特征,可以应用到词嵌入中。[coursera/SequenceModels/week2]NLP&Word Embeddings
- 在像素空间作为向量直接进行运算后,发现效果并不好
- 我们可以把这个向量作为向量的表示,进行图像加减运算
- 基于上述方法还能对图像也做隐含空间的平移
总而言之,DCGAN未来会有很多应用,例如视频的frame prediction,合成声音的特征预训练也会很有趣,关于隐含空间(latent space)的研究也会非常有趣。
完整代码
from __future__ import print_function, division
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils import plot_model
import matplotlib.pyplot as plt
import sys
import numpy as np
class DCGAN():
def __init__(self):
# Input shape
self.img_rows = 28
self.img_cols = 28
self.channels = 1
self.img_shape = (self.img_rows, self.img_cols, self.channels)
self.latent_dim = 100
optimizer = Adam(0.0002, 0.5)
# Build and compile the discriminator
self.discriminator = self.build_discriminator()
self.discriminator.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
# Build the generator
self.generator = self.build_generator()
# The generator takes noise as input and generates imgs
z = Input(shape=(100,))
img = self.generator(z)
# For the combined model we will only train the generator
self.discriminator.trainable = False
# The discriminator takes generated images as input and determines validity
valid = self.discriminator(img)
# The combined model (stacked generator and discriminator)
# Trains the generator to fool the discriminator
self.combined = Model(z, valid)
self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)
def build_generator(self):
model = Sequential()
model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((7, 7, 128)))
model.add(UpSampling2D())
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(Conv2D(self.channels, kernel_size=3, padding="same"))
model.add(Activation("tanh"))
model.summary()
plot_model(model, show_shapes= True, to_file='generator.png')
noise = Input(shape=(self.latent_dim,))
img = model(noise)
return Model(noise, img)
def build_discriminator(self):
model = Sequential()
model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
model.add(ZeroPadding2D(padding=((0,1),(0,1))))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.summary()
plot_model(model,show_shapes= True, to_file='discriminator.png')
img = Input(shape=self.img_shape)
validity = model(img)
return Model(img, validity)
def train(self, epochs, batch_size=128, save_interval=50):
# Load the dataset
(X_train, _), (_, _) = mnist.load_data()
# Rescale -1 to 1
X_train = X_train / 127.5 - 1.
X_train = np.expand_dims(X_train, axis=3)
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half of images
idx = np.random.randint(0, X_train.shape[0], batch_size)
imgs = X_train[idx]
# Sample noise and generate a batch of new images
noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
gen_imgs = self.generator.predict(noise)
# Train the discriminator (real classified as ones and generated as zeros)
d_loss_real = self.discriminator.train_on_batch(imgs, valid)
d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator
# ---------------------
# Train the generator (wants discriminator to mistake images as real)
g_loss = self.combined.train_on_batch(noise, valid)
# Plot the progress
print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
# If at save interval => save generated image samples
if epoch % save_interval == 0:
self.save_imgs(epoch)
def save_imgs(self, epoch):
r, c = 5, 5
noise = np.random.normal(0, 1, (r * c, self.latent_dim))
gen_imgs = self.generator.predict(noise)
# Rescale images 0 - 1
gen_imgs = 0.5 * gen_imgs + 0.5
fig, axs = plt.subplots(r, c)
cnt = 0
for i in range(r):
for j in range(c):
axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
axs[i,j].axis('off')
cnt += 1
fig.savefig("images/mnist_%d.png" % epoch)
plt.close()
if __name__ == '__main__':
dcgan = DCGAN()
dcgan.train(epochs=4000, batch_size=32, save_interval=50)
判别器结构:
生成器结构:
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:[生成对抗网络GAN入门指南](4)DCGAN 深度卷积生成对抗网络 - Python技术站