TensorFlow构建卷积神经网络／模型保存与加载／正则化

TensorFlow

官方文档：https://www.tensorflow.org/api_guides/python/math_ops

# Arithmetic Operators
import tensorflow as tf

# 用 tf.session.run() 里 feed_dict 参数设置占位 tensor, 如果传入 feed_dict的数据与 tensor 类型不符，就无法被正确处理
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})

# 数学运算，类型转换
tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1

# 线性分类函数
# tf.Variable 类创建一个 tensor，它的初始值可以被改变，就像普通的 Python 变量一样。tensor 把它的状态存在 session里，所以你必须手动初始化它的状态。你用tf.global_variables_initializer() 来初始化所有可变 tensors。
# tf.global_variables_initializer() 会返回一个操作，它会从graph中初始化所有的 TensorFlow 变量。你可以通过 session 来呼叫这个操作来初始化所有上面的变量
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    
# 从正态分布中选择权重可以避免任意一个权重与其他权重相比有压倒性的特性
# tf.truncated_normal()函数从一个正态分布中产生随机数，
# tf.truncated_normal() 返回一个 tensor，它的随机值取自一个正态分布，并且它们的取值会在这个正态分布平均值的两个标准差之内。
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

# 因为权重已经被随机化来帮助模型不被卡住，你不需要再把偏差随机化了。让我们简单地把偏差设为 0。
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))
# 变量的统一初始化：
sess.run(tf.global_variables_initializer())

# 矩阵乘法：
tf.matmul(input, w)

# TensorFlow Softmax
# x = tf.nn.softmax([2.0, 1.0, 0.2])
output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
with tf.Session() as sess:
    output = sess.run(tf.nn.softmax(logits), feed_dict={logits:logit_data})

# TensorFlow 中的交叉熵（Cross Entropy）
# tf.reduce_sum() 函数输入一个序列，返回他们的和
# tf.reduce_mean()计算序列均值
# tf.log() 返回所输入值的自然对数
softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

cross_entropy = -tf.reduce_sum(tf.multiply(softmax, one_hot))
# TODO: Print cross entropy from session
with tf.Session() as session:
    output = session.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data})
    print(output)

TensorFlow Mini-batching

# 有时候不可能把数据完全分割成相同数量的 batch。例如有 1000 个数据点，你想每个 batch 有 128 个数据。但是1000 无法被 128 整除。你得到的结果是 7 batch，每个128个数据点，一个 batch 有 104个数据点。(7*128 + 1*104 = 1000)

# batch里面的数据点数量会不同的情况下，你需要利用 TensorFlow 的tf.placeholder() 函数来接收这些不同的 batch
# 如果每个样本有n_input = 784特征，n_classes = 10个可能的标签，features的维度应该是[None, n_input]，labels的维度是 [None, n_classes]

# Features and Labels
# None 维度在这里是一个 batch size 的占位符。在运行时，TensorFlow 会接收任何大于 0 的 batch size
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

学习率和epochs

学习率过高，在相同的epochs条件下准确率会过早的停止改进，导致最终准确率会低
降低学习率需要更多的 epochs，但是可以最终得到更好的准确率

隐藏单元数目的选择：

文章：https://www.quora.com/How-do-I-decide-the-number-of-nodes-in-a-hidden-layer-of-a-neural-network-I-will-be-using-a-three-layer-model

The number of hidden nodes you should have is based on a complex relationship between:
1. Number of input and output nodes
2. Amount of training data available
3. Complexity of the function that is trying to be learned
4. The training algorithm
Too few nodes will lead to high error for your system as the predictive factors might be too complex for a small number of nodes to capture
Too many nodes will overfit to your training data and not generalize well
some general advice: How many hidden units should I use?(详细)
http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-10.html
1. The number of hidden nodes in each layer should be somewhere between the size of the input and output layer, potentially the mean.
2. The number of hidden nodes shouldn't need to exceed twice the number of input nodes, as you are probably grossly overfitting at this point.

对隐藏层单元个数的选择主要是要满足：能让网络准确预测，使其具有泛化能力，但不会过拟合。
隐藏单元数目一般不超过输入单元的2倍，过高可能引起overfitting，而且开销很高，但是也不能过小，过小的话会影响到模型的拟合效果

学习率的选择
1. 对学习速率的选择满足：能使网络成功收敛，且仍然具有较高的时间效率
2. 学习速率self.lr的经验公式是： self.lr / n_records 趋近于 0.01，可以参考经验公式尝试不同的值,如batch=128，则选择lr=1左右
3. 学习速率的优化注意要与其他2个超参数配合调整，特别是迭代次数，单个参数最优不一定最后结果最优，需要3个参数配合调整，以达到最优结果
TensorFlow Examples:
https://github.com/aymericdamien/TensorFlow-Examples
TensorFlow中构建深度神经网络

# 一个demo
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# 参数 Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # 如果没有足够内存，可以降低 batch size
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

n_hidden_layer = 256 # layer number of features 特征的层数

# Store layers weight & bias
# 层权重和偏置项的储存
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# Hidden layer with RELU activation
# ReLU作为隐藏层激活函数
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
# 输出层的线性激活函数
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

# Define loss and optimizer
# 定义误差值和优化器
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Initializing the variables
# 初始化变量
init = tf.global_variables_initializer()

# Launch the graph
# 启动图
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    # 训练循环
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        # 遍历所有 batch
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            # 运行优化器进行反向传导、计算 cost（获取 loss 值）
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
            
# 隐藏层单元数：隐藏层宽度
# 隐藏层层数：隐藏层深度

保存和读取 TensorFlow 模型

保存变量

# 保存变量
# weights 和 bias Tensors 用 tf.truncated_normal() 函数设定了随机值。用 tf.train.Saver.save() 函数把这些值被保存在save_file 位置，命名为 "model.ckpt"，（".ckpt" 扩展名表示"checkpoint"）。

import tensorflow as tf

# The file path to save the data
# 文件保存路径
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    # 初始化所有变量
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
   # 显示变量和权重
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    # 保存模型
    saver.save(sess, save_file)

加载变量

# 加载变量
# 注意，你依然需要在 Python 中创建 weights 和 bias Tensors。tf.train.Saver.restore() 函数把之前保存的数据加载到 weights 和 bias 当中。
# 因为 tf.train.Saver.restore() 设定了 TensorFlow 变量，这里你不需要调用 tf.global_variables_initializer()了。
# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    # 加载权重和偏置项
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    # 显示权重和偏置项
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

保存一个训练好的模型

# 保存一个训练好的模型
# Remove previous Tensors and Operations
# 移除之前的  Tensors 和运算
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST 数据输入 (图片尺寸: 28*28)
n_classes = 10  # MNIST 总计类别 (数字 0-9)

# Import MNIST data
# 加载 MNIST 数据
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
# 特征和标签
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
# 权重和偏置项
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
# 定义损失函数和优化器
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
# 计算准确率
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 训练模型并保存权重：
save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
# 启动图
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    # 训练循环
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        # 遍历所有 batch
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        # 每运行10个 epoch 打印一次状态
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    # 保存模型
    saver.save(sess, save_file)
    print('Trained Model Saved.')

加载训练好的模型

# 加载训练好的模型
saver = tf.train.Saver()

# Launch the graph
# 加载图
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

把权重和偏置项加载到新模型中

# 很多时候你想调整，或者说“微调”一个你已经训练并保存了的模型。但是，把保存的变量直接加载到已经修改过的模型会产生错误
# TensorFlow 对 Tensor 和计算使用一个叫 name 的字符串辨识器，如果没有定义 name，TensorFlow 会自动创建一个.容易出现命名错误

# 手动设定name属性：
import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名称
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名称
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    # 加载权重和偏置项 - 没有报错
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0

Save Bias: bias_0:0

Load Weights: weights_0:0

Load Bias: bias_0:0

Loaded Weights and Bias successfully.

TensorFlow Dropout

# Dropout 是一个降低过拟合的正则化技术。它在网络中暂时的丢弃一些单元（神经元），以及与它们的前后相连的所有节点
# TensorFlow 提供了一个 tf.nn.dropout() 函数，你可以用来实现 dropout

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# tf.nn.dropout()函数有两个参数：
# hidden_layer：你要应用 dropout 的 tensor
# keep_prob：任何一个给定单元的留存率（没有被丢弃的单元）

# keep_prob 可以让你调整丢弃单元的数量。为了补偿被丢弃的单元，tf.nn.dropout() 把所有保留下来的单元（没有被丢弃的单元）* 1/keep_prob
# 在训练时，一个好的keep_prob初始值是0.5。

# 在测试时，把 keep_prob 值设为1.0 ，这样保留所有的单元，最大化模型的能力

一个demo：

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

...

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 1.0})

TensorFlow卷积神经网络

设置
H = height, W = width, D = depth

我们有一个输入维度是 32x32x3 (HxWxD)
20个维度为 8x8x3 (HxWxD) 的滤波器
高和宽的stride（步长）都为 2。(S)
padding 大小为1 (P)
计算新的高度和宽度的公式是：

new_height = (input_height - filter_height + 2 * P)/S + 1

new_width = (input_width - filter_width + 2 * P)/S + 1

输出层的大小： 14x14x20

input = tf.placeholder(tf.float32, (None, 32, 32, 3))
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias


# TensorFlow 使用如下等式计算 SAME 、PADDING

# SAME Padding， 输出的高和宽，计算如下：

out_height = ceil(float(in_height) / float(strides1))

out_width = ceil(float(in_width) / float(strides[2]))

# VALID Padding， 输出的高和宽，计算如下：

out_height = ceil(float(in_height - filter_height + 1) / float(strides1))

out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))

# ceil：返回大于或者等于指定表达式的最小整数

TensorFlow 卷积层实现

# TensorFlow 提供了 tf.nn.conv2d() 和 tf.nn.bias_add() 函数来创建你自己的卷积层

# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

# 上述代码用了 tf.nn.conv2d() 函数来计算卷积，weights 作为滤波器，[1, 2, 2, 1] 作为 strides。
# TensorFlow 对每一个 input 维度使用一个单独的 stride 参数，[batch, input_height, input_width, input_channels]。我们通常把 batch 和 input_channels （strides 序列中的第一个第四个）的 stride 设为 1
# input_height 和 input_width strides 表示滤波器在input 上移动的步长

# tf.nn.bias_add() 函数对矩阵的最后一维加了偏置项

TensorFlow max pooling 最大池化

Decrease the size of the output and prevent overfitting. Reducing overfitting is a consequence of the reducing the output size, which in turn, reduces the number of parameters in future layers
池化作用：减小输出大小和降低过拟合。降低过拟合是减小输出大小的结果，它同样也减少了后续层中的参数的数量。

近期，池化层并不是很受青睐。部分原因是：

现在的数据集又大又复杂，我们更关心欠拟合问题。

Dropout 是一个更好的正则化方法。
池化导致信息损失。想想最大池化的例子，n 个数字中我们只保留最大的，把余下的 n-1 完全舍弃了。

# TensorFlow 提供了 tf.nn.max_pool() 函数，用于对卷积层实现 最大池化 

# tf.nn.max_pool() 函数实现最大池化时， ksize参数是滤波器大小，strides参数是步长。2x2 的滤波器配合 2x2 的步长是常用设定。

# ksize 和 strides 参数也被构建为四个元素的列表，每个元素对应 input tensor 的一个维度 ([batch, height, width, channels])，对 ksize 和 strides 来说，batch 和 channel 通常都设置成 1。

# 注意：池化层的输出深度与输入的深度相同。另外池化操作是分别应用到每一个深度切片层

conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

一个minist数据集上的三层卷积神经网络demo：

# 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
# 参数
learning_rate = 0.00001
epochs = 10
batch_size = 128

# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
# 用来验证和计算准确率的样本数
# 如果内存不够，可以调小这个数字
test_valid_size = 256

#####################################################
# Network Parameters
# 神经网络参数
n_classes = 10  # MNIST total classes (0-9 digits)
dropout = 0.75  # Dropout, probability to keep units

# weights and biases
# Store layers weight & bias
weights = {
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    'out': tf.Variable(tf.random_normal([1024, n_classes]))}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

##################################################### 
# 卷积 
def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)
# 最大池化
def maxpool2d(x, k=2):
    return tf.nn.max_pool(
        x,
        ksize=[1, k, k, 1],
        strides=[1, k, k, 1],
        padding='SAME')

#####################################################
# 模型
# 创建了 3 层来实现卷积，最大池化以及全链接层和输出层
def conv_net(x, weights, biases, dropout):
    # Layer 1 - 28*28*1 to 14*14*32
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    conv1 = maxpool2d(conv1, k=2)

    # Layer 2 - 14*14*32 to 7*7*64
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer - 7*7*64 to 1024
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output Layer - class prediction - 1024 to 10
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out
#####################################################  
# Session
# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)

# Model
logits = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf. global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for batch in range(mnist.train.num_examples//batch_size):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: dropout})

            # Calculate batch loss and accuracy
            loss = sess.run(cost, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: 1.})
            valid_acc = sess.run(accuracy, feed_dict={
                x: mnist.validation.images[:test_valid_size],
                y: mnist.validation.labels[:test_valid_size],
                keep_prob: 1.})

            print('Epoch {:>2}, Batch {:>3} -'
                  'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(
                epoch + 1,
                batch + 1,
                loss,
                valid_acc))

    # Calculate Test Accuracy
    test_acc = sess.run(accuracy, feed_dict={
        x: mnist.test.images[:test_valid_size],
        y: mnist.test.labels[:test_valid_size],
        keep_prob: 1.})
    print('Testing Accuracy: {}'.format(test_acc))