Deep Learning 7_深度学习UFLDL教程：Self-Taught Learning_Exercise（斯坦福大学深度学习教程）

练习环境：win7， matlab2015b，16G内存，2T硬盘

练习内容及步骤：Exercise:Self-Taught Learning。具体如下：

一是用29404个无标注数据unlabeledData（手写数字数据库MNIST Dataset中数字为5-9的数据）来训练稀疏自动编码器，得到其权重参数opttheta。这一步的目的是提取这些数据的特征，虽然我们不知道它提取的究竟是哪些特征（当然，可以通过可视化结果看出来，可假设其提取的特征为Features），但是我们知道它提取到的特征实际上就是已训练好的稀疏自动编码器的隐藏层的激活值（即：第2层激活值）。注意：本节所有训练稀疏自动编码器的算法用的都L-BFGS算法。

二是把15298个已标注数据trainData（手写数字数据库MNIST Dataset中数字为0-4的前一半数据）作为训练数据集通过这个已训练好的稀疏自动编码器（即：权重参数为opttheta的稀疏自动编码器），就可提取出跟上一步一样的相同的特征参数，这里trainData提取的特征表达假设为trainFeatures，它其实也是隐藏层的激活值。如果还不明白，这里打一个比方：假设上一步提取的是一个通信信号A(对应unlabeledData)的特征是一阶累积量，而这一步提取的就是通信信号B（对应trainData）的一阶累积量，它们提取的都是同样的特征，只是对象不同而已。同样地，unlabeledData和trainData提取的是同样的特征Features，只是对象不同而已。

注意：如果上一步对unlabeledData做了预处理，一定要把其各种数据预处理参数（比如PCA中主成份U）保存起来，因为这一步的训练数据集trainData和下一步的测试数据集testData也一定要做相同的预处理。本节练习，因为用的是手写数字数据库MNIST Dataset，已经经过了预处理，所以不用再预处理。

具体见：http://ufldl.stanford.edu/wiki/index.php/%E8%87%AA%E6%88%91%E5%AD%A6%E4%B9%A0

三是把15298个已标注数据testData（手写数字数据库MNIST Dataset中数字为0-4的后一半数据）作为测试数据集通过这个已训练好的稀疏自动编码器（即：权重参数为opttheta的稀疏自动编码器），，就可提取出跟上一步一样的相同的特征参数，这里testData提取的特征表达假设为testFeatures，它其实也是隐藏层的激活值。

四是把第二步提取出来的特征trainFeatures和已标注数据trainData的标签trainLabels作为输入来训练softmax分类器，得到其回归模型softmaxModel。

五是把第三步提取出来的特征testFeatures输入训练好的softmax回归模型softmaxModel，从而预测出已标注数据testData的类别pred，再把pred和已标注数据testData本来的标签testLabels对比，就可得出正确率。

综上，Self-taught learning是利用未标注数据，用无监督学习来提取特征参数，然后用有监督学习和提取的特征参数来训练分类器。

本节方法适用范围：

用于在一些拥有大量未标注数据和少量的已标注数据的场景中，本节方法可能是最有效的。即使在只有已标注数据的情况下（这时我们通常忽略训练数据的类标号进行特征学习），以上想法也能得到很好的结果。

一些matlab函数

numel：求元素总数。

n=numel(A)该语句返回数组中元素的总数。

s=size(A),当只有一个输出参数时，返回一个行向量，该行向量的第一个元素时数组的行数，第二个元素是数组的列数。

[r,c]=size(A),当有两个输出参数时，size函数将数组的行数返回到第一个输出变量，将数组的列数返回到第二个输出变量。

round(n)的意思是纯粹的四舍五入，意思与我们以前数学中的四舍五入是一样的！

find

找到非零元素的索引和值

语法：

1. ind = find(X)

2. ind = find(X, k)

3. ind = find(X, k, 'first')

4. ind = find(X, k, 'last')

5. [row,col] = find(X, ...)

6. [row,col,v] = find(X, ...)

说明：

1. ind = find(X)

找出矩阵X中的所有非零元素，并将这些元素的线性索引值（linear indices：按列）返回到向量ind中。

如果X是一个行向量，则ind是一个行向量；否则，ind是一个列向量。

如果X不含非零元素或是一个空矩阵，则ind是一个空矩阵。

2. ind = find(X, k) 或 3. ind = find(X, k, 'first')

返回第一个非零元素k的索引值。

k必须是一个正数，但是它可以是任何数字数值类型。

4. ind = find(X, k, 'last')

返回最后一个非零元素k的索引值。

5. [row,col] = find(X, ...)

返回矩阵X中非零元素的行和列的索引值。

这个语法对于处理稀疏矩阵尤其有用。

如果X是一个N（N>2）维矩阵，col包括列的线性索引。

例如，一个5*7*3的矩阵X，有一个非零元素X（4,2,3），find函数将返回row=4和col=16。也就是说，（第1页有7列）+（第2页有7列）+（第3页有2列）=16。

6. [row,col,v] = find(X, ...)

返回X中非零元素的一个列或行向量v，同时返回行和列的索引值。

如果X是一个逻辑表示，则v是一个逻辑矩阵。

输出向量v包含通过评估X表示得到的逻辑矩阵的非零元素。

例如，

A= magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1

[r,c,v]= find(A>10);

r', c', v'
ans =
1 2 4 4 1 3 (按列)
ans =
1 2 2 3 4 4 （按列）
ans =
1 1 1 1 1 1

这里返回的向量v是一个逻辑矩阵，它包含N个非零元素，N=(A>10)

例子：

例1

X = [1 0 4 -3 0 0 0 8 6];
indices = find(X)

返回X中非零元素的线性索引值。

indices =
1 3 4 8 9

例2

你可以用一个逻辑表达方式定义X。例如

find(X > 2)

返回X中大于2的元素的相对应的线性索引值。

ans =
3 8 9

unique:

　　unique为找出向量中的非重复元素并进行排序后输出。

运行结果

权重参数opttheta中W1的可视化结果，也就是所提取特征的可视化结果如下：

Deep Learning 7_深度学习UFLDL教程：Self-Taught Learning_Exercise（斯坦福大学深度学习教程）

Test Accuracy: 98.333115%

Elapsed time is 594.435594 seconds.

结果总结：

1. 为什么Andrew Ng他们训练样本用25分钟，而我所有运行时间不到6分钟？估计前几年电脑配置比现在的电脑配置差很多！

2.为了对比，Andrew Ng团队做了实验，如果不用本节稀疏自动编码器提取的特征代替原始像素值（即：原始数据）训练softmax分类器，准确率最多达到96%。实际上，本节练习和上一节练习Deep Learning六：Softmax Regression_Exercise（斯坦福大学UFLDL深度学习教程）的不同之处，就是本节练习用的是稀疏自动编码器提取的特征训练softmax分类器，而上一节练习用的原始数据训练softmax分类器，上节练习我们得到的准确率实际上只有92.640%，当然，可能Andrew Ng团队的准确率最多达到了96%。

代码

stlExercise.m

%% CS294A/CS294W Self-taught Learning Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  self-taught learning. You will need to complete code in feedForwardAutoencoder.m
%  You will also need to have implemented sparseAutoencoderCost.m and 
%  softmaxCost.m from previous exercises.
%
%% ======================================================================
%  STEP 0: Here we provide the relevant parameters values that will
%  allow your sparse autoencoder to get good filters; you do not need to 
%  change the parameters below.
tic
inputSize  = 28 * 28;
numLabels  = 5;
hiddenSize = 200;
sparsityParam = 0.1; % desired average activation of the hidden units.
                     % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                     %  in the lecture notes). 
lambda = 3e-3;       % weight decay parameter       
beta = 3;            % weight of sparsity penalty term   
maxIter = 400;

%% ======================================================================
%  STEP 1: Load data from the MNIST database
%
%  This loads our training and test data from the MNIST database files.
%  We have sorted the data for you in this so that you will not have to
%  change it.

% Load MNIST database files
mnistData   = loadMNISTImages('train-images.idx3-ubyte');
mnistLabels = loadMNISTLabels('train-labels.idx1-ubyte');

% Set Unlabeled Set (All Images)

% Simulate a Labeled and Unlabeled set
labeledSet   = find(mnistLabels >= 0 & mnistLabels <= 4);%返回mnistLabels中元素值大于等于0且小于等于4的数字的行号
unlabeledSet = find(mnistLabels >= 5);

numTrain = round(numel(labeledSet)/2);
trainSet = labeledSet(1:numTrain);
testSet  = labeledSet(numTrain+1:end);

unlabeledData = mnistData(:, unlabeledSet);% 无标签数据集

trainData   = mnistData(:, trainSet);% mnistData中大于等于0且小于等于4的数字的前一半数字作为有标签的训练数据
trainLabels = mnistLabels(trainSet)' + 1; % Shift Labels to the Range 1-5

testData   = mnistData(:, testSet);% mnistData中大于等于0且小于等于4的数字的后一半数字作为有标签的测试数据
testLabels = mnistLabels(testSet)' + 1;   % Shift Labels to the Range 1-5

% Output Some Statistics
fprintf('# examples in unlabeled set: %d\n', size(unlabeledData, 2));
fprintf('# examples in supervised training set: %d\n\n', size(trainData, 2));
fprintf('# examples in supervised testing set: %d\n\n', size(testData, 2));

%% ======================================================================
%  STEP 2: Train the sparse autoencoder
%  This trains the sparse autoencoder on the unlabeled training
%  images. 

%  按均匀分布随机初始化theta参数   Randomly initialize the parameters
theta = initializeParameters(hiddenSize, inputSize);

%% ----------------- YOUR CODE HERE ----------------------
%  Find opttheta by running the sparse autoencoder on
%  unlabeledTrainingImages
%  利用L-BFGS算法，用无标签数据集来训练稀疏自动编码器

opttheta = theta; 

addpath minFunc/
options.Method = 'lbfgs';
options.maxIter = 400;
options.display = 'on';
[opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
      inputSize, hiddenSize, ...
      lambda, sparsityParam, ...
      beta, unlabeledData), ...
      theta, options);


%% -----------------------------------------------------
                          
% Visualize weights
W1 = reshape(opttheta(1:hiddenSize * inputSize), hiddenSize, inputSize);
display_network(W1');

%%======================================================================
%% STEP 3: 从有标签数据集中提取特征 Extract Features from the Supervised Dataset
%  
%  You need to complete the code in feedForwardAutoencoder.m so that the 
%  following command will extract features from the data.

trainFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       trainData);

testFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       testData);

%%======================================================================
%% STEP 4: Train the softmax classifier

softmaxModel = struct;  
%% ----------------- YOUR CODE HERE ----------------------
%  Use softmaxTrain.m from the previous exercise to train a multi-class
%  classifier. 
%  利用L-BFGS算法，用从有标签训练数据集中提取的特征及其标签，训练softmax回归模型，

%  Use lambda = 1e-4 for the weight regularization for softmax
lambda = 1e-4;
inputSize = hiddenSize;
numClasses = numel(unique(trainLabels));%unique为找出向量中的非重复元素并进行排序
% You need to compute softmaxModel using softmaxTrain on trainFeatures and
% trainLabels

options.maxIter = 100; %最大迭代次数
softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...
                            trainFeatures, trainLabels, options);





%% -----------------------------------------------------


%%======================================================================
%% STEP 5: Testing 

%% ----------------- YOUR CODE HERE ----------------------
% Compute Predictions on the test set (testFeatures) using softmaxPredict
% and softmaxModel

[pred] = softmaxPredict(softmaxModel, testFeatures);



%% -----------------------------------------------------

% Classification Score
fprintf('Test Accuracy: %f%%\n', 100*mean(pred(:) == testLabels(:)));
toc
% (note that we shift the labels by 1, so that digit 0 now corresponds to
%  label 1)
%
% Accuracy is the proportion of correctly classified images
% The results for our implementation was:
%
% Accuracy: 98.3%
%
%

feedForwardAutoencoder.m

 1 function [activation] = feedForwardAutoencoder(theta, hiddenSize, visibleSize, data)
 2 
 3 % theta: trained weights from the autoencoder
 4 % visibleSize: the number of input units (probably 64) 
 5 % hiddenSize: the number of hidden units (probably 25) 
 6 % data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. 
 7   
 8 % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 
 9 % follows the notation convention of the lecture notes. 
10 
11 W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
12 b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
13 
14 %% ---------- YOUR CODE HERE --------------------------------------
15 %  Instructions: Compute the activation of the hidden layer for the Sparse Autoencoder.
16 
17 activation  = sigmoid(W1*data+repmat(b1,[1,size(data,2)]));
18 %-------------------------------------------------------------------
19 
20 end
21 
22 %-------------------------------------------------------------------
23 % Here's an implementation of the sigmoid function, which you may find useful
24 % in your computation of the costs and the gradients.  This inputs a (row or
25 % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). 
26 
27 function sigm = sigmoid(x)
28     sigm = 1 ./ (1 + exp(-x));
29 end