Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

2023年4月9日下午11:50 • 深度学习

前言

实验环境：win7， matlab2015b，16G内存，2T机械硬盘

难点：本实验难点在于运行时间比较长，跑一次都快一天了，并且我还要验证各种代价函数的对错，所以跑了很多次。

实验内容：Exercise:Independent Component Analysis。从数据库Sampled 8x8 patches from the STL-10 dataset (stl10_patches_100k.zip)（它是从数据库the STL-10 dataset中抽取10万个大小为8*8的3通道彩色小图像块，也是Deep Learning 9_深度学习UFLDL教程：linear decoder_exercise（斯坦福大学深度学习教程）中的训练集）中随机选择2万个小图像块作为本节实验训练集，利用ICA算法学习它的线性独立标准正交基，并显示出来。

实验基础说明：

1.本节实验中要学习的标准正交基与Deep Learning 12_深度学习UFLDL教程：Sparse Coding_exercise（斯坦福大学深度学习教程）中学习到的“超完备”基的异同，即本节实验的意义？

①不同点：本节实验中的基是标准正交的，也是线性独立的，而x 映射到特征，即ICA算法学习到的特征是Wx

②相同点：稀疏编码和ICA算法学习到的特征都要求是稀疏的。

$Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）$

$通过$ 主成分分析和白化可知，ZCA白化整个过程实际上是：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

而投影步骤的整个过程是：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

通过对比以上两者公式可知，投影步骤可以看成是ε为0的ZCA白化（即：无正则ZCA白化），其中，ZCA白化中的数据旋转方向U相当于投影步骤中的W，ZCA白化中的特征值λ相当于投影步骤中的(WW^T)³。

3.数据必须经过无正则 ZCA白化（也即, $ε设为0），但是为什么？下面是我的个人理解$

$因为前面已经说了，标准正交基W的投影步骤实际上就是无正则ZCA白化，而特征为Wx，用特征和基表示的原始数据为W T Wx（原因见下面的解释），为了统一，使特征Wx和W T Wx也是白化的，就需要也对原始数据x做无正则ZCA白化$

4.代价函数及其梯度

$Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）$

标准约束项WW^T = I通过投影步骤实现，所以实现时代价函数为：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

其代价函数对W的梯度或偏导为：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

该矩阵求导公式的推导，可参考The Matrix Cookbook。

5.代价函数的推导

通过以前本人对Ng的代价函数推导的理解（即：Deep learning：三十九(ICA模型练习)中并没有推导这些，而且从我的推导结果可知他的代价函数形式是错误的（准确地说，可能也不叫错误，因为它只是增加了一项非常复杂的多余的项，这一项的值永远不会变，并且使运行时间加大了1倍多，但还是可以提取出标准正交基），我觉得这是因为他并没有理解代价函数是怎么来的，实际这一点完全可从他整篇文章中看出来。

本节实验中，我们希望学习得到一组基向量――以列向量形式构成的矩阵 $W ，其满足以下特点：首先，与稀疏编码一样，特征是稀疏的；其次，基是标准正交的。$

$①稀疏惩罚项：因为要求学习到的特征是稀疏的，且学习到的特征表示为Wx，所以代价函数必须要有如下稀疏惩罚项：$

$Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）$

$又因为以后我们为求代价函数的最小值，会对代价函数求导，而 <img decoding="async" alt="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）" referrerpolicy="no-referrer" src="https://image.pythonjishu.com/web/pythonjishu/HQqpzUjYrhBr20230409.jpg" title="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）"/> 对s在0点处不可导的（理解：y=|x|在x=0处不可导），所以为了方便以后求代价函数最小值，可把 <img decoding="async" alt="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）" referrerpolicy="no-referrer" src="https://image.pythonjishu.com/web/pythonjishu/EUyhRJwpwmQp20230409.jpg" title="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）"/> 替换为 <img decoding="async" alt="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）" referrerpolicy="no-referrer" src="https://image.pythonjishu.com/web/pythonjishu/tpfllgryRnNP20230409.jpg" title="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）"/> ，其中 ε 是“平滑参数”（"smoothing parameter"）或者“稀疏参数”（"sparsity parameter"）。所以把以上代价函数改为：$

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

$其中，m为样本个数，λ为1/m，是稀疏惩罚项的权重。$

$②标准正交项：因为要求学习到的基W是标准正交的，所以代价函数必须有以下约束：$

$W W T = I$

$在实际实现这一约束过程中，并没有把代价函数中应包含的这一标准正交约束加入到代价函数中，而是把它放到利用梯度下降法优化代价函数阶段。也就是，用梯度下降法优化代价函数时，在梯度下降的每一步中增加投影步骤，以满足标准正交约束。所以，在编程写代价函数并求它梯度（即：orthonormalICACost.m）时，实际上并没有包含这一标准正交项。$

$③重构项：$ 因为本节实验的目的是寻找数据X的标准正交基W，并把它显示出来。而学习到的特征表示为Wx，所以联系上节实验（即：。

$为了使数据x和W T Wx之间的误差最小，那么代价函数必须要包括这两者的均方差，并且要使这个均方差最小，即最小化如下项：$

$Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）$

$上面的形式只是Adrew Ng在$ 用反向传导思想求导中的表达，实际上需要再除以样本数m才是真正的均方差。

$所以代价函数也应该包含上面这个重构项。但是因为代价函数有标准正交约束 W W T = I ，所以可推知只要W满足标准正交约束，那么这个重构项就永远为0。这一点并不仅是我猜测，我还通过编程实践证明这个重构项恒等于0.003550。证明很简单，只需要把$ Deep learning：三十九(ICA模型练习)中orthonormalICACost.m改为如下即可：

function [cost, grad] = orthonormalICACost(theta, visibleSize, numFeatures, patches, epsilon)
%orthonormalICACost - compute the cost and gradients for orthonormal ICA
%                     (i.e. compute the cost ||Wx||_1 and its gradient)

    weightMatrix = reshape(theta, numFeatures, visibleSize);
    
    cost = 0;
    grad = zeros(numFeatures, visibleSize);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------     

%% 方法
      lambda =  8e-6;% 0.5e-4
    num_samples = size(patches,2);
    
    cost_part1 = sum(sum((weightMatrix'*weightMatrix*patches-patches).^2))./num_samples;
    cost_part2 = sum(sum(sqrt((weightMatrix*patches).^2+epsilon)))*lambda;
    cost = cost_part1  +  cost_part2;
    grad = (2*weightMatrix*(weightMatrix'*weightMatrix*patches-patches)*patches'+...
        2*weightMatrix*patches*(weightMatrix'*weightMatrix*patches-patches)')./num_samples+...
        (weightMatrix*patches./sqrt((weightMatrix*patches).^2+epsilon))*patches'*lambda;
    
    grad = grad(:);
    fprintf('%11s%16s\n','cost_part1','cost_part2');
    fprintf('   %14.6f  %14.6f\n', cost_part1, cost_part2);
    
end

然后运行ICAExercise.m，运行结果中cost_part1就是重构项的值，它恒等于0.003550，不会随着迭代次数的增加而减少。

$因此在orthonormalICACost.m中实现的代价函数，实际上只包含了稀疏惩罚项。$

$综上，orthonormalICACost.m中代价函数为：$

$<img decoding="async" alt="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）" referrerpolicy="no-referrer" src="https://image.pythonjishu.com/web/pythonjishu/ZCBAXhDTqxbU20230409.jpg" title="Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）"/>$

实验步骤

1.初始化参数，加载数据库。

2.把数据进行无正则ZCA白化。注意把ε设为0。

3.实现ICA算法的代价函数及梯度计算（见orthonormalICACost.m），并检查梯度计算是否正确。注意，这一步中并没有把代价函数中应包含的标准正交约束加入到代价函数中，而是把它放下一步，即利用梯度下降法优化代价函数阶段。具体做法就是：在梯度下降的每一步中增加投影步骤，以满足标准正交约束。

4.利用梯度下降法优化代价函数，同时在梯度下降的每一步中增加投影步骤，以满足标准正交约束，并且用线搜索算法（见Convex Optimization by Boyd and Vandenbergh）来加速梯度。

5.得到优化结果，并把权值矩阵weightMatrix（它就是我们要求的线性独立标准正交基）显示出来。

结果

部分原始数据：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

迭代1万次的结果：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

迭代2万次的结果：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

迭代5万次的结果：

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

对比可知，5万次的运行结果比Ng的结果更好一点。

我也运行了Deep learning：三十九(ICA模型练习)中的代码，实现了作者的意图，但是他的代码实在运行时间太长。整个实验实际上不算难，主要是代码运行时间长，以及代价函数的理解。

代码

ICAExercise.m

%% CS294A/CS294W Independent Component Analysis (ICA) Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  ICA exercise. In this exercise, you will need to modify
%  orthonormalICACost.m and a small part of this file, ICAExercise.m.

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

numPatches = 20000;
numFeatures = 121;
imageChannels = 3;
patchDim = 8;
visibleSize = patchDim * patchDim * imageChannels;

outputDir = '.';
epsilon = 1e-6; % L1-regularisation epsilon |Wx| ~ sqrt((Wx).^2 + epsilon)

%%======================================================================
%% STEP 1: Sample patches

patches = load('stlSampledPatches.mat');
patches = patches.patches(:, 1:numPatches);
displayColorNetwork(patches(:, 1:100));

%%======================================================================
%% STEP 2: ZCA whiten patches
%  In this step, we ZCA whiten the sampled patches. This is necessary for
%  orthonormal ICA to work.

patches = patches / 255;
meanPatch = mean(patches, 2);
patches = bsxfun(@minus, patches, meanPatch);

sigma = patches * patches';
[u, s, v] = svd(sigma);
ZCAWhite = u * diag(1 ./ sqrt(diag(s))) * u';
patches = ZCAWhite * patches;

%%======================================================================
%% STEP 3: ICA cost functions
%  Implement the cost function for orthornomal ICA (you don't have to 
%  enforce the orthonormality constraint in the cost function) 
%  in the function orthonormalICACost in orthonormalICACost.m.
%  Once you have implemented the function, check the gradient.

% Use less features and smaller patches for speed
debug = false;
if debug
numFeatures = 5;
patches = patches(1:3, 1:5);
visibleSize = 3;
numPatches = 5;

weightMatrix = rand(numFeatures, visibleSize);

[cost, grad] = orthonormalICACost(weightMatrix, visibleSize, numFeatures, patches, epsilon);

numGrad = computeNumericalGradient( @(x) orthonormalICACost(x, visibleSize, numFeatures, patches, epsilon), weightMatrix(:) );
% Uncomment to display the numeric and analytic gradients side-by-side
% disp([numGrad grad]); 
diff = norm(numGrad-grad)/norm(numGrad+grad);
fprintf('Orthonormal ICA difference: %g\n', diff);
assert(diff < 1e-7, 'Difference too large. Check your analytic gradients.');

fprintf('Congratulations! Your gradients seem okay.\n');
end
%%======================================================================
%% STEP 4: Optimization for orthonormal ICA
%  Optimize for the orthonormal ICA objective, enforcing the orthonormality
%  constraint. Code has been provided to do the gradient descent with a
%  backtracking line search using the orthonormalICACost function 
%  (for more information about backtracking line search, you can read the 
%  appendix of the exercise).
%
%  However, you will need to write code to enforce the orthonormality 
%  constraint by projecting weightMatrix back into the space of matrices 
%  satisfying WW^T  = I.
%
%  Once you are done, you can run the code. 10000 iterations of gradient
%  descent will take around 2 hours, and only a few bases will be
%  completely learned within 10000 iterations. This highlights one of the
%  weaknesses of orthonormal ICA - it is difficult to optimize for the
%  objective function while enforcing the orthonormality constraint - 
%  convergence using gradient descent and projection is very slow.

weightMatrix = rand(numFeatures, visibleSize);

[cost, grad] = orthonormalICACost(weightMatrix(:), visibleSize, numFeatures, patches, epsilon);

fprintf('%11s%16s%10s\n','Iteration','Cost','t');

startTime = tic();

% Initialize some parameters for the backtracking line search
alpha = 0.5;
t = 0.02;
lastCost = 1e40;

% Do 10000 iterations of gradient descent
for iteration = 1:50000
                       
    grad = reshape(grad, size(weightMatrix));
    newCost = Inf;        
    linearDelta = sum(sum(grad .* grad));
    
    % Perform the backtracking line search
    while 1
        considerWeightMatrix = weightMatrix - alpha * grad;
        % -------------------- YOUR CODE HERE --------------------
        % Instructions:
        %   Write code to project considerWeightMatrix back into the space
        %   of matrices satisfying WW^T = I.
        %   
        %   Once that is done, verify that your projection is correct by 
        %   using the checking code below. After you have verified your
        %   code, comment out the checking code before running the
        %   optimization.
        
%         % Project considerWeightMatrix such that it satisfies WW^T = I
%         error('Fill in the code for the projection here');        
        considerWeightMatrix = (considerWeightMatrix*considerWeightMatrix')^(-0.5)*considerWeightMatrix;
        % Verify that the projection is correct
        temp = considerWeightMatrix * considerWeightMatrix';
        temp = temp - eye(numFeatures);
        assert(sum(temp(:).^2) < 1e-23, 'considerWeightMatrix does not satisfy WW^T = I. Check your projection again');
        error('Projection seems okay. Comment out verification code before running optimization.');
        
        % -------------------- YOUR CODE HERE --------------------                                        

        [newCost, newGrad] = orthonormalICACost(considerWeightMatrix(:), visibleSize, numFeatures, patches, epsilon);
        if newCost > lastCost - alpha * t * linearDelta
%             fprintf('   %14.6f  %14.6f\n', newCost, lastCost - alpha * t * linearDelta);
            t = 0.9 * t;
        else
            break;
        end
    end
   
    lastCost = newCost;
    weightMatrix = considerWeightMatrix;
    
    fprintf('  %9d  %14.6f  %8.7g\n', iteration, newCost, t);
    
    t = 1.1 * t;
    
    cost = newCost;
    grad = newGrad;
           
    % Visualize the learned bases as we go along    
    if mod(iteration, 1000) == 0
        duration = toc(startTime);
        % Visualize the learned bases over time in different figures so 
        % we can get a feel for the slow rate of convergence
        figure(floor(iteration /  1000));
        displayColorNetwork(weightMatrix'); 
    end
                   
end

% Visualize the learned bases
displayColorNetwork(weightMatrix');

orthonormalICACost.m

function [cost, grad] = orthonormalICACost(theta, visibleSize, numFeatures, patches, epsilon)
%orthonormalICACost - compute the cost and gradients for orthonormal ICA
%                     (i.e. compute the cost ||Wx||_1 and its gradient)

    weightMatrix = reshape(theta, numFeatures, visibleSize);
    
    cost = 0;
    grad = zeros(numFeatures, visibleSize);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------     

%% 
    num_samples = size(patches,2); %样本个数

    aux1 = sqrt(((weightMatrix*patches).^2) + epsilon);
    cost = sum(aux1(:))/num_samples;
    grad = ((weightMatrix*patches)./aux1)*patches'./num_samples;
    grad = grad(:);

    
end

参考资料

UFLDL教程

独立成分分析

Deep learning：三十三(ICA模型)

Deep learning：三十九(ICA模型练习)

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程） - Python技术站

深度学习

0 0 打赏

微信扫一扫

支付宝扫一扫

Deep Learning 5_深度学习UFLDL教程：PCA and Whitening_Exercise（斯坦福大学深度学习教程）

上一篇 2023年4月9日下午11:50

Deep Learning 3_深度学习UFLDL教程：预处理之主成分分析与白化_总结（斯坦福大学深度学习教程）

下一篇 2023年4月10日上午12:12

深度学习之DCGAN

1、知识点 “”” DCGAN:相比GAN而言，使用了卷积网络替代全连接卷积：256*256*3 — > 28*28*14 –>结果，即H,W变小，特征图变多反卷积(就是把卷积的前向和反向传播完全颠倒了) ：4*4*1024 —> 28 * 28 *1 –>结果即H,W变大，特征图变少特点： 1、判别模型：使用带…

深度学习 2023年4月12日
000
【神经网络与深度学习】【计算机视觉】YOLO2

转自：https://zhuanlan.zhihu.com/p/25167153?refer=xiaoleimlnote 本文是对 YOLO9000: Better, Faster, Stronger （项目主页）的翻译。加了个人理解和配图。内容参考了 YOLOv2 论文笔记 – Jesse_Mx 。水平有限，错误之处欢迎指正。 1. 概述 YOLO2主要…

深度学习 2023年4月12日
000
基于深度学习的车辆检测系统（MATLAB代码，含GUI界面）

摘要：当前深度学习在目标检测领域的影响日益显著，本文主要基于深度学习的目标检测算法实现车辆检测，为大家介绍如何利用MATLAB设计一个车辆检测系统的软件，通过自行搭建YOLO网络并利用自定义的数据集进行训练、验证模型，最终实现系统可选取图片或视频进行检测、标注，以及结果的实时显示和保存。其中，GUI界面利用最新的MATLAB APP设计工具开发设计完成，算法…

深度学习 2023年4月12日
000
吴恩达《深度学习》第一门课（2）神经网络的编程基础

2.1二分类（1）以一张三通道的64×64的图片做二分类识别是否是毛，输出y为1时认为是猫，为0时认为不是猫： y输出是一个数，x输入是64*64*3=12288的向量。（2）以下是一些符号定义（数据集变成矩阵之后进行矩阵运算代替循环运算，更加高效） x：表示一个nx维数据，维度为（nx,1） y：表示输出结果，取值为（0,1）；（x(i),y(i)）…

深度学习 2023年4月11日
000
NVIDIA深度学习Tensor Core性能解析（上）

NVIDIA深度学习Tensor Core性能解析（上）本篇将通过多项测试来考验Volta架构，利用各种深度学习框架来了解Tensor Core的性能。很多时候，深度学习这样的新领域会让人难以理解。从框架到模型，再到API和库，AI硬件的许多部分都是高度定制化的，因而被行业接受的公开基准测试工具很少也就不足为奇。随着ImageNet和一些衍生模型（Ale…

深度学习 2023年4月13日
000
6月份学习记录【海岛帝国系列赛】No.1 海岛帝国：诞辰之日【海岛帝国系列赛】No.2 海岛帝国：“落汤鸡”市的黑帮危机【海岛帝国系列赛】No.3 海岛帝国：运输资源【海岛帝国系列赛】No.4 海岛帝国：LYF的太空运输站【海岛帝国系列赛】No.5 海岛帝国：独立之战【海岛帝国系列赛】No.6 海岛帝国：战争前线【海岛帝国系列赛】No.7 海岛帝国：神圣之日图的广度优先遍历图的深度优先遍历 kruskal算法

6月份学习记录今天一看日历，6月差不多要过去了，又该写学习记录啦~~~ 想到6月的头一天，因为没有过传说中的儿童节（去出题了）闹了一顿，然后得到一张电影票QAQ（电影好像还是在电视上点播的）。LJX李家鑫说：“谁计算机没学两年啊！”，当我跟LJX李家鑫童靴说我c++学了6个月后，他说我智商太高？我瞬间就懵了，难道学6个月学不到这样吗？ …

深度学习 2023年4月12日
000
【深度学习论文篇 01-1 】AlexNet论文翻译

前言：本文是我对照原论文逐字逐句翻译而来，英文水平有限，不影响阅读即可。翻译论文的确能很大程度加深我们对文章的理解，但太过耗时，不建议采用。我翻译的另一个目的就是想重拾英文，所以就硬着头皮啃了。本文只作翻译，总结及代码复现详见后续的姊妹篇。 Alex原论文链接：https://proceedings.neurips.cc/paper/2012/file/c3…

深度学习 2023年4月12日
000
《动手学深度学习》系列笔记 —— 过拟合、欠拟合极其解决方案(权重衰减法、丢弃法)

1 训练误差和泛化误差训练误差（training error）：模型在训练数据集上表现出的误差。泛化误差（generalization error）：模型在任意一个测试数据样本上表现出的误差的期望，并常常通过测试数据集上的误差来近似。计算训练误差和泛化误差可以使用之前介绍过的损失函数，例如线性回归用到的平方损失函数和softmax回归用到的交叉熵损失函…

深度学习 2023年4月10日
000

Deep Learning 13_深度学习UFLDL教程：Independent Component Analysis_Exercise（斯坦福大学深度学习教程）

相关文章