1.  首先安装好docker,拉取intel caffe image:

$ docker pull bvlc/caffe:intel
试着运行:
$ docker run -it bvlc/caffe:intel /bin/bash

2. 拉取 intel caffe 源码:

git clone https://github.com/intel/caffe 
git checkout 1.0 

 或者下载源码包:

wget https://github.com/intel/caffe/archive/1.1.0.zip
unzip 1.0.zip

3. 编译Intel caffe

sudo apt-get -y install python-devel boost boost-devel cmake numpy 
  numpy-devel gflags gflags-devel glog glog-devel protobuf protobuf-devel hdf5 
  hdf5-devel lmdb lmdb-devel leveldb leveldb-devel snappy-devel opencv opencv-devel

 

cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)

 vim Makefile.config

# Intel(r) Machine Learning Scaling Library (uncomment to build with MLSL)
USE_MLSL := 1

多线程编译:

$ make -j <number_of_physical_cores> -k

编译过程中会下载MKL 和MKL-DNN:

Download mklml_lnx_2018.0.1.20171227.tgz
git clone --no-checkout https://github.com/01org/mkl-dnn.git /home/ubuntu/yuntong/caffe-master/external/mkldnn/tmp

 测试编译结果:

make test
make runtest

4. 下载和创建mnist数据集:  

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
Creating lmdb...

5. 写入intel caffe docker中的caffe运行路径:  

vim /home/ubuntu/yuntong/caffe-1.0/examples/mnist/train_lenet.sh

#!/usr/bin/env sh set -e #./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@ /opt/caffe/build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@

6. 设置CPU模式 vim examples/mnist/lenet_solver.prototxt

# solver mode: CPU or GPU
#solver_mode: GPU
solver_mode: CPU

7. 运行docker,并在docker中运行mnist训练:

sudo docker run -v "/home/ubuntu/yuntong/:/opt/caffe/share"  -it bvlc/caffe:intel /bin/bash
cd /opt/caffe/share/caffe-1.0
./examples/mnist/train_lenet.sh

运行结果如下:  

ubuntu@k8s-1:~$ sudo docker run -v "/home/ubuntu/yuntong/:/opt/caffe/share"  -it bvlc/caffe:intel /bin/bash
root@19eaccc415e1:/workspace# cd /opt/caffe/share/caffe-1.0
root@19eaccc415e1:/opt/caffe/share/caffe-1.0# ./examples/mnist/train_lenet.sh
I0408 01:33:10.509523    12 caffe.cpp:285] Use CPU.
I0408 01:33:10.510561    12 solver.cpp:107] Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.01
display: 100
max_iter: 10000
lr_policy: "inv"
gamma: 0.0001
power: 0.75
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: CPU
net: "examples/mnist/lenet_train_test.prototxt"
train_state {
  level: 0
  stage: ""
}
I0408 01:33:10.511216    12 solver.cpp:153] Creating training net from net file: examples/mnist/lenet_train_test.prototxt
I0408 01:33:10.523326    12 cpu_info.cpp:453] Processor speed [MHz]: 0
I0408 01:33:10.523360    12 cpu_info.cpp:456] Total number of sockets: 8
I0408 01:33:10.523373    12 cpu_info.cpp:459] Total number of CPU cores: 8
I0408 01:33:10.523385    12 cpu_info.cpp:462] Total number of processors: 8
I0408 01:33:10.523396    12 cpu_info.cpp:465] GPU is used: no
I0408 01:33:10.523406    12 cpu_info.cpp:468] OpenMP environmental variables are specified: no
I0408 01:33:10.523427    12 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I0408 01:33:10.523437    12 cpu_info.cpp:474] Number of OpenMP threads: 8
I0408 01:33:10.524194    12 net.cpp:1052] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
I0408 01:33:10.524220    12 net.cpp:1052] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0408 01:33:10.524510    12 net.cpp:207] Initializing net from parameters:
I0408 01:33:10.524531    12 net.cpp:208]
name: "LeNet"
state {
  phase: TRAIN
  level: 0
  stage: ""
}
engine: "MKLDNN"
compile_net_state {
  bn_scale_remove: false
  bn_scale_merge: false
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}
…………………………………………….
…………………………………………….
…………………………………………….

I0408 01:36:28.103435    12 solver.cpp:312] Iteration 7300, loss = 0.0219446
I0408 01:36:28.103497    12 solver.cpp:333]     Train net output #0: loss = 0.0219446 (* 1 = 0.0219446 loss)
I0408 01:36:28.103519    12 sgd_solver.cpp:215] Iteration 7300, lr = 0.00662927
I0408 01:36:30.492499    12 solver.cpp:312] Iteration 7400, loss = 0.00484636
I0408 01:36:30.492563    12 solver.cpp:333]     Train net output #0: loss = 0.00484634 (* 1 = 0.00484634 loss)
I0408 01:36:30.492584    12 sgd_solver.cpp:215] Iteration 7400, lr = 0.00660067
I0408 01:36:32.912159    12 solver.cpp:474] Iteration 7500, Testing net (#0)
I0408 01:36:33.992708    12 solver.cpp:563]     Test net output #0: accuracy = 0.9905
I0408 01:36:33.992983    12 solver.cpp:563]     Test net output #1: loss = 0.0301301 (* 1 = 0.0301301 loss)
I0408 01:36:34.019621    12 solver.cpp:312] Iteration 7500, loss = 0.00250706
I0408 01:36:34.019706    12 solver.cpp:333]     Train net output #0: loss = 0.00250702 (* 1 = 0.00250702 loss)
I0408 01:36:34.020164    12 sgd_solver.cpp:215] Iteration 7500, lr = 0.00657236
I0408 01:36:36.432328    12 solver.cpp:312] Iteration 7600, loss = 0.00537509
I0408 01:36:36.432528    12 solver.cpp:333]     Train net output #0: loss = 0.00537505 (* 1 = 0.00537505 loss)
I0408 01:36:36.432566    12 sgd_solver.cpp:215] Iteration 7600, lr = 0.00654433
I0408 01:36:39.159704    12 solver.cpp:312] Iteration 7700, loss = 0.034624
I0408 01:36:39.159781    12 solver.cpp:333]     Train net output #0: loss = 0.0346239 (* 1 = 0.0346239 loss)
I0408 01:36:39.159811    12 sgd_solver.cpp:215] Iteration 7700, lr = 0.00651658
I0408 01:36:41.873411    12 solver.cpp:312] Iteration 7800, loss = 0.00424178
I0408 01:36:41.873672    12 solver.cpp:333]     Train net output #0: loss = 0.00424175 (* 1 = 0.00424175 loss)
I0408 01:36:41.873694    12 sgd_solver.cpp:215] Iteration 7800, lr = 0.00648911
I0408 01:36:44.552800    12 solver.cpp:312] Iteration 7900, loss = 0.00208136
I0408 01:36:44.553073    12 solver.cpp:333]     Train net output #0: loss = 0.00208134 (* 1 = 0.00208134 loss)
I0408 01:36:44.553095    12 sgd_solver.cpp:215] Iteration 7900, lr = 0.0064619
I0408 01:36:47.132925    12 solver.cpp:474] Iteration 8000, Testing net (#0)
I0408 01:36:48.254405    12 solver.cpp:563]     Test net output #0: accuracy = 0.9905
I0408 01:36:48.254935    12 solver.cpp:563]     Test net output #1: loss = 0.0278543 (* 1 = 0.0278543 loss)
I0408 01:36:48.279563    12 solver.cpp:312] Iteration 8000, loss = 0.0065576
I0408 01:36:48.279626    12 solver.cpp:333]     Train net output #0: loss = 0.00655758 (* 1 = 0.00655758 loss)
I0408 01:36:48.279647    12 sgd_solver.cpp:215] Iteration 8000, lr = 0.00643496
I0408 01:36:50.693308    12 solver.cpp:312] Iteration 8100, loss = 0.0102435
I0408 01:36:50.694417    12 solver.cpp:333]     Train net output #0: loss = 0.0102435 (* 1 = 0.0102435 loss)
I0408 01:36:50.694447    12 sgd_solver.cpp:215] Iteration 8100, lr = 0.00640827
I0408 01:36:53.059345    12 solver.cpp:312] Iteration 8200, loss = 0.0111062
I0408 01:36:53.059619    12 solver.cpp:333]     Train net output #0: loss = 0.0111061 (* 1 = 0.0111061 loss)
I0408 01:36:53.059643    12 sgd_solver.cpp:215] Iteration 8200, lr = 0.00638185
I0408 01:36:55.439267    12 solver.cpp:312] Iteration 8300, loss = 0.0255548
I0408 01:36:55.439332    12 solver.cpp:333]     Train net output #0: loss = 0.0255548 (* 1 = 0.0255548 loss)
I0408 01:36:55.439357    12 sgd_solver.cpp:215] Iteration 8300, lr = 0.00635567
I0408 01:36:57.821687    12 solver.cpp:312] Iteration 8400, loss = 0.00810484
I0408 01:36:57.821768    12 solver.cpp:333]     Train net output #0: loss = 0.00810483 (* 1 = 0.00810483 loss)
I0408 01:36:57.821794    12 sgd_solver.cpp:215] Iteration 8400, lr = 0.00632975
I0408 01:37:00.229344    12 solver.cpp:474] Iteration 8500, Testing net (#0)
I0408 01:37:01.341504    12 solver.cpp:563]     Test net output #0: accuracy = 0.991
I0408 01:37:01.341583    12 solver.cpp:563]     Test net output #1: loss = 0.028333 (* 1 = 0.028333 loss)
I0408 01:37:01.368783    12 solver.cpp:312] Iteration 8500, loss = 0.00672253
I0408 01:37:01.368850    12 solver.cpp:333]     Train net output #0: loss = 0.00672251 (* 1 = 0.00672251 loss)
I0408 01:37:01.368876    12 sgd_solver.cpp:215] Iteration 8500, lr = 0.00630407
I0408 01:37:03.789499    12 solver.cpp:312] Iteration 8600, loss = 0.000701985
I0408 01:37:03.789630    12 solver.cpp:333]     Train net output #0: loss = 0.000701961 (* 1 = 0.000701961 loss)
I0408 01:37:03.789660    12 sgd_solver.cpp:215] Iteration 8600, lr = 0.00627864
I0408 01:37:06.311506    12 solver.cpp:312] Iteration 8700, loss = 0.00329251
I0408 01:37:06.311738    12 solver.cpp:333]     Train net output #0: loss = 0.00329248 (* 1 = 0.00329248 loss)
I0408 01:37:06.311763    12 sgd_solver.cpp:215] Iteration 8700, lr = 0.00625344
I0408 01:37:08.734477    12 solver.cpp:312] Iteration 8800, loss = 0.0011685
I0408 01:37:08.734781    12 solver.cpp:333]     Train net output #0: loss = 0.00116848 (* 1 = 0.00116848 loss)
I0408 01:37:08.734805    12 sgd_solver.cpp:215] Iteration 8800, lr = 0.00622847
I0408 01:37:11.223204    12 solver.cpp:312] Iteration 8900, loss = 0.000881624
I0408 01:37:11.223266    12 solver.cpp:333]     Train net output #0: loss = 0.000881607 (* 1 = 0.000881607 loss)
I0408 01:37:11.223289    12 sgd_solver.cpp:215] Iteration 8900, lr = 0.00620374
I0408 01:37:13.565495    12 solver.cpp:474] Iteration 9000, Testing net (#0)
I0408 01:37:14.642087    12 solver.cpp:563]     Test net output #0: accuracy = 0.99
I0408 01:37:14.642159    12 solver.cpp:563]     Test net output #1: loss = 0.0268256 (* 1 = 0.0268256 loss)
I0408 01:37:14.666667    12 solver.cpp:312] Iteration 9000, loss = 0.011516
I0408 01:37:14.666734    12 solver.cpp:333]     Train net output #0: loss = 0.011516 (* 1 = 0.011516 loss)
I0408 01:37:14.666755    12 sgd_solver.cpp:215] Iteration 9000, lr = 0.00617924
I0408 01:37:17.068984    12 solver.cpp:312] Iteration 9100, loss = 0.00914626
I0408 01:37:17.069262    12 solver.cpp:333]     Train net output #0: loss = 0.00914625 (* 1 = 0.00914625 loss)
I0408 01:37:17.069284    12 sgd_solver.cpp:215] Iteration 9100, lr = 0.00615496
I0408 01:37:19.455351    12 solver.cpp:312] Iteration 9200, loss = 0.00317596
I0408 01:37:19.455596    12 solver.cpp:333]     Train net output #0: loss = 0.00317595 (* 1 = 0.00317595 loss)
I0408 01:37:19.455623    12 sgd_solver.cpp:215] Iteration 9200, lr = 0.0061309
I0408 01:37:21.834389    12 solver.cpp:312] Iteration 9300, loss = 0.00890829
I0408 01:37:21.835710    12 solver.cpp:333]     Train net output #0: loss = 0.00890827 (* 1 = 0.00890827 loss)
I0408 01:37:21.835734    12 sgd_solver.cpp:215] Iteration 9300, lr = 0.00610706
I0408 01:37:24.199872    12 solver.cpp:312] Iteration 9400, loss = 0.0232409
I0408 01:37:24.199946    12 solver.cpp:333]     Train net output #0: loss = 0.0232409 (* 1 = 0.0232409 loss)
I0408 01:37:24.199970    12 sgd_solver.cpp:215] Iteration 9400, lr = 0.00608343
I0408 01:37:26.601363    12 solver.cpp:474] Iteration 9500, Testing net (#0)
I0408 01:37:27.673274    12 solver.cpp:563]     Test net output #0: accuracy = 0.989
I0408 01:37:27.673359    12 solver.cpp:563]     Test net output #1: loss = 0.0323742 (* 1 = 0.0323742 loss)
I0408 01:37:27.698536    12 solver.cpp:312] Iteration 9500, loss = 0.00388906
I0408 01:37:27.698603    12 solver.cpp:333]     Train net output #0: loss = 0.00388905 (* 1 = 0.00388905 loss)
I0408 01:37:27.698628    12 sgd_solver.cpp:215] Iteration 9500, lr = 0.00606002
I0408 01:37:30.146077    12 solver.cpp:312] Iteration 9600, loss = 0.00205984
I0408 01:37:30.146361    12 solver.cpp:333]     Train net output #0: loss = 0.00205983 (* 1 = 0.00205983 loss)
I0408 01:37:30.146386    12 sgd_solver.cpp:215] Iteration 9600, lr = 0.00603682
I0408 01:37:32.567978    12 solver.cpp:312] Iteration 9700, loss = 0.00330913
I0408 01:37:32.568212    12 solver.cpp:333]     Train net output #0: loss = 0.00330913 (* 1 = 0.00330913 loss)
I0408 01:37:32.568235    12 sgd_solver.cpp:215] Iteration 9700, lr = 0.00601382
I0408 01:37:34.955097    12 solver.cpp:312] Iteration 9800, loss = 0.0134696
I0408 01:37:34.955363    12 solver.cpp:333]     Train net output #0: loss = 0.0134696 (* 1 = 0.0134696 loss)
I0408 01:37:34.955386    12 sgd_solver.cpp:215] Iteration 9800, lr = 0.00599102
I0408 01:37:37.377465    12 solver.cpp:312] Iteration 9900, loss = 0.00235391
I0408 01:37:37.377655    12 solver.cpp:333]     Train net output #0: loss = 0.0023539 (* 1 = 0.0023539 loss)
I0408 01:37:37.377678    12 sgd_solver.cpp:215] Iteration 9900, lr = 0.00596843
I0408 01:37:39.850847    12 solver.cpp:707] Snapshot begin
I0408 01:37:39.859346    12 solver.cpp:769] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I0408 01:37:39.869576    12 sgd_solver.cpp:754] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate
I0408 01:37:39.878753    12 solver.cpp:734] Snapshot end
I0408 01:37:39.888120    12 solver.cpp:436] Iteration 10000, loss = 0.00251002
I0408 01:37:39.888172    12 solver.cpp:474] Iteration 10000, Testing net (#0)
I0408 01:37:41.067348    12 solver.cpp:563]     Test net output #0: accuracy = 0.9913
I0408 01:37:41.067407    12 solver.cpp:563]     Test net output #1: loss = 0.0267652 (* 1 = 0.0267652 loss)
I0408 01:37:41.067422    12 solver.cpp:443] Optimization Done.
I0408 01:37:41.067432    12 caffe.cpp:345] Optimization Done.

 

花费时间 01:37:41.067432 - 01:33:10.509523 = 4.31分钟

CPU及IO利用率:

8个CPU基本达到100%

Intel Caffe 与原生Caffe

 

 

Intel Caffe 与原生Caffe

IO很小,MNIST数据集只有几十M,数据都被cache了

8. 加上MKL2017

./examples/mnist/train_lenet.sh -engine "MKL2017"

 

第一次: 03:01:35.904659 -02:58:30.774215 = 2:55分钟

第二次: 03:05:15.134409 - 03:02:13.449990 = 2:58分钟

对于原生Caffe

docker run -ti bvlc/caffe:cpu caffe –version
sudo docker run -v "/home/ubuntu/yuntong/:/opt/caffe/share"  -it bvlc/caffe:cpu /bin/bash
./examples/mnist/train_lenet.sh  

运行时间24分钟。

 Intel Caffe 与原生Caffe

原生caffe 只能在一个线程上跑

运行cifar10

该数据集共有60000张彩色图像,这些图像是32*32,分为10个类,每类6000张图。这里面有50000张用于训练,构成了5个训练批,每一批10000张图;另外10000用于测试,单独构成一批。测试批的数据里,取自10类中的每一类,每一类随机取1000张。抽剩下的就随机排列组成了训练批。注意一个训练批中的各类图像并不一定数量相同,总的来看训练批,每一类都有5000张图。

下面这幅图就是列举了10各类,每一类展示了随机的10张图片:

Intel Caffe 与原生Caffe

 

1. 下载和创建cifar10数据集:  

cd $CAFFE_ROOT
./data/mnist/get_cifar10.sh
./examples/cifar10/create_cifar10.sh
Creating lmdb...

2. 写入intel caffe docker中的caffe运行路径: ~/yuntong/caffe-1.0/examples$ vim cifar10/train_quick.sh

TOOLS=/opt/caffe/build/tools

3. 设置CPU模式:

vim cifar10_quick_solver_lr1.prototxt  cifar10_quick_solver.prototxt

4. Intel Caffe里面运行:

sudo docker run -v "/home/ubuntu/yuntong/:/opt/caffe/share"  -it bvlc/caffe:intel /bin/bash
cd /opt/caffe/share/caffe-1.0

./examples/cifar10/train_quick.sh 

 

训练时间: 07:20:47.795905  - 07:08:08.193487 = 12分40

5. 原生 Caffe里面运行:

sudo docker run -v "/home/ubuntu/yuntong/:/opt/caffe/share"  -it bvlc/caffe:cpu /bin/bash
cd /opt/caffe/share/caffe-1.0

./examples/cifar10/train_quick.sh 
07:26:23.522944
……………
I0408 08:56:53.116524    18 solver.cpp:310] Iteration 5000, loss = 0.449847
I0408 08:56:53.117141    18 solver.cpp:330] Iteration 5000, Testing net (#0)
I0408 08:57:30.968313    21 data_layer.cpp:73] Restarting data prefetching from start.
I0408 08:57:32.527096    18 solver.cpp:397]     Test net output #0: accuracy = 0.7561
I0408 08:57:32.527354    18 solver.cpp:397]     Test net output #1: loss = 0.72683 (* 1 = 0.72683 loss)
I0408 08:57:32.527364    18 solver.cpp:315] Optimization Done.
I0408 08:57:32.527381    18 caffe.cpp:259] Optimization Done.

用时 1.5小时

 

机器配置:

CPU: 8 * Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz

Memory: 16G

Storage: SATA SSD