为了对比滴滴云内测版NVIDIA A100,跑了一下Google Colab V100 的 TensorFlow基准测试,现在把结果记录一下!

 

运行环境

 

平台为:Google Colab

系统为:Ubuntu 18.04

显卡为:V100-SXM2-16GB

Python版本: 3.6

TensorFlow版本:1.15.2

 

 

显卡相关:

 

Google Colab V100 +TensorFlow1.15.2 性能测试Google Colab V100 +TensorFlow1.15.2 性能测试

 

测试方法

 

TensorFlow benchmarks测试方法:

https://github.com/tensorflow/benchmarks

 

 

ResNet50_v1.5 BS64

!python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
Step	Img/sec	total_loss
1 images/sec: 349.6 +/- 0.0 (jitter = 0.0) 7.848
10 images/sec: 349.9 +/- 0.2 (jitter = 0.4) 8.053
20 images/sec: 349.9 +/- 0.1 (jitter = 0.6) 8.103
30 images/sec: 350.2 +/- 0.1 (jitter = 0.6) 8.118
40 images/sec: 350.2 +/- 0.1 (jitter = 0.8) 7.894
50 images/sec: 350.3 +/- 0.1 (jitter = 0.8) 7.918
60 images/sec: 350.1 +/- 0.1 (jitter = 0.7) 8.103
70 images/sec: 350.0 +/- 0.1 (jitter = 0.8) 7.986
80 images/sec: 350.0 +/- 0.1 (jitter = 0.8) 7.808
90 images/sec: 350.0 +/- 0.1 (jitter = 0.8) 7.972
100 images/sec: 350.0 +/- 0.1 (jitter = 0.9) 7.649
----------------------------------------------------------------
total images/sec: 349.78
----------------------------------------------------------------

 

Resnet50 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
Step	Img/sec	total_loss
1 images/sec: 386.2 +/- 0.0 (jitter = 0.0) 8.220
10 images/sec: 384.8 +/- 0.4 (jitter = 0.7) 7.880
20 images/sec: 385.5 +/- 0.5 (jitter = 2.2) 7.910
30 images/sec: 385.7 +/- 0.4 (jitter = 2.6) 7.821
40 images/sec: 386.0 +/- 0.4 (jitter = 2.3) 8.004
50 images/sec: 386.2 +/- 0.3 (jitter = 2.4) 7.768
60 images/sec: 386.3 +/- 0.3 (jitter = 2.4) 8.118
70 images/sec: 386.1 +/- 0.3 (jitter = 2.5) 7.816
80 images/sec: 386.3 +/- 0.2 (jitter = 2.4) 7.977
90 images/sec: 386.2 +/- 0.2 (jitter = 2.5) 8.098
100 images/sec: 386.3 +/- 0.2 (jitter = 2.4) 8.045
----------------------------------------------------------------
total images/sec: 386.06
----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
Step	Img/sec	total_loss
1 images/sec: 911.0 +/- 0.0 (jitter = 0.0) 8.103
10 images/sec: 918.1 +/- 1.2 (jitter = 3.1) 7.756
20 images/sec: 914.3 +/- 2.3 (jitter = 4.3) 7.915
30 images/sec: 914.2 +/- 2.2 (jitter = 4.2) 7.769
40 images/sec: 912.8 +/- 1.7 (jitter = 6.5) 7.915
50 images/sec: 911.7 +/- 1.5 (jitter = 7.3) 7.888
60 images/sec: 912.9 +/- 1.3 (jitter = 7.0) 7.707
70 images/sec: 911.8 +/- 1.2 (jitter = 7.6) 8.011
80 images/sec: 912.3 +/- 1.1 (jitter = 7.3) 7.779
90 images/sec: 912.9 +/- 1.0 (jitter = 6.9) 7.805
100 images/sec: 913.1 +/- 0.9 (jitter = 6.8) 8.034
----------------------------------------------------------------
total images/sec: 912.08
----------------------------------------------------------------

 

AlexNet BS512

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
Step	Img/sec	total_loss
1 images/sec: 4824.0 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 4804.0 +/- 5.9 (jitter = 23.3) nan
20 images/sec: 4802.3 +/- 4.3 (jitter = 24.4) nan
30 images/sec: 4801.7 +/- 4.4 (jitter = 24.0) nan
40 images/sec: 4804.5 +/- 3.9 (jitter = 23.0) nan
50 images/sec: 4805.4 +/- 4.0 (jitter = 24.4) nan
60 images/sec: 4806.7 +/- 3.5 (jitter = 24.8) nan
70 images/sec: 4810.1 +/- 3.4 (jitter = 24.4) nan
80 images/sec: 4810.0 +/- 3.1 (jitter = 25.7) nan
90 images/sec: 4810.9 +/- 2.8 (jitter = 23.4) nan
100 images/sec: 4811.5 +/- 2.7 (jitter = 23.4) nan
----------------------------------------------------------------
total images/sec: 4808.18
----------------------------------------------------------------

 

Inception v3 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
Step	Img/sec	total_loss
1 images/sec: 255.3 +/- 0.0 (jitter = 0.0) 7.277
10 images/sec: 254.3 +/- 0.5 (jitter = 2.2) 7.304
20 images/sec: 254.4 +/- 0.3 (jitter = 2.4) 7.292
30 images/sec: 254.3 +/- 0.3 (jitter = 2.3) 7.402
40 images/sec: 254.2 +/- 0.3 (jitter = 2.3) 7.314
50 images/sec: 254.3 +/- 0.2 (jitter = 2.3) 7.283
60 images/sec: 254.3 +/- 0.2 (jitter = 2.2) 7.363
70 images/sec: 254.3 +/- 0.2 (jitter = 2.1) 7.350
80 images/sec: 254.3 +/- 0.2 (jitter = 2.2) 7.384
90 images/sec: 254.3 +/- 0.2 (jitter = 1.9) 7.318
100 images/sec: 254.3 +/- 0.1 (jitter = 1.9) 7.376
----------------------------------------------------------------
total images/sec: 254.19
----------------------------------------------------------------

 

VGG16 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
Step	Img/sec	total_loss
1 images/sec: 250.0 +/- 0.0 (jitter = 0.0) 7.319
10 images/sec: 250.2 +/- 0.2 (jitter = 0.2) 7.297
20 images/sec: 250.4 +/- 0.1 (jitter = 0.5) 7.284
30 images/sec: 250.4 +/- 0.1 (jitter = 0.6) 7.274
40 images/sec: 250.4 +/- 0.1 (jitter = 0.6) 7.288
50 images/sec: 250.4 +/- 0.1 (jitter = 0.6) 7.278
60 images/sec: 250.3 +/- 0.1 (jitter = 0.6) 7.278
70 images/sec: 250.3 +/- 0.1 (jitter = 0.6) 7.266
80 images/sec: 250.3 +/- 0.1 (jitter = 0.6) 7.288
90 images/sec: 250.2 +/- 0.1 (jitter = 0.6) 7.269
100 images/sec: 250.3 +/- 0.1 (jitter = 0.6) 7.270
----------------------------------------------------------------
total images/sec: 250.19
----------------------------------------------------------------

 

GoogLeNet BS128

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
Step	Img/sec	total_loss
1 images/sec: 1034.6 +/- 0.0 (jitter = 0.0) 7.105
10 images/sec: 1034.2 +/- 0.9 (jitter = 1.8) 7.105
20 images/sec: 1030.9 +/- 1.8 (jitter = 2.9) 7.094
30 images/sec: 1031.0 +/- 1.3 (jitter = 4.2) 7.086
40 images/sec: 1031.6 +/- 1.0 (jitter = 3.9) 7.067
50 images/sec: 1030.6 +/- 0.9 (jitter = 5.4) 7.093
60 images/sec: 1030.4 +/- 0.8 (jitter = 5.4) 7.050
70 images/sec: 1030.6 +/- 0.8 (jitter = 5.7) 7.073
80 images/sec: 1030.3 +/- 0.7 (jitter = 5.9) 7.078
90 images/sec: 1030.3 +/- 0.6 (jitter = 5.6) 7.078
100 images/sec: 1030.0 +/- 0.6 (jitter = 5.5) 7.069
----------------------------------------------------------------
total images/sec: 1029.42
----------------------------------------------------------------

 

ResNet152 BS32

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
Step	Img/sec	total_loss
1 images/sec: 137.0 +/- 0.0 (jitter = 0.0) 9.023
10 images/sec: 138.0 +/- 0.4 (jitter = 1.4) 8.574
20 images/sec: 138.5 +/- 0.3 (jitter = 1.6) 8.600
30 images/sec: 138.5 +/- 0.2 (jitter = 1.6) 8.755
40 images/sec: 138.6 +/- 0.2 (jitter = 1.6) 8.624
50 images/sec: 138.5 +/- 0.2 (jitter = 1.6) 8.801
60 images/sec: 138.4 +/- 0.1 (jitter = 1.7) 8.679
70 images/sec: 138.4 +/- 0.1 (jitter = 1.8) 9.112
80 images/sec: 138.4 +/- 0.1 (jitter = 1.7) 8.872
90 images/sec: 138.4 +/- 0.1 (jitter = 1.7) 9.025
100 images/sec: 138.4 +/- 0.1 (jitter = 1.7) 8.847
----------------------------------------------------------------
total images/sec: 138.39
----------------------------------------------------------------

性能对比

A100 和V100 和 2080ti 性能对比:

 

滴滴云A100 40G 性能测试 V100陪练!