ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

python – 多个gpus(1080Ti)不能加速tensorflow中的训练,测试cifar10_estimator代码

2019-07-10 14:08:08  阅读:364  来源: 互联网

标签:python python-3-x machine-learning tensorflow multi-gpu


我试图在2或3个1080Ti上测试多GPU版本cifar10_estimator的性能,但没有收到加速.

我找到了一些有关硬件here的有用信息,但仍然困惑如何解决它.

我的环境:

> Ubuntu VERSION = 16.04.5 LTS(Xenial Xerus)
> Python3
> CUDA_VERSION = 9.0.176
> tensorflow-gpu = 1.11.0

GPU信息:

nvidia-smi topo -m

    GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity
GPU0     X  PIX PHB PHB SYS SYS SYS SYS 0-7
GPU1    PIX  X  PHB PHB SYS SYS SYS SYS 0-7
GPU2    PHB PHB  X  PIX SYS SYS SYS SYS 0-7
GPU3    PHB PHB PIX  X  SYS SYS SYS SYS 0-7
GPU4    SYS SYS SYS SYS  X  PIX PHB PHB 8-15
GPU5    SYS SYS SYS SYS PIX  X  PHB PHB 8-15
GPU6    SYS SYS SYS SYS PHB PHB  X  PIX 8-15
GPU7    SYS SYS SYS SYS PHB PHB PIX  X  8-15

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

1 gpu bach_size = 128

INFO:tensorflow:loss = 2.2576141, step = 200 (3.729 sec)
INFO:tensorflow:learning_rate = 0.1, loss = 2.2576141 (3.729 sec)
INFO:tensorflow:Average examples/sec: 2821.06 (2858.65), step = 200
INFO:tensorflow:Average examples/sec: 2847.23 (3496.06), step = 210
INFO:tensorflow:Average examples/sec: 2857.91 (3102.29), step = 220
INFO:tensorflow:Average examples/sec: 2867.04 (3083.62), step = 230
INFO:tensorflow:Average examples/sec: 2889.21 (3514.15), step = 240
INFO:tensorflow:Average examples/sec: 2913.15 (3636.28), step = 250
INFO:tensorflow:Average examples/sec: 2915.99 (2988.94), step = 260
INFO:tensorflow:Average examples/sec: 2901.94 (2578.95), step = 270
INFO:tensorflow:Average examples/sec: 2888.87 (2575.46), step = 280
INFO:tensorflow:Average examples/sec: 2892.13 (2986.66), step = 290
INFO:tensorflow:global_step/sec: 24.25

2 gpu bach_size = 256

INFO:tensorflow:loss = 2.4630964, step = 200 (5.971 sec)
INFO:tensorflow:learning_rate = 0.1, loss = 2.4630964 (5.971 sec)
INFO:tensorflow:Average examples/sec: 3255.68 (4296.71), step = 200
INFO:tensorflow:Average examples/sec: 3297.51 (4437.93), step = 210
INFO:tensorflow:Average examples/sec: 3332.15 (4275.33), step = 220
INFO:tensorflow:Average examples/sec: 3363.86 (4254.65), step = 230
INFO:tensorflow:Average examples/sec: 3395.09 (4316.94), step = 240
INFO:tensorflow:Average examples/sec: 3418.44 (4094.23), step = 250
INFO:tensorflow:Average examples/sec: 3447.17 (4364.24), step = 260
INFO:tensorflow:Average examples/sec: 3474.56 (4379.02), step = 270
INFO:tensorflow:Average examples/sec: 3492.73 (4067.13), step = 280
INFO:tensorflow:Average examples/sec: 3514.19 (4244.23), step = 290
INFO:tensorflow:global_step/sec: 16.6026

3 gpu bach_size = 384

INFO:tensorflow:loss = 2.0980535, step = 200 (9.329 sec)
INFO:tensorflow:learning_rate = 0.1, loss = 2.0980535 (9.329 sec)
INFO:tensorflow:Average examples/sec: 3214.65 (4165.7), step = 200
INFO:tensorflow:Average examples/sec: 3272.85 (5130.99), step = 210
INFO:tensorflow:Average examples/sec: 3324.15 (4955.13), step = 220
INFO:tensorflow:Average examples/sec: 3376.65 (5174.76), step = 230
INFO:tensorflow:Average examples/sec: 3425.48 (5132.15), step = 240
INFO:tensorflow:Average examples/sec: 3468.29 (4954.35), step = 250
INFO:tensorflow:Average examples/sec: 3509.91 (5014.23), step = 260
INFO:tensorflow:Average examples/sec: 3544.29 (4755.56), step = 270
INFO:tensorflow:Average examples/sec: 3579.69 (4901.39), step = 280
INFO:tensorflow:Average examples/sec: 3617.84 (5156.66), step = 290
INFO:tensorflow:global_step/sec: 13.1009

enter image description here

解决方法:

我想我现在可以回答我的问题.如果我想为多个gpus提供更高的性能,我应该查看https://github.com/tensorflow/benchmarks/.有关我在tf_cnn_benchmarks的测试结果,请参阅this issue.

标签:python,python-3-x,machine-learning,tensorflow,multi-gpu
来源: https://codeday.me/bug/20190710/1424840.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有