ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Jetson TX2实现EfficientDet推理加速(二)

2021-10-30 13:58:03  阅读:916  来源: 互联网

标签:10 15 22 EfficientDet cc 2021 tensorflow Jetson TX2


一、参考资料

TensorRT实现EfficientDet推理加速(一)

二、可能出现的问题

  • infer推理错误

    [TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
    
  • 直接用pip安装pip install onnx_graphsurgeon报错

    解决办法
    pip install nvidia-pyindex
    pip install onnx-graphsurgeon
    
  • 生成onnx过程中,不支持

    INFO:EfficientDetGraphSurgeon:Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.4000000059604645, 'iou_threshold': 0.5, 'score_activation': True, 'box_coding': 1}
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    
  • 安装dm-tree失败
    unable to execute ‘bazel’: No such file or directory #1089
    dm-tree安装方法
    dm-tree源码

    Failed to build dm-tree
    Installing collected packages: dm-tree
        Running setup.py install for dm-tree ... error
    
    源码编译安装方法一(未成功):
    
    CMake-GUI关键配置
    CMAKE_SOURCE_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/tree
    CMAKE_BINARY_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree
    
    输出:
    Current build type is: RELEASE
    PROJECT_BINARY_DIR is: /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree
    pybind11 v2.6.2 
    Configuring done
    Generating done
    
  • 源码编译安装方法一出错

    /usr/bin/ld: cannot open output file tree/_tree.cpython-37m-aarch64-linux-gnu.so: No such file or directory
    collect2: error: ld returned 1 exit status
    CMakeFiles/_tree.dir/build.make:101: recipe for target 'tree/_tree.cpython-37m-aarch64-linux-gnu.so' failed
    make[2]: *** [tree/_tree.cpython-37m-aarch64-linux-gnu.so] Error 1
    CMakeFiles/Makefile2:127: recipe for target 'CMakeFiles/_tree.dir/all' failed
    make[1]: *** [CMakeFiles/_tree.dir/all] Error 2
    Makefile:90: recipe for target 'all' failed
    make: *** [all] Error 2
    
    源码编译安装方法二(成功):
    先安装requirements.txt里的依赖包
    pip install -r /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/docs/requirements.txt
    
    python setup.py install
    
  • 安装tensorflow-model-optimization失败

    Failed to build dm-tree
    Installing collected packages: dm-tree, tensorflow-model-optimization
        Running setup.py install for dm-tree ... error
    
    安装好 dm-tree,即可顺利安装 tensorflow-model-optimization
    
  • 安装bazel失败
    Install Tensorflow Object Detection API for

    解决办法:
    https://github.com/jkjung-avt/jetson_nano/blob/master/install_bazel-3.1.0.sh
    
    通过源码编译安装bazel
    The error complains about a missing binary called bazel.
    You can install it via building from the source
    
    #!/bin/bash
    #
    # Reference: https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu
    
    set -e
    
    folder=${HOME}/src
    mkdir -p $folder
    
    echo "** Install requirements"
    sudo apt-get install -y pkg-config zip g++ zlib1g-dev unzip
    sudo apt-get install -y openjdk-8-jdk
    
    echo "** Download bazel-3.1.0 sources"
    pushd $folder
    if [ ! -f bazel-3.1.0-dist.zip ]; then
      wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
    fi
    
    echo "** Build and install bazel-3.1.0"
    
  • 在GTX 1650服务器中运行的环境,直接pip安装到Jetson TX2失败,部分包无法安装

    pip install -r requirements-gpu.txt
    
    解决办法:
    删去requirements-gpu.txt文件中所有包的版本号,默认安装与Jetson TX2匹配的最新版本
    
  • 创建virtualenv虚拟环境失败

    tx2@tx2:/media/mydisk/MyDocuments/PyProjects/automl/efficientdet$ virtualenv -p /usr/bin/python3 venv
    Already using interpreter /usr/bin/python3
    Using base prefix '/usr'
    New python executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python3
    Also creating executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python
    Installing setuptools, pkg_resources, pip, wheel...
      Complete output from command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel:
      Exception:
    Traceback (most recent call last):
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
        status = self.run(options, args)
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
        with self._build_session(options) as session:
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session
        if options.cache_dir else None
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/posixpath.py", line 80, in join
        a = os.fspath(a)
    TypeError: expected str, bytes or os.PathLike object, not int
    ----------------------------------------
    ...Installing setuptools, pkg_resources, pip, wheel...done.
    Traceback (most recent call last):
      File "/usr/bin/virtualenv", line 11, in <module>
        load_entry_point('virtualenv==15.1.0', 'console_scripts', 'virtualenv')()
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 724, in main
        symlink=options.symlink)
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 992, in create_environment
        download=download,
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 922, in install_wheel
        call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=SCRIPT)
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 817, in call_subprocess
        % (cmd_desc, proc.returncode))
    OSError: Command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel failed with error code 2
    
    错误原因:
    Traceback (most recent call last):
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
        status = self.run(options, args)
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
        with self._build_session(options) as session:
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session
    
    virtualenv -p /usr/bin/python3 venv
    python3与pip版本不匹配,创建virtualenv找到的pip版本是9.0.1
    
    解决办法:
    用pycharm自动创建virtualenv虚拟环境
    
  • 生成FP32引擎成功,但生成FP16引擎失败

    [TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
    Traceback (most recent call last):
      File "build_engine.py", line 240, in <module>
        main(args)
      File "build_engine.py", line 212, in main
        args.calib_batch_size)
      File "build_engine.py", line 203, in create_engine
        with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
    AttributeError: __enter__
    
    [EfficientNMS_TRT not working on jetson nano (TensorRT 8.0.1) #1538](https://github.com/NVIDIA/TensorRT/issues/1538)
    错误原因:
    This problem did not occur if BatchedNMS_TRT was used instead of EfficientNMS_TRT by giving the --legacy_plugins option when creating the onnx file in create_onnx.py.
    
    What's even more strange is that it was built without any problems at Jetson Xavier NX. (same Jetpack, tensorrt version).
    有人尝试,在Jetson TX2中会出现这个问题,但是在Jetson Xavier NX没有任何问题。
    
    解决办法:
    生成onnx的时候,添加 `--legacy_plugins` 参数
    
    python create_onnx.py \
        --input_shape '1,512,512,3' \
        --saved_model /media/mydisk/YOYOFile/saved_model \
        --onnx /media/mydisk/YOYOFile/saved_model_onnx/model.onnx \
        --legacy_plugins
    
  • 如果无法跟踪tensorRT错误信息

    [builder.build_engine throws AttributeError: __enter__ #234](https://github.com/NVIDIA/TensorRT/issues/234)
    如果找不到tensorRT报错的原因,可能是tensorRT内部的错误,且tensorRT的日志信息不明显,可以降低tensorRT的日志等级。
    
    解决方法:修改日志的等级
    trt.Logger.ERROR 改为 trt.Logger.VERBOSE
    
    et the TRT_LOGGER's verbosity to VERBOSE: TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
    
  • 显存不足

    [TensorRT] ERROR: Tactic Device request: 1686MB Available: 1536MB. Device memory is insufficient to use tactic.
    
    Jetson TX2提示现存不足的ERROR,但是程序并不会终止,可以推测Jetson TX2内部自动进行内存/显存优化,防止因为显存不够的问题导致程序终止。
    
    (venv) tx2@tx2:/media/mydisk/MyDocuments/PyProjects/TensorRT/samples/python/efficientdet$ time python compare_tf.py \
    >     --engine /media/mydisk/YOYOFile/saved_model_trt_fp16/engine.trt \
    >     --saved_model /media/mydisk/YOYOFile/saved_model \
    >     --input /media/mydisk/YOYOFile/coco_calib \
    >     --output /media/mydisk/YOYOFile/output_fp16
    2021-10-22 15:35:22.133357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:35:34.777079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
    2021-10-22 15:35:34.777723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.777983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
    pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
    coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
    2021-10-22 15:35:34.778194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:35:34.778427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
    2021-10-22 15:35:34.778583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
    2021-10-22 15:35:34.778749: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
    2021-10-22 15:35:34.779183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
    2021-10-22 15:35:34.825369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
    2021-10-22 15:35:34.861433: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
    2021-10-22 15:35:34.861805: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-10-22 15:35:34.862251: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.862703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.862908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
    2021-10-22 15:37:02.440933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.441206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
    pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
    coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
    2021-10-22 15:37:02.441661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.442112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.442278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
    2021-10-22 15:37:02.442651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:37:09.339992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
    2021-10-22 15:37:09.340386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
    2021-10-22 15:37:09.340484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
    2021-10-22 15:37:09.341206: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.341823: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.342411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.342745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 80 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
    2021-10-22 15:40:55.306220: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
    2021-10-22 15:40:55.546753: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 31250000 Hz
    len(batch_images): ['/media/mydisk/YOYOFile/coco_calib/COCO_train2014_000000000009.jpg']
    2021-10-22 15:42:32.819948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-10-22 15:42:33.464673: I tensorflow/stream_executor/cuda/cuda_dnn.cc:380] Loaded cuDNN version 8201
    2021-10-22 15:42:33.958547: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 24.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:33.983722: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
    2021-10-22 15:42:42.844197: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 22.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:43.485925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
    2021-10-22 15:42:45.070240: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:45.094177: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    ...
    ...
    2021-10-22 15:42:55.842108: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1474560 totalling 2.81MiB
    2021-10-22 15:42:55.842169: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 27442176 totalling 26.17MiB
    2021-10-22 15:42:55.842230: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 58.26MiB
    2021-10-22 15:42:55.842290: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 84848640 memory_limit_: 84848640 available bytes: 0 curr_region_allocation_bytes_: 134217728
    2021-10-22 15:42:55.869607: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
    Limit:                        84848640
    InUse:                        61095168
    MaxInUse:                     68186112
    NumAllocs:                        1583
    MaxAllocSize:                 27442176
    Reserved:                            0
    PeakReserved:                        0
    LargestFreeBlock:                    0
    
    2021-10-22 15:42:55.869973: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ****************************************___*******************************xxx_______________________
    2021-10-22 15:42:55.936569: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at fused_batch_norm_op.cc:1360 : Resource exhausted: OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    Traceback (most recent call last):
      File "compare_tf.py", line 263, in <module>
        main(args)
      File "compare_tf.py", line 234, in main
        tf_images, tf_detections = run(tf_batcher, tf_infer, "TensorFlow", args.nms_threshold)
      File "compare_tf.py", line 124, in run
        res_detections += inferer.infer(batch, scales, nms_threshold)
      File "compare_tf.py", line 77, in infer
        output = self.pred_fn(**input)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1711, in __call__
        return self._call_impl(args, kwargs)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
        args, kwargs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1729, in _call_impl
        return self._call_with_flat_signature(args, kwargs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1778, in _call_with_flat_signature
        return self._call_flat(args, self.captured_inputs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
        ctx, args, cancellation_manager=cancellation_manager))
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 596, in call
        ctx=ctx)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
        inputs, attrs, num_outputs)
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
      (0) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    	 [[strided_slice_18/_36]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
      (1) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    0 successful operations.
    0 derived errors ignored. [Op:__inference_pruned_42115]
    
    Function call stack:
    pruned -> pruned
    
    
    real	7m57.829s
    user	6m34.100s
    sys	0m14.384s
    

标签:10,15,22,EfficientDet,cc,2021,tensorflow,Jetson,TX2
来源: https://blog.csdn.net/m0_37605642/article/details/121045933

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有