Jetson TX2实现EfficientDet推理加速（二）

2021-10-30 13:58:03 阅读：916 来源： 互联网

标签：10 15 22 EfficientDet cc 2021 tensorflow Jetson TX2

一、参考资料

二、可能出现的问题

infer推理错误

[TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)

直接用pip安装pip install onnx_graphsurgeon报错

解决办法
pip install nvidia-pyindex
pip install onnx-graphsurgeon

生成onnx过程中，不支持

INFO:EfficientDetGraphSurgeon:Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.4000000059604645, 'iou_threshold': 0.5, 'score_activation': True, 'box_coding': 1}
Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.

安装dm-tree失败
unable to execute ‘bazel’: No such file or directory #1089
dm-tree安装方法
 dm-tree源码

Failed to build dm-tree
Installing collected packages: dm-tree
    Running setup.py install for dm-tree ... error

源码编译安装方法一(未成功)：

CMake-GUI关键配置
CMAKE_SOURCE_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/tree
CMAKE_BINARY_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree

输出：
Current build type is: RELEASE
PROJECT_BINARY_DIR is: /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree
pybind11 v2.6.2 
Configuring done
Generating done

源码编译安装方法一出错

/usr/bin/ld: cannot open output file tree/_tree.cpython-37m-aarch64-linux-gnu.so: No such file or directory
collect2: error: ld returned 1 exit status
CMakeFiles/_tree.dir/build.make:101: recipe for target 'tree/_tree.cpython-37m-aarch64-linux-gnu.so' failed
make[2]: *** [tree/_tree.cpython-37m-aarch64-linux-gnu.so] Error 1
CMakeFiles/Makefile2:127: recipe for target 'CMakeFiles/_tree.dir/all' failed
make[1]: *** [CMakeFiles/_tree.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

源码编译安装方法二(成功)：
先安装requirements.txt里的依赖包
pip install -r /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/docs/requirements.txt

python setup.py install

安装tensorflow-model-optimization失败

Failed to build dm-tree
Installing collected packages: dm-tree, tensorflow-model-optimization
    Running setup.py install for dm-tree ... error

安装好 dm-tree，即可顺利安装 tensorflow-model-optimization

安装bazel失败
Install Tensorflow Object Detection API for

解决办法：
https://github.com/jkjung-avt/jetson_nano/blob/master/install_bazel-3.1.0.sh

通过源码编译安装bazel
The error complains about a missing binary called bazel.
You can install it via building from the source

#!/bin/bash
#
# Reference: https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu

set -e

folder=${HOME}/src
mkdir -p $folder

echo "** Install requirements"
sudo apt-get install -y pkg-config zip g++ zlib1g-dev unzip
sudo apt-get install -y openjdk-8-jdk

echo "** Download bazel-3.1.0 sources"
pushd $folder
if [ ! -f bazel-3.1.0-dist.zip ]; then
  wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
fi

echo "** Build and install bazel-3.1.0"

在GTX 1650服务器中运行的环境，直接pip安装到Jetson TX2失败，部分包无法安装

pip install -r requirements-gpu.txt

解决办法：
删去requirements-gpu.txt文件中所有包的版本号，默认安装与Jetson TX2匹配的最新版本

创建virtualenv虚拟环境失败

tx2@tx2:/media/mydisk/MyDocuments/PyProjects/automl/efficientdet$ virtualenv -p /usr/bin/python3 venv
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python3
Also creating executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python
Installing setuptools, pkg_resources, pip, wheel...
  Complete output from command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel:
  Exception:
Traceback (most recent call last):
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
    with self._build_session(options) as session:
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session
    if options.cache_dir else None
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not int
----------------------------------------
...Installing setuptools, pkg_resources, pip, wheel...done.
Traceback (most recent call last):
  File "/usr/bin/virtualenv", line 11, in <module>
    load_entry_point('virtualenv==15.1.0', 'console_scripts', 'virtualenv')()
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 724, in main
    symlink=options.symlink)
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 992, in create_environment
    download=download,
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 922, in install_wheel
    call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=SCRIPT)
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 817, in call_subprocess
    % (cmd_desc, proc.returncode))
OSError: Command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel failed with error code 2

错误原因：
Traceback (most recent call last):
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
    with self._build_session(options) as session:
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session

virtualenv -p /usr/bin/python3 venv
python3与pip版本不匹配，创建virtualenv找到的pip版本是9.0.1

解决办法：
用pycharm自动创建virtualenv虚拟环境

生成FP32引擎成功，但生成FP16引擎失败

[TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
Traceback (most recent call last):
  File "build_engine.py", line 240, in <module>
    main(args)
  File "build_engine.py", line 212, in main
    args.calib_batch_size)
  File "build_engine.py", line 203, in create_engine
    with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
AttributeError: __enter__

[EfficientNMS_TRT not working on jetson nano (TensorRT 8.0.1) #1538](https://github.com/NVIDIA/TensorRT/issues/1538)
错误原因：
This problem did not occur if BatchedNMS_TRT was used instead of EfficientNMS_TRT by giving the --legacy_plugins option when creating the onnx file in create_onnx.py.

What's even more strange is that it was built without any problems at Jetson Xavier NX. (same Jetpack, tensorrt version).
有人尝试，在Jetson TX2中会出现这个问题，但是在Jetson Xavier NX没有任何问题。

解决办法：
生成onnx的时候，添加 `--legacy_plugins` 参数

python create_onnx.py \
    --input_shape '1,512,512,3' \
    --saved_model /media/mydisk/YOYOFile/saved_model \
    --onnx /media/mydisk/YOYOFile/saved_model_onnx/model.onnx \
    --legacy_plugins

如果无法跟踪tensorRT错误信息

[builder.build_engine throws AttributeError: __enter__ #234](https://github.com/NVIDIA/TensorRT/issues/234)
如果找不到tensorRT报错的原因，可能是tensorRT内部的错误，且tensorRT的日志信息不明显，可以降低tensorRT的日志等级。

解决方法：修改日志的等级
trt.Logger.ERROR 改为 trt.Logger.VERBOSE

et the TRT_LOGGER's verbosity to VERBOSE: TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

显存不足

[TensorRT] ERROR: Tactic Device request: 1686MB Available: 1536MB. Device memory is insufficient to use tactic.

Jetson TX2提示现存不足的ERROR，但是程序并不会终止，可以推测Jetson TX2内部自动进行内存/显存优化，防止因为显存不够的问题导致程序终止。

(venv) tx2@tx2:/media/mydisk/MyDocuments/PyProjects/TensorRT/samples/python/efficientdet$ time python compare_tf.py \
>     --engine /media/mydisk/YOYOFile/saved_model_trt_fp16/engine.trt \
>     --saved_model /media/mydisk/YOYOFile/saved_model \
>     --input /media/mydisk/YOYOFile/coco_calib \
>     --output /media/mydisk/YOYOFile/output_fp16
2021-10-22 15:35:22.133357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-10-22 15:35:34.777079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-10-22 15:35:34.777723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:35:34.777983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2021-10-22 15:35:34.778194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-10-22 15:35:34.778427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-10-22 15:35:34.778583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2021-10-22 15:35:34.778749: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-10-22 15:35:34.779183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-10-22 15:35:34.825369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-10-22 15:35:34.861433: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2021-10-22 15:35:34.861805: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-22 15:35:34.862251: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:35:34.862703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:35:34.862908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-10-22 15:37:02.440933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:02.441206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2021-10-22 15:37:02.441661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:02.442112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:02.442278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-10-22 15:37:02.442651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-10-22 15:37:09.339992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-22 15:37:09.340386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-10-22 15:37:09.340484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-10-22 15:37:09.341206: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:09.341823: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:09.342411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-10-22 15:37:09.342745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 80 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2021-10-22 15:40:55.306220: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-22 15:40:55.546753: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 31250000 Hz
len(batch_images): ['/media/mydisk/YOYOFile/coco_calib/COCO_train2014_000000000009.jpg']
2021-10-22 15:42:32.819948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-22 15:42:33.464673: I tensorflow/stream_executor/cuda/cuda_dnn.cc:380] Loaded cuDNN version 8201
2021-10-22 15:42:33.958547: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 24.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-10-22 15:42:33.983722: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2021-10-22 15:42:42.844197: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 22.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-10-22 15:42:43.485925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-10-22 15:42:45.070240: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-10-22 15:42:45.094177: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
...
...
2021-10-22 15:42:55.842108: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1474560 totalling 2.81MiB
2021-10-22 15:42:55.842169: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 27442176 totalling 26.17MiB
2021-10-22 15:42:55.842230: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 58.26MiB
2021-10-22 15:42:55.842290: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 84848640 memory_limit_: 84848640 available bytes: 0 curr_region_allocation_bytes_: 134217728
2021-10-22 15:42:55.869607: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                        84848640
InUse:                        61095168
MaxInUse:                     68186112
NumAllocs:                        1583
MaxAllocSize:                 27442176
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-10-22 15:42:55.869973: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ****************************************___*******************************xxx_______________________
2021-10-22 15:42:55.936569: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at fused_batch_norm_op.cc:1360 : Resource exhausted: OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "compare_tf.py", line 263, in <module>
    main(args)
  File "compare_tf.py", line 234, in main
    tf_images, tf_detections = run(tf_batcher, tf_infer, "TensorFlow", args.nms_threshold)
  File "compare_tf.py", line 124, in run
    res_detections += inferer.infer(batch, scales, nms_threshold)
  File "compare_tf.py", line 77, in infer
    output = self.pred_fn(**input)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1711, in __call__
    return self._call_impl(args, kwargs)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
    args, kwargs, cancellation_manager)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1729, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1778, in _call_with_flat_signature
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[strided_slice_18/_36]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_pruned_42115]

Function call stack:
pruned -> pruned


real	7m57.829s
user	6m34.100s
sys	0m14.384s

标签：10,15,22,EfficientDet,cc,2021,tensorflow,Jetson,TX2
来源： https://blog.csdn.net/m0_37605642/article/details/121045933

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

Jetson TX2实现EfficientDet推理加速（二）

一、参考资料

二、可能出现的问题