ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

TensorFlow Serving 使用 及 部署

2020-11-23 17:56:30  阅读:227  来源: 互联网

标签:Serving 部署 0.0 -- file tensorflow TensorFlow model config


TensorFlow Serving

文章目录

一、Quick Start

0x00 变量

服务部署及调用时需要用到的变量有:

变量名适用流程说明
$MODEL_PATH部署本地模型目录
$MODEL_NAME部署、调用模型名称,部署时指定;调用时根据模型名进行调用
$MODEL_VERSION

0x01 使用docker安装TensorFlow Serving

docker pull tensorflow/serving

0x02 准备模型,并按版本布置目录

将包含saved_model.pb , assets , variables 的模型目录以数字版本号命名,服务会自动调用目录内版本号最高的模型文件,模型目录示例如下:

e.g.

# 此时模型目录 $MODEL_PATH = "/Users/eric/work/wheel/tmp/tensorflow_serving/fm_models"

➜ tree /Users/eric/work/wheel/tmp/tensorflow_serving/fm_models
/Users/eric/work/wheel/tmp/tensorflow_serving/fm_models
├── 001
│   ├── assets
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── 002
    ├── assets
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

6 directories, 6 files

0x03 以docker run的方式简单部署TensorFlow Serving

运行方式模板为:


docker run -p 8501:8501 \
  --mount type=bind,\
source=$MODEL_PATH,\
target=/models/$MODEL_NAME \
  -e MODEL_NAME=$MODEL_NAME -t tensorflow/serving &

按照上述部署,可以得到$MODEL_PATH = "/Users/eric/work/wheel/tmp/tensorflow_serving/fm_models" ;

同时令 $MODEL_NAME="fm_model",可以得到完整运行指令:

docker run -p 8501:8501 \
  --mount type=bind,\
source=/Users/eric/work/wheel/tmp/tensorflow_serving/fm_models,\
target=/models/fm_model \
  -e MODEL_NAME=fm_model -t tensorflow/serving &

0x04 curl简单调用

调用 :

# /v1/models/<model name>/versions/<version number>


curl -d '{"instances": [[1.0, 0.0, 22.0, 48.0, 1.0, 5.0, 1.0, 0.0, 1.0, 2.0, 2.0, 9.0, 9.0, 9.0, 9.0, 9.0, 4.0, 6.0, 6.0, 5.0, 5.0, 6.0, 8.0, 8.0, 0.0471, 2119.0, 8.0, 0.0, 1.0, 2.0, 1.0, 1.0, 15.0, 3.0, 1.0, 0.0, 0.0, 0.0, 10.0, 22.0, 1.0, 5.0, 1.0, 2.0, 0.0, 9.0, 9.0, 9.0, 9.0, 9.0, 3.0, 6.0, 6.0, 5.0, 5.0, 6.0, 8.0, 8.0, 2115.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0162, 0.0588, 0.0511, 0.0082, 0.0, 0.0523, 0.0362, 0.0367, 0.0, 0.0, 9777.0, 3761.0],[1.0, 0.0, 22.0, 48.0, 1.0, 5.0, 1.0, 0.0, 1.0, 2.0, 2.0, 9.0, 9.0, 9.0, 9.0, 9.0, 4.0, 6.0, 6.0, 5.0, 5.0, 6.0, 8.0, 8.0, 0.0471, 2119.0, 8.0, 0.0, 1.0, 2.0, 1.0, 1.0, 15.0, 3.0, 1.0, 0.0, 0.0, 0.0, 10.0, 22.0, 1.0, 5.0, 1.0, 2.0, 0.0, 9.0, 9.0, 9.0, 9.0, 9.0, 3.0, 6.0, 6.0, 5.0, 5.0, 6.0, 8.0, 8.0, 2115.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0162, 0.0588, 0.0511, 0.0082, 0.0, 0.0523, 0.0362, 0.0367, 0.0, 0.0, 9777.0, 3761.0]]}' \
  -X POST http://localhost:8501/v1/models/fm_model:predict

返回结果:

{
    "predictions": [[0.00173848867], [0.00173848867]
    ]
}%

0x05 RESTful API

TensorFlow ModelServer 运行在 host:port 上并接受 REST API 请求:

POST http://host:port/<URI>:<VERB>

URI: /v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]
VERB: classify|regress|predict

/versions/${MODEL_VERSION} 为可选部分,如果省略的话则使用最新版本的模型。

请求 URLs 的示例如下:

http://host:port/v1/models/iris:classify
http://host:port/v1/models/mnist/versions/314:predict

请求和响应均为一个 JSON 对象,其内容取决于请求的类型和 VERB,更多内容请参见 API Specific 章节。

为了处理可能的错误,APIs 返回的 JSON 对象中包含了一个以 error 为键,错误内容为值的键值对:

{
  "error": <error message string>
}

二、 使用Dockerfile部署TensorFlow ModelServer

0x00 目录结构

.
├── Dockerfile
└── model_server_config

0x01 Dockerfile

FROM debian:10-slim

RUN mkdir /s
WORKDIR /s

ADD . .

RUN sed -i "s/deb.debian.org/mirrors.aliyun.com/g;s/security.debian.org/mirrors.aliyun.com/g" /etc/apt/sources.list && \
    apt-get update && apt-get install -y curl unzip gnupg2 wget && \
    echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" |  tee /etc/apt/sources.list.d/tensorflow-serving.list && \
    curl "https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg" |  apt-key add - && \
    apt-get update && apt-get install -y tensorflow-model-server && \
    wget "https://bigdata-recommend-cn-dev.s3.cn-northwest-1.amazonaws.com.cn/tensorflow/fm_model.tf.zip" && \
    unzip -o -d /tmp fm_model.tf.zip &&  mkdir /tmp/fm_model &&  mv /tmp/fm_model.tf/ /tmp/fm_model/001 && rm fm_model.tf.zip

EXPOSE 8080

ENTRYPOINT ["/usr/bin/tensorflow_model_server","--rest_api_port=8080","--model_config_file=/s/model_server_config"]

0x02 配置文件 model_server_config

参数说明:

  • model_config_list.config.name : 模型名称,也即上文中提到的$MODEL_NAME
  • model_config_list.config.base_path : 多版本模型目录,也即上文中提到的$MODEL_PATH
  • model_config_list.config.model_platform :tensorflow
model_config_list {
  config {
    name: 'fm'
    base_path: '/tmp/fm_model/'
    model_platform: 'tensorflow'
  }
}

三、 TensorFlow Serving模型预测速度对比Golang SDK

0x00 基于RESTful API的测试

机器服务调用方式batch大小耗时
本机servingpostman17ms
本机servinggo run170ms
本机servinggo run200070ms *
本机sdkgo run11ms
本机sdkgo run200090ms *
远程servingpostman135ms
远程servinggo run1600ms
远程servinggo run2000600ms *

0x01 基于gPRPC API的测试

四、TensorFlow Serving 原理

五、tensorflow_model_server – help

usage: /usr/bin/tensorflow_model_server
Flags:
	--port=8500                      	int32	Port to listen on for gRPC API
	--grpc_socket_path=""            	string	If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
	--rest_api_port=0                	int32	Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
	--rest_api_num_threads=32        	int32	Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
	--rest_api_timeout_in_ms=30000   	int32	Timeout for HTTP/REST API calls.
	--enable_batching=false          	bool	enable batching
	--allow_version_labels_for_unavailable_models=false	bool	If true, allows assigning unused version labels to models that are not available yet.
	--batching_parameters_file=""    	string	If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
	--model_config_file=""           	string	If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
	--model_config_file_poll_wait_seconds=0	int32	Interval in seconds between each poll of the filesystemfor model_config_file. If unset or set to zero, poll will be done exactly once and not periodically. Setting this to negative is reserved for testing purposes only.
	--model_name="default"           	string	name of model (ignored if --model_config_file flag is set)
	--model_base_path=""             	string	path to export (ignored if --model_config_file flag is set, otherwise required)
	--max_num_load_retries=5         	int32	maximum number of times it retries loading a model after the first failure, before giving up. If set to 0, a load is attempted only once. Default: 5
	--load_retry_interval_micros=60000000	int64	The interval, in microseconds, between each servable load retry. If set negative, it doesn't wait. Default: 1 minute
	--file_system_poll_wait_seconds=1	int32	Interval in seconds between each poll of the filesystem for new model version. If set to zero poll will be exactly done once and not periodically. Setting this to negative value will disable polling entirely causing ModelServer to indefinitely wait for a new model at startup. Negative values are reserved for testing purposes only.
	--flush_filesystem_caches=true   	bool	If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.
	--tensorflow_session_parallelism=0	int64	Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
	--tensorflow_intra_op_parallelism=0	int64	Number of threads to use to parallelize the executionof an individual op. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
	--tensorflow_inter_op_parallelism=0	int64	Controls the number of operators that can be executed simultaneously. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
	--ssl_config_file=""             	string	If non-empty, read an ascii SSLConfig protobuf from the supplied file name and set up a secure gRPC channel
	--platform_config_file=""        	string	If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)
	--per_process_gpu_memory_fraction=0.000000	float	Fraction that each process occupies of the GPU memory space the value is between 0.0 and 1.0 (with 0.0 as the default) If 1.0, the server will allocate all the memory when the server starts, If 0.0, Tensorflow will automatically select a value.
	--saved_model_tags="serve"       	string	Comma-separated set of tags corresponding to the meta graph def to load from SavedModel.
	--grpc_channel_arguments=""      	string	A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000)
	--enable_model_warmup=true       	bool	Enables model warmup, which triggers lazy initializations (such as TF optimizations) at load time, to reduce first request latency.
	--version=false                  	bool	Display version
	--monitoring_config_file=""      	string	If non-empty, read an ascii MonitoringConfig protobuf from the supplied file name
	--remove_unused_fields_from_bundle_metagraph=true	bool	Removes unused fields from MetaGraphDef proto message to save memory.
	--prefer_tflite_model=false      	bool	EXPERIMENTAL; CAN BE REMOVED ANYTIME! Prefer TensorFlow Lite model from `model.tflite` file in SavedModel directory, instead of the TensorFlow model from `saved_model.pb` file. If no TensorFlow Lite model found, fallback to TensorFlow model.

参考

标签:Serving,部署,0.0,--,file,tensorflow,TensorFlow,model,config
来源: https://blog.csdn.net/leiflyy/article/details/110003671

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有