TVM:解析TVM算子

2022-08-07 00:05:47 阅读：290 来源： 互联网

标签：kernel layout name ._ TVM 算子解析 data op

在对[TVM:编译流程]一文中，从ONNX模型中读取模型并转换为relay IR，其中调用_convert_operator函数关于将onnx算子转换成Relay算子，其中如何实现当时直接跳过去了，本节将以卷积算子为例，看下Relay表达式是如何转换为TOPI算子并结合TVM的scheduler在后端上运行的

Relay卷积算子的转换过程

卷积算子转换过程如下：
"Conv": Conv.get_converter(opset),中get_converter()的实现如下：

class OnnxOpConverter(object):
    """A helper class for holding onnx op converters."""

    @classmethod
    def get_converter(cls, opset):
        """Get converter matches given opset.

        Parameters
        ----------
        opset: int
            opset from model.

        Returns
        -------
        converter, which should be `_impl_vx`. Number x is the biggest
            number smaller than or equal to opset belongs to all support versions.
        """
        versions = [int(d.replace("_impl_v", "")) for d in dir(cls) if "_impl_v" in d]
        versions = sorted(versions + [opset])
        version = versions[max([i for i, v in enumerate(versions) if v == opset]) - 1]
        if hasattr(cls, "_impl_v{}".format(version)):
            return getattr(cls, "_impl_v{}".format(version))
        raise NotImplementedError(
            "opset version {} of {} not implemented".format(version, cls.__name__)
        )

他是调用了父类的方法get_converter，再通过getattr()调用Conv类中的_impl_v1实现，如下：

class Conv(OnnxOpConverter):
    """Operator converter for Conv."""

    @classmethod
    def _impl_v1(cls, inputs, attr, params):
        # Use shape of input to determine convolution type.
        data = inputs[0]
        kernel = inputs[1]
        input_shape = infer_shape(data)
        ndim = len(input_shape)
		'处理ONNX的卷积算子属性与TVM Relay的卷积OP属性不一致的问题'
        kernel_type = infer_type(inputs[1])
        kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]

        if "kernel_shape" not in attr:
            attr["kernel_shape"] = kernel_shapes[0][2:]

        if "auto_pad" in attr:
            attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
            if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
                # Warning: Convolution does not yet support dynamic shapes,
                # one will need to run dynamic_to_static on this model after import
                data = autopad(
                    data,
                    attr.get("strides", [1] * (ndim - 2)),
                    attr["kernel_shape"],
                    attr.get("dilations", [1] * (ndim - 2)),
                    mode=attr["auto_pad"],
                )
            elif attr["auto_pad"] == "VALID":
                attr["pads"] = [0 for i in range(ndim - 2)]
            elif attr["auto_pad"] == "NOTSET":
                pass
            else:
                msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
                raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))
            attr.pop("auto_pad")

        attr["channels"] = kernel_shapes[0][0]
		`完成属性的转换以及OP转换`
        out = AttrCvt(
            op_name=dimension_picker("conv"),
            transforms={
                "kernel_shape": "kernel_size",
                "dilations": ("dilation", 1),
                "pads": ("padding", 0),
                "group": ("groups", 1),
            },
            custom_check=dimension_constraint(),
        )([data, kernel], attr, params)

        use_bias = len(inputs) == 3
        if use_bias:
            out = _op.nn.bias_add(out, inputs[2])
        return out

以及AttrCvt函数的实现：


class AttrCvt(object):
    """Common attribute converter. An AttrConverter instance is a callable:
    ```
    attr_converter = AttrConverter(op_name, transforms={'a':'b', 'c':('d', 1)})
    new_op_name, new_attr = attr_converter(attrs)
    ```

    Parameters
    ----------
    op_name : str or callable
        If set as str, returned operator name is the str.
        If set as callable, returned operator is the str returned by calling:
        `op_name = func(attr)`

    transforms : dict of `new_name, or (new_name, default_value, transform function)`
        If only a new_name is provided, it's like renaming the attribute name.
        If default_value if provided, then the attribute is considered as optional.
        If transform function is provided, the original attribute value is handled
        by transform function.

    excludes : list
        A list of excluded attributes that should `NOT` appear.
        Raise NotImplementedError if occurred.

    disables : list
        A list of attributes that is disabled in relay. Log warnings.

    ignores : list
        A list of attributes that is ignored in relay. Debug level logging.

    extras : dict
        A series of additional attributes should be added anyway to the returned
        attribute dict.

    custom_check : callable
        A custom function takes attribute, and return True/False.
        Raise RuntimeError if not bool(True) returned.
    """

    def __init__(
        self,
        op_name,
        transforms=None,
        excludes=None,
        disables=None,
        ignores=None,
        extras=None,
        custom_check=None,
    ):
        self._op_name = op_name
        self._transforms = transforms if transforms else {}
        self._excludes = excludes if excludes else []
        self._disables = disables if disables else []
        self._ignores = ignores if ignores else []
        self._extras = extras if extras else {}
        self._custom_check = custom_check

    def __call__(self, inputs, attrs, *args):
        self._ignores.append("_output_shapes")
        self._ignores.append("_input_shapes")
        self._ignores.append("T")
        self._ignores.append("use_cudnn_on_gpu")
        self._ignores.append("_node_name")
        self._ignores.append("is_training")
        self._ignores.append("_target_layout")

        # apply custom check
        if self._custom_check:
            func, msg = self._custom_check
            if not func(attrs):
                raise RuntimeError("Check failed: {}".format(msg))
        # get new op_name
        if isinstance(self._op_name, str):
            op_name = self._op_name
        else:
            assert callable(self._op_name), "op_name can either be string or callable"
            op_name = self._op_name(attrs)

        # ignore 'tvm_custom' always
        self._ignores.append("tvm_custom")

        # convert attributes
        new_attrs = {}
        for k in attrs.keys():
            if k in self._excludes:
                raise NotImplementedError(
                    "Attribute %s in operator %s is not" + " supported.", k, op_name
                )
            if k in self._disables:
                logger.debug("Attribute %s is disabled in relay.sym.%s", k, op_name)
            elif k in self._ignores:
                if k != "tvm_custom":
                    logger.debug("Attribute %s is ignored in relay.sym.%s", k, op_name)
            elif k in self._transforms:
                new_name, defaults, transform = self._parse_default(self._transforms[k])
                if defaults is None:
                    new_attr = self._required_attr(attrs, k)
                else:
                    new_attr = attrs.get(k, None)
                if new_attr is None:
                    new_attrs[new_name] = defaults
                else:
                    new_attrs[new_name] = transform(new_attr)
            else:
                # copy
                new_attrs[k] = attrs[k]
        # add extras
        new_attrs.update(self._extras)
        return get_relay_op(op_name)(*inputs, **new_attrs)

这个类核心就是调用了AttrCvt函数，完成了ONNX卷积算子到Relay 卷积算子的转换。这个转换包含了属性的转换以及根据layout对weights，inputs，outputs进行重排并返回一个Relay 卷积算子。（在tensorflow中倒是看到了对应代码的描述，在onnx模型转换中并没有看到类似的代码）
AttrCvt的调用位于python/tvm/relay/frontend/common.py文件中，根据类注释可知，这个类主要是实现了算子转换，即根据输入的op_name映射到relay的算子。具体过程是：先对传入的attrs进行检查，如有非法属性就报错，如果属性有相应的转换策略就直接转换（即上述代码中的transform），最后调用get_relay_op返回一个TVM Relay卷积算子。

get_relay_op函数的实现如下：

def get_relay_op(op_name):
    """Get the callable function from Relay based on operator name.
    Parameters
    ----------
    op_name : str
        The Relay operator name.
    """
    if "." in op_name:
        # explicit hierarchical modules
        op = _op
        try:
            for opn in op_name.split("."):
                op = getattr(op, opn)
        except AttributeError:
            op = None
    else:
        # try search op in various modules
        for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):
            op = getattr(candidate, op_name, None)
            if op is not None:
                break
    if not op:
        raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))
    return op

所有的op都位于python/tvm/relay/op包中，conv在op/nn中定义
在上述的for循环中，在python/tvm/relay/op/下搜索满足op name为op_name的Relay算子，找到就返回。
至于为什么要分两种情况，这是既支持用户写module.xxx也支持直接写xxx，这里的module可以是python/tvm/relay/op包中的任何一级文件夹比如nn。
nn.py中conv2d的实现如下：


def conv2d(
    data,
    weight,
    strides=(1, 1),
    padding=(0, 0),
    dilation=(1, 1),
    groups=1,
    channels=None,
    kernel_size=None,
    data_layout="NCHW",
    kernel_layout="OIHW",
    out_layout="",
    out_dtype="",
):
    r"""2D convolution.

    This operator takes the weight as the convolution kernel
    and convolves it with data to produce an output.


    In the default case, where the data_layout is `NCHW`
    and kernel_layout is `OIHW`, conv2d takes in
    a data Tensor with shape `(batch_size, in_channels, height, width)`,
    and a weight Tensor with shape `(channels, in_channels, kernel_size[0], kernel_size[1])`
    to produce an output Tensor with the following rule:

    .. math::

        \mbox{out}[b, c, y, x] = \sum_{dy, dx, k}
           \mbox{data}[b, k, \mbox{strides}[0] * y  + dy, \mbox{strides}[1] * x + dx] *
           \mbox{weight}[c, k, dy, dx]

    Padding and dilation are applied to data and weight respectively before the computation.
    This operator accepts data layout specification.
    Semantically, the operator will convert the layout to the canonical layout
    (`NCHW` for data and `OIHW` for weight), perform the computation,
    then convert to the out_layout.


    Parameters
    ----------
    data : tvm.relay.Expr
        The input data to the operator.

    weight : tvm.relay.Expr
        The weight expressions.

    strides : Optional[int, Tuple[int]]
        The strides of convolution.

    padding : Optional[int, Tuple[int]]
        The padding of convolution on both sides of inputs before convolution.

    dilation : Optional[int, Tuple[int]]
        Specifies the dilation rate to be used for dilated convolution.

    groups : Optional[int]
        Number of groups for grouped convolution.

    channels : Optional[int]
        Number of output channels of this convolution.

    kernel_size : Optional[int, Tuple[int]]
        The spatial of the convolution kernel.

    data_layout : Optional[str]
        Layout of the input.

    kernel_layout : Optional[str]
        Layout of the weight.

    out_layout : Optional[str]
        Layout of the output, by default, out_layout is the same as data_layout

    out_dtype : Optional[str]
        Specifies the output data type for mixed precision conv2d.

    Returns
    -------
    result : tvm.relay.Expr
        The computed result.
    """
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)
    if isinstance(strides, int):
        strides = (strides, strides)
    if isinstance(dilation, int):
        dilation = (dilation, dilation)
    # TODO enforce 4-way padding in topi/nn/conv2d after #4644 merged
    # convert 2-way padding to 4-way padding
    padding = get_pad_tuple2d(padding)
    return _make.conv2d(
        data,
        weight,
        strides,
        padding,
        dilation,
        groups,
        channels,
        kernel_size,
        data_layout,
        kernel_layout,
        out_layout,
        out_dtype,
    )

调用关系：conv2d() ->_make.conv2d(),在_make.py中实现了C++类到python类的接口暴露

import tvm._ffi
tvm._ffi._init_api("relay.op.nn._make", __name__)

conv2d的注册代码如下：


TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
    .set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                       Array<IndexExpr> dilation, int groups, IndexExpr channels,
                       Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
                       String out_layout, DataType out_dtype) {
      return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
                                   kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
                                   "nn.conv2d");
    });

RELAY_REGISTER_OP("nn.conv2d")
    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.

- **data**: This depends on the `layout` parameter. Input is 4D array of shape
            (batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**:  This depends on the `layout` parameter. Output is 4D array of shape
            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.

)code" TVM_ADD_FILELINE)
    .set_attrs_type<Conv2DAttrs>()
    .set_num_inputs(2)
    .add_argument("data", "Tensor", "The input tensor.")
    .add_argument("weight", "Tensor", "The weight tensor.")
    .set_support_level(2)
    .add_type_rel("Conv2D", Conv2DRel)
    .set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>)
    .set_attr<TOpPattern>("TOpPattern", kOutEWiseFusable);

TVM_REGISTER_GLOBAL这个宏定义将算子注册到一个全局对象中。可以看一下这个宏定义：

#define TVM_REGISTER_GLOBAL(OpName) \
  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注册的实现在Registry类中，这个类有一个Register成员函数，这个函数会通过全局manager来将算子注册进去：

Registry& Registry::Register(const std::string& name, bool can_override) {  // NOLINT(*)
  Manager* m = Manager::Global();
  std::lock_guard<std::mutex> lock(m->mutex);
  if (m->fmap.count(name)) {
    ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";
  }

  Registry* r = new Registry();
  r->name_ = name;
  m->fmap[name] = r;
  return *r;
}

其中set_body将通过MakeConv构建一个conv算子，然后注册到registry中。在MakeConv中，首先根据传入的conv参数，包括strides，kernel，layout等，构建atrrs对象，然后根据op的名字从已经注册过的conv算子中得到conv的算子，然后再将attrs和op一起打包到call类中。即在tvm/src/relay/op/nn/convolution_make.h中的：


template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                     Array<IndexExpr> dilation, int groups, IndexExpr channels,
                     Array<IndexExpr> kernel_size, std::string data_layout,
                     std::string kernel_layout, std::string out_layout, DataType out_dtype,
                     std::string op_name) {
  auto attrs = make_object<T>();
  attrs->strides = std::move(strides);
  attrs->padding = std::move(padding);
  attrs->dilation = std::move(dilation);
  attrs->groups = groups;
  attrs->channels = std::move(channels);
  attrs->kernel_size = std::move(kernel_size);
  attrs->data_layout = std::move(data_layout);
  attrs->kernel_layout = std::move(kernel_layout);
  attrs->out_layout = std::move(out_layout);
  attrs->out_dtype = std::move(out_dtype);
  const Op& op = Op::Get(op_name);
  return Call(op, {data, weight}, Attrs(attrs), {});
}

Call是继承了Expr类：

class Call : public Expr {
 public:
  /*!
   * \brief The destructor
   */
  ~Call();

  /*!
   * \brief The constructor
   * \param op The operator will be invoked.
   * \param args The arguments of the call.
   * \param attrs The attributes of the call node.
   * \param type_args The type arguments passed to a polymorphic function.
   * \param span The source span of the expression.
   */
  TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),
               Array<Type> type_args = Array<Type>(), Span span = Span());

  TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);
  TVM_DEFINE_OBJECT_REF_COW_METHOD(CallNode);
};

Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。
在一个op类中实际上并没有包含这个op的计算过程，只是纳入了这个算子的输入输出以及属性的信息。

特别注意Relay OP并没有包含具体的计算过程！上面的一系列操作仅仅是拿到了Relay 卷积OP的IR以及输入和属性。那么这个OP的计算过程是在哪里完成的呢？TOPI

定义算子的compute函数

算子的compute函数是算子的计算过程实现。
nn.conv2d的算子算法实现入口为python/tvm/topi/nn/conv2d.py中定义的conv2d函数，调用了同文件中的conv接口，在该接口中实现了compute函数并调用：

def conv(
    inp: te.Tensor,
    filt: te.Tensor,
    stride: Union[int, Sequence[int]],
    padding: Union[int, Sequence[int]],
    dilation: Union[int, Sequence[int]],
    groups: int,
    order: str,
    out_dtype: Union[str, None] = None,
):
    
    ...
 
 
    def compute(*args):
        nn, ff, *dim_indices = list(np.array(args)[permutation_to])
        return te.sum(
            temp.__getitem__(
                tuple(
                    np.array(
                        [nn, ff // (num_filter // groups) * (in_channel // groups) + rc]
                        + [
                            di * stride + r * dil
                            for di, stride, r, dil in zip(dim_indices, strides, rs, dilations)
                        ]
                    )[permutation_from]
                )
            ).astype(out_dtype)
            * filt.__getitem__(tuple(np.array([ff, rc] + rs)[permutation_from_kernel])).astype(
                out_dtype
            ),
            # Schedules depend on reduction axes being in the same order as the
            # layout, so we reorder here.
            axis=np.array([rc, *rs])[permutation_from_reductions].tolist(),
        )
 
    return te.compute(
        list(np.array([batch, out_channel] + out_dimensions)[permutation_from]),
        compute,
        # tag is expected to be lowercase
        tag=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
        name=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
    )

这里只是conv2d的默认compute，根据参数和输入数据的排布格式，在 python/tvm/topi/nn/conv2d.py中定义了对应的compute函数。

注册算子的compute函数和schedule

在实现算子的compute函数后，需要将这个compute函数加入relay算子中
在TVM中，这意味着我们不仅仅只是实现计算方法，还要给出对应的调度schedule策略，也就是为compute挑选合适的schedule。例如，当2d卷积是一个分组卷积时，我们会给它分配合适的计算方法和调度。conv2d的shedule定义在python/tvm/topi/generic/nn.py中，以schedule_conv2d_开头的函数定义了各种数据排布格式对应的调度策略，大部分都是使用了默认的调度方法
conv2d的strategy函数conv2d_strategy定义在python/tvm/relay/op/strategy/generic.py中。在该函数中，根据输入数据和卷积核的排布格式，给出各种排布组合的计算方法和调度。compute和schedule的组合即strategy。
这样，relay中就已经增加了我们的算子，便可以通过Relay Call Node来调用它。这一步我们要写一个接口，将参数传入算子，然后返回一个Relay Call Node。这个Node可以加入Relay的语法树。

不支持直接调用 Attrs和参数，所以这里用Op::Get从算子注册表中获取算子信息，作为参数传递给Call Node。nn.conv2d的Relay Call Node生成函数(src/relay/op/nn/convolution_make.h)，即上面的MakeConv函数

当模板参数为Conv2DAttrs的时候，即生成的nn.conv2d的Relay Call Node。这里先是new了一个Conv2DAttrs，接收传入的各参数和属性；然后获取2d卷积注册信息，一并传给Call；最后返回CallNode类型实例的引用

在定义Relay Call Node函数后，我们要向Python注册一个接口来调用这个函数。这里注册是使用TVM_REGISTER_GLOBAL宏。注册后，在Python中就可以用relay.op._make.xxx(...)形式调用了。nn.conv2d的注册，该注册在本文开头

参考：https://zhuanlan.zhihu.com/p/368940120
https://blog.csdn.net/zx_ros/article/details/123526147

标签：kernel,layout,name,._,TVM,算子,解析,data,op
来源： https://www.cnblogs.com/whiteBear/p/16558215.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

TVM:解析TVM算子

Relay卷积算子的转换过程

定义算子的compute函数

注册算子的compute函数和schedule