ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection

2021-07-15 11:33:38  阅读:372  来源: 互联网

标签:BERT shot -- Conditional intent embeddings variable input latent


目录

The proposed method

Input Representation

The Encoder 

 The Decoder

 fine-tuning


discriminate a joint label space consisting of both existing intent which have enough labeled data and novel intents which only have a few examples for each class.

==> Conditional Text Generation with BERT

The proposed method

CG-BERT: adopts the CVAE(Condiional  Variational AutoEncoder) framework and incorporates BERT into both the encoder and the decoder.

采用条件变分自编码器,并将BERT融入到encoder-decoder中

  • the encoder: encodes the utterance x and its intent y together into a latent variable z and models the posterior distribution p(z|x,y), where y is the condition in the CVAE model.

编码器:同时将话语x和意图y编码为一个潜变量z,并且模拟z的后验概率分布p(z|x,y),y是CVAE模型中的条件。 ==> encoder模拟few-shot intent的数据概率分布

  • the decoder: decodes z and the intent y together to reconstruct the input utterance x.

解码器:同时解码变量z和意图y,以便重构输入话语x ==> 利用masked attention的特性限制attend,以保持文本生成这种特定任务left-to-right的特性,保留其autoregressive特性!

  • to generate new utterances for an novel intent y, we sample the latent variable z from a prior distribution p(z|y) and utilize the decoder to decode z and y \in y_{novel} into new utterances.

为新意图y生成新话语,我们通过从一个先验分布p(z|y)采样潜变量z,并且用解码器解码变量z和y to 新话语。

It's able to generate more utterances for the novel intent through sampling from the learned distribution.

通过从学到的概率分布中采样,为新意图生成更多的话语

Input Representation

input: intent + utterance text sentences (concatenated)

句子S1: CLS token + intent y + SEP token --> first intent sentence

句子S2: utterance x + SEP --> second utterance sentence

whole input: S1 + S2

CLS: as the representation for the whole input

variable z: encode the embeddings for [CLS] to the latent variable z

Text are tokenized into subword units by WordPiece

embedding: obtained for each token --> token embeddings, position embeddings, segment embeddings

a given token: constructed by summing these three embeddings and represented as H^{0} = [h_{1}^{0}, h_{2}^{0}, ....,h_{T}^{0}]  with a total length of T tokens.

The Encoder 

models the distribution of diverse utterances for a given intent.

对给定intent,即few-shot intent,的不同话语分布进行建模

to obtain deep bidirectional context information <-- models the attention between the intent tokens and the utterance tokens

为获得深度双向上下文信息 <-- 利用意图令牌和话语令牌之间的attention进行建模

the input representation: H^{0} = [h_{1}^{0}, h_{2}^{0}, ....,h_{T}^{0}] 

multiple self-attention heads: 

output of the previous layer H^{l-1} --> a triple of queries, keys and values

embeddings for the [CLS] token in the 6-th transformer block h_{1}^{6} --> sentence-level representation

sentence-level representation h_{1}^{6}  --> a latent variable z = a latent vector z, where prior distribution p(z|y) is a multivariate standard Gaussian distribution.

 u and \sigma in the Gaussian distribution q(z|x,y) = N(u, \sigma) --> to sample z

 The Decoder

 aims to reconstruct the input utterance x using the latent variable z and the intent y.

目的是用潜变量z和意图y重构输入话语x

residual connection from input representation H0 --> decoder H6'残差连接z和H0

==> input of the decoder  H_{6}^{'} = [z, h_{2}^{0}, ....,h_{T}^{0}]

left-to-right manner ==> 掩码masked attention

the attention mask --> helps the transformer blocks fit into the conditional text generation task. 

attention掩码 --> 帮助transformer块适应有条件文本生成任务

not whole bidirectional attention to the input ==> instead a mask matrix to determine whether a pair of tokens can be attended to each other.

并不是全部双向attention的输入 ==> 而是用一个掩码矩阵去决定一对令牌是否要相互关注

updated Attention:

 

 output of 12-th transformer block in decoder H^{12} = [h_{1}^{12}, h_{2}^{12}, ....,h_{T}^{12}]h_{1}^{12} is the embeddings for the latent variable z

To further increase the impact of z and alleviate the vanishing latent variable problem,

embeddings of z with all the tokens H^{12'} = [h_{1}^{12}||h_{1}^{12}, h_{2}^{12}|| ....||h_{1}^{12}, h_{2}^{12},...,h_{T}^{12}||h_{1}^{12}]

Two fully-connected layers with a layer normalization to get the final representation

H^{f} = g(f(f(H^{12'}W_{1} + b1)W_{2} + b_{2}))

to predict the next token at position t+1 <-- the embeddings in Hf at position t

p(X_{t+1}) = f(H_{t}^{f}W_{e}^{T} + be)

 fine-tuning

in order to improve the performance in the few-shot intent of model learned from existing intents with enough labeled data.

reference: Cross-Lingual Natural Language Generation via Pre-training

标签:BERT,shot,--,Conditional,intent,embeddings,variable,input,latent
来源: https://blog.csdn.net/qq_33419476/article/details/118752179

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有