ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

统计推断(三) Exponential Family

2020-02-04 09:04:30  阅读:376  来源: 互联网

标签:frac Family Exponential ln py cdot 推断 alpha dot


1. Exponential family

  • Definition

    • PDF: p(y;x)=exp(λ(x)Tt(y)α(x)+β(y))p(y;x)=\exp(\lambda(x)^T t(y)-\alpha(x)+\beta(y))p(y;x)=exp(λ(x)Tt(y)−α(x)+β(y))
      yε(x;λ(),t(),β())y\sim \varepsilon(x;\lambda(\cdot),t(\cdot),\beta(\cdot))y∼ε(x;λ(⋅),t(⋅),β(⋅))
    • nature statistic: t(y)t(y)t(y)
    • nature parameter: λ(x)\lambda(x)λ(x)
    • log-partition function: α(x)\alpha(x)α(x)
    • partition function: Z(x)=exp(α(x))Z(x)=\exp(\alpha(x))Z(x)=exp(α(x))
    • distribution: exp(β(y))\exp(\beta(y))exp(β(y))
  • 正则条件(regular):若分布族中的任意一个分布 p(y;x)p(y;x)p(y;x) 都有其支集(support)与 x 无关,则为正则

    • 实质上是要求 CRB 正则条件中求导和积分可换序
      E[xlnp(y;x)]=xp(y;x)dy=xabp(y;x)dy=0 \mathbb{E}\left[\frac{\partial}{\partial x}\ln p(y;x)\right]=\int\frac{\partial}{\partial x}p(y;x)dy = \frac{\partial}{\partial x}\int_a^b p(y;x)dy = 0 E[∂x∂​lnp(y;x)]=∫∂x∂​p(y;x)dy=∂x∂​∫ab​p(y;x)dy=0
  • 指数分布族可以有多种获得方式

    • 很多分布本身可以写成指数分布族形式

      • Bernulli distribution: yB(x)y\sim \mathcal{B}(x)y∼B(x)

      p(y;x)=xy(1x)(1y)lnp(y;x)=(ln(x1x))y(ln(1x)) p(y;x)=x^y (1-x)^{(1-y)} \\ \ln p(y;x)=\left(\ln(\frac{x}{1-x})\right)y-(-\ln(1-x)) p(y;x)=xy(1−x)(1−y)lnp(y;x)=(ln(1−xx​))y−(−ln(1−x))

      • Gaussian y=[y1,y2]TN(x,1)y=[y_1,y_2]^T\sim \mathcal{N}(x,1)y=[y1​,y2​]T∼N(x,1)

      p(y;x)=12πexp((y1+y2)xx2y12+y222) p(y;x)=\frac{1}{\sqrt{2\pi}}\exp\left((y_1+y_2)x-x^2-\frac{y_1^2+y_2^2}{2}\right) p(y;x)=2π​1​exp((y1​+y2​)x−x2−2y12​+y22​​)

    • 多个分布的几何均值
      p(y;x)=p1x(y)p2(1x)(y)Z(x)lnp(y;x)=xln(p1(y)p2(y))lnZ(x)+lnp2(y) p(y;x)=\frac{p_1^x(y)*p_2^{(1-x)}(y)}{Z(x)} \\ \ln p(y;x)=x\ln\left(\frac{p_1(y)}{p_2(y)}\right)-\ln Z(x)+\ln p_2(y) p(y;x)=Z(x)p1x​(y)∗p2(1−x)​(y)​lnp(y;x)=xln(p2​(y)p1​(y)​)−lnZ(x)+lnp2​(y)

      • 例如 p1(y)B(11+e1),p2(y)B(1/2)p_1(y)\sim \mathcal{B}(\frac{1}{1+e^{-1}}), p_2(y)\sim \mathcal{B}(1/2)p1​(y)∼B(1+e−11​),p2​(y)∼B(1/2)
        p(y;x)=(11+e1)xy(e11+e1)x(1y)(1/2)(1x)B(11+ex)p(y=1;x)p(y=0;x)=ex p(y;x)=(\frac{1}{1+e^{-1}})^{xy}(\frac{e^{-1}}{1+e^{-1}})^{x(1-y)}(1/2)^{(1-x)}\sim \mathcal{B}(\frac{1}{1+e^{-x}}) \\ \frac{p(y=1;x)}{p(y=0;x)}=e^x p(y;x)=(1+e−11​)xy(1+e−1e−1​)x(1−y)(1/2)(1−x)∼B(1+e−x1​)p(y=0;x)p(y=1;x)​=ex
    • Tilting
      p(y;x)=p(y)exyZ(x)lnp(y;x)=xylnZ(x)+lnp(y) p(y;x)=\frac{p(y)e^{xy}}{Z(x)} \\ \ln p(y;x)=xy - \ln Z(x) + \ln p(y) p(y;x)=Z(x)p(y)exy​lnp(y;x)=xy−lnZ(x)+lnp(y)

      • 例如 p(y)N(0,1)p(y)\sim \mathcal{N}(0,1)p(y)∼N(0,1),p(y;x)N(x,1)p(y;x)\sim \mathcal{N}(x,1)p(y;x)∼N(x,1)
  • linear exponential family

    • 定义:t(x)=xt(x)=xt(x)=x,lnp(y;x)=x t(y)α(x)+β(y)\ln p(y;x)=x\ t(y) - \alpha(x)+\beta(y)lnp(y;x)=x t(y)−α(x)+β(y)
    • 性质:α˙(x)=E[t(y)],  α˙˙(x)=E[t2(y)]E[t(y)]2=Var(t(y))=Jy(x)\dot{\alpha}(x)=\mathbb{E}[t(y)], \ \ \dot{\dot{\alpha}}(x)=\mathbb{E}[t^2(y)]-\mathbb{E}[t(y)]^2=Var(t(y)) = J_y(x)α˙(x)=E[t(y)],  α˙˙(x)=E[t2(y)]−E[t(y)]2=Var(t(y))=Jy​(x)

    Proof
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ Z(x) &= e^{\al…

    α˙˙(x)=t(y)p(y;x)(t(y)α˙(x))dyJy(x)=E[2x2lnp(y;x)]=α˙˙(x) \dot{\dot{\alpha}}(x)=\int t(y)\cdot p(y;x)\cdot (t(y)-\dot{\alpha}(x))dy \\ J_y(x) = \mathbb{E}\left[-\frac{\partial^2}{\partial x^2} \ln p(y;x)\right]=\dot{\dot{\alpha}}(x) α˙˙(x)=∫t(y)⋅p(y;x)⋅(t(y)−α˙(x))dyJy​(x)=E[−∂x2∂2​lnp(y;x)]=α˙˙(x)

  • 指数族分布与有效统计量(efficient statistics)

    • 必要条件:若有效统计量存在,则可以写成指数族分布形式,且有
      t(x)=xJy(u)du,   α(x)=xuJy(u)du t(x)=\int^x J_y(u)du, \ \ \ \alpha(x)=\int^x u J_y(u) du t(x)=∫xJy​(u)du,   α(x)=∫xuJy​(u)du

    Proof
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \hat {x}_{eff}…

    • 充分条件:对于线性指数分布族,若有 Jy(x)J_y(x)Jy​(x) 不依赖于 x,也即 Jy(x)J_y(x)Jy​(x) 等于一个常数时,有效统计量存在

    ProofJy(x)=JJ_y(x)=JJy​(x)=J
    α˙˙(x)=J,   α˙(x)=Jxcx^eff(y)=x+1Jxlnp(y;x)=x+1J(t(y)α˙(x))=x+1J(t(y)Jx+c)=t(y)J+cJ \dot{\dot{\alpha}}(x)=J, \ \ \ \dot{\alpha}(x)=Jx-c \\ \hat x_{eff}(y) = x + \frac{1}{J}\frac{\partial}{\partial x}\ln p(y;x) = x + \frac{1}{J} (t(y)-\dot{\alpha}(x)) = x + \frac{1}{J}(t(y)-Jx+c)=\frac{t(y)}{J}+\frac{c}{J} α˙˙(x)=J,   α˙(x)=Jx−cx^eff​(y)=x+J1​∂x∂​lnp(y;x)=x+J1​(t(y)−α˙(x))=x+J1​(t(y)−Jx+c)=Jt(y)​+Jc​
    由于
    xlnp(y;x)x=x^ML=0=t(y)α˙(x)x=x^ML \frac{\partial}{\partial x}\ln p(y;x)|_{x=\hat x_{ML}} = 0 = t(y) - \dot{\alpha}(x)|_{x=\hat x_{ML}} ∂x∂​lnp(y;x)∣x=x^ML​​=0=t(y)−α˙(x)∣x=x^ML​​

    x^eff(y)=c/J+1Jα˙(x)x=x^ML=x^ML(y) \hat x_{eff}(y) = c/J + \frac{1}{J}\dot{\alpha}(x)|_{x=\hat x_{ML}} = \hat x_{ML}(y) x^eff​(y)=c/J+J1​α˙(x)∣x=x^ML​​=x^ML​(y)

2. Sufficient statistics

2.1 Non-Bayesian case

  • Definition:t(y) 是关于分布 py(;x)p_{\mathsf{y}}(\cdot;x)py​(⋅;x) 的充分统计量,如果 p(yt(y);x)p(y|t(y);x)p(y∣t(y);x) 与 x 无关

Theorem 1(likelihood characterization):

t(y)t(y)t(y) is sufficient w.r.t p(y;x)p(y;x)p(y;x)      py(y;x)pt(t(y);x)\iff \ \frac{p_{y}(y;x)}{p_t(t(y);x)}⟺ pt​(t(y);x)py​(y;x)​ doesn’t depend on x, for all x and y

Proof:omit…

Theorem 2(Neyman Factorization theorem):

t(y)t(y)t(y) is sufficient w.r.t p(y;x)p(y;x)p(y;x)      a(,)b()使  p(y;x)=a(t(y),x)b(y)\iff \ 存在a(\cdot,\cdot)和b(\cdot)使得 \ \ p(y;x)=a\left(t(y),x\right) \cdot b(y)⟺ 存在a(⋅,⋅)和b(⋅)使得  p(y;x)=a(t(y),x)⋅b(y)

Proof:omit…

  • minimum sufficient statistictt^*t∗ 是 minimal 的,如果对任意其他充分统计量 t ,都存在 g() 使得 t=g(t)t^*=g(t)t∗=g(t)
  • completett^*t∗ 是 complete 的如果对任意函数 ϕ()\phi(\cdot)ϕ(⋅),有 E[ϕ(t(y))]=0  x    ϕ()0E[\phi(t^*(y))]=0 \ \ \forall x \iff \phi(\cdot) \equiv 0E[ϕ(t∗(y))]=0  ∀x⟺ϕ(⋅)≡0

Theorem:complete \Longrightarrow⟹ minimal

Proof:假设 t 为complete,s 为 minimal,存在 s=g(t)s=g(t)s=g(t),E[t]=E[E[ts=s]]E[t]=E\left[E\left[t|s=s\right]\right]E[t]=E[E[t∣s=s]]

E[ts=s]=f(s)=f(g(t))=f~(t)E[t|s=s]=f(s)=f(g(t))=\tilde{f}(t)E[t∣s=s]=f(s)=f(g(t))=f~​(t)

ϕ(t)=tf~(t)\phi(t)=t-\tilde{f}(t)ϕ(t)=t−f~​(t),有 E[ϕ(t)]=0E[\phi(t)] = 0E[ϕ(t)]=0

根据 complete 的定义,有 ϕ(t)0t=f~(t)=f(s)\phi(t)\equiv0 \Longrightarrow t = \tilde{f}(t)=f(s)ϕ(t)≡0⟹t=f~​(t)=f(s)

故 t 也是 minimal

2.2 Bayesian case

  • Definition:t(y) 是关于分布 py,x(,)p_{\mathsf{y,x}}(\cdot,\cdot)py,x​(⋅,⋅) 的充分统计量,如果 pyt,x(yt(y),x)=pyt(yt(y))p_{\mathsf{y|t,x}}(y|t(y),x)=p_\mathsf{y|t}(y|t(y))py∣t,x​(y∣t(y),x)=py∣t​(y∣t(y)) 与 x 无关

Theorem(Belief characterization):

t(y)t(y)t(y) is sufficient w.r.t p(y,x)p(y,x)p(y,x)      p(xy)=p(xt(y))\iff \ p(x|y)=p(x|t(y))⟺ p(x∣y)=p(x∣t(y)), for all x and y

Proof:omit…

Theorem(Neyman Factorization theorem):

t(y)t(y)t(y) is sufficient w.r.t p(y,x)p(y,x)p(y,x)      p(yx)=p(t(y)x)p(yt(y))\iff \ p(y|x)=p(t(y)|x)\cdot p(y|t(y))⟺ p(y∣x)=p(t(y)∣x)⋅p(y∣t(y)), for all x and y

Proof:omit…

3. Conjugate priors

  • Idea: Given a model pyxp_\mathsf{y|x}py∣x​, look for a family of prior pxp_\mathsf{x}px​ such that the induced posterior pxyp_\mathsf{x|y}px∣y​ also in this family
  • Definition: a family of distribution q(;θ)q(\cdot;\theta)q(⋅;θ) is conjugate to a model pyxp_{y|x}py∣x​ if
    • pyx(y1,...,yNx)q(x;θ)p_{y|x}(y_1,...,y_N|x) \propto q(x;\theta)py∣x​(y1​,...,yN​∣x)∝q(x;θ)
    • q(x;θ1)q(x;θ2)q(x;θ3)q(x;\theta_1)q(x;\theta_2)\propto q(x;\theta_3)q(x;θ1​)q(x;θ2​)∝q(x;θ3​)
  • Theorem: 对于采样数 N,联合分布 pyxN()p^N_{y|x}()py∣xN​() 有充分统计量,且其维度不依赖于 N,则对该模型存在共轭先验分布
Bonennult 发布了37 篇原创文章 · 获赞 27 · 访问量 2万+ 私信 关注

标签:frac,Family,Exponential,ln,py,cdot,推断,alpha,dot
来源: https://blog.csdn.net/weixin_41024483/article/details/104165233

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有