ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

信息论与编码:信息度量

2019-12-29 19:57:41  阅读:391  来源: 互联网

标签:编码 frac log sum cdots mathcal rightarrow 信息论 度量


信息度量

1. 独立与马尔可夫链

独立(Independence)

对于两个随机变量\(X\)和\(Y\),若对所有的\((x, y) \in \mathcal{X} \times \mathcal{Y}\),都有
\[ p(x, y) = p(x)p(y) \]
则称\(X\)和\(Y\)独立,记为\(X \perp Y\)。

\(p(x), p(y), p(x, y)\)分别是\(\text{Pr}(X=x), \text{Pr}(Y=y), \text{Pr}(X=x, Y=y)\)的简写。

相互独立(Mutual Independence)

给定随机变量\(X_{1}, \cdots, X_{n}\),若对于所有的\((x_1, \cdots, x_{n}) \in \mathcal{X}_{1} \times \cdots \times \mathcal{X}_{n}\),都有:
\[ p(x_{1}, \cdots, x_{n}) = p(x_{1})\cdots p(x_{n}) \]
则\(X_{1}, \cdots, X_{n}\)相互独立。

两两独立(Pairwise Independence)

随机变量\(X_{1}, \cdots, X_{n}\)两两独立,若对于所有的\(1 \le i \lt j \le n\),\(X_{i}\)和\(X_{j}\)独立。

相互独立可以推出两两独立。

条件独立(Conditional Independence)

对于随机变量\(X, Y, Z\),若:
\[ p(x,y,z)p(y) = p(x,y)p(y,z) \]
则称\(X\)与\(Z\)在给定\(Y\)的条件下独立,记作\(X \perp Z \mid Y\)或\(X \rightarrow Y \rightarrow Z\)

马尔可夫链(Markov Chain)

对于随机变量\(X_{1}, \cdots, X_{n}\)(\(n \ge 3\)),\(X_{1} \rightarrow \cdots \rightarrow X_{n}\)构成马尔可夫链,若:
\[ p(x_{1},\cdots,x_{n})p(x_{2})\cdots p(x_{n-1}) = p(x_{1},x_{2})\cdots p(x_{n-1},x_{n}) \]
马尔可夫链的等价定义:

  1. \(p(x_{1},\cdots,x_{n})=\begin{cases}p(x_{1})p(x_{2}|x_{1})\cdots p(x_{n}|x_{n-1}) & \text{if}\ \ p(x_{1})\cdots p(x_{n-1}) > 0\\ 0 & \text{otherwise}\end{cases}\),
  2. \(p(x_{t}|x_{1},\cdots,x_{t-1})=p(x_{t}|x_{t-1})\),其中\(1 \le t \le n\)

性质:

若\(X_{1} \rightarrow \cdots \rightarrow X_{n}\)是马尔可夫链, 则\(X_{n} \rightarrow \cdots \rightarrow X_{1}\)也是马尔可夫链,

性质:马尔可夫子链(Markov Subchains)

\(X_{1} \rightarrow \cdots \rightarrow X_{n}\)是马尔可夫链, \(\mathcal{N}_{n} = \left\{1, 2, \cdots, n\right\}\),对于\(\mathcal{N}_{n}\)的子集\(\alpha\),用\(X_{\alpha}\)表示\(\left\{X_{i}: i \in \alpha\right\}\)。给定\(\mathcal{N}_{n}\)的不相交子集\(\alpha_{1}, \cdots, \alpha_{m}\),若对于所有的\(k_{j} \in \alpha_{j}, j = 1, \cdots, m\),\(k_{1} \lt \cdots \lt k_{m}\),则\(X_{\alpha_{1}}\rightarrow\cdots\rightarrow X_{\alpha_{m}}\)构成一个马尔可夫链。

2. 香农信息度量

(entropy):

随机变量\(X\)的熵定义为:\(\displaystyle H(X) = -\sum_{x\in \mathcal{X}}p(x)\log p(x) = \sum_{x \in \mathcal{X}}p(x)\log \frac{1}{p(x)}\)

称\(\displaystyle \log \frac{1}{p(X)}\)是\(X\)的信息量,则熵是信息量的期望,即\(H(X) = E \log\frac{1}{p(X)}\)

示例:二元随机变量的熵

\(X \sim \text{Bernoulli}(p)\),则\(H(p) = p \times \log \frac{1}{p} + (1-p) \times \log \frac{1}{1-p}\)。\(H(p)\)是关于\(p\)的函数,函数在\(p = 0.5\)处取最大值。

联合熵(joint entropy):

随机变量\(X, Y\)的联合熵定义为:\(\displaystyle H(X, Y) = -\sum_{x,y}p(x,y)\log p(x,y) = \sum_{x,y}p(x,y)\log \frac{1}{p(x,y)}\)

\(\log \frac{1}{p(X,Y)}\)是二元组\((X, Y)\)的信息量。

条件熵(conditional entropy):

对于随机变量\(X, Y\),\(Y\)在给定\(X\)条件下的条件熵定义为:
\[ \begin{align*} H(Y|X) &= \sum_{x}p(x)H(Y|X=x)\\ &= \sum_{x}p(x)\sum_{y}p(y|x)\log \frac{1}{p(y|x)}\\ &= \sum_{x,y}p(x,y)\log \frac{1}{p(y|x)}\\ &= E\log \frac{1}{p(Y|X)} \end{align*} \]

联合熵与条件熵的关系:\(H(X,Y)=H(X)+H(Y|X) = H(Y) + H(X|Y)\)

\(\displaystyle H(X,Y|Z,W=w,S=s,U) = \sum_{x,y,z,u}p(x,y,z,u|w,s)\log \frac{1}{p(x,y|z,w,s,u)}\)

互信息(mutual information):

随机变量\(X,Y\)之间的互信息定义为:\(\displaystyle I(X;Y) = \sum_{x,y}p(x,y)\log \frac{p(x,y)}{p(x)p(y)} = E \log \frac{p(X,Y)}{p(X)p(Y)}\)

互信息与条件熵的关系:

\(H(X) = H(X|Y) + I(X;Y)\)

\(H(Y) = H(Y|X) + I(X;Y)\)

条件互信息(conditional mutual information):

对于随机变量\(X, Y, Z\),\(X,Y\)在给定\(Z\)条件下的条件互信息定义为:
\[ \begin{align*} I(X;Y|Z) &= \sum_{z}p(z)\sum_{x,y}p(x,y|z)\log\frac{p(x,y|z)}{p(x|z)p(y|z)} \\ &= \sum_{x,y,z}p(x,y,z)\log\frac{p(x,y|z)}{p(x|z)p(y|z)}\\ &= E\log\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)} \end{align*} \]
\(\displaystyle I(X;Y|Z=z,V)=\sum_{x,y,v}p(x,y,v|z)\log\frac{p(x,y|z,v)}{p(x|z,v)p(y|z,v)}\)

3. 链式规则

\(\displaystyle H(X_{1}, \dots, X_{n})=\sum_{i=1}^{n}H(X_{i} \mid X_{1}, \dots, X_{i-1})\)

\(\displaystyle H(X_{1}, \dots, X_{n} \mid Y)=\sum_{i=1}^{n}H(X_{i} \mid X_{1}, \dots, X_{i-1},Y)\)

\(\displaystyle I(X_{1}, \dots, X_{n};Y) = \sum_{i=1}^{n}I(X_{i};Y|X_{1}, \dots, X_{i-1})\)

\(\displaystyle I(X_{1}, \dots, X_{n};Y\mid Z) = \sum_{i=1}^{n}I(X_{i};Y|X_{1}, \dots, X_{i-1}, Z)\)

4. 信息散度

信息散度/KL距离/相对熵

在同一个字典\(\mathcal{X}\)上的两个分布\(p\)与\(q\)之间的信息散度(informational divergence)定义为:
\[ D(p \parallel q) = \sum_{x \in \mathcal{X}}p(x) \log \frac{p(x)}{q(x)} = E_{p}\log \frac{p(X)}{q(X)} \]

\(\displaystyle I(X;Y) = D(p(x,y)\parallel p(x)q(x))\)

性质

对于同一个字典\(\mathcal{X}\)上的两个分布\(p\)和\(q\):
\[ \begin{align*} D(p\parallel q) &= \sum_{x \in \mathcal{X}}p(x) \log \frac{p(x)}{q(x)}\\ &= \log e \sum_{x \in \mathcal{X}}p(x) \ln \frac{p(x)}{q(x)}\\ &\ge \log e \sum_{x \in \mathcal{X}} p(x) (1 - \frac{q(x)}{p(x)})\\ &= \log e\sum_{x \in \mathcal{X}}(p(x) - q(x))\\ &= 0 \end{align*} \]
取得等号当且仅当\(p = q\)

度量(metric)

函数\(\rho(x, y)\)是一个度量函数,若对于所有的\(x, y\):

  1. \(\rho(x, y) \ge 0\)
  2. \(\rho(x, y) = \rho(y, x)\)
  3. \(\rho(x, y) = 0\)当且仅当\(x = y\)
  4. \(\rho(x, y) + \rho(y, z) \ge \rho(x, z)\)

例子

\(\rho(X, Y) = H(X|Y) + H(Y|X)\)满足条件1,2,4,若将\(X = Y\)定义为存在一个从\(X\)到\(Y\)的一一映射,则条件3也满足。

条件4:
\[ \begin{align*} \rho(X,Z) &= H(X|Z) + H(Z|X)\\ &= I(X;Y|Z) + H(X|Y,Z) + I(Y;Z|X) + H(Z|X,Y)\\ &\le H(Y|Z) + H(X|Y) + H(Y|X)+H(Z|Y)\\ &= H(X|Y) + H(Y|X) + H(Y|Z) + H(Z|Y)\\ &= \rho(X,Y) + \rho(Y,Z) \end{align*} \]

基本不等式

Logarithm Inequality:\(\displaystyle \ln x \le x - 1 \Leftrightarrow \ln x \ge 1 - \frac{1}{x}\)

Jensen Inequality:\(f\)是凸函数,\(\lambda_i \ge 0\)且\(\sum \lambda_i = 1\),则\(\displaystyle f\left(\sum \lambda_ix_i\right) \le \sum \lambda_i f(x_i)\)

Relative Inequality:\(\displaystyle \sum_i p_i \log \frac{p_i}{q_i} \ge 0\),等号成立当且仅当\(p_i = q_i\)

Log-Sum Inequality:\(\displaystyle \sum u_{i} \log \frac{u_i}{v_i} \ge \left(\sum u_{i}\right) \log \frac{\sum u_{i}}{\sum v_{i}}\),等号成立当且仅当\(\displaystyle \frac{u_{i}}{v_{i}} = constant\)

关于信息度量的一些不等式

  • \(H(X) \ge 0\),等号成立当且仅当\(X\)是确定的。证明:\(H(X) = I(X;X) = D(p(x,x)\parallel p(x) p(x)) \ge 0\)

  • \(H(Y|X) \ge 0\),等号成立当且仅当\(Y\)是\(X\)的一个函数。证明:\(H(Y|X) = I(Y;Y|X) = D(p(y,y|x)\parallel p(y|x)p(y|x))\ge 0\)

  • \(I(X;Y) \ge 0\),等号成立当且仅当\(X\)与\(Y\)独立
  • \(I(X;Y|Z) \ge 0\),等号成立当且仅当\(X\)与\(Y\)在给定\(Z\)时条件独立

定理:

\(H(Y|X) \le H(Y)\),等号成立当且仅当\(X\)与\(Y\)独立。证明:\(H(Y) = H(Y|X) + I(X;Y) \ge H(Y|X)\)

定理:

\(\displaystyle H(X_1, X_2, \dots, X_n) \le \sum_{i=1}^{n} H(X_i)\),等号成立当且仅当\(X_i\)相互独立。证明:\(\displaystyle H(X_1, \dots, X_n) = \sum_{i=1}^{n}H(X_{i}|X_{1}, \dots, X_{i-1}) \le \sum_{i=1}^{n}H(X_i)\)

定理:

\(I(X;Y,Z) \ge I(X;Y)\),等号成立当且仅当\(X \rightarrow Y \rightarrow Z\)构成马尔可夫链。证明:\(I(X;Y,Z) = I(X;Y) + I(X;Z|Y) \ge I(X;Y)\)

定理:

若\(U \rightarrow X \rightarrow Y \rightarrow V\)构成一个马尔可夫链,则\(I(X;Y) \ge I(U;V)\)。证明:由于\(U\rightarrow X\rightarrow Y\)是马尔可夫链,所以\(I(X;Y) = I(U,X;Y)=I(U;Y)+I(X;Y|U) \ge I(U;Y)\);同理,\(I(U;Y) \ge I(U;V)\)。

定理:

对于随机变量\(X\),当\(X\)服从均匀分布时,熵取得最大值,即\(H(X) \le \log \left|\mathcal{X}\right|\)。证明:设\(u(x)\)是\(\mathcal{X}\)上的均匀分布,\(D(p(x)\parallel u(x)) \ge 0\)。

Fano's Inequality

\(X\)是随机变量,\(\hat{X}\)是对\(X\)的估计(\(X, \hat{X} \in \mathcal{X}\)),出错的概率是\(P_e = \text{Pr}(X \neq \hat{X})\),则:
\[ H(X\mid \hat{X}) \le h_b(P_e) + P_e \log (\left|\mathcal{X}\right|-1) \]

证明:定义\(Y = 1\cdot\left\{X \neq \hat{X}\right\}\),则\(\text{Pr}(Y=1) = P_e, \text{Pr}(Y=0) = 1 - P_e, H(Y) = h_{b}(P_e)\)
\[ \begin{align*} H(X|\hat{X}) &= H(X|\hat{X}) + H(Y|X,\hat{X})\\ &= H(X,Y|\hat{X})\\ &= H(Y|\hat{X})+H(X|Y,\hat{X})\\ &=H(Y|\hat{X}) + \text{Pr}(Y=1)H(X|Y=1,\hat{X})\\ &\le H(Y) + \text{Pr}(Y=1)\sum_{\hat{x} \in \mathcal{X}}\text{Pr}(\hat{X}=\hat{x})H(X|Y=1,\hat{X}=\hat{x})\\ &\le H(Y) + \text{Pr}(Y=1)\sum_{\hat{x} \in \mathcal{X}}\text{Pr}(\hat{X}=\hat{x})\log (\left|\mathcal{X}\right|-1)\\ &= h_b(P_e) + P_e\log (\left|\mathcal{X}\right|-1)\\ \end{align*} \]

平稳信源的熵率

离散时间信源(discrete-time information source):\(\left\{X_{k}: k \ge 1\right\}\)

熵率(entropy rate):\(\left\{X_{k}\right\}\)的熵率定义为:\(H_X=\displaystyle \lim_{n\rightarrow \infty}\frac{1}{n}H(X_1, X_2, \cdots, X_{n})\),若极限存在。

例子:

\(\left\{X_{k}\right\}\)是一个\(\text{i.i.d}\)信源,用\(X\)表示任何一个时间步的随机变量,则:
\[ \lim_{n\rightarrow \infty}\frac{1}{n}H(X_1, \cdots, X_{n}) = \lim_{n\rightarrow \infty}\frac{n\cdot H(X)}{n} = H(X) \]
熵率存在,熵率是\(H(X)\)。

例子:

\(\left\{X_{k}\right\}\)是一个信源,各个\(X_k\)相互独立,且\(H(X_{k}) = k\),则:
\[ \lim_{n\rightarrow \infty}\frac{1}{n}H(X_1, \cdots, X_{n}) = \lim_{n\rightarrow \infty}\frac{n+1}{2} \]
熵率不存在。

平稳信源(stationary information source):对于一个信源\(\left\{X_{k}\right\}\),若对于任意的\(m, l \ge 1\),\(X_1, X_2, \cdots, X_m\)与\(X_{1+l}, X_{2+l}, \cdots, X_{m+l}\)具有相同的联合概率分布,则称之为平稳信源。

定义:\(\displaystyle H_X^{'} = \lim_{n\rightarrow \infty}H(X_n|X_1, X_2, \dots, X_{n-1})\)

定理:平稳信源\(\left\{X_k\right\}\)的熵率\(H_X\)存在且\(H_X = H_{X}^{'}\)

证明:\(H(X_n|X_1, X_2, \dots, X_{n-1}) \le H(X_n|X_2, \dots, X_{n-1})=H(X_{n-1}|X_1, X_2, \dots, X_{n-2})\),令\(a_n = H(X_n|X_1, X_2, \dots, X_{n-1})\),则序列单调递减且存在下界,故极限存在。

\(\displaystyle H_{X}^{'} = \lim_{n \rightarrow \infty}a_{n} = \lim_{n\rightarrow n}\frac{\sum_{i=1}^{n}a_i}{n}=\lim_{n\rightarrow \infty} \frac{1}{n}\sum_{i=1}^{n}H(X_i|X_1, X_2, \dots, X_{i-1}) = \lim_{n\rightarrow \infty}\frac{1}{n}H(X_1, \dots, X_n) = H_{X}\)

标签:编码,frac,log,sum,cdots,mathcal,rightarrow,信息论,度量
来源: https://www.cnblogs.com/hitgxz/p/12116121.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有