ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

TRMF 辅助论文:最小二乘法复现TRMF

2021-11-09 23:33:51  阅读:285  来源: 互联网

标签:l% 7D% 7Bl% 20% 复现 TRMF plus _% 乘法


1 目标函数(总)

论文笔记:Temporal Regularized Matrix Factorization forHigh-dimensional Time Series Prediction_UQI-LIUWJ的博客-CSDN博客

min_{W,X,\Theta} \frac{1}{2}\sum_{(i,t) \in \Omega} (y_it-w_ix_t^T)^2+\lambda_w R_w(W)+\lambda_x R_{AR}(X|\L,\Theta,\eta)+\lambda_\theta R_\theta(\Theta)

1.1 求解W

我们留下含有W的部分:

 min_{W,X,\Theta} \frac{1}{2}\sum_{(i,t) \in \Omega} (y_it-w_ix_t^T)^2+\lambda_w R_w(W)

min_{W_i} \frac{1}{2} [(y_{i1}-w_ix_1^T)^2+\dots+(y_{in}-w_ix_n^T)^2]+ \frac{1}{2}\lambda_w\sum_{i=1}^m w_iw_i^T

然后对wi求导

线性代数笔记:标量、向量、矩阵求导_UQI-LIUWJ的博客-CSDN博客

\frac{1}{2} [-2(y_{i1}-w_ix_1^T)x_1+\dots+-2(y_{in}-w_ix_n^T)x_n]+ \frac{1}{2}\lambda_w 2I w_i=0

-(y_{i1}x_1+\dots+y_{in}x_n)+ [{(w_ix_1^T)x_1}+\dots+ (w_ix_n^T)x_n]+ \lambda_w I w_i=0

y_{ij}是一个标量,所以放在xi的左边和右边没有影响

所以

[{w_ix_1^Tx_1}+\dots+ w_ix_n^Tx_n]+ \lambda_w I w_i=(y_{i1}x_1+\dots+y_{in}x_n)

也即:

(\sum_{(i,t)\in \Omega}x_t^Tx_t) w_i+ \lambda_w I w_i=\sum_{(i,t)\in \Omega}y_{it}x_t

w_i=[(\sum_{(i,t)\in \Omega}x_t^Tx_t) + \lambda_w I]^{-1}\sum_{(i,t)\in \Omega}y_{it}x_t

 对应的代码如下:(假设sparse_mat表示 观测矩阵)

from numpy.linalg import inv as inv
for i in range(dim1):
    #W矩阵的每一行分别计算
    pos0 = np.where(sparse_mat[i, :] != 0)
    #[num_obs] 表示i对应的有示数的数量

    Xt = X[pos0[0], :]
    #[num_obs,rank

    vec0 = sparse_mat[i, pos0[0]] @ Xt
    #sparse_mat[i, pos0[0]] 是一维向量,
    #所以sparse_mat[i, pos0[0]] @ Xt 和 sparse_mat[i, pos0[0]].T @ Xt 是一个意思,
    #输出的都是一个一维向量
    #[rank,1]

    mat0 = inv(Xt.T @ Xt + np.eye(rank))
    #[rank,rank]

    W[i, :] = mat0 @ vec0

 其中:

\sum_{(i,t)\in \Omega}y_{it}x_t
vec0 = sparse_mat[i, pos0[0]] @ Xt

[(\sum_{(i,t)\in \Omega}x_i^Tx_i) + \lambda_w I]^{-1}
mat0 = inv(Xt.T @ Xt + np.eye(rank))

1.2 求解X

我们留下含有X的部分

min_{W,X,\Theta} \frac{1}{2}\sum_{(i,t) \in \Omega} (y_it-w_ix_t^T)^2+\lambda_w R_w(W)+\lambda_x R_{AR}(X|\L,\Theta,\eta)

\lambda_x R_{AR}(X|\L,\Theta,\eta) =\frac{1}{2}\lambda_x \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L}\theta_l \divideontimes x_{t-l})(x_t-\sum_{l \in L}\theta_l \divideontimes x_{t-l})^T]+\frac{1}{2}\lambda_x \eta \sum_{t=1}^f x_t x_t^T

\divideontimes表示逐元素乘积 (两个向量a和b,a\divideontimesb可以用diag(a) b表示)

当t=1~ld的时候,我们没有R_{AR}什么事情,所以此时我们更新X的方式和之前的W差不多min_{W,X,\Theta} \frac{1}{2}\sum_{(i,t) \in \Omega} (y_it-w_ix_t^T)^2+\lambda_x R_x(X)

同理,X的更新方式为:

\small x_t=[(\sum_{(i,t)\in \Omega}w_i^Tw_i) + \lambda_x \eta I]^{-1}\sum_{(i,t)\in \Omega}y_{it}w_i

而当t≥ld+1的时候,我们就需要考虑R_{AR}

对于任意xt(我们令其为xo),他会出现在哪些R_{AR}中呢?

首先 是 \frac{1}{2}\lambda_x[(x_o-\sum_{l \in L}\theta_l \divideontimes x_{o-l})(x_o-\sum_{l \in L}\theta_l \divideontimes x_{o-l})^T]

=\frac{1}{2}\lambda_x[(x_o-\sum_{l \in L}\theta_l \divideontimes x_{o-l})(x_o^T-(\sum_{l \in L}\theta_l \divideontimes x_{o-l})^T)]

\tiny =\frac{1}{2}\lambda_x[x_ox_o^T-(\sum_{l \in L}\theta_l \divideontimes x_{o-l})x_o^T-x_o(\sum_{l \in L}\theta_l \divideontimes x_{o-l})^T+(\sum_{l \in L}\theta_l \divideontimes x_{o-l})(\sum_{l \in L}\theta_l \divideontimes x_{o-l})^T]

对xo求导,有:

\small =\frac{1}{2}\lambda_x[2x_o-2\sum_{l \in L}\theta_l \divideontimes x_{o-l}]

其次,是所有的 \frac{1}{2}\lambda_x[(x_{o+l}-\sum_{l \in L}\theta_l \divideontimes x_{o})(x_{o+l}-\sum_{l \in L}\theta_l \divideontimes x_{o})^T]

对每一个l,有用的项就是xo相关的项,于是我们可以写成,对每一个l的

\frac{1}{2}\lambda_x[(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l}-\theta_{l} \divideontimes x_{o})(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l}-\theta_{l} \divideontimes x_{o})^T]

\small =\frac{1}{2}\lambda_x[(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T -\theta_{l} \divideontimes x_{o}(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T -(x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})(\theta_{l} \divideontimes x_{o})^T +\theta_{l} \divideontimes x_{o}(\theta_{l} \divideontimes x_{o})^T]

对xo求导,有\small =\frac{1}{2}\lambda_x [-2\theta_{l} \divideontimes (x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T+2(\theta_{l} ^2)\divideontimes x_o]

于是我们可以写成\sum_{l \in L, o+l<T}\frac{1}{2}\lambda_x [-2\theta_{l} \divideontimes (x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T+2(\theta_{l} ^2)\divideontimes x_o]

几部分拼起来,有

\small \sum_{l \in L, o+l<T, o>l_d}\frac{1}{2}\lambda_x [-2\theta_{l} \divideontimes (x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T+2(\theta_{l} \divideontimes \theta_{l} ) \divideontimes x_o]

\small +\frac{1}{2}\lambda_x[2x_o-2\sum_{l \in L,o>l_d}\theta_l \divideontimes x_{o-l}]

\small +[(\sum_{(i,o)\in \Omega}w_i^Tw_i) + \lambda_x \eta I]x_o-\sum_{(i,o)\in \Omega}y_{io}w_i

=0

\small \sum_{l \in L, o+l<T, o>l_d}\lambda_x (\theta_{l} \divideontimes \theta_{l} ) \divideontimes x_o

\small +\lambda_x \eta I x_o

\small +[(\sum_{(i,o)\in \Omega}w_i^Tw_i) + \lambda_x I]x_o

=

\small \sum_{l \in L, o+l<T, o>l_d}\lambda_x [\theta_{l} \divideontimes (x_{o+l}-\sum_{l' \in L-\{l\}}\theta_{l'} \divideontimes x_{o+l'-l})^T]

\small +\lambda_x\sum_{l \in L,o>l_d}\theta_l \divideontimes x_{o-l}

+\small \sum_{(i,o)\in \Omega}y_{io}w_i

所以xo(o≥ld+1)的更新公式为

\small [(\sum_{(i,o)\in \Omega}w_i^Tw_i) + \lambda_x I+\lambda_x \eta I +\lambda_x\sum_{l \in L, o+l<T, o>l_d} diag(\theta_{l} \divideontimes \theta_{l} )]^{-1}

3 更新θ

min_{W,X,\Theta} \frac{1}{2}\sum_{(i,t) \in \Omega} (y_it-w_ix_t^T)^2+\lambda_w R_w(W)+\lambda_x R_{AR}(X|\L,\Theta,\eta)+\lambda_\theta R_\theta(\Theta)

我们留下和θ (θk)有关的部分

\small \frac{1}{2}\lambda_x \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L}\theta_l \divideontimes x_{t-l})(x_t-\sum_{l \in L}\theta_l \divideontimes x_{t-l})^T]+\lambda_\theta R_\theta(\Theta)

=\frac{1}{2}\lambda_x \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l}-\theta_k \divideontimes x_{t-k})(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l}-\theta_k \divideontimes x_{t-k})^T]+\lambda_\theta R_\theta(\Theta)

=\frac{1}{2}\lambda_x \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})^T

-(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})(\theta_k \divideontimes x_{t-k})^T

-(\theta_k \divideontimes x_{t-k})(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})^T

+(\theta_k \divideontimes x_{t-k})(\theta_k \divideontimes x_{t-k})^T]

+\lambda_\theta R_\theta(\Theta)

关于θk求导

\small \frac{1}{2}\lambda_x \sum_{t=l_d+1}^f[-2(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})\divideontimes x_{t-k}+2 diag (x_{t-k} \divideontimes x_{t-k})\theta_k]+\lambda_\theta I \theta_k=0

\small \theta_k=[\lambda_\theta I + \sum_{t=l_d+1}^f diag (x_{t-k} \divideontimes x_{t-k})\theta_k]^{-1} \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})\divideontimes x_{t-k}]

4 总结

w_i=[(\sum_{(i,t)\in \Omega}x_t^Tx_t) + \lambda_w I]^{-1}\sum_{(i,t)\in \Omega}y_{it}x_t

x:

t ∈ 1~ld:\small x_t=[(\sum_{(i,t)\in \Omega}w_i^Tw_i) + \lambda_x \eta I]^{-1}\sum_{(i,t)\in \Omega}y_{it}w_i

 

t ≥ld+1 \small [(\sum_{(i,o)\in \Omega}w_i^Tw_i) + \lambda_x I+\lambda_x \eta I +\lambda_x\sum_{l \in L, o+l<T, o>l_d} diag(\theta_{l} \divideontimes \theta_{l} )]^{-1}

 \small \theta_k=[\lambda_\theta I + \sum_{t=l_d+1}^f diag (x_{t-k} \divideontimes x_{t-k})\theta_k]^{-1} \sum_{t=l_d+1}^f[(x_t-\sum_{l \in L-\{k\}}\theta_l \divideontimes x_{t-l})\divideontimes x_{t-k}]

 

标签:l%,7D%,7Bl%,20%,复现,TRMF,plus,_%,乘法
来源: https://blog.csdn.net/qq_40206371/article/details/121215548

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有