ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Lecture16 Recommender Systems

2021-12-21 15:35:43  阅读:194  来源: 互联网

标签:dots Recommender matrix Tx sum 电影 Systems theta Lecture16


Lecture16 Recommender Systems

Problem formulation

Example:Predicting movie ratings

User rates movies using zero to five stars

\(n_u\) = no. users

\(n_m\) = no.movies

\(r(i,j)\) = 1 if user \(j\) has rated movie \(i\)

\(y(i,j)\) = rating given by user \(j\) to movie \(i\) (defined only if \(r(i,j) = 1\))

Content-based recommendations

Content-based recommender systems

image-20211221095345564

For each user \(j\) ,learn a parameter \(\theta^{(j)}\in\mathbb{R}^3\).Predict user \(j\) as rating movies \((\theta^{(j)})^Tx^{(i)}\)

\(\theta^{(j)}\) = parameter vector for user \(j\) (\(\theta^{(j)}\in\mathbb{R}^{n+1}\))
\(x^{(i)}\) = feature vector for movie \(i\)

For user \(j\) , movie \(i\) ,predicted rating : \((\theta^{(j)})^Tx^{(i)}\)

\(m^{(j)}\) = no. of movies rated by user \(j\)

上图样本中的电影分为两种,前三部是爱情电影,后两部为动作电影。假设电影有两个特征值,第三个电影的特征值为\(x^{(3)}=\left[\begin{matrix}1\\0.9\\0\end{matrix}\right]\),其中\(x^{(3)}_0\)为偏置,0.9表明该电影偏向爱情电影。用户1对两部爱情电影打分5,对两部动作电影打分0,通过学习得到的用户1的\(\theta^{(3)}=\left[\begin{matrix}0\\5\\0\end{matrix}\right]\)。\((\theta^{(1)})^Tx^{(3)}=5*0.99=4.95\)。所以预测用户1给第三部电影打分为4.95分

Optimization objective

To learn \(\theta^{(j)}\) (parameter for user \(j\)):

\[\underset{\theta^{(j)}}{min}\frac{1}{2}\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2+\frac{\lambda}{2}\sum_{k=1}^n(\theta^{(j)}_k)^2 \]

To learn \(\theta^{(1)},\theta^{(2)},\dots,\theta^{(n_u)}\)

\[\underset{\theta^{(1)},\dots,\theta^{(n_u)}}{min}\frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta^{(j)}_k)^2 \]

Gradient descent update:

\[\theta_k^{(j)}=\theta_k^{(j)} - \alpha\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x_k^{(i)}\ \ \ \ (for\ \ k=0) \]

\[\theta_k^{(j)}=\theta_k^{(j)} - \alpha(\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x_k^{(i)}+\lambda\theta_k^{(j)})\ \ \ \ (for\ \ k\neq0) \]

Collaborative filtering

协同过滤。该算法有一个有趣的特征,叫做特征学习,即这种算法能够自行学习所需要使用的特征。

problem motivation

image-20211221101736197

对于该数据集,我们不知道每个电影的特征值,但是知道每个用户的喜好,即图中的\(\theta\)表示。我们可以由此推测出每个电影的特征值。

\[(\theta^{(1)})^Tx^{(1)} \approx 5 \\ (\theta^{(2)})^Tx^{(1)} \approx 5 \\ (\theta^{(3)})^Tx^{(1)} \approx 0 \\ (\theta^{(4)})^Tx^{(1)} \approx 0 \]

由此可知\(x^{(1)}=\left[\begin{matrix}1\\1\\0\end{matrix}\right]\)。其中\(x_0^{(1)}\)是截距项。

Optimization algorithm

Given \(\theta^{(1)},\dots,\theta^{(n_u)}\) , to learn \(x^{(i)}\)

\[\underset{x^{(i)}}{min}\frac{1}{2}\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2+\frac{\lambda}{2}\sum_{k=1}^n(x^{(i)}_k)^2 \]

Given \(\theta^{(1)},\dots,\theta^{(n_u)}\) , to learn \(x^{(1)},\dots,x^{(n_m)}\)

\[\underset{x^{(i)},\dots,x^{(n_m)}}{min}\frac{1}{2}\sum_{i=1}^{n_m}\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x^{(i)}_k)^2 \]

Collaborative filtering

Given \(x^{(1)},\dots,x^{(n_m)}\) (and movies ratings),can estimate \(\theta^{(1)},\dots,\theta^{(n_m)}\)

Given \(\theta^{(1)},\dots,\theta^{(n_m)}\) (and movies ratings),can estimate \(x^{(1)},\dots,x^{(n_m)}\)

随机猜一些\(\theta\),学习出不同电影的特征,再用通过特征来获得一个更好的对参数\(\theta\)的估计,再通过\(\theta\)来获得更好的特征,不断进行迭代,最后收敛到合理的电影特征和合理的对不同用户的参数的估计。

这是基础的协同过滤算法。对于该问题,对于推荐系统来说,该算法得以实现的基础是每个用户都对数个电影进行了评价,并且每个电影都被数位用户评价过,才能重复这个迭代过程来估计 \(x\) 和 \(\theta\)。

当你运行算法时,需要观察大量的用户,观察这些用户的实际动作来协同获得更佳的每个人对电影的评分。

每个用户都对电影做出了评价,每个用户都在帮助算法学习出更合适的特征。学习出来的特征又可以被用来更好地预测其他用户的评分。这是协同的另一个意思,是说每个用户都在帮助算法更好地进行特征学习。

Collaborative filtering algorithm

Collaborative filtering optimization objective

Minimizing \(x^{(1)},\dots,x^{(n_m)}\) and \(\theta^{(1)},\dots,\theta^{(n_u)}\) simultaneously

\[J(x^{(1)},\dots,x^{(n_m)},\theta^{(1)},\dots,\theta^{(n_u)})=\frac{1}{2}\sum_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x^{(i)}_k)^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta^{(j)}_k)^2 \]

Collaborative filtering algorithm

  1. Initialize \(x^{(1)},\dots,x^{(n_m)}\) and \(\theta^{(1)},\dots,\theta^{(n_u)}\) to small random values

  2. Minimize \(x^{(1)},\dots,x^{(n_m)}\) and \(\theta^{(1)},\dots,\theta^{(n_u)}\) using gradient descent (or an advanced optimization algorithm) E.g. for every \(j = 1,\dots,n_u,i = 1,\dots,n_m\)

    \[x_k^{(j)}=x_k^{(j)} - \alpha(\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})\theta_k^{(j)}+\lambda x_k^{(j)})\\ \theta_k^{(j)}=\theta_k^{(j)} - \alpha(\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x_k^{(i)}+\lambda\theta_k^{(j)}) \]

  3. for a user with parameters \(\theta\) and movie with (learned) features \(x\) ,predict a star rating of \(\theta^Tx\)

Vectorization:Low rank matrix factorization

Collaborative filtering

image-20211221120012101

用户对电影的评分用矩阵Y进行表示。

预测评分矩阵:

\[X\Theta^T=\left[ \begin{matrix} --(X^{(1)})^T-- \\ --(X^{(2)})^T-- \\ \vdots \\ --(X^{(n_m)})^T-- \\ \end{matrix} \right] \left[ \begin{matrix} --(\theta^{(1)})^T-- \\ --(\theta^{(2)})^T-- \\ \vdots \\ --(\theta^{(n_u)})^T-- \\ \end{matrix} \right]^T =\left[ \begin{matrix} (\theta^{(1)})^Tx^{(1)} & (\theta^{(2)})^Tx^{(1)} & \cdots & (\theta^{(n_u)})^Tx^{(1)} \\ (\theta^{(1)})^Tx^{(2)} & (\theta^{(2)})^Tx^{(2)} & \cdots & (\theta^{(n_u)})^Tx^{(2)} \\ \vdots & \vdots & \ddots & \vdots \\ (\theta^{(1)})^Tx^{(n_m)} & (\theta^{(2)})^Tx^{(n_m)} & \cdots & (\theta^{(n_u)})^Tx^{(n_m)} \end{matrix} \right] \]

For each product \(i\),we learn a feature vector \(x^{(i)} \in \mathbb{R}^n\)

How to find movies related to movie \(i\)?

small \(\left \|x^{(i)} - x^{(j)} \right \|\) means that movie \(j\) and movie \(i\) are "similar"

5 most similar movies to movie \(i\) : Find the 5 movies j with the smallest \(\left \|x^{(i)} - x^{(j)} \right \|\)

Implementational detail : Mean normalization

Users who have not rated any movies

image-20211221150835764

学习5号用户Eve的参数向量\(\theta^{(5)}\)。用户Eve没有评价过任何电影,所以只有\(\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta^{(j)}_k)^2\)这一项对损失函数有影响。为了最小化损失函数,\(\theta^{(5)}=\left[\begin{matrix}0\\0\\0\end{matrix}\right]\)。所以\((\theta^{(5)})^Tx^{(1)} = 0\),即Eve对每个电影打的分数都是0分。这种结果是存在问题的。

Mean normalization

image-20211221151333128

计算出每部电影的平均值,Y减去均值。用新的Y来训练\(\theta^{(i)},x^{(i)}\)。因为用户5仍然没有评价过任何电影,所以问题仍然存在

对于用户 \(j\) 评价电影 \(i\) 时,预测值再加上均值 \(u_i\)。即使预测值为0,得到就是电影的平均分。

标签:dots,Recommender,matrix,Tx,sum,电影,Systems,theta,Lecture16
来源: https://www.cnblogs.com/EIPsilly/p/15715481.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有