ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

ML: Dimensionality Reduction - Principal Component Analysis

2022-07-16 17:31:26  阅读:203  来源: 互联网

标签:reduce Dimensionality frac ML sum Component approx variation error


Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera


Dimensionality Reduction - Principal Component Analysis (PCA)

notations:

$u_k$: the k-th principal component of variation

$z^{(i)}$: the projection of the $i$-th example $x^{(i)}$

$x_{approx}^{(i)}$: the recovered data of $x^{(i)}$ from its projection $z^{(i)}$

problem formulation:

For an $n$ dimensional input dataset, reduce it to $k$ dimension. That is, find $k$ vectors ($u_1, u_2, \cdots, u_k$) onto which to project the data, so as to minimize the projection error:

$$ error = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2 $$

algorithm process:

1. feature scaling and mean normalization for the original dataset $x^{(i)} \in \mathbb{R}^n$

2. compute the covariance matrix $\Sigma \in \mathbb{R}^{n \times n}$:

$$ \Sigma = \frac{1}{m} \sum_{i=1}^{m} (x^{(i)})(x^{(i)})^{T} $$

Sigma = X' * X / m;

3. compute the eigenvectors of the covariance matrix using:

[U, S, V] = svc(Sigma);

$$ U = \begin{bmatrix}| & | &  & | \\u_1 & u_2 & \vdots & u_n \\| & | &  & | \\\end{bmatrix} $$

4. select the first $k$ columns of matrix $U \in \mathbb{R}^{n \times n}$ as the $k$ principal components:

U_reduce = U(:, 1:k);

5. project $x^{(i)}$ into a $k$ dimensional vector $z^{(i)}$:

$$ z^{(i)} = U_{reduce}^{T}x^{(i)} $$

Z = X * U_reduce;

6. reconstruction from compressed representation:

$$ x_{approx}^{(i)} = U_{reduce}z^{(i)} $$

X_approx = U_reduce * Z;

choosing the number of principal components:

The average squared projection error is:

$$ error = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2 $$

The total variation of the dataset is:

$$ variation = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)}\right\|^2 $$

Typically, choose $k$ to be the smallest value so that:

$$ \frac{error}{variation} = \frac{\frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - x_{approx}^{(i)}\right\| ^ 2}{\frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)}\right\|^2} \leq 0.01 $$

i.e. 99% of variation is retained.

In practice, this value is found to be:

$$ \frac{error}{variation} = 1 - \frac{\sum_{i=1}^{k}S_{ii}}{\sum_{i=1}^{n}S_{ii}} $$

$$ S = \begin{bmatrix}S_{11} &  &  &  \\ & S_{22} &  &  \\ &  & \ddots  &  \\ &  &  & S_{nn} \\\end{bmatrix} $$

Hence, the algorithm only needs to be run once. And pick the smallest $k$ so that: 

$$ \frac{\sum_{i=1}^{k}S_{ii}}{\sum_{i=1}^{n}S_{ii}} \geq 0.99 $$

usages of PCA:

  • data compression: reduce the memory needed & speed up the learning algorithm
  • data visualization: reduce the data to 2D or 3D so that they can be plotted
  • improper use: use PCA for regularization, because some information is lost during the process of PCA

 

标签:reduce,Dimensionality,frac,ML,sum,Component,approx,variation,error
来源: https://www.cnblogs.com/ms-qwq/p/16484697.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有