标签:varepsilon right Theory CS229 Learning error hat gamma left
Learning Theory
Assumption
- data in training set and test set are from the same distribution
- all samples are sampled independently
- Learning Algorithm is deterministic function, while output parameter is a random variable(sampling distribution), but there is a "true parameter" that is fixed but unknown, and we wish to get close.
Parameter view of fitting degree
- bias is how far the estimated parameter is to the true parameter
- variance is how stable and similar when repeating the learning process
- actually first moment and second moment property
we wish when we increase the number of samples, variance tends to be 0. And if \(\theta \rightarrow \theta^{*}\),this model can be called unbiased estimator.
-
g: best possible hypothesis
-
\(h^{*}\): best in class \(\mathcal{H}\)
-
\(h\): learn from finite data
-
empirical risk
- \(\hat{\varepsilon}(h)=\frac{1}{m} \sum_{i=1}^{m} 1\left\{h\left(x^{(i)}\right) \neq y^{(i)}\right\}\)
- wrong classification rate in the training set
-
generalization error
- $\varepsilon(h)=P_{(x, y) \sim \mathcal{D}}(h(x) \neq y) $
- probability of making a mistake on a new example
-
Bayes error \(\varepsilon(g)\): irreducible mistake rate
-
Approximation error \(\varepsilon(h^{*}) - \varepsilon(g)\)
- attribution of the class
-
Estimation error \(\varepsilon(\hat{h}) - \varepsilon(h^{*})\)
-
generalization error
- = irreducible error + Approximation error + Estimation error
- = irreducible error + bias + variance
Fight variance
- increase the number of samples
- regularization(may cause bias but can reduce variance significantly)
Fight high Bias
- make class \(\mathcal{H}\) bigger
- reduce bias, increase variance
- opposite to regularization
ERM: Empirical Risk Minimizer
-
\(\hat{h}=\arg \min _{h \in \mathcal{H}} \hat{\varepsilon}(h)\)
-
Uniform convergence
- \(\hat{\varepsilon}(h)\) vs \(\varepsilon(h)\) : relationship between empirical risk and generalization erro
- \(\varepsilon(h)\) vs \(\varepsilon(h^{*})\) : how to measure the distance between our result and the best result in this class
- tools
- Union bound
- Hoeffding's Inequality
- \(P(|\phi-\hat{\phi}|>\gamma) \leq 2 \exp \left(-2 \gamma^{2} m\right)\)
- Hoeffding's Inequality is used to restrict the problem 1 for a certain hypothesis, applying this and union bound to all hypothesis is called uniform convergence
Finite hypothesis class
-
Assume that \(|\mathcal{H}| = k\)
-
\(\begin{aligned} P\left(\exists h \in \mathcal{H} \cdot\left|\varepsilon\left(h_{i}\right)-\hat{\varepsilon}\left(h_{i}\right)\right|>\gamma\right) &=P\left(A_{1} \cup \cdots \cup A_{k}\right) \\ & \leq \sum_{i=1}^{k} P\left(A_{i}\right) \\ & \leq \sum_{i=1}^{k} 2 \exp \left(-2 \gamma^{2} m\right) \\ &=2 k \exp \left(-2 \gamma^{2} m\right) \end{aligned}\)
-
\(m\) stands for the size of the dataset, set the right side as \(\delta\), which stands for the probability of making a mistake greater than \(\gamma\) , we can solve one of the three with others fixed. e.g. \(m \geq \frac{1}{2 \gamma^{2}} \log \frac{2 k}{\delta}\)
-
compare our performance to that of \(h^{*}\):
- \(\begin{aligned} \varepsilon(\hat{h}) & \leq \hat{\varepsilon}(\hat{h})+\gamma \\ & \leq \hat{\varepsilon}\left(h^{*}\right)+\gamma \\ & \leq \varepsilon\left(h^{*}\right)+2 \gamma \end{aligned}\)
- \(\varepsilon(\hat{h}) \leq\left(\min _{h \in \mathcal{H}} \varepsilon(h)\right)+2 \sqrt{\frac{1}{2 m} \log \frac{2 k}{\delta}}\)
-
for infinite dimension, use VC dimension, and get a similar result:
- \(\varepsilon(\hat{h}) \leq \varepsilon\left(h^{*}\right)+O\left(\sqrt{\frac{d}{m} \log \frac{m}{d}+\frac{1}{m} \log \frac{1}{\delta}}\right)\)
标签:varepsilon,right,Theory,CS229,Learning,error,hat,gamma,left 来源: https://www.cnblogs.com/Philematology/p/15867861.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。