标签:kernel Taking feature prior Optimization mathcal reading data mean
目录- Proceedings of the IEEE 2016
- https://ieeexplore.ieee.org/abstract/document/7352306
- A review of BO, an optimization algorithm typically for "hyperparameters".
1 Introduction
- design, choice, high-dim, hyperparam
- IBM ILOG CPLEX
- \(x^* = argmax_{x\in \mathcal X}f(x)\)
- compact subset of \(\mathbb R^d\), or ...
- stochastic output \(\mathbb E[y|f(x)]=f(x)\)
- unbiased noisy point-wise observations
- data efficient, evaluations are costly
- prior, refine
- best choice? acquisition function \(\alpha_n: \mathcal X\to \mathbb R\)
-
- mean, confidence interval
- myopic heuristics
- uncertainty is large (exploration), or prediction is high (exploitation)
- acquisition function: easy to find the optimum, analytic?
2 Bayesian Optimization with Parametric Models
- parametrized by \(w\)
- \(\mathcal D\): data
- bayesian: \(p(w|D)=p(D|w)p(w)/p(D)\)
- beliefs about \(w\) after observing data \(D\)
- \(p(D)\) intractable, but in fact a normalizing constant
- prior: conjucacy, analytically
- \(K\) drugs, independent
- to optimize \(f\), on \(K\) indices, fully parametrized
- beta, conjugacy
- TS, simplest strategy, posterior prob of optimality, estimated, MC
- \(a_{n+1}=argmax_a f_{\bar w}(a)\)
- no more param other than the prior
- linear model, feature, vector, \(f_w(a)=x_a^T w\)
- \(X\): input vectors, \(y\): outputs
- nonlinear basis functions
- radial
- Fourier
- learned from data
- feature map, regardless, weights can be computed analytically
3 Nonparametric models
- start, observation variance \(\sigma^2\), zero-mean Gaussian prior \(V_0\), preserve Gaussianity
- basis functions, linear regression, symmetric positive-semidefinite, kernel
- intuitive similarity between pairs of points, rather than a feature map \(\Phi\)
- tractable, linear algebra, unnecessary to explicitly define \(\Phi\)
- GP, nonparametric model, prior mean, covariance
- \(f|X\sim \mathcal N(m,K)\)
- \(y|f, \sigma^2\sim \mathcal N(f,\sigma^2 I)\)
- posterior: use \(x\) and previous data (not "abstracted by parameters")
- kernel, structure, periodic, stationary
- Matern, diagonal, paramtrized
- Matern, diagonal, paramtrized
- kernel, smoothness and amplitude
- prior, possible offset, constant, expert knowledge
标签:kernel,Taking,feature,prior,Optimization,mathcal,reading,data,mean 来源: https://www.cnblogs.com/minor-second/p/15512655.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。