Deep Learning Week9 Notes

2022-06-02 06:00:07 阅读：184 来源： 互联网

标签：img Notes Week9 hook Learning output input model grad

1. Looking at parameters

Hidden units of a perceptron

one-hidden layer fully connected network \(\mathbb{R}^2\rightarrow \mathbb{R}^2\)

nb_hidden = 20

model = nn.Sequential(
        nn.Linear(2, nb_hidden),
        nn.ReLU(),
        nn.Linear(nb_hidden, 2)
        )

visit the parameters \((w, b)\) of each hidden units:

for k in range(model[0].weight.size(0)):
    w = model[0].weight[k]
    b = model[0].bias[k]

2. Looking at activations

Given data points in high dimension:

\[\mathcal{D} = \{ x_n\in\mathbb{R}^D,n=1,...,N \} \]

the objective of the data visualization is to find a set of corresponding low-dimension points

\[\mathcal{Y} = \{y_n\in\mathbb{R}^C,n=1,...,N \} \]

\(\large\text{t-Distributed Stochastic Neighbor Embedding (t-SNE):}\) optimizes with SGD the \(y_i\)s so that the distributions of distances to close neighbors of each point are preserved.

It actually matches for \(D_{KL}\) two distance-dependent distributions: \(\textbf{Gaussian}\) in the original space, and \(\textbf{Student t-distribution}\) in the low-dimension one.

\(\text{Code:}\)

from sklearn.manifold import TSNE

# x is the array of the original high-dimension points
x_np = x.numpy()
y_np = TSNE(n_components = 2, perplexity = 50).fit_transform(x_np)
# y is the array of corresponding low-dimension points
y = torch.from_numpy(y_np)

n_components specifies the embedding dimension and perplexity states how many points are considered neighbors of each point.

3. Visualizing the processing in the input

Saliency maps

An alternative is to compute the gradient of an output with respect to the input:

\[\nabla_{|x}f_c(x;w) \]

\(\text{Code}\)

input.requires_grad_()
output = model(input)
grad_input, = torch.autograd.grad(output[0, c], input)

Smilkov et al. (2017) proposed to smooth the gradient with respect to the input image by averaging over slightly perturbed versions of the latter.

\[\tilde{\nabla}_{\mid x} f_{y}(x ; w)=\frac{1}{N} \sum_{n=1}^{N} \nabla_{\mid x} f_{y}\left(x+\epsilon_{n} ; w\right) \]

where \(\epsilon_{1}, \ldots, \epsilon_{N}\) are i.i.d of distribution \(\mathcal{N}\left(0, \sigma^{2} I\right)\), and \(\sigma\) is a fraction of the gap \(\Delta\) between the maximum and the minimum of the pixel values.

\(\text{Code}\)

std = std_fraction * (img.max() - img.min())
acc_grad = img.new_zeros(img.size())

for q in range(nb_smooth): # This should be done with mini-batches ...
    noisy_input = img + img.new(img.size()).normal_(0, std)
    noisy_input.requires_grad_()
    output = model(noisy_input)
    
    grad_input, = torch.autograd.grad(output[0, c], noisy_input)
    acc_grad += grad_input

acc_grad = acc_grad.abs().sum(1) # sum across channels

std_fraction is typically between \(0.1\) and \(0.25\).
Remember that new_* initialize tensors with the same type and same device as the input tensor
.sum(1) sums across RGB channels, so we go from a tensor of size \(1 × 3 × 224 × 224\) to a tensor of size \(1 × 224 × 224\), which can be represented as a gray-scale image. Here, the \(1\) is for a mini-batch of one sample.

Deconvolution and guided back-propagation

For ReLU function, forward pass:

\[x = \max(0,s) \]

backward pass:

\[\frac{\partial l}{\partial s} = \mathbf{1}_{\left\{s>0\right\}} \frac{\partial \ell}{\partial x} \]

\(\large\textbf{Deconvolution:}\)

\[\frac{\partial l}{\partial s} = \mathbf{1}_{\left\{\frac{\partial \ell}{\partial x}>0\right\}} \frac{\partial \ell}{\partial x} \]

This quantity is positive for units whose output has a positive contribution to the response, kills the others, and is not modulated by the pre-layer activation s.

\(\large\textbf{Guided back-propagation:}\)

\[\mathbf{1}_{\{s>0\}} \mathbf{1}_{\left\{\frac{\partial \ell}{\partial x}>0\right\}} \frac{\partial \ell}{\partial x} \]

aims at the best of both worlds: Discarding structures which would not contribute positively to the final response, and discarding structures which are not already present.

Hook

>>> x = torch.tensor([ 1.23, -4.56 ])
>>> m = nn.ReLU()
>>> m(x)
tensor([ 1.2300, 0.0000])
>>> def my_hook(m, input, output):
... print(str(m) + ' got ' + str(input[0].size()))
...
>>> handle = m.register_forward_hook(my_hook)
>>> m(x)
ReLU() got torch.Size([2])
tensor([ 1.2300, 0.0000])
>>> handle.remove()
>>> m(x)
tensor([ 1.2300, 0.0000])

Using hooks, we can implement the deconvolution as follows:

def relu_backward_deconv_hook(module, grad_input, grad_output):
    return F.relu(grad_output[0]),

def equip_model_deconv(model):
    for m in model.modules():
        if isinstance(m, nn.ReLU):
            m.register_backward_hook(relu_backward_deconv_hook)

def grad_view(model, image_name):
    to_tensor = transforms.ToTensor()
    img = to_tensor(PIL.Image.open(image_name))
    img = 0.5 + 0.5 * (img - img.mean()) / img.std()
    
    model.to(device)
    img = img.to(device)
    
    input = img.view(1, img.size(0), img.size(1), img.size(2)).requires_grad_()
    output = model(input)
    
    result, = torch.autograd.grad(output.max(), input)
    result = result / result.max() + 0.5
    return result

model = models.vgg16(pretrained = True)
model.eval()
model = model.features
equip_model_deconv(model)
result = grad_view(model, 'blacklab.jpg')
utils.save_image(result, 'blacklab-vgg16-deconv.png')

\(\text{Hooks for Guided back-propagation:}\)

def relu_forward_gbackprop_hook(module, input, output):
    module.input_kept = input[0]

def relu_backward_gbackprop_hook(module, grad_input, grad_output):
    return F.relu(grad_output[0]) * F.relu(module.input_kept).sign(),

def equip_model_gbackprop(model):
    for m in model.modules():
        if isinstance(m, nn.ReLU):
            m.register_forward_hook(relu_forward_gbackprop_hook)
            m.register_backward_hook(relu_backward_gbackprop_hook)

Grad-CAM

\(\text{Gradient-weighted Class Activation Mapping (Grad-CAM)}\) visualizes the importance of the input sub-parts according to the activations in a specific layer.

Formally, let \(k = \{1,...,C \}\) be a channel number, \(A^k\in \mathbb{R}^{H\times W}\) the output feature map \(k\) of the selected layer, \(c\) a class number, and \(y^c\) the network's logit for that class.

The channel's weight:

\[\alpha_k^c = \frac{1}{HW}\sum_{i=1}^H\sum_{j=1}^W\frac{\partial y^c}{\partial A_{i,j}^k} \]

The final localization map is:

\[L_{Grad-CAM}^c = \text{ReLU}(\sum_{k=1}^C\alpha_k^cA^k) \]

\(\text{Code:}\)

def hook_store_A(module, input, output):
    module.A = output[0]

def hook_store_dydA(module, grad_input, grad_output):
    module.dydA = grad_output[0]

model = torchvision.models.vgg19(pretrained = True)
model.eval()

layer = model.features[35] # Last ReLU of the conv layers
layer.register_forward_hook(hook_store_A)
layer.register_backward_hook(hook_store_dydA)

load an image and make it as one batch:

to_tensor = torchvision.transforms.ToTensor()
input = to_tensor(PIL.Image.open('example_images/elephant_hippo.png')).unsqueeze(0)

Compute:

output = model(input)

c = 386 # African elephant
output[0, c].backward()

alpha = layer.dydA.mean((2, 3), keepdim = True)
L = torch.relu((alpha * layer.A).sum(1, keepdim = True))

mean((2, 3), keepdim = True) computes the mean over the height and width of the image. So we go from a tensor of size \(1 × 3 × H × W\) to a tensor of size \(1 × 3 × 1 × 1\). The last two “\(1\)” are preserved by keepdim = True.

Save it as a resized colored heat-map:

L = L / L.max()
L = F.interpolate(L, size = (input.size(2), input.size(3)),
                    mode = 'bilinear', align_corners = False)

l = L.view(L.size(2), L.size(3)).detach().numpy()
PIL.Image.fromarray(numpy.uint8(cm.gist_earth(l) * 255)).save('result.png')

gist_earth is a color map with orange color for high values, blue for low ones, and green for intermediate ones.

标签：img,Notes,Week9,hook,Learning,output,input,model,grad
来源： https://www.cnblogs.com/xinyu04/p/16336363.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9