Computer Vision MT25, Interpreting vision models


Flashcards

@State and @define the three aspects that need to be considered for the explanation of a deep learning model.


  • Recipient: Explanations need to adapt to the recipient of the information.
  • Content: Explanations provide different types of information.
  • Purpose: Explanations differ based on use-cases.

@State and @define three different approaches to explainable models.


  • Post-hoc analysis: Explanations are derived from a fixed, pre-trained model via analysis.
  • Transparent models: The model is specifically constructed such that some mechanism has semantic meaning.
  • Learned explanations: The model is trained to deliver explanations together with predictions.

What are the pros and cons of the post-hoc analysis approach to explaining models?


  • Pro: There is no impact on performance.
  • Con: Typically very difficult.
  • Con: Explanations are often local around predictions.

What are the pros and cons of the “transparent models” approach to explainable AI?


  • Pro: Does not require difficult post-hoc analysis.
  • Con: Requires a task-specific architecture.
  • Con: Can affect performance, there is a trade-off between explainability vs capabilities.

What are the pros and cons of the “learned explanations” approach to explainable models?


  • Pro: Explanations can be very semantic.
  • Con: Might need meta-explanations; you are always unsure if the explanation is a true explanation.
  • Con: Can affect performance.

The last layer of ResNet18 has dimension $512 \times 1000$, corresponding to $1000$ classes each of dimension $512$. How could you visualise these weights?


Use PCA to reduce to $2 \times 1000$ and plot.

One way to understand a classification model is to visualise its learned weights e.g. with PCA. How could you instead visualise what the model does with its inputs?


Compute the activations, use PCA, then visualise the embedding with class labels.

What is t-SNE?


A non-linear (although less interpretable) embedding like PCA.

@State two sanity checks for interpretability techniques.


  1. See if it makes a randomly initialised network appear interpretable.
  2. Train another model on the same data but random labels.

What is the technique of input reconstruction?


Search for inputs that maximise class probabilities.

What is the occlusion method for interpreting vision models?


A black-box interpretability method where you occlude a part of the image and measure the change in response; the bigger the change, the more important the region.

Why might it not be a good idea to occlude patches of an image by replacing them with a black square?


If the image is already black there, it will make no difference.

What is the gradient method for interpreting vision models? @Visualise an example.


Plot the gradient magnitude $ \vert \nabla _ x f(x) \vert _ 1$, i.e. see which direction does the input need to change in order to affect the output the most.

What is the ROAR method for benchmarking interpretability methods?


  • Remove and retrain
  • Run your attribution method, delete X% of the most important pixels
  • Retrain your network on this data, see how much performance changes



Related posts