Notes - Computer Security MT24, Cryptographic hashes
Flashcards
@Define an unkeyed hash function.
A hash function that depends only the data they digest.
Suppose $h : D \to \{0, 1\}^n$ is a function. What seven properties related to cryptographic hashing might we put on the function, and what are the names of these conditions?
- Conditions on the domain
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
@definition~
Suppose $h : D \to \{0, 1\}^n$ is a function. @Define the compression condition, and note the easy trip-up.
A function that satisfies this property is not a compression function, since a compression function also has to satisfy other properties.
Suppose $h : D \to \{0, 1\}^n$ is a function. @Define the preimage resistance/one way condition.
For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$.
(Why $\text{Im}(h)$? If $y$ is not in the image, then it is obviously impossible to find a corresponding $x$. Why “almost all”? It’s easy to come up with a pair $\langle x, y\rangle$ by just picking some $x$ and then computing $h(x) = y$).
Suppose $h : D \to \{0, 1\}^n$ is a function. @Define the 2nd-preimage resistance/weak collision resistance condition.
Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
Suppose $h : D \to \{0, 1\}^n$ is a function. @Define the collision resistance/strong collision resistance condition.
It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$.
What is the difference between:
- 2nd-preimage resistance/weak collision resistance, and
- Collision resistance/strong collision resistance
?
The two conditions are:
- 2nd-preimage resistance/weak collision resistance: Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$.
- Collision resistance/strong collision resistance: It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$.
In the former, you are given $x$. In the later, it has to be over all pairs.
Suppose we have a function $h : D \to \{0, 1\}^n$, for which we might specify the following conditions:
- Conditions on the domain
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
What conditions must a one-way function meet?
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- 1A
- 2
- 3
Suppose we have a function $h : D \to \{0, 1\}^n$, for which we might specify the following conditions:
- Conditions on the domain
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
What conditions must a compression function meet?
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- 1B
- 2
- 3
- 4
- 5
Suppose we have a function $h : D \to \{0, 1\}^n$, for which we might specify the following conditions:
- Conditions on the domain
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
What conditions must a one-way hash function meet?
- $D = \{0, 1\}^n$
- $D = \{0, 1\}^{n+k}$ for $k > 0$
- $D = \{0, 1\}^\ast$ (compression)
- 1C
- 2
- 3
- 4
Suppose we have a function $h : D \to \{0, 1\}^n$, for which we might specify the following conditions:
- Conditions on the domain
- $D = \{0, 1\}^n$.
- $D = \{0, 1\}^{n+k}$ for $k > 0$.
- $D = \{0, 1\}^\ast$. (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way)
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
What conditions must a cryptographic hash function meet?
- $D = \{0, 1\}^n$.
- $D = \{0, 1\}^{n+k}$ for $k > 0$.
- $D = \{0, 1\}^\ast$. (compression)
- 1C
- 2
- 4
- 5
A function $h : D \to \{0, 1\}^n$ is preimage resistant if:
For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
What would count as falisifying preimage-resistance?
For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way).
Exhibiting a few feasibly invertible outputs.
A function $h : D \to \{0, 1\}^n$ is 2nd-preimage resistant if:
Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
What would count as falisifying 2nd-preimage resistance?
Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
Finding one $x \in D$ such that $x’ \ne x$ and $h(x) = h(x’)$.
A function $h : D \to \{0, 1\}^n$ is collision resistance if:
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
What would count as falisifying collision resistance?
Exhibiting one pair of colliding inputs to falsify collision resistance.
Can you @state a result about constructing a collision resistant function from two smaller functions?
Suppose:
- $f, g : \{0, 1\}^\star \to \{0, 1\}^n$
- Either $f$ or $g$ is collision resistant
Then:
\[h := f \parallel g\]is collision resistant.
Suppose you have some compression function
\[g : \\{0, 1\\}^n \times \\{0, 1\\}^k \to \\{0, 1\\}^n\]
How can you extend this to a function $h : \{0, 1\}^\ast \to \{0, 1\}^n$, which might serve as a candidate to be a one-way hash function or cryptographic hash function?
Take $x \in \{0, 1\}^\ast$ and split into a sequence of $k$-bit blocks $\langle x _ 1, \ldots, x _ m \rangle$. Then perform the iterative process:
\[\begin{aligned} h\langle \rangle &= IV \\\\ h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m) \end{aligned}\]For some $IV$ initialisation vector.
Consider the function
\[g : \\{0, 1\\}^{32} \times \\{0, 1\\}^{32} \to \\{0, 1\\}^{32}\]
defined by
\[g(x, y) = x + y \pmod {2^{32}}\]
and extend to a function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $\langle x _ 1, \ldots, x _ n \rangle$ is the input split into 32-bit blocks. What is this construction called?
A checksum.
Consider the function
\[g : \\{0, 1\\}^{32} \times \\{0, 1\\}^{32} \to \\{0, 1\\}^{32}\]
defined by
\[g(x, y) = x + y \pmod {2^{32}}\]
and extend to a function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $\langle x _ 1, \ldots, x _ n \rangle$ is the input split into 32-bit blocks. This is the checksum hash.
Which of the following conditions are met?
- Conditions on the domain
- $D = \{0, 1\}^n$.
- $D = \{0, 1\}^{n+k}$ for $k > 0$.
- $D = \{0, 1\}^\ast$. (compression)
- For all $x \in D$, $h(x)$ is easy to compute.
- For almost all $y \in \text{Im}(h)$, it is computationally infeasible to find an $x$ such that $h(x) = y$. (preimage resistance/one way)
- Given $x \in D$, it is always computationally infeasible to find $x’ \ne x$ such that $h(x) = h(x’)$. (2nd-preimage resistance/weak collision resistance)
- It is computationally infeasible to find any pair $x \ne x’$ such that $h(x) = h(x’)$. (collision resistance/strong collision resistance)
- $D = \{0, 1\}^n$.
- $D = \{0, 1\}^{n+k}$ for $k > 0$.
- $D = \{0, 1\}^\ast$. (compression)
- 1.3, so compression function
- 2,
- Not 3 (every 32-bit block maps to itself)
- Not 4 (appending extra 32-bit block of zeroes does not change the output)
- Not 5 (appending extra 32-bit block of zeroes does not change the output)
Suppose $E _ k$ is a block cipher with a fixed known key $k$, and consider the function
\[\begin{aligned}
h \langle \rangle &= 0 \\
h \langle x_1, \ldots, x_m \rangle &= E_k(h\langle x_1, \ldots, x_{m-1}\rangle \oplus x_m)
\end{aligned}\]
@Prove that this function is not collision resistant.
Pick some $x$ that is two blocks long, so $x = x _ 1 \parallel x _ 2$ and set $x’ = x’’ \parallel (E _ k(x’’) \oplus x’’)$ where $x’’ = E _ k(x _ 1) \oplus x _ 2$.
But then $h(x) = h(x’)$ is a collision.
\[h(x) = E_k(E_k(x_1) \oplus x_2)\] \[\begin{aligned} h(x') &= E_k(E_k(x'') \oplus (E_k(x'') \oplus x'')) \\\\ &= E_k(x'') \\\\ &= E_k(E_k(x_1) \oplus x_2) \\\\ &= E_k(E_k(x_1) \oplus x_2) \end{aligned}\]Suppose $h : \{0, 1\}^\star \to \{0, 1\}^n$. Brute-forcing a preimage sounds difficult in practice since the domain is infinite. But what’s the average time to find a solution?
$2^{n-1}$, similar to the exhaustion attack on symmetric key ciphers.
Suppose $h : \{0, 1\}^\star \to \{0, 1\}^n$. Brute-forcing a preimage can be done in roughly $2^{n-1}$ steps, but what’s the expected number of steps to find a collision $h(x) = h(x’)$, and briefly explain why this is the case?
This is much lower than finding a preimage because of the birthday paradox. All you need to do is compute $h(x)$ for random inputs and store $\langle x, h(x) \rangle$ in a data structure and keep going until a collision is found.
Suppose:
- $h : \{0, 1\}^\ast \to \{0, 1\}^n$ is a hash function with $2^n$ possible hash outputs
- We are trying to find a collision $h(x) = h(x’)$
- $T$ is the random variable representing the step at which we find the first duplicate output
@Prove (ish) that it takes roughly $\sqrt h$ steps before there’s a greater than 50% chance we have found a collision.
We have
\[\begin{aligned} \mathbb P[T > t] &= 1 \cdot \left(1 - \frac 1 h\right) \cdot \left(1 - \frac 2 h\right) \cdots \left(1 - \frac{t-1}{h}\right) \end{aligned}\]Then, using the approximation $1 - x \approx \exp(-x)$ for small $x$, we obtain
\[\begin{aligned} \mathbb P[T > t] &\approx \left(-\frac 1 h\right) \exp\left( -\frac 2 h \right) \cdots \exp\left( -\frac{t-1}{h} \right) \\\\ &= \exp\left(-\frac{1}{h} \sum^{t-1}_{i = 1} i\right) \\\\ &= \exp\left(-\frac{t(t-1)}{2h}\right) \\\\ &\approx \exp\left(-\frac{t^2}{2h}\right) \end{aligned}\]Then solving for
\[\mathbb P[T > t] > \frac 1 2\]we obtain
\[t > \sqrt{2\ln(2) h}\]Suppose you implement a login system like so:
- The computer requests a username and a password
- The user provides them
- The computer checks if $\langle \mathbf{Username}, h(\mathbf{Password})\rangle$ where $h$ is a hash function
Why is it not necessary for $h$ to be collision resistant?
- The important property is 2nd preimage resistance
- Finding an arbitrary pair $x$ and $x’$ such that $h(x) = h(x’)$ would achieve nothing as it’s unrelated to the user’s password
@Justify why 8-character passwords drawn from only lowercase letters and numbers are insecure.
- Number of lowercase letters and numbers: $36$
- Total possibilities: $36^8 \approx 2^{40}$
- Within the realms of brute-forcability
Why would it be infeasbible to calculate the hashes of all e.g. 8 character passwords and store them in a table?
Even if this table could be computed in a practical amount of time, it couldn’t be stored as the outputs of the hash function can take up more space than the inputs.
It’s infeasible to calculate the hashes of all e.g. 8 character passwords and store them in a table, not because it would take too much time but because it would take up too much space.
Can you describe a technique that trades time for space?
Suppose:
- $P$ is the set of passwords
- $h : P \to H$ is the hash function
- $g : H \to P$ is a function we invent which maps hashes to passwords (won’t be injective)
Then pick a large selection $\{p _ 1, \ldots, p _ n\} \subseteq P$ and compute an iterated chain
\[p_i, gh(p_i), ghgh(p_i), \ldots, (gh)^k(p_i) =: q_i\]Then we store $p _ i$, and the last password $q _ i$. You can vary $n$ and $k$ so the storage requirements are manageable and hope to have gone through most hashed passwords.
(This is the idea behind a rainbow table).
Suppose you implement a login system like so:
- The computer requests a username and a password
- The user provides them
- The computer checks if $\langle \mathbf{Username}, h(\mathbf{Password})\rangle$ where $h$ is a hash function
Can you describe two approaches to salting the passwords, and explain why they are useful?
- Store passwords as $\langle \mathbf{Username}, h(\mathbf{Salt} \parallel \mathbf{Password})$ where $\mathbf{Salt}$ is a fixed string used by the database
- Store passwords as $\langle \mathbf{Username}, h(g(\mathbf{Username}) \parallel \mathbf{Password})$ where $g$ is some function that creates a salt from the username
These are useful because they stop attacks from precomputed hashes. The second approach should be prefered since that means it’s impossible to build up a precomputed hash table if the attacker knows the salt.
Suppose you implement a login system over a network like so:
- The computer requests a username and a password
- The user provides them
- The computer checks if $\langle \mathbf{Username}, h(\mathbf{Password})\rangle$ where $h$ is a hash function (perhaps also using a salt)
What are two problems with this approach?
- If passwords are sent in plaintext to the host, then an eavesdropper can determine the password
- Even if the passwords aren’t in plaintext, an attacker could replay the users response to authenticate
Suppose you implement a login system over a network like so:
- The computer requests a username and a password
- The user provides them
- The computer checks if $\langle \mathbf{Username}, h(\mathbf{Password})\rangle$ where $h$ is a hash function (perhaps also using a salt)
Two problems with this approach are:
- If passwords are sent in plaintext to the host, then an eavesdropper can determine the password
- Even if the passwords aren’t in plaintext, an attacker could replay the users response to authenticate
In this context, @describe Lamport’s scheme, which prevents these kinds of attacks.
When the user creates their account, a large number $n$ is chosen and $\langle \mathbf{Username} _ A, h^n(\mathbf{Password _ A})\rangle$ is written to the password database and $n$ is communicated to Alice.
Then to log in:
- $A \to H$: $\mathbf{Username} _ A$
- $H \to A$: $\texttt{Code}?$
- $A \to H$: $\mathbf{Code} = h^{n-1}(\mathbf{Password} _ A)$.
- $H$ grants access iff $\langle \mathbf{Username} _ A, h(\mathbf{Code})\rangle$ is in the password database, and if so it will update the entry to $\langle \mathbf{Username} _ A, \mathbf{Code}\rangle$.
- $A$ decrements $n$.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. @Define finalisation in this context.
Applying one more function to the hash output.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. @Define the Merkle-Damgård construction in this context, and explain why it is useful
A hash function defined using this construction, with the additional requirements that:
- $g$ is a compression function (not the same as the compression property, to be a compression function means $g : \{0, 1\}^{n+k} \to \{0, 1\}^\ast$ and you have preimage restiance, 2nd-preimage resistance and collision resistance).
- The input is padded up to a multiple of the block size in a secure way
It can be proved that if these properties are satisfied, $h$ is a cryptographic hash function.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. If $x$ is not a multiple of the block size then it will need to be padded. Why does adding trailing zeroes to $x$ break security?
Because then
\[h(x) = h(x\pmb 0)\]where $\pmb 0$ are the trailing zeroes will be an easy to find collision.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. If $x$ is not a multiple of the block size then it will need to be padded, say by a function $\mathbf{Pad}(x)$. @State a result (of which the proof is highly technical) that describes an insecure class of padding functions.
If a padded message can ever be a suffix of another, i.e.
\[\mathbf{Pad}(x) = y \parallel \mathbf{Pad}(x')\]for some $x, x’, y$, then collisions can be found.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. If $x$ is not a multiple of the block size then it will need to be padded, say by a function $\mathbf{Pad}(x)$. There is a result that say if a padded message can ever be a suffix of another, i.e.
\[\mathbf{Pad}(x) = y \parallel \mathbf{Pad}(x')\]
for some $x, x’, y$, then collisions can be found. What is typically done to make sure this can’t be the case?
Include the size of the message in the padding.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. If $x$ is not a multiple of the block size then it will need to be padded, say by a function $\mathbf{Pad}(x)$. Can you define MD-strengthening?
where $\ell$ is the length of the $x$.
Consider the hash function $h : \{0, 1\}^\ast \to \{0, 1\}^n$ defined by
\[\begin{aligned}
h\langle \rangle &= IV \\\\
h\langle x_1, x_2, \ldots, x_m \rangle &= g(h\langle x_1, \ldots, x_{m-1}\rangle, x_m)
\end{aligned}\]
where $g$ is a compression function. If $x$ is not a multiple of the block size then it will need to be padded, say by a function $\mathbf{Pad}(x)$. Can you what it mean s for a padding compression to be MD-compliant?
- $x$ is a prefix of $\mathbf{Pad}(x)$
- If $ \vert x \vert = \vert y \vert $ then $ \vert \mathbf{Pad}(x) \vert = \vert \mathbf{Pad}(y) \vert $
- If $ \vert x \vert \ne \vert y \vert $ then the last blocks of $\mathbf{Pad}(x)$ and $\mathbf{Pad}(y)$ are not equal