Generating Strong Pseudoprimes to Fixed Bases
The hardest challenge in the mathematics section of CryptoHack, entitled "Prime and Prejudice," essentially involves finding a strong pseudoprime to a set of fixed bases—with the aim of tricking the Miller–Rabin implementation to accept our composite number as prime. The title of the challenge is a reference to a 2018 paper exposing security flaws related to adversarial primality testing in cryptographic libraries.
In the appendix of that paper, they summarize Arnault's method of generating strong pseudoprimes to a set of fixed bases. It's quite involved, but becomes easier to understand once you cross reference the theory with the SageMath implementation. The basic idea is that if $n=p_1\cdots p_h$ is a product of distinct odd primes then it is a strong pseudoprime to a base $a$ if it satisfies these three requirements:
$\gcd(a,n)=1$.
$(a/p_i) = -1$ for all $1 \leq i\leq h$ where $(a/p)$ denotes the Legendre symbol.
Define,
$$k_i = \frac{p_i-1}{p_1-1}, \quad m_i = \frac{\prod_{j\neq i} p_j - 1}{p_i-1},$$
then the coefficients $k_i$ and $m_i$ must be integers for all $2 \leq i \leq h$ and $k_i$ must be odd.
We begin by choosing random odd primes as the coefficients $k_i$ then work backwards to derive $n=p_1\dotsc p_h$ that fulfills the requirements. By the definition of $k_i$ we know that $p_i = k_i(p_1-1) + 1$. Therefore, we need only find a value for $p_1$ that will fit the other requirements—the other $p_i$ can be easily calculated from $p_1$. Arnault's method essentially involves enumerating candidate values for $p_1$ until we find the one that works. Let $S_a \subset \mathbb{Z}_{4a}$ be such that, for any prime $p$,
$$(a/p) = -1 \iff p \bmod {4a} \in S_a.$$
The quadratic reciprocity law ensures that $(a/p)=(p/a)$ unless $a\equiv p\equiv3 \pmod{4}$ in which case $(a/p) = -(p/a)$. Hence $S_a$ is trivial to compute. $p_1 \bmod{4a}$ must be an element in $S_a$ because of requirement 2, so $S_a$ is the set of candidate values for $p_1$. By noticing that $p_i = k_i(p_1-1) + 1$ can be rewritten as $p_1 = k_i^{-1}(p_i - 1) + 1$ we can further narrow down the possible values for $p_1$ to a subset of $S_a$. Call this subset $S_a'$, then the number $p_1 \bmod{4a}$ must be an element of $S_a' \subset S_a$, where,
$$S_a' := \bigcap_{i=1}^h k_i^{-1}(S_a + k_i - 1)$$
and $k_i^{-1}(S_a + k_i - 1)$ denotes the set $\{k_i^{-1}(s+k_i-1) : s \in S_a\}$. To recap, $n=p_1 \dotsc p_h$ is (almost) a strong pseudoprime to base $a$ if its an element of $S_a'$. In general, for a set of bases $A=\{a_1, \dotsc, a_t\}$, by selecting a candidate value $z_a \in S_a'$ for each $a\in A$ we have a system of modular congruences $p_1 \equiv z_a \pmod {4a}$. I say almost, because in addition, we need to add the following congruences in the system to ensure that the coefficients $m_i$ are integers for $h=3$ (we discuss the general case where $h=n$ later):
$$\begin{aligned} p_1 &\equiv -k_3^{-1} \pmod{k_2} \\ p_1 &\equiv -k_2^{-1} \pmod{k_3} \end{aligned}$$
Now we have this system of congruences that $p_1$ must satisfy. We choose random candidates $z_a\in S_a'$ and try to solve the system with CRT—this succeeds only occasionally, however, because the moduli are not coprime. Arnault's method involves bruteforcing different values for $z_a$ until CRT works. As you might've noticed, Arnault's method is quite a fragile and complicated process that requires the correct selection of $k_i$ values.
I haven't done a stellar job explaining how Arnault's method works. I might've made mathematical errors, thus if you're a stickler for proofs and details go read Arnault's original 1995 paper that introduced this method. If you're confused—which is totally expected, I am confused reading what I just wrote—cross reference the following SageMath implementation with the above mathematics. It makes the method more concrete and makes more sense.
def S(a):
return {
p for p in range(3, 4*a + 1, 2)
if kronecker_symbol(a, p) == -1
}
# A is set of prime bases to be strong pseudoprime against
def strong_pseudoprime(A, ks):
assert len(ks) == 3 # we remove this restriction later
# subsets is dict that maps bases a to subsets of residues mod 4a
subsets = {}
for a in A:
subsets[a] = list(set.intersection(*(
{(pow(k, -1, 4*a) * (s + k - 1)) % (4*a) for s in S(a)}
for k in ks
)))
# bruteforce candidate values for p_1 until CRT works
r, m = None, None
while True:
# initial conditions to ensure m_i are integers
rs = [
pow(-ks[2], -1, ks[1]),
pow(-ks[1], -1, ks[2])
] # residues
ms = [ks[1], ks[2]] # moduli
# choose random candidate value
for a, s in subsets.items():
rs.append(random.choice(s))
ms.append(a * 4)
# does CRT work?
try:
r = crt(rs, ms)
m = lcm(ms)
except ValueError:
continue
break
# found a value for p_1! Now, find p_2, ..., p_h
i = 0
while True:
p1 = m*i + r
ps = [p1] + [
k*(p1 - 1) + 1
for k in ks[1:]
]
if all(is_prime(p) for p in ps):
return product(ps)
i += 1
The code above is quite naive and only meant to illustrate the concept. It has a major limitation: $h$ must be equal to 3. I've shown the specific congruences needed for $h=3$, but haven't discussed the general case when $h=n$ yet. Let me explain that now. Arnault, in Lemma 2.2 and Remark 2.3, points out that $m_i$ is an integer if $p_1$ is a root of $f_i$ modulo $k_i$, where,
$$f_i(x) = \frac{(\prod_{j\neq i} (k_j(x-1) + 1)) - 1}{x-1}.$$
For example, when $h=3$ we have $f_2(x) = k_3x+1$. The congruence corresponds to
$$\begin{aligned} f_2(p_1) &\equiv 0 \pmod{k_2} \\ k_3p_1+1 &\equiv 0 \pmod{k_2} \\ p_1 &\equiv -k_3^{-1} \pmod{k_2} \end{aligned}$$
which is exactly the same congruence I showed you earlier for when $h=3$. Instead of hard coding these congruences to ensure $m_i$ are integers, we can model the function $f_i$ in SageMath and basically solve for $x$ on the fly to derive the congruences. Be warned—I'm not entirely confident that I've implemented this procedure correctly. I've only tested the code with $h=3$ for the most part.
A major speedup can be achieved by systematically working through each candidate value $z_a$, and backtracking as soon as the CRT fails. The following implementation finds all possible valid candidates from selected values of $k_i$ and returns them as a generator.
def S(a):
return {
p for p in range(3, 4*a + 1, 2)
if kronecker_symbol(a, p) == -1
}
def strong_pseudoprime(A, ks):
x = var('x')
def f(i):
p = product(ks[j]*(x-1)+1 for j in range(len(ks)) if j != i)
return (p - 1) / (x - 1)
# subsets is dict that maps bases a_i to subsets of residues mod 4a
subsets = {}
for a in A:
subsets[a] = list(set.intersection(*(
{(pow(k, -1, 4*a) * (s + k - 1)) % (4*a) for s in S(a)}
for k in ks
)))
indices = [0] * len(subsets)
def bruteforce(i, r, m):
if i == len(indices):
yield find_pseudoprime(r, m)
return
for residue in subsets[A[i]]:
try:
r = crt([r, residue], [m, A[i] * 4])
except ValueError:
continue
yield from bruteforce(i + 1, r, lcm(m, A[i] * 4))
def find_pseudoprime(r, m):
i = 0
while True:
p1 = m*i + r
ps = [p1] + [
k*(p1 - 1) + 1
for k in ks[1:]
]
if all(is_prime(p) for p in ps):
return ps
i += 1
# initial congruences to ensure m_i is integer
rs, ms = [], []
for i in range(1, len(ks)):
sol, = solve_mod(f(i).full_simplify(), ks[i])[0]
rs.append(int(sol)); ms.append(ks[i])
yield from bruteforce(0, crt(rs, ms), lcm(ms))
# USAGE
A = prime_range(2, 61+1)
for p in strong_pseudoprime(A, [1, 2081, 4177]):
# assert miller_rabin(product(p), A)
print(p)
If the code throws an exception, most likely the choices of $k_i$ are wrong. Ideally, for $h \geq 3$, we would iterate over each of the roots of $f_i$ as a possible congruence instead of simply choosing the first one. You can set the i
variable in the find_pseudoprime
function to a random big value like 1000000
if you require a bigger pseudoprime, which was needed to complete the CryptoHack CTF challenge.