Generating Strong Pseudoprimes to Fixed Bases

Sep 12, 2024

The hardest challenge in the mathematics section of CryptoHack, entitled "Prime and Prejudice," essentially involves finding a strong pseudoprime to a set of fixed bases—with the aim of tricking the Miller–Rabin implementation to accept our composite number as prime. The title of the challenge is a reference to a 2018 paper exposing security flaws related to adversarial primality testing in cryptographic libraries.

In the appendix of that paper, they summarize Arnault's method of generating strong pseudoprimes to a set of fixed bases. It's quite involved, but becomes easier to understand once you cross reference the theory with the SageMath implementation. The basic idea is that if $n=p_1\cdots p_h$ is a product of distinct odd primes then it is a strong pseudoprime to a base $a$ if it satisfies these three requirements:

$\gcd(a,n)=1$.
$(a/p_i) = -1$ for all $1 \leq i\leq h$ where $(a/p)$ denotes the Legendre symbol.
Define, $$ k_i = frac{p_i-1}{p_1-1}, quad m_i = frac{prod_{jneq i} p_j - 1}{p_i-1}, $$ then the coefficients $k_i$ and $m_i$ must be integers for all $2 \leq i \leq h$ and $k_i$ must be odd.

We begin by choosing random odd primes as the coefficients $k_i$ then work backwards to derive $n=p_1\dotsc p_h$ that fulfills the requirements. By the definition of $k_i$ we know that $p_i = k_i(p_1-1) + 1$. Therefore, we need only find a value for $p_1$ that will fit the other requirements—the other $p_i$ can be easily calculated from $p_1$. Arnault's method essentially involves enumerating candidate values for $p_1$ until we find the one that works. Let $S_a \subset \mathbb{Z}_{4a}$ be such that, for any prime $p$,

$$(a/p) = -1 \iff p \bmod {4a} \in S_a.$$

The quadratic reciprocity law ensures that $(a/p)=(p/a)$ unless $a\equiv p\equiv3 \pmod{4}$ in which case $(a/p) = -(p/a)$. Hence $S_a$ is trivial to compute. $p_1 \bmod{4a}$ must be an element in $S_a$ because of requirement 2, so $S_a$ is the set of candidate values for $p_1$. By noticing that $p_i = k_i(p_1-1) + 1$ can be rewritten as $p_1 = k_i^{-1}(p_i - 1) + 1$ we can further narrow down the possible values for $p_1$ to a subset of $S_a$. Call this subset $S_a'$, then the number $p_1 \bmod{4a}$ must be an element of $S_a' \subset S_a$, where,

$$S_a' := \bigcap_{i=1}^h k_i^{-1}(S_a + k_i - 1)$$

and $k_i^{-1}(S_a + k_i - 1)$ denotes the set $\{k_i^{-1}(s+k_i-1) : s \in S_a\}$. To recap, $n=p_1 \dotsc p_h$ is (almost) a strong pseudoprime to base $a$ if its an element of $S_a'$. In general, for a set of bases $A=\{a_1, \dotsc, a_t\}$, by selecting a candidate value $z_a \in S_a'$ for each $a\in A$ we have a system of modular congruences $p_1 \equiv z_a \pmod {4a}$. I say almost, because in addition, we need to add the following congruences in the system to ensure that the coefficients $m_i$ are integers for $h=3$ (we discuss the general case where $h=n$ later):

$$\begin{aligned} p_1 &\equiv -k_3^{-1} \pmod{k_2} \\ p_1 &\equiv -k_2^{-1} \pmod{k_3} \end{aligned}$$

Now we have this system of congruences that $p_1$ must satisfy. We choose random candidates $z_a\in S_a'$ and try to solve the system with CRT—this succeeds only occasionally, however, because the moduli are not coprime. Arnault's method involves bruteforcing different values for $z_a$ until CRT works. As you might've noticed, Arnault's method is quite a fragile and complicated process that requires the correct selection of $k_i$ values.

I haven't done a stellar job explaining how Arnault's method works. I might've made mathematical errors, thus if you're a stickler for proofs and details go read Arnault's original 1995 paper that introduced this method. If you're confused—which is totally expected, I am confused reading what I just wrote—cross reference the following SageMath implementation with the above mathematics. It makes the method more concrete and makes more sense.

def S(a):
    return {
        p for p in range(3, 4*a + 1, 2)
        if kronecker_symbol(a, p) == -1
    }

# A is set of prime bases to be strong pseudoprime against
def strong_pseudoprime(A, ks):
    assert len(ks) == 3 # we remove this restriction later

    # subsets is dict that maps bases a to subsets of residues mod 4a
    subsets = {}
    for a in A:
        subsets[a] = list(set.intersection(*(
            {(pow(k, -1, 4*a) * (s + k - 1)) % (4*a) for s in S(a)}
            for k in ks
        )))

    # bruteforce candidate values for p_1 until CRT works
    r, m = None, None
    while True:
        # initial conditions to ensure m_i are integers
        rs = [
            pow(-ks[2], -1, ks[1]),
            pow(-ks[1], -1, ks[2])
        ] # residues
        ms = [ks[1], ks[2]] # moduli

        # choose random candidate value
        for a, s in subsets.items():
            rs.append(random.choice(s))
            ms.append(a * 4)

        # does CRT work?
        try:
            r = crt(rs, ms)
            m = lcm(ms)
        except ValueError:
            continue
        break

    # found a value for p_1! Now, find p_2, ..., p_h
    i = 0
    while True:
        p1 = m*i + r
        ps = [p1] + [
            k*(p1 - 1) + 1
            for k in ks[1:]
        ]
        if all(is_prime(p) for p in ps):
            return product(ps)
        i += 1

The code above is quite naive and only meant to illustrate the concept. It has a major limitation: $h$ must be equal to 3. I've shown the specific congruences needed for $h=3$, but haven't discussed the general case when $h=n$ yet. Let me explain that now. Arnault, in Lemma 2.2 and Remark 2.3, points out that $m_i$ is an integer if $p_1$ is a root of $f_i$ modulo $k_i$, where,

$$f_i(x) = \frac{(\prod_{j\neq i} (k_j(x-1) + 1)) - 1}{x-1}.$$

For example, when $h=3$ we have $f_2(x) = k_3x+1$. The congruence corresponds to

$$\begin{aligned} f_2(p_1) &\equiv 0 \pmod{k_2} \\ k_3p_1+1 &\equiv 0 \pmod{k_2} \\ p_1 &\equiv -k_3^{-1} \pmod{k_2} \end{aligned}$$

which is exactly the same congruence I showed you earlier for when $h=3$. Instead of hard coding these congruences to ensure $m_i$ are integers, we can model the function $f_i$ in SageMath and basically solve for $x$ on the fly to derive the congruences. Be warned—I'm not entirely confident that I've implemented this procedure correctly. I've only tested the code with $h=3$ for the most part.

A major speedup can be achieved by systematically working through each candidate value $z_a$, and backtracking as soon as the CRT fails. The following implementation finds all possible valid candidates from selected values of $k_i$ and returns them as a generator.

def S(a):
    return {
        p for p in range(3, 4*a + 1, 2)
        if kronecker_symbol(a, p) == -1
    }

def strong_pseudoprime(A, ks):
    x = var('x')
    def f(i):
        p = product(ks[j]*(x-1)+1 for j in range(len(ks)) if j != i)
        return (p - 1) / (x - 1)

    # subsets is dict that maps bases a_i to subsets of residues mod 4a
    subsets = {}
    for a in A:
        subsets[a] = list(set.intersection(*(
            {(pow(k, -1, 4*a) * (s + k - 1)) % (4*a) for s in S(a)}
            for k in ks
        )))

    indices = [0] * len(subsets)
    def bruteforce(i, r, m):
        if i == len(indices):
            yield find_pseudoprime(r, m)
            return

        for residue in subsets[A[i]]:
            try:
                r = crt([r, residue], [m, A[i] * 4])
            except ValueError:
                continue
            yield from bruteforce(i + 1, r, lcm(m, A[i] * 4))

    def find_pseudoprime(r, m):
        i = 0
        while True:
            p1 = m*i + r
            ps = [p1] + [
                k*(p1 - 1) + 1
                for k in ks[1:]
            ]
            if all(is_prime(p) for p in ps):
                return ps
            i += 1

    # initial congruences to ensure m_i is integer
    rs, ms = [], []
    for i in range(1, len(ks)):
        sol, = solve_mod(f(i).full_simplify(), ks[i])[0]
        rs.append(int(sol)); ms.append(ks[i])

    yield from bruteforce(0, crt(rs, ms), lcm(ms))

# USAGE
A = prime_range(2, 61+1)
for p in strong_pseudoprime(A, [1, 2081, 4177]):
    # assert miller_rabin(product(p), A)
    print(p)

If the code throws an exception, most likely the choices of $k_i$ are wrong. Ideally, for $h \geq 3$, we would iterate over each of the roots of $f_i$ as a possible congruence instead of simply choosing the first one. You can set the i variable in the find_pseudoprime function to a random big value like 1000000 if you require a bigger pseudoprime, which was needed to complete the CryptoHack CTF challenge.