Cryptography and the RSA Cryptosystem
Published June 03, 2021
Goals
There are two goals of this essay:
- Communicate the RSA cryptosystem at a level accessible to any software engineer, security professional, or undergraduate math/cs student.
- Be self-contained. Every assertion, mathematical or otherwise, is either proven within the essay itself or is taken to be trivial. In some special cases, I have linked to external sources; however, this will likely change in the future.
Fun fact: Ron Rivest (the "R" in RSA) has actually read this blog post, and he approves! 😎 Thanks, Ron!
I have yet to find online educational materials on RSA that accomplish both of these goals simultaneously. For software engineers and security professionals especially, I believe it is empowering to trace RSA from it's implementation down to the underlying mathematics. It's one thing to trust smart people to have made a robust cryposystem; it's another to verify it's robustness for yourself!
We'll start with basic modular arithmetic, then create a small library of useful tools, and eventually build up to a fully functional python program that mimics the RSA public/private key generation capabilities of utilities like GNU's openssl and gpg.
Intro to RSA
RSA is an asymmetric cryptosystem, which offers some key benefits over a symmetric cryptosystem like Diffie-Hellman. Perhaps the most notable advantage is that distinct secret keys need not be generated for every new recipient of your cryptographic messages. RSA essentially has you create a single "lock" (with one associated key) with which others can encrypt their message to you. Luckily, the concepts here are a superset of the concepts in Diffie-Hellman, so if you understand RSA, you get Diffie-Hellman for free.
At its very core, the cryptosystem is thus:
With message , public exponent (typically 65537), private key , and , where and are large primes, messages are encrypted and decrypted via the following protocol.
- The encrypted message is obtained via .
- The private key is generated by choosing such that .
- The public key is the combination of and . (In actual implementations of RSA, the public key may contain other numbers that speed up decryption; i.e. precomputation during key generation removes some computational burden at the time of decryption. This is not strictly neccessary, however, and we do not include this step in our implementation).
The message is decrypted by raising the encrypted message to the power of , since
We'll prove that this holds mathematically, and then find a way to generate large primes to implement our own version of RSA.
Proof
We want to show that message == decrypt(encrypt(message)); that is, . By definition, , so
We also chose such that , so
Substituting,
There are two cases we need to prove.
- Case 1:
- Case 2:
Case 1
Case 1 follows directly from Euler's formula, which states that .
Case 2
Case 2 is more involved, which is ironic considering that the only time we'd face it is in the case that is exactly or . Since is taken to be strictly less than , if , then or . We'll prove this case for , and by symmetry it will be proven for . First, we claim that , and then we claim .
Claim 1:
so by modular transitivity.
Claim 2:
Since , . Therefore, Euler's theorem tells us that
From (2),
So, , since
Finally, we turn to the Constant-Case Chinese Remainder Theorem, which states that given coprime and ,
Fortunately, the Constant-Case Chinese Remainder Theorem can be proven in 1 line (below proof is the forward direction; the reverse direction is trivial). Following from Euclid's Lemma,
So, since and (claim 1 and 2),
Pivoting to Python
One of the tools we'll neeed to implement this cryptosystem is a fast
method for computing modular exponentiation with very large numbers.
Luckily, python handles large numbers natively, and provides a
little-known 3rd argument option to the builtin pow()
function which
is the modulo. Still, I wrote my own function to do this, which I rather
like. I won't step through the code in this essay, but it can be rather
rewarding to reverse engineer the math buried therein.
def large_mod(a, x, p):
pwr2 = {1:a%p}
total = 1 if not 1 & x else pwr2[1]
i = 2
while i <= x:
pwr2[i] = pow(pwr2[i>>1],2) % p
if i & x:
total = (total * pwr2[i]) % p
i <<= 1
return total % p
To start, we create some helper functions such as the one below to help print out large numbers. (I've chosen to use ascii escape sequences for a colored output).
def bign_print(x, name):
spaces = " " * (18 - len(name))
print(f"{name}:{spaces}{acol.YEL}" + str(x)[:18] + f"...{acol.END} ({len(str(x))} digits)")
Next, we'll write the function for generating a public + private key
pair. I've chosen to write them out to files here, using PEM style
header and footer lines, "--- BEGIN PUBLIC KEY ---"
. We could choose
any number of encodings for the keys; here I simply convert their base
10 representations into utf-8 strings separated by a "." in the case
of the public key. The PEM format base64 encodes the numbers.
def gen_keys():
# public key
p, q = gen_large_primes(min_bitlen=1024)
n = p * q
phin = (p - 1) * (q - 1)
e = 65537
if (math.gcd(e, phin) != 1):
raise Exception("e and phi(n) are not coprime")
with open("./publickey.pem", "w") as f:
f.write("--- BEGIN PUBLIC KEY ---\n")
f.write(str(e) + "." + str(n))
f.write("\n--- END PUBLIC KEY ---\n")
# private key
for k in range(1, e):
d = (k * phin + 1) // e
if (d * e) % phin == 1:
break
else:
raise Exception("Could not find k s.t. e*d ≡ 1 (mod phi(n))")
with open("./privatekey.pem", "w") as f:
f.write("--- BEGIN PRIVATE KEY ---\n")
f.write(str(d))
f.write("\n--- END PRIVATE KEY ---\n")
Thanks to the mathematics we've already proven, encryption and decryption are trivial. Here, I read the secret message from a file.
def encrypt_message(e,n):
with open("./msg.txt", "r") as f:
msg = f.read()
msg_encoded = int(msg.encode("utf-8").hex(), 16)
# c is the encrypted message
c = pow(msg_encoded,e,n)
with open("./msg.asc", "w") as f:
f.write("--- BEGIN MESSAGE ---\n")
f.write(str(c))
f.write("\n--- END MESSAGE ---\n")
def decrypt_message(d,n):
with open("./msg.asc", "r") as f:
c = f.read().replace("--- BEGIN MESSAGE ---","") \
.replace("--- END MESSAGE ---","") \
.replace("\n", "")
msg_encoded = hex(pow(int(c),d,n))
msg = bytes.fromhex(msg_encoded[2:]).decode("utf-8")
with open("./msg_decrypted.txt", "w") as f:
f.write(msg)
You may have noticed that there is still one piece of the puzzle
missing, which is the mysterious gen_large_primes()
function. All it
needs to do is generate some large prime numbers (preferrably > 1024
bits in length); however, deterministic primality tests are slow,
especially for numbers at the scale we're using. Luckily, the
Rabin-Miller test can get us probablistically very close, and fast.
Ironically, the integrity of RSA depends entirely on the computational
difficulty of factoring large numbers, but the Rabin-Miller test takes a
different approach.
Below is my implementation, pulled from the pseudocode on this wikipedia page.
def rabin_miller(n, s):
# n is an odd number greater than 3
# i.e. n - 1 = d * 2^r with d odd
if n % 2 == 0:
return True
r, d = 0, n - 1
while d % 2 == 0:
r += 1
d //= 2
for _ in range(s):
a = random.randint(2, n - 2)
x = pow(a, d, n)
if x == 1 or x == n - 1:
continue
for _ in range(r - 1):
x = pow(x, 2, n)
if x == n - 1:
break
else:
return False
return True
In this algorithm, we are aiming to prove the primalty of using security parameter (the number of rounds to perform the test). According to this Stack Overflow post, the optimal value for is 40. Choosing values greater than 40 would achieve greater certainty than is possible for computers due to naturally occurring computation errors.
The test hinges on the following theorem:
Put , with odd. If and for all , for some , then is composite.
We can prove the contrapositive relatively easily.
Proof
Let be an odd prime, and have , with odd. Consider the set
with . The last term is , since , and Fermat's Little theorem states that . Since each term in the series is the square of the previous term, we must consider 2 cases: and .
Case 1
If , then all terms are , and we are done, since .
Case 2
If , then at least some term in the series is not congruent to . Let's look at the solutions to the equation
There are two solutions: and . This means that if there is a member of the list not congruent to , has to show up somewhere, since no other number will square to be . This completes the proof.
Now we can write our gen_large_primes()
function.
def gen_large_primes(min_bitlen):
p = random.getrandbits(random.randint(min_bitlen, min_bitlen*2))
q = random.getrandbits(random.randint(min_bitlen, min_bitlen*2))
p = p + 1 if p % 2 == 0 else p
q = q + 1 if q % 2 == 0 else q
while not rabin_miller(p, 40):
p = p + 2
while not rabin_miller(q, 40):
q = q + 2
And we're done! You can find this code along with an "Interactive Mode" that steps through the algorithm on my github page.
TODO: The following proofs are missing from this essay but should be added to make it truly self-contained
- Euler's Theorem
- Modular Transitivity
- Euclid's Lemma
- Rabin-Miller Primality Test false positive rate
- Optimal Security Parameter for R-M
Note: this essay is still a work in progress. Have any feedback? Feel free to email me at brendan.schlaman@gmail.com.