RSA Explained (With Examples)

Motivation

RSA (Rivest-Shamir-Adleman) is one of the first public-key cryptosystems and is widely used for secure data transmission. In such a cryptosystem, the encryption key is public and is different from the decryption key which is kept secret.

If I wanted to comprehend zero knowledge proofs, then understanding the grand-daddy of public-key cryptosystems is a must.

Background Maths

Exponential Rules 1

gab=gagbga+b=gagb(ga)b=gab(ga mod p)b mod p=gab mod p

Division Theorem 2

The division theorem gives us a formal way to proof the equivalence relationship between the divident, divisor, quotient, and the remainder. i.e.

dividentquotientdivisor+remaindernkq+r

img

Example: 9/2=4 remainder 1

We can write the equation as 9=24+1.

Modulo Arithmetic

a=b mod n states that a and b both have the same remainder after division with n.

Modular arithemtic and the divisor theorem are closely related – say we have k=2,q=7,r=3, plugging those values into n=kq+r gives us n=17.

We can write that as 17=3 mod 7, or 17/7=2 remainder 3.

We can also go backwards:

17=3 mod 7

17=k7+3

More generally:

n=r mod qn=kq+r

Greatest Common Divisor (gcd)

The greatest common divisor between two numbers is the largest integer that will divide both numbers.

Example: gcd(3, 9) = 3.

If one of the numbers in the gcd is a prime number, then the gcd will always be 1.

Multiplicative Inverse

A multiplicative inverse for a number x, denoted by x1, is a number when multiplied by x yields the multiplicative identity 1.

xx1=1

In modulo arithmetic, only numbers whose gcd(x,n)=1 has a multiplicative inverse, i.e there exists

xx1=1 mod n, gcd(x,n)=1

Euler's Totient Function

In number theory, Euler's totient function counts the positive integers up to a given integer n that are relatively prime to n.

In other words, the totient function (often represented as (ϕ) for number n calculates the number of integers between 2 and n whose gcd is equal to 1.

More concretely in code:

def totient(n):
    total = 0
    for i in range(2, n):
        if gcd(i, n) == 1:
            total = total + 1
    return total

If n is a prime number, then ϕ(n)=n1.

One important thing to note is that multiplication in the totient function is associative:

ϕ(ab)=ϕ(a)ϕ(b)

This will come into use when we calculate ϕ(n) where n=pq.

Euler's Theorem 3

aϕ(n)1 mod n

The following formula above holds true if the gcd between n and a is 1 (a.k.a coprime).

RSA

Introduction

Say we have message M, with public key Pk, and secret key Sk, we can encrypt M with Pk as cipher C and decrypt C with Sk.

img

Algorithm

  1. Generate two randomly large prime numbers p, and q.
  2. Calculate n=pq.
  3. Calculate totient of n ϕ(n)=(p1)(q1).
  4. Generate public key e that satifies the two constraints:
    • 3<e<ϕ(n)
    • gcd(e,ϕ(n))=1
  5. Calculate the multiplicative inverse of e,d (this will be the private key) such that ed=1 mod ϕ(n)
  6. The generated public key is (e,n), and the generated private key is (d,n).
  7. Given message m, me mod n yields the encrypted message, and med mod n yields the decrypted message.
    • This is because ed=1 mod n
    • Therefore, med mod n=m1 mod n, giving us our original message

Why RSA Works

Calculating Modulus   n

Our modulus, n is calculated by multiplying the two prime numbers p and q. This step sort of acts like a one way function, easy to calculate n given p and q, but hard to compute p and q given n.

Whats scary to me is that computing the prime factors p and q is only considered a hard enough problem, meaning that if someone found out how to calculate p and q given n with polynomial complexity, all encryption as we know it (e.g. SSL) will break.

This is essential to RSA's security as given a composite number (n), it is considered a hard problem to determine the prime factors (p, q).

Totient Function   ϕ(n)

ϕ(n)=(p1)(q1) when p,q is prime. This is because the totient of a prime number p is simply p1, and that multiplication is associative in the totient function (10).

Public Key   e

Public Key e is a number that is randomly chosen between 3<e<ϕ(n), and has to satisfy gcd(e,ϕ(n))=1. We need gcd(e,ϕ(n))=1 in order for a multiplicative inverse (our secret key) to exist (9).

Secret Key   d

The secret key d is calculated using the formula: de=1 mod ϕ(n). This process can be calculated using the extended euclidean algorithm 4 given parameters e and ϕ(n)

Why RSA satifies proof of correctness even though the key generation is based on   mod ϕ(n) and not mod n

Given message m, public key e, and secret key d.

To encrypt a message we do: me mod n

And to decrypt said encrypted message, we do: med mod n
Our key generation uses the following formula: de=1 mod ϕ(n)
Which can be rewritten as (re: equation 8): de=kϕ(n)+1
Substituting equation 15 into equation 13 yields us: mkϕ(n)+1 mod n
Which can be rewritten as (re: exponential rules 1): (mϕ(n))km1 mod n
Rewriting using Euler's Theorem (aϕ(n)=1 mod n, equation 11) gives us: (1 mod n)km1 mod n1km1 mod n
And so, we have just proved that RSA satisfies the proof of correctness even though the key generation is based on mod ϕ(n) and not n med mod n=m1 mod n

Conclusion

You don't need to be a math wizard to understand RSA ;-)

Example

"""
2019-01-06 Kendrick Tan
RSA

Rivest–Shamir–Adleman (RSA) is a process that allows
two parties to exchange secret information within
each other over an insecure line (e.g. the internet)

Party A sends Party B it's public key.
Party B uses the public key to encrypt the message they want to send
Party A receives encrypted message, decrypts it using their private key
"""

def gcd(a, b):
    """
    Greatest Common Divisor
    """
    m = min(a, b)

    for i in range(m, 0, -1):
        if a % i == 0 and b % i == 0:
            return i

    return 1

def xgcd(a, b):
    """
    Extended Euclidean Distance

    return (g, x, y) such that a*x + b*y = g = gcd(x, y)
    """
    x0, x1, y0, y1 = 0, 1, 1, 0
    while a != 0:
        q, b, a = b // a, a, b % a
        y0, y1 = y1, y0 - q * y1
        x0, x1 = x1, x0 - q * x1
    return b, x0, y0

def encrypt(msg, e, n):
    return ''.join([chr(ord(c)**e % n) for c in msg])

def decrypt(msg, d, n):
    return ''.join([chr(ord(c)**d % n) for c in msg])

## 1. Choose two distinct prime numbers p and q
p = 23
q = 31

## 2. Calculate n = p*q
n = p*q

## 3. Calculate the totient: phi(n) = (p - 1)*(q - 1)
phi_n = (p - 1) * (q - 1)

## 4.1 Choose integer e such that 1 < e < phi_n
e = 7
assert 1 < e < phi_n

## 4.2 Assert greatest-common-divisor (gcd) between e and phi_n = 1
## i.e. e and phi_n share no factors other than 1
assert gcd(e, phi_n) == 1

## 5. Compure d to satisgy the congruence relation d * e = 1 mod phi_n
## i.e. de = 1 + k * phi_n

## goal is to find d such that e*d = 1 mod phi_n
## EED calculates x and y such that ax + by = gcd(a, b)
## Let a = e, b = phi_n, therefore:
## gcd(e, phi_n) = 1 
## is equal to
## e*x + phi_n*y = 1
## take mod phi_n
## (e*x + phi_ny*y) mod phi_n = 1 mod phi_n
## = e*x = 1 mod phi_n
_, d, _ = xgcd(e, phi_n)

assert (d * e % phi_n) == 1

## 6. Encrypt a message using the public key (e)
## c = m**e % n
orig_msg = 'hello world'
enc_msg = encrypt(orig_msg, e, n)
assert orig_msg != enc_msg

## 7. Decrypt number using the private key (d)
## m = c**e % n
dec_msg = decrypt(enc_msg, d, n)

assert orig_msg == dec_msg

print(f'original message: {orig_msg}')
print(f'encrypted message: {enc_msg}')
print(f'decrypted message: {dec_msg}')

"""
This works because we know that:

d*e = 1 mod phi_n
d*e = k*phi_n + 1

c = m**e mod n
m = c**d mod n (sub c)
  = (m**e mod n)**d mod n
  = m**(d * e) mod n
  = m**(k*phi_n + 1) mod n
  = (m**(phi_n)**k)*m**1 mod n # note: m^(phi_n) = 1 mod n
  = (1 mod n)**k * m**1 mod n
  = 1**k * m**1 mod n
  = m mod n

https://crypto.stackexchange.com/questions/1789/why-is-rsa-encryption-key-based-on-modulo-varphin-rather-than-modulo-n
"""

  1. Algebra Basics - Exponents In Depth ↩︎

  2. Division Theorem - Proofwiki ↩︎

  3. Euler's Theorem ↩︎

  4. Extended Euclidean Algorithm ↩︎