Number Theory#
Contents#
Arithmetic#
The bit complexity of numbers#
We focus on bit complexity (of algorithms), the number of elementary operations on individual bits, because it reflects the amount of hardware, transistors and wires, necessary for implementing an algorithm.
It is a basic property of numbers that for any base \(b \ge 2\) the sum of any three single-digit numbers is at most two digits long.
\( \begin{aligned} b &= 2 && & 1 + 1 + 1 &= 11 &&= 1 \times 2^1 + 1 \times 2^0 \\ b &= 3 && & 2 + 2 + 2 &= 20 &&= 2 \times 3^1 + 0 \times 3^0 \\ b &= 10 && & 9 + 9 + 9 &= 27 &&= 2 \times 10^1 + 7 \times 10^0 \\ b &= 16 && & F + F + F &= 2D &&= 2 \times 16^1 + 13 \times 16^0 \\ b &= 60 && & \lambda_{59} + \lambda_{59} + \lambda_{59} &= 2\lambda_{57} &&= 2 \times 60^1 + 57 \times 60^0 \\ \end {aligned} \)
How many digits are needed to represent the number \(N \ge 0\) in base \(b\)? \(\lceil \log_b (N + 1) \rceil\) digits are needed to represent \(N\) in base \(b\).
With \(k \ge 1\) digits in base \(b\) we can express numbers up to \(b^k - 1\).
base \(2\)
- with \(1\) digits in base \(2\) we can express numbers up to \(2^1 - 1 = 1\) 
- with \(2\) digits in base \(2\) we can express numbers up to \(2^2 - 1 = 3\) 
- with \(3\) digits in base \(2\) we can express numbers up to \(2^3 - 1 = 7\) 
- with \(4\) digits in base \(2\) we can express numbers up to \(2^4 - 1 = 15\) 
base \(10\)
- with \(1\) digits in base \(10\) we can express numbers up to \(10^1 - 1 = 9\) 
- with \(2\) digits in base \(10\) we can express numbers up to \(10^2 - 1 = 99\) 
- with \(3\) digits in base \(10\) we can express numbers up to \(10^3 - 1 = 999\) 
- with \(4\) digits in base \(10\) we can express numbers up to \(10^4 - 1 = 9999\) 
\( \begin{aligned} b^k - 1 &= N \\ b^k &= N + 1 \\ \log_b b^k &= \lceil \log_b (N + 1) \rceil \\ k &= \lceil \log_b (N + 1) \rceil \\ \end {aligned} \)
Base \(2\)
\( \begin{aligned} \lceil \log_2 (\textcolor{green}{ 0} + 1) \rceil &= 1 \text{ b} \\ \lceil \log_2 (\textcolor{green}{ 1} + 1) \rceil &= 1 \text{ b} \\ \lceil \log_2 (\textcolor{green}{ 2} + 1) \rceil &= 2 \text{ b} \\ \lceil \log_2 (\textcolor{green}{ 3} + 1) \rceil &= 2 \text{ b} \\ \lceil \log_2 (\textcolor{green}{ 4} + 1) \rceil &= 3 \text{ b} \\ &\vdots \\ \lceil \log_2 (\textcolor{green}{ 7} + 1) \rceil &= 3 \text{ b} \\ \lceil \log_2 (\textcolor{green}{ 8} + 1) \rceil &= 4 \text{ b} \\ &\vdots \\ \lceil \log_2 (\textcolor{green}{15} + 1) \rceil &= 4 \text{ b} \\ \lceil \log_2 (\textcolor{green}{16} + 1) \rceil &= 5 \text{ b} \\ &\vdots \\ \lceil \log_2 (\textcolor{green}{31} + 1) \rceil &= 5 \text{ b} \\ \lceil \log_2 (\textcolor{green}{32} + 1) \rceil &= 6 \text{ b} \\ &\vdots \\ \lceil \log_2 (\textcolor{green}{63} + 1) \rceil &= 6 \text{ b} \\ \lceil \log_2 (\textcolor{green}{64} + 1) \rceil &= 7 \text{ b} \\ \end {aligned} \)
How much does the size of a number change when its base changes?
The rule for converting logarithms from base \(a\) to base \(b\):
\(\log_b N = \frac{1}{\log_a b} \log_a N \iff \log_a N = \log_b N \times \log_a b\)
The size of integer \(N\) in base \(a\) is the same as its size in base \(b\) times a constant factor \(\log_a b\). Therefore, in big-\(O\) notation the base is irrelevant and the size is written simply as \(O(\log N)\)
\(\log N\) is the power to which \(2\) must be raised to obtain \(N\).
\(\lceil \log N \rceil\) is the number of times \(N\) must be halved to obtain unity. This will be useful when a number is halved at each iteration of an algorithm.
\(\lceil \log (N + 1) \rceil\) is the number of bits in the binary representation of \(N\).
\(\lfloor \log N \rfloor\) is the depth of a complete binary tree with \(N\) nodes.
\(\log N = 1 + 1/2 + 1/3 + \dotsb + 1/N\) to within a constant factor.
Addition#
The rule that the sum of any three single-digit numbers is at most two digits allows us to add two numbers in any base in the following way: align their right-hand ends and then perform a single right-to-left pass in which the sum is computed digit by digit, maintaining the overflow as a carry. Since we know the individual sum is a two-digit number, the carry is always a single digit, and so at any given step, three single-digit numbers are added.
Carry  1     1 1 1
         1 1 0 1 0 1 (53)
         1 0 0 0 1 1 (35)
       -------------
       1 0 1 1 0 0 0 (88)
Complexity
Suppose both \(x\) and \(y\) are \(n\) bits long. Then \(x + y\) is at most \(n + 1\) bits long and each bit of this sum is computed in a fixed amount of time.
\(T(n) = c_1n + c_0 = O(n)\)
Is there a faster algorithm? In order to add two \(n\)-bit numbers they must at least be read and their sum written, which require n operations. Therefore, the addition algorithm is optimal up to multiplicative constants.
It is true that in a single processor instruction integers can be added whose size in bits is within the word length of today’s computers–\(32\) perhaps. But it is often useful and necessary to handle numbers much larger than this, perhaps several thousand bits long. Adding and multiplying such large numbers on real computers is very much like performing the operations bit by bit.
Implementation
#include <stdio.h>
int main (void) {
  long b1;
  long b2;
  int i = 0;
  int r = 0; // remainder
  int sum[20];
  printf("Enter the first  binary number: "); scanf("%ld", &b1);
  printf("Enter the second binary number: "); scanf("%ld", &b2);
  while (b1 != 0 || b2 != 0) {
    sum[i++] = (b1 % 10 + b2 % 10 + r) % 2;
    r        = (b1 % 10 + b2 % 10 + r) / 2;
    b1 /= 10;
    b2 /= 10;
  }
  if (r != 0)
    sum[i++] = r;
  --i;
  printf("Sum: ");
  while (i >= 0)
    printf("%d", sum[i--]);
  printf("\n");
  return 0;
}
Multiplication#
Grade-School Binary Multiplication Algorithm#
Multiply two numbers \(x\) and \(y\). Create an array of intermediate sums each representing the product of \(x\) by a single digit of \(y\). These values are appropriately left-shifted and then added up.
\(13 \times 11\) or \(x = 1101\) and \(y = 1011\)
        1 1 0 1 (13, multiplicand)
      x 1 0 1 1 (11, multiplier)
      ---------
        1 1 0 1 (1101 times 1)                  13 x 2^0 =  13
      1 1 0 1   (1101 times 1, shifted once)    13 x 2^1 =  26
    0 0 0 0     (1101 times 0, shifted twice)    0 x 2^2 =   0
+ 1 1 0 1       (1101 times 1, shifted thrice)  13 x 2^3 = 104
---------------
1 0 0 0 1 1 1 1 (binary 143)
In binary each intermediate row is either zero or \(x\) itself, left-shifted an appropriate amount of times. Left-shifting multiplies by the base and right-shifting divides by the base, rounding down if necessary.
Correctness
Complexity
If \(x\) and \(y\) are both \(n\) bits then there are \(n\) intermediate rows with lengths of up to \(2n\) bits, accounting for the shifts. The total time taken to add up these rows, two numbers at a time, is
\(\underbrace{O(n) + O(n) + \dotsb + O(n)}_{n - 1 \text{ times}} = O(n^2)\)
Al Khwarizmi’s multiplication by repeated halvings of the multiplier#
Multiply two decimal numbers \(x\) and \(y\). Write them next to each other and then repeat the following: divide the first number by \(2\), rounding down the result, and double the second number; continue until the first number reaches unity and then strike out all the rows in which the first number is even and add up whatever remains in the second column.
                      parity of multiplier 11
11    13              11 = 5 x 2 +  1
 5    26               5 = 2 x 2 +  1
 2    52 (strike out)  2 = 1 x 2 +  0
 1   104               1 = 0 x 2 +  1
--------
     143
Multiplication à la Français#
This algorithm implements the following recursive rule.
\( x \cdot y = \begin{cases} 2(x \cdot \lfloor y/2 \rfloor) & \text{if \textit{y} is even} \\ x + 2(x \cdot \lfloor y/2 \rfloor) & \text{if \textit{y} is odd} \\ \end {cases} \)
 INPUT    two n-bit integers x and y, where y >= 0
OUTPUT    their product
MULT (x, y)
1    if y = 0
2        return 0
3
4    z = MULT(x, floor(y/2))
5
6    if even(y)
7        return 2z
8    else
9        return 2z + x
EXAMPLE
algorithm decomposition
\(x \cdot 25 = x \cdot 16 + x \cdot 8 + x \cdot 1\)
For multiplication the terms \(x \cdot 2^i\) come from repeated doubling
Correctness
The recursive rule is transparently correct. Checking that the algorithm is correct is a matter of verifying that it mimics the rule and that it handles the base case \(y = 0\) properly.
Complexity
The algorithm terminates after \(n\) recursive calls because at each call \(y\) is halved (i.e., its number of bits is decreased by one). Each recursive call requires the following operations:
- a division by \(2\) (a right shift) 
- a test for odd/even (looking up the last bit) 
- a multiplication by \(2\) (a left shift) 
- possibly, one addition 
a total of \(O(n)\) bit operations. The total running time is
\(T(n) = O(n^2)\)
Can we do better? Intuitively, it seems that multiplication requires adding about \(n\) multiples of one of the inputs, and we know that each addition is linear, so it would appear that \(n^2\) bit operations are inevitable. But we will see that we can de better.
Division#
To divide an integer \(x\) by an integer \(y \ne 0\) means to find a quotient \(q\) and a remainder \(r\), where \(x = yq + r\) and \(r \lt y\).
 INPUT    two n-bit integers x and y, where y >= 1
OUTPUT    the quotient and the remainder of x divided by y
DIVIDE (x, y)
 1    if x = 0
 2        return (q, r) = (0, 0)
 3
 4    (q, r) = DIVIDE(floor(x/2), y)
 5    q = 2q
 6    r = 2r
 7    if odd(x)
 8        r = r + 1
 9    if r >= y
10        r = r - y
11        q = q + 1
12    return (q, r)
Complexity Like multiplication, division takes quadratic time.
\(T(n) = O(n^2)\)
\( \begin{aligned} 2^{32} - 1 &= 4294967295 &&= 3 \times 5 \times 17 \times 257 \times 65537 \\ 2^{53} - 1 &= 9007199254740991 &&= 6361 \times 69431 \times 20394401 \\ 2^{64} - 1 &= 18446744073709551615 &&= 3 \times 5 \times 17 \times 257 \times 641 \times 65537 \times 6700417 \\ \end {aligned} \)
What’s the largest value that \(n\) bits can hold?
print(f'                 largest value')
print(f' 16-bit numbers {2** 16 - 1:55} = 2^ 16 - 1')
print(f' 20-bit numbers {2** 20 - 1:55} = 2^ 20 - 1')
print(f' 32-bit numbers {2** 32 - 1:55} = 2^ 32 - 1')
print(f' 40-bit numbers {2** 40 - 1:55} = 2^ 40 - 1')
print(f' 52-bit numbers {2** 52 - 1:55} = 2^ 52 - 1')
print(f' 53-bit numbers {2** 53 - 1:55} = 2^ 53 - 1')
print(f' 64-bit numbers {2** 64 - 1:55} = 2^ 64 - 1')
print(f'128-bit numbers {2**128 - 1:55} = 2^128 - 1')
                 largest value
 16-bit numbers                                                   65535 = 2^ 16 - 1
 20-bit numbers                                                 1048575 = 2^ 20 - 1
 32-bit numbers                                              4294967295 = 2^ 32 - 1
 40-bit numbers                                           1099511627775 = 2^ 40 - 1
 52-bit numbers                                        4503599627370495 = 2^ 52 - 1
 53-bit numbers                                        9007199254740991 = 2^ 53 - 1
 64-bit numbers                                    18446744073709551615 = 2^ 64 - 1
128-bit numbers                 340282366920938463463374607431768211455 = 2^128 - 1
Theorem
If \(p\) is prime then \(2^{p - 1}\) leaves the remainder \(1\) on division by \(p\).
Example
\(2^{1000}\) leaves the remainder \(562\) on division by \(1001\) so \(1001\) is composite.
\(1001 = 7 \times 11 \times 13\)
\(2^{k^k}\) can be computed by successive squaring
\( \begin{aligned} 1000 &= 2^3 + 2^5 + 2^6 + 2^7 + 2^8 + 2^9 \\ 2^{1000} &= 2^{2^3} \times 2^{2^5} \times 2^{2^6} \times 2^{2^7} \times 2^{2^8} \times 2^{2^9} \\ 2^{2^3} &= 256 \\ 2^{2^4} &= 256^2 = 65536 \equiv 471 \mod 1001 \\ 2^{2^5} &= 471^2 = 221841 \equiv 620 \mod 1001 \\ 2^{2^3} \times 2^{2^5} &\equiv 256 \times 620 = 158720 \equiv 562 \mod 1001 \\ 2^{2^6} &\equiv 620^2 = 384400 \equiv 16 \mod 1001 \\ 2^{2^3} \times 2^{2^5} \times 2^{2^6} &\equiv 562 \times 16 = 8992 \equiv 984 \mod 1001 \\ 2^{2^7} &\equiv 16^2 \equiv 256 \mod 1001 \\ 2^{2^3} \times 2^{2^5} \times 2^{2^6} \times 2^{2^7} &\equiv 984 \times 256 = 251904 \equiv 653 \mod 1001 \\ 2^{2^8} &\equiv 471 \mod 1001 \\ 2^{2^3} \times 2^{2^5} \times 2^{2^6} \times 2^{2^7} \times 2^{2^8} &\equiv 653 \times 471 = 307563 \equiv 256 \\ 2^{2^9} &\equiv 620 \\ 2^{1000} &= 2^{2^3} \times 2^{2^5} \times 2^{2^6} \times 2^{2^7} \times 2^{2^8} \times 2^{2^9} \equiv 620 \times 256 = 167168 \equiv 562 \\ \end {aligned} \)
Any programming language which can do double precision can compute \(2^{p - 1} \mod p\) in linear time.
Induction
Inductive definition of \(\mathbb{N}\): \(1 \in \mathbb{N}\) is the least member of \(\mathbb{N}\). Given any \(n \in \mathbb{N}\) the next member is \(n + 1\).