2008.7.27 Prime factor notation
Click here to show an explanation of this program. >
Over a year ago (specifically, early April 2007, as indicated by dated notes on the matter), I was ruminating on how every natural number has its own unique prime factorization. For instance,
60 = 22 × 3 × 5
and this is 60's only prime factorization, and it does not share it with any other number. I realized that every natural number can be recursively defined, based on its prime factorization, with three operations: multiplication, exponentiation, and priming, represented below by pn, where pn denotes the nth prime. Following are a few examples, starting at the beginning:
number | notation | comments |
---|---|---|
1 | 1 has no prime factors | |
2 | p | 2 = p1 |
3 | pp | 3 = p2 |
4 | pp | 4 = 22 |
5 | ppp | 5 = p3 |
6 | p×p | 6 = 2 × 3 = p1 × p2 = p1 × p(the 1st number after 1) |
7 | p(pp) | 7 = p4 |
8 | ppp | 8 = 23 |
9 | (pp)p | 9 = 32 |
10 | p×pp | 10 = 2 × 5 = p1 × p3 = p1 × p(the 2nd number after 1) |
11 | pppp | 11 = p5 |
12 | pp × p | 12 = 22 × 3 = p1p1 × p2 = p1p1 × p(the 1st number after 1) |
13 | p(p×p) | 13 = p6 |
14 | p×ppp | 14 = 2 × 7 = p1 × p4 = p1 × p(the 3rd number after 1) |
15 | pp×p | 15 = 3 × 5 = p2 × p3 = p2 × p(the 1st number after 2) |
Note how, for greater efficiency, when denoting pm×pn, I instead write pm×p(n − m). The lowest case of this is 6, as see above. Here, we know that p×p is not 2×2 = 4, because that's covered by exponentiation as pp. Similarly,
p×p×p = p(1) × p(1 + 1) × p((1 + 1) + 1) = p1 × p2 × p3 = 2 × 3 × 5 = 30
— it does NOT denote 2 × 2 × 2 = 8, because 8 = 23 → ppp.
Being one to seek out economy of means, I eventually figured out a way to convert this to a notation with only three characters, rather than the four of p, ×, (, and ), plus the superscripting for exponentiation, which essentially counts as a fifth. This is how it works:
- pn → -n
- mn → |m n
- m × n → | m n
This is an example of prefix notation (thanks Dan for apprising me of the concept), where because we know how many arguments each operation takes (priming takes one, exponentiation takes two, multiplication takes two if you use grouping such as p×p×p → (p×p)×p ), the need for parentheses is eliminated. Following are the first few examples. Note that we can tell multiplication and exponentiation apart simply by whether there's a space following the |.
number | notation |
---|---|
1 | |
2 | - |
3 | -- |
4 | |- - |
5 | --- |
6 | | - - |
7 | -|- - |
8 | |- -- |
9 | |-- - |
10 | | - -- |
11 | ---- |
12 | | |- - - |
13 | -| - - |
14 | | - --- |
15 | | -- - |
If that's confusing at first, consider the case of 12 = | |- - -. Starting from the left:
- We first see | with a space after it, which means multiplication, so we're looking for two arguments after it.
- The first argument starts with the next character we hit, which is |, but this | is not followed by a space, so it represents exponentiation, and it therefore has two arguments as well.
- The first one is -, and we know that this is the entire first argument because it's followed by a space, whereas if it were followed by another - or by |, it would be acting on that character.
- So the second argument for the exponentiation is the next -, and again this is the whole argument because it's followed by a space.
- We now have the two exponentiation arguments, so the exponentiation (which, remember, is the multiplication's first argument) is closed.
- That means that the next character, the last -, is the multiplication's second argument.
So: what we end up with is pp×p = 12.
This thought experiment lay fallow over the past year, and then recently I took another look at it. I ended up coming up with a notation system that uses five (or six) characters but is very compact. Here is how the original notation converts to the new notation:
- p(m)n + 1 → m.n
- pm × p(m + n) → m,n
- ppppn → p4n → n-4 (etc.)
After this, optionally, -- is converted to + from the righthand side. Note that, since the first argument of exponentiation is always a prime, and since all arguments of multiplication are always prime, this notation makes that priming implicit. Meanwhile, since an exponent is always 2 or greater, it is subtracted by 1 for display, again for greater efficiency. Finally, any repeating sequence of one or more "(" at the beginning is stripped away, likewise for any ")" sequence at the end, because this is unambiguous. Here are the first few examples:
number | notation | comments |
---|---|---|
1 | 1 has no prime factors | |
2 | - | 2 = p1 |
3 | + | = --; 3 = p21 |
4 | . | 4 = p11 + 1 |
5 | -+ | = ---; 5 = p31 |
6 | , | 6 = 2 × 3 = p1 × p2 = p1 × p(the 1st number after 1) |
7 | .)- | 7 = p4 |
8 | .- | 8 = 23 = p11 + p1 |
9 | -. | 9 = 32 = p(2)1 + 1 |
10 | ,- | 10 = 2 × 5 = p1 × p3 = p1 × p(the 2nd number after 1) |
11 | -(. | 11 = p41 |
12 | ., | 12 = 22 × 3 = p1p1 × p2 = p1p1 × p(the 1st number after 1) |
13 | ,)- | 13 = p6 |
14 | .+ | 14 = 2 × 7 = p1 × p4 = p1 × p(the 3rd number after 1) |
15 | -. | 15 = 3 × 5 = p2 × p3 = p(2) × p(the 1st number after 2) |
You can see from the above how much more compact this is than the prefix notation.
The program on this page, then, outputs the word in this notation for a number (or range of numbers) that you input. You can turn off the trimming of "(" and ")" sequences, as well as the conversion of -- to +, and you can also change the characters used for notation: if you want, you can use multi-character strings — see what happens, for instance, if you substitute fragments of natural language. It took a lot of learning about recursive programming, as well as a lot (a whole lot) of thought about the exact nature of this notation system and how to reflect it in data structures and values, to get this to work. Now, though, it's bug-free as far as I can tell (and I did a LOT of debugging while writing it).
The program also outputs a sequence of whole numbers that it has used to build the word(s). The gist of the sequence is as follows, where seq(m) means a sub-sequence that represents m and seq(n) one that represents n, and x+seq means that x is added componentwise to each member of seq.
- p(m)n + 1 → 2+seq(m) 1 2+seq(n)
- pm × p(m + n) → 2+seq(m) 0 2+seq(n)
- pan → 2a+seq(n)
Here are the first few sequences:
number | sequence |
---|---|
1 | 0 |
2 | 2 |
3 | 4 |
4 | 2 1 2 |
5 | 6 |
6 | 2 0 2 |
7 | 4 3 4 |
8 | 2 1 4 |
9 | 4 1 2 |
10 | 2 0 4 |
11 | 8 |
12 | 2 1 2 0 2 |
13 | 4 2 4 |
14 | 2 0 6 |
15 | 4 0 2 |
With luck, that gives you a general idea of how the numeric sequences work. For both the sequences and the words, the best way to understand them is to start with low numbers and see what happens as you go higher. Finally, if processing just a single number, the program also outputs a log of the recursive factorization process, which can probably help you to understand the word and sequence for that number more clearly.
That's about it. Go at it!