| CAN | YOU | KE E P | A | SE C R E T ? |
| nxz | bty | cllw | x | flnolu? |
Julius Caesar used to protect his writing from unauthorized readers by replacing each letter by the letter that occurs three positions later. So every A would be replaced by a D, every B by an E, and so on. Here is a complete table of Caesar's cipher. We'll use lowercase letters for cipher and uppercase for plain text.
| |||||||||||||
|
|
|
| |||||||||||||
| |||||||||||||||||||||||
| |||||||||||||
| ||||||||||||||||||
| |||||||||||||
| |||||||||||||
Problems
Perhaps the most famous literary cipher appears in Edgar Allan Poe's short story, The Gold-Bug. The message was on a piece of parchment and was supposed to reveal the location of Captain Kidd's treasure. ''The following characters were rudely traced, in a red tint, between the death's head and the goat:
53f f f 305))6*;4826)4f .)4f );806*;48
f 8¶ 60))85;1f (;:f *8f 83(88)5*f ;46(;88*
96*?;8)*f (;485);5*f 2:*f (;4956*2(5*
-4)8¶ 8*;4069285);)6f 8)4f f ;1(f 9;48
081;8:8f 1;48f 85;4)485f 528806*81(f
9;48;(88;4(f ?34;48)4f ;161;:188;f ?;
What does the message say? The decipherer in the story, William Legrand, first applies a letter-frequency approach. He counts how many times each letter appears in the cipher.
| Letter | 8 | ; | 4 | * | 5 | 6 | ( | f 1 | 0 | 9 2 | : 3 | ? | ¶ | - . | |
| Count | 33 | 26 | 19 | 16 | 13 | 12 | 11 | 10 | 8 | 6 | 5 | 4 | 3 | 2 | 1 |
Legrand goes on, ''Now, in English, the letter which most frequently occurs is e. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z.'' So Legrand makes the assumption that 8 stands for e. He then looks for the, the most common word in the English language, and notes that the combination ;48 occurs seven times. So he assumes that ; stands for t and 4 for h.
He then looks at the sequence ;48;(88;4 that appears near the end. We know
all but one letter: thet(eeth. Legrand tries all possible letters for the
'(' and concludes that the 'th' at the end must start another word, and that
'(' stands for r. Concentrating on this part of the code, he sees
that ;48;(88;4(f ?34;48 is 'the tree thrf ?3h the' and says that it is evident that the word 'thr¼h'
is 'through'. So f ?3 represents oug.
Then he spots the 83(88 on the second line. This decodes to 'egree', which
he says is plainly the conclusion of the word 'degree', so f represents d. A little further on he sees ;46(;88*, that is,
th6rtee*, which suggests to him 'thirteen'. So i and n are
represented by 6 and *. Right at the beginning he sees 53f f f , that is, 5good. He concludes that the
first letter is A, so the message starts: 'A good.' He then compiles a small
table of what he now knows:
| Cipher | 5 | f | 8 | 3 | 4 | 6 | * | ( | ; | ? | |
| Plain | a | d | e | g | h | i | n | o | r | t | u |
He doesn't tell us how he finished, but the partially deciphered
code now looks like
agoodg0a))inthe2i)ho.)ho)te0inthe
de¶ i0))eat1ort:onedegree)andthirteen
9inute)northea)tand2:north9ain2ran
-h)e¶ enth0i92ea)t)ide)hoot1ro9the
0e1te:eo1thedeath)heada2ee0ine1ro
9thetreethroughthe)hot1i1t:1eetout
We will let you finish the deciphering.
Let's look now at a cipher with word divisions:
okr ifdk mfd iygok joi mfom mfd mzdd ioj ayyr
vyz vyyr okr mfom wm ioj exdojokm my mfd dcdj
okr o mzdd my bd rdjwzdr my gohd ykd iwjd jfd
myyh yv mfd vzuwm mfdzdyv okr rwr dom
The letter counts are
| d | m | o | y | r | f | k | j | zi | vw | hg | baxeuc |
| 22 | 19 | 15 | 14 | 10 | 9 | 8 | 8 | 6 | 5 | 2 | 1 |
We put in our guesses in capital letters. So, if we suppose that d is E, we get
okr ifEk mfE iygok joi mfom mfE mzEE ioj ayyr
vyz vyyr okr mfom wm ioj exEojokm my mfE EcEj
okr o mzEE my bE rEjwzEr my gohE ykE iwjE jfE
myyh yv mfE vzuwm mfEzEyv okr rwr Eom
That gives two possibilities for 'THE', either 'mfE' or 'ykE'. As the former occurs more times, and 'mf' itself appears several more times, we go with the guess that 'mfE' is 'THE'.
okr iHEk THE iygok joi THoT THE TzEE ioj ayyr
vyz vyyr okr THoT wT ioj exEojokT Ty THE EcEj
okr o TzEE Ty bE rEjwzEr Ty gohE ykE iwjE jHE
Tyyh yv THE vzuwT THEzEyv okr rwr EoT
The only thing that makes sense in 'THoT' is for the 'o' to be an 'A'. And for 'TzEE', the only thing that works for 'z' is 'R' (except possibly for 'H', but that has been used already). The word 'Ty' has to be 'TO', so 'y' is 'O'.
Akr iHEk THE iOgAk jAi THAT THE TREE iAj aOOr
vOR vOOr Akr THAT wT iAj exEAjAkT TO THE EcEj
Akr A TREE TO bE rEjwREr TO gAhE OkE iwjE jHE
TOOh Ov THE vRuwT THEREOv Akr rwr EAT
What is the word 'jHE'? Aside from 'T', which is already taken, the only thing that works for 'j' is 'S'. The word 'wT' can't be 'AT' because 'A' is already taken, so it must be 'IT'.
Akr iHEk THE iOgAk SAi THAT THE TREE iAS aOOr
vOR vOOr Akr THAT IT iAS exEASAkT TO THE EcES
Akr A TREE TO bE rESIREr TO gAhE OkE iISE SHE
TOOh Ov THE vRuIT THEREOv Akr rIr EAT
Notice the four occurrences of the word 'Akr'. The most common three-letter word in English is 'THE', the next most common is 'AND'. We'll try that. The only way to make sense of 'iAS' and 'SAi' is to take 'i' to be 'W'.
AND WHEN THE WOgAN SAW THAT THE TREE WAS aOOD
vOR vOOD AND THAT IT WAS exEASANT TO THE EcES
AND A TREE TO bE DESIRED TO gAhE ONE WISE SHE
TOOh Ov THE vRuIT THEREOv AND DID EAT
Got it?
Problems
lo gyr ayqf yqu ayqf y fdye ybh
lq y clqbuha kf oxd rdy
oxyo y ayludq oxded ilsdu gxha fhm ayf cqhg
kf oxd qyad hv yqqykdi idd
yqu oxlr ayludq rxd ilsdu glox qh hoxde oxhmbxo
oxyq oh ihsd yqu kd ihsdu kf ad
vsd vugd smc wddp
vsmv osdp vsd wlmupc odld bhv vsd gmp obhzy yud
mpy vsdld mp dpy; whv pbo vsde lucd mjmup
ouvs vodpve gblvmz ghlydlc bp vsdul flbopc
mpy xhcs hc nlbg bhl cvbbzc
In the American Standard Code for Information Interchange (ASCII, pronounced askey), the letter A is represented by the number 65 and the letter Z by the number 90. There are separate codings for the lowercase letters: The letter a is 97 and the letter z is 122. The letters between A and Z get encoded by the corresponding numbers between 65 and 90, and similarly for the letters between a and z. So the word ''American'' is written as
|
| Character | ASCII code | Binary ASCII code |
| A | 65 | 1000001 |
| B | 66 | 1000010 |
| Z | 90 | 1011010 |
| a | 97 | 1100111 |
| b | 98 | 1101000 |
| c | 99 | 1101001 |
| z | 122 | 1111010 |
Why, if there are twenty-six letters in the alphabet, is 90-65 = 25?
Once we get used to thinking of letters as numbers, we can use arithmetic for enciphering. Suppose we code the letters a¼z by the numbers from 1 to 26. So now the word ''american'' looks like
|
|
| (*) |
|
|
|
If we encipher using the formula s = 7t (
To generate the cipher corresponding to the formula s = 7t+1 (
The letter A is enciphered as h. The letter B is enciphered as o, which is seven beyond h. The letter C is enciphered as v, which is seven beyond o, and so on. So, to get the cipher table, we start at h and choose every seventh letter, going around the circle as many times as needed. The resulting cipher table is
| A | B | C | D | ¼ |
| h | o | v | c | ¼ |
This technique is called decimation, after the Roman custom of killing every tenth soldier as a group punishment for mutiny. The root of the word is ''ten'', but it is used for other numbers, like 7, as well. Of course, the Romans went around the circle just once.
Something a little funny happens when you encipher the letter K using the formula s = 7t+1 (
Problems
So far, we have always enciphered letter by letter. But there is an advantage to enciphering bigger chunks at a time. For one thing, your adversaries can't get a good count on letter frequencies. The simplest idea is to encipher the letters two at a time, and the most famous of these ciphers is the Playfair cipher, invented by Charles Wheatstone in 1854 and popularized by Lyon Playfair.
The Playfair cipher works like this. You arrange the letters of the alphabet in a five-by-five square, omitting the letter j, which is replaced by i:
| C | H | A | R | L |
| E | S | W | T | O |
| N | B | D | F | G |
| I | K | M | P | Q |
| U | V | X | Y | Z |
What about double letters? How do you encipher EE? You have to get rid of them. When a double letter appears that has to be enciphered together, you insert a Q. If the text were ATTACK AT DAWN, we wouldn't have to doctor it, even though it has a double T, because it breaks up AT TA CK AT DA WN. However, if it were STOP THE ATTACK, it would break up as ST OP TH EA TT AC K. Then, because we can't encipher TT, we would have to insert an Q to get STOPTHEATQTACK, or ST OP TH EA TQ TA CK. It will be easy enough for the recipient to figure out which Q's are real and which were added to break up double letters. Similarly, if there is a single letter to encode at the end, we add a Q. So if the text were RETREAT, we would encipher RE TR EA TQ.
Enciphering STOP THE ATTACK according to the square given by the key word Charles Wheatstone gives the cipher text:
Check this to see if you understand how to encipher.
How do you decipher a Playfair cipher? Just the same as enciphering, except you move left along the rows instead of right, and above in the columns instead of below. The case where the letters are in different rows and different columns is done exactly the same. After deciphering, you should remove the spurious Q's and put in the word divisions.
We can use numerical techniques to encipher bigger blocks. Suppose we want to encipher the word AMERICAN. Then we code the letters A¼Z by the numbers from 1 to 26, as before, so that AMERICAN becomes
|
|
|
|
|
|
These computations can really be done only on a computer. To get something you can do easily with a hand calculator, you have to restrict yourself to much smaller numbers. We used two ingredients for enciphering. The first was the number 1997 that we multiplied by. The second was the number 10000000000000000 = 1016. This number came into play because we dropped all but the last 16 digits. Another way of saying that is that we divided by 1016 and took the remainder. So the game is to multiply by one number, then divide by another-the enciphered number is the remainder. The enciphering formula here was s = 1997t (
Notice that we are enciphering numbers. What happened to the letters and the words? They're still around, but the conversion from words to numbers and back again is not the secret part of the encipherment. The key to the process is two numbers: the one that we multiply by and the one we divide by. In the example just considered, we say that we multiply by 1997 mod 1016.
Instead of enciphering as large a word as AMERICAN all at one time, we could break the text up into two-letter pieces. So AMERICAN would separate into AM ER IC AN, and we could encipher each pair of letters separately. First, we would take these four pairs of letters and translate them into four numbers, in the usual way:
|
| |||||||||||||||||||||||
|
| |||||||||||||||||||||||
Problems
|
|
|
One last enciphering algorithm, with a twist. To illustrate, we will encipher pairs of letters, as before, although in practice this algorithm is used on much larger blocks of text. The algorithm takes a plain-text number t and enciphers it as s = t3 (
|
| t | times t | t2 (mod 8423) | times t | t3 (mod 8423) |
| 113 | 12769 | 4346 | 491098 | 2564 |
| 518 | 268324 | 7211 | 3735298 | 3909 |
| 903 | 815409 | 6801 | 6141303 | 936 |
| 114 | 12996 | 4573 | 521322 | 7519 |
The encrypted version is thus
|
| s | s2 (mod 8423) | s3 (mod 8423) | s4 (mod 8423) | ¼ | |
| 2564 | 4156 | 889 | 5186 | ¼ | |
| times s | 6574096 | 10655984 | 2279396 | 13296904 | ¼ |
| s | s2 | s4 | s8 | s16 | s32 | s64 |
| 2564 | 4156 | 5186 | 8380 | 1849 | 7486 | 1977 |
| s128 | s256 | s512 | s1024 | s2048 | s4096 |
| 257 | 7088 | 4972 | 7702 | 6038 | 2700 |
Now, we haven't found s5615 yet, but we can figure it out fairly easily from this table, because
|
| |||||||||||||
We have three numbers in this process: the modulus, 8423; the enciphering key, 3 (because s = t3 (
The security of the system hinges on the fact that it is very difficult to factor large numbers into primes. So one of the most esoteric and seemingly useless branches of pure mathematics, the theory of numbers, has become of great practical interest.
Problems
You can find out lots of things about codes and their history in The Codebreakers by David Kahn (Macmillan 1967).
The word ''cryptography'' comes from the Greek for secret writing.
More accurate than Legrand's list of letters by frequency is
or
an approximation to the famous etaoin shrdlu. In the Gold-Bug itself, the ten most frequent letters, in order of frequency, are etaoni shrdlu.
The words etaoin shrdlu come from the linotype machines that were once used for typesetting newspapers. The keyboard was arranged
| E | S | C | V | X |
| T | H | M | B | Z |
| A | R | F | G | |
| O | D | W | K | |
| I | L | Y | Q | |
| N | U | P | J |
This arrangement corresponds, more or less, to the frequency of the letters in English. When the operators made a mistake in a line, they would fill it out by running their fingers down the first two columns, thus inserting ETAOIN SHRDLU into the line. This indicated a mistaken line and was supposed to be thrown out of the final copy, but sometimes it managed to stay in.
The character '(' is omitted from the frequency table in all editions of the Gold-Bug. In some editions the plain text reads ''forty-one'' instead of ''twenty-one,'' and the cipher is also a bit different, although rarely corresponding exactly with the plain text. The frequency table, however, is invariably taken from the ''forty-one'' version, although sometimes a character for w, which appears only in the word ''twenty-one'', is listed. For example, the frequency of the character '8' is always given as 33, rather than the 34 times it would appear in a cipher corresponding to ''twenty-one''. There are 203 letters in the message with ''forty-one'', 204 letters in the message with ''twenty-one''.
In most editions, the table of known characters does not include '?' which Legrand knew stood for u at the time.