Sneakers: Constructing and Analyzing Cryptograms

Constructing and Analyzing Cryptograms

Justin Adams-Tucker, Kimberly Anderson, Michael Cox, and Smita Desai

Group Project in Cryptography
Code Making / Code Breaking
Professor Chris Kennedy
3 December 1998

I. The Sneakers Cryptogram

Our original intent was to create a cryptogram that would render all common forms of cryptanalysis that we discussed in class useless. This criteria eliminated the possibility of any simple monoalphabetic substitution. Although we wanted decipherment of our cryptogram to be quite difficult, we felt that some types of encipherment we studied, such as rare book substitution or the dictionary cipher, would be unfair, since they require the person deciphering the message to be in possession of a particular book. At first, we considered a concealment cipher, which hides the mere existence of any secret message. However, we decided that this would be inappropriate for the group project, because it would be virtually impossible to even reach the first step in cryptanalysis of such a cipher, obtaining the concealed, enciphered plaintext from the cryptogram. Finally, we concluded that a complex polyalphabetic substitution cipher would best suit our needs, as the type of cipher used is fairly evident, but remains quite difficult to decrypt. Our next step was to develop an elaborate technique for enciphering our article. We decided to create two discs, each containing the twenty-six letters of the alphabet, much like the polyalphabetic substitution designed by Leon Battista Alberti in the fifteenth century. However, we decided to turn the disc after each letter had been enciphered, to ensure that simply counting how many times each letter occurred would not give any clue as to how to decipher the cryptogram. To accomplish this, we devised an eight-step pattern for turning the outside disc around the inner one:

turn disc clockwise two spaces (before enciphering first letter)
counterclockwise three spaces
clockwise one space (back to original position)
counterclockwise two spaces
clockwise three spaces
counterclockwise two spaces
clockwise three spaces
counterclockwise two spaces (back to original position)

This pattern returns the disc to its original location after a full sequence of eight moves, enabling it to cycle throughout the process of encipherment. Having formed our principle ideas, we had to decide how to handle some of the syntactical attributes of our plaintext. We printed the entire article in upper-case letters, spelled out any numerals in the plaintext, and eliminated spaces, to make the cryptogram appear more intimidating. Furthermore, we concluded that punctuation should remain, but we substituted other punctuation symbols, which remained constant throughout the message for each occurrence and did not affect the pattern of turning the disc:

an apostrophe in the plaintext became a colon in the cryptogram
a period became quotation marks
a comma became a question mark
a hyphen became a period
a semicolon became a hyphen

As we began to construct our discs out of construction paper, we discovered that it is quite difficult to divide a circle into twenty-six equal sections. Thus, we altered the structure of our ìmachineî. Instead of the discs, we used two strips of construction paper, cut from the lengthwise margin of the page. These strips were much easier to divide properly and match up nicely, and we simply substituted ìrightî and ìleftî for ìclockwiseî and ìcounterclockwiseî in our pattern for shifting the strips. We wrote the letters horizontally across the strips, deciding that both alphabets should be jumbled, or out of proper order, to increase the difficulty of deciphering the article:

Upper strip: J K M S A T C D P L F E R B G X W H Y I N U Q V Z O
Lower strip: V H Y M O E T K W Z N I R B U Q X F J A S D L C G P

In place of turning the disc, we simply slide the lower strip of paper the appropriate number of spaces right or left after each letter is enciphered. When shifting the lower strip, it will stick out on one end when it is not in the ìhome positionî. When this happens, it is easy to envision the overhanging portion of one stripís alphabet at the other end of the strip. We encipher a letter of the plaintext by locating it on the upper strip of paper and converting it to the letter that appears on the strip directly below. Thus, it is possible that, every so often, a letter in the cryptogram will be the actual plaintext letter (as ìRî and ìBî are in the position above). Finally, we were inclined to include ìdummy lettersî in our cryptogram, extra junk that would increase the difficulty of cryptanalysis, but decided this would be similar to concealment, and preferred to present the cryptogram in the most straightforward manner possible, free of gimmicks. After completing the cryptogram, we were all certain that nobody in the class would be able to decipher our work. Here is our plaintext, taken from an October issue of Times Educational Supplement:

It's All Done in the Mind This series supplements Scholasticís excellent series Developing Mental Maths (reviewed in TES Primary Magazine, January twenty-third). Each book provides twenty-six photocopiable activity sheets to supplement activities in the former series, although they could be slotted into any other math scheme or used on their own. The sheets are conveniently grouped under four headings: counting and ordering; addition and subtraction; multiplication and division; and multistep and mixed operations. Unlike some of the dreary mental maths books being churned out by publishers, these are lively books that will help children become confident with numbers and enable them to develop a range of mental calculation strategies to draw on. Notes are provided for each activity highlighting prerequisite knowledge, offering suggestions for teacher input, stating the key calculating strategy involved, and suggesting ways in which the activity sheet could be modified (by changing the range of numbers involved, for example). Each activity also cross-refers to particular practical activities in the Developing Mental Maths series.

II. The NSA Cryptogram

The most obvious property of this cryptogram is the fact that, when downloaded, it appears as one line of text. This fact in itself indicates that there exists a strong possibility of manipulation of the letter sequence. When the number of times each letter appears in the cryptogram is counted, a strange pattern appears. Each letter appears roughly the same number of times one would expect it to, according to the frequency tables discussed in class:

E-126
T-107
N-97
O-95
I-94
S-81
A-75
R-75
C-53
D-50
H-41
U-40
L-37
M-32
P-31
Y-25
F-23
G-23
V-19
B-18
W-16
--9
X-4
K-2
J-1
&-1

This is indicative of a transposition cipher, in which no substitution of letters has occurred. The plaintext letters are simply rearranged in a particular order over a particular cycle to form the cryptogram. Because there are 1182 characters in the cryptogram, the only possible transposition cycles that fit perfectly are two, three, six, 197, 394, and 591. However, any of the latter three would require tedious encipherment, so the cycle of transposition is probably small. The fact that there are several numerals located relatively close together in the middle of the code further supports this hypothesis, since the plaintext numerals are likely close together. Thus, letters from the middle of the article are transposed with other letters from the middle of the article, and it is likely that letters from the beginning of the article are transposed with other letters from the beginning and vice versa. Another typical characteristic of a transposition cipher is that the first letter of each cycle in the cryptogram is usually the first letter of the corresponding cycle of plaintext. We are not sure if this has anything to do with the capitalization of the first letter. We are also unable to explain the presence of hyphens at varying frequencies in the cryptogram. There are only two letters before the first hyphen, then fifty-one between the first and second hyphens, then twenty between the second and third, and so on, with each hyphen separated by a number of letters with no ascertainable pattern. Thus, our conclusion is that these did not mark the cycles, and there must simply be a lot of hyphens in the plaintext.

The given sample of plaintext, ìToday, the Procter & Gamble subphylum alone outnumbers insects two to oneî can be easily located because of the ì&î. The letters that form this sentence seem to be situated around the ì&î, but many of them repeat several times, making it virtually impossible to distinguish any sort of pattern. Also, when breaking the cryptogram down into two, three or six-letter cycles, it is impossible to fit the appropriate letters around the ì&î. Without any knowledge of the cycle or its pattern, we were not able to obtain a solution for this cryptogram.

III. The Kenzthabest Cryptogram

Kenzthabest is unique in its arrangement: 92 lines of fifteen symbols, each subdivided into five three-symbol groups. It is also very noticeable that the group utilized an odd combination of upper-case letters, lower-case letters, numerals, and punctuation signs in their cryptogram. Our conclusion is that this was done purely for appearance, and that it merely makes the cryptogram look much more difficult than it truly is. What is striking, however, is the fact that there are thirty-six symbols used, and that punctuation signs often appear much more frequently than some letters, as shown in the cryptogramís frequency table:

g-178
4-149
G-133
o-111
=-105
7-101
t-63
s-59
T-52
n-52
a-39
space-38
$-33
N-28
O-35
S-28
q-27
r-26
e-22
A-18
R-16
.-14
E-13
w-12
i-8
W-7
/-3
I-2
F-1
f-1
H-1
h-1
j-1
M-1
1-1
0-1

The fact that there are thirty-six letters supports the usage of ìdummy lettersî, but may also reveal the existence of enciphered punctuation, and possibly spaces, in the cryptogram. There are 1380 characters in the cryptogram, meaning that there would be 6.9 characters per word in the plaintext (if it was exactly 200 words), a little longer than normal. However, the article could contain well over 200 words, and thus, the usefulness of statistical analysis such as this is limited. The end of the cryptogram, which says ìAHNTî, looks as though it may provide some kind of ìhintî, perhaps even a key, but we are unable to interpret it properly. The given plaintext sample, ìThe list could go onî, does little more than confirm that the plaintext does not even bear a remote resemblance to the text of this extremely complex cryptogram. Because we are uncertain about so many aspects of this cryptogram, we cannot even determine if it is a substitution cipher or something else, although we are inclined to think it uses substitution and ìdummy lettersî. Therefore, we cannot develop a sound strategy for cryptanalysis.

IV. The Incognito Cryptogram

Of all of the cryptograms we have attempted to decipher, this one is by far the longest. The groupís primary objective seems to be to make frequency tables useless. However, the cryptogramís structure is rather simple, as it contains no spaces and no punctuation. Thus, the best cryptanalysis approach is one based primarily on statistics. The cryptogram contains some 9920 characters, so if each letter in the ciphertext corresponds to a single letter in the plaintext and the plaintext is 200 words, each word would be an astronomical 49.6 letters long! The twenty-six upper-case letters appear in the following frequency:

E-506
O-461
I-457
L-453
U-440
B-435
H-429
P-423
V-415
A-410
M-400
R-399
Y-385
T-377
C-359
F-356
S-356
W-350
J-346
D-345
K-343
Z-336
G-304
Q-296
X-273
N-266

What becomes apparent when studying the frequency table is that, although the frequencies for the most part are somewhat appropriate, the ratios are way off. In a given sample of normal text, there are between ten and twenty times as many Eís as Zís, for example. Here there is not even as much as a 2:1 ratio between the most frequent letter ìEî and the one that appeared least, ìNî, which was mysteriously absent from the last thirty-six lines of the cryptogram.

This fact sheds light on the possibility that each letter of the plaintext could be enciphered as a multiple-letter sequence in the cryptogram. There are 676 such possibilities for two-letter sequences, and only twenty-six would appear if this was the case. Since more than twenty-six independent two-letter sequences appear, this method has obviously not been used, unless each of the 676 possibilities has been assigned to correspond to a single letter, for example ìMRî and ìVEî in the cryptogram could both represent the plaintext ìXî. We wonder though, if perhaps the cipher uses eight-letter sequences instead of two-letter sequences. In this case, it is possible that a single letter of plaintext could correspond to one of over 208 billion possible eight-letter combinations. Therefore, 1240 distinct eight-letter sequences could be used in the cryptogram to represent 1240 different characters, for an average of 6.2 characters for 200 words. Although it is quite possible, we doubt that the Incognito group has done this, because it would be an exhausting process.

Therefore, our concentration shifts to the potential existence of ìdummy lettersî in the cryptogram. If they do exist, as we suspect in large part due to the disappearance of ìNî, there would be a low likelihood that each letter has been used as a ìdummy letterî the same number of times. For this reason, it would be totally ineffective to subtract a given number, say 250, from the amount of times that each letter occurs. Furthermore, we would not know which appearances of each letter were ìtrueî, and which were ìdummyî. Therefore, we can only attempt to determine how long the cryptogramís cycles would be if it is composed primarily of ìdummy lettersî. If each cycle is eight letters long, then eight letters in the cryptogram would have one letter directly from the plaintext (or one that must correspond directly to the plaintext). As mentioned before, this would mean that a 200-word article would average 6.2 characters per word. It is also possible that there are ten-letter cycles, for an average of 4.96 characters per word.

It is possible to search for the given sample of plaintext, ìHe knew there was something wrong with the data traffic he was watchingî, in the cryptogram. By dividing the cryptogram into sequences of eight and ten letters, we can search for an ìHî in one line, an ìEî in the next, and so on as appropriate. However, in the ten-letter cycle, we can find nothing longer than ìH-E-K-N-Eî moving forward through sequential lines, and we are unsure if the cryptogram should be attacked from the beginning, the end, or somewhere in between. In addition, we know that we would only be able to solve the cryptogram in this manner if no substitution has been performed before the addition of the ìdummy lettersî. In summary, this cryptogram could have been formed in too many different ways for us to make much of an attempt at its solution.