Making and Breaking the Code

Ken Cadwell, Laurel Felt, Sanjiv Prasad

Devising and deciphering codes are complicated endeavors. We members of Kenzthabest are fully aware of this and therefore carefully went about each step of the process.

The decryptor’s most practical approach for deciphering is to count letter frequencies and word patterns. One-on-one substitution lends itself to easy decoding with the manipulation of frequency tables, so our group decided to do a phrase substitution instead. We wrote GEORGE WASHINGTON over the alphabet and filled out the rest with random symbols. With this system, a symbol could stand for more than one letter. To make recognition of words difficult, we wrote the entire message backwards. Also every other sentence is a meaningless filler sentence, so this makes both letter frequency and word patterns harder to discern. Another devise of confusion we used was random usage of capital and lower-case letters. G and g are the exact symbols, but the decryptor does not know this. Finally the way in which the message was placed on the web, it could be read up or down, or left or right. The result is an extremely difficult code for even us to solve. All three of us have lost our places at one point or another while encrypting the message, and finding our places was tedious because we had to decrypt what we wrote. However, this helped us proof-read the code for mistakes.

We were not successful at first in decrypting the NSA code. One thing we noticed at the beginning was that they used more symbols than there are letters in the alphabet, which established that they did not do a simple single substitution. A letter frequency chart did not help either because about five letters were all within a couple units of frequency which signifies that some either one letter was represented by more than one symbol, or more than one letter was represented by one symbol. Do to the first observation, it would make more sense if one letter was represented by more than one symbol. It is also possible that neither is true and that the group threw in spacers once in a while to throw us off. However, the length of the code does not appear to be unusually long, so if spacers were used, it would have only been to a limited extent.

Then we resorted to another approach. If this were a competition to see which groups broke the most codes, than it would be beneficial to trade answers with another group which is what we did. NSA and Kenzthabest have now officially broken one code because we traded. The NSA plaintext is as follows:

Consumer-product diversity now exceeds biodiversity in Washington D.C. according to an EPA study conducted in conjunction with the taskforce on global developmental impact. For the first time in history the rich array of consumer products available in malls and supermarkets surpasses the number of living species populating the planet last year since introduction of Dentyne Ice Cinnamine Gum. Right on the heels of the exctinction of the Carolina tufted then put the product diversity on top for the first time. Study Chair Donald Hargrove said today the procotor & ganble subphylum alone outnumbers in sects two to one. The sharp rise in consumer-product diversity--with more than 200 million new purchasing options generated since 1930--comes as welcoming news for those upset over the dwindling number of plant and animal species. As more and more species fall victim to extinction we face a grave crisis of decreased diversity not only in America but across the globe. Hargrove said but the good news is the sellossesin biodiversity are more than offset by a corresponding rise in comsumer product diversity thought flora and fauna are dwindling the spectrum of goods available to consumers is wider than at anytime in planetary history and that’s something we can all be happy about the onion twenty first of october nineteen ninety-eight.

NSA made many mistakes which is understandable considering that Ray could not decipher his own group’s code. Because of this, we cannot really give an explanation to how their code works, except we know that a letter stands for a different letter every time. Since the assignment is to figure out the plaintext, we did what we have to do although we don’t quite know how.

After an initial look at the code for the group Sneakers, we noticed no prominent features leapt off the page. Therefore, we could gather that there was no code inside a code, which eased our transition into the next phase. We noticed a standard, 26-letter English alphabet with a occasional punctuation marks. After this short discussion about the significance of these punctuation marks (later refuted), we concluded that each individual symbol could not lead to a punctuation mark, because they did not occur often enough. However, we also thought that all of these marks could be a universal punctuation mark, but we still fluctuated. We also noticed a lack of spacing, which further frustrated us, in our quest to decipher the code. However, this initial look did not only produce negative results, there was one positive result. The very last mark of the code was the usage of quotation marks. This led us to believe that quotation marks, the most common type of punctuation in the code, could perhaps be a period.

Our next step in the decipherment process was a quantitative step that is quite time consuming and frustrating. We first counted up the first four lines and then we proceeded to do the next 4. If a pattern could be seen, we could stop there. The letters, which appeared most often in the first 4 lines, were A(12), B (11), E(15), F(12), J(15), K(12), M(12), N(11), O(10), R(11), T(18), Y(13), Z(15). We found this quite interesting, seeing as T and E, appeared the most often in the code, and that they are the most common letters in the English language. The results, from the first four lines, did not produce convincing results, so we did the next 4 lines. The letters that appeared most often in the first eight lines were E(32), Z (30), J(27), T(27), M(21), and N(21). This procedure removed some of the variability in the data.

Next we tried finding repeated letters, which resulted in a most frequent ZZ combination (6). We hypothesized that Z is E at this sight, because it appeared very frequently and had a significant number of letter combinations. Then we noticed a sequence of letters of KZZK, where we tried to fit in a letter for K. The only reasonable letters that could have fit there, should this be a single-letter substitution, would be D and P. However, if these are a part of a word, then there is a possibility of a double-substitution. After a tedious process, which led to frustration, we decided to wait for Sneakers for more help.

Incognito’s code is quite challenging. Because of the encoded message’s sheer length, constructing frequency tables is not only tedious, but also ineffective. However, at a loss as to what to do, we embarked on this futile mission. We decided to only count the letters in the first 27 lines of text, hypothesizing that the frequency trend would continue throughout the other 127 lines. We discovered that the letter A appeared 71 times, B appeared 121 times, C appeared 45 times, and Q appeared 52 times. These conclusions, though, came at the expense of a great deal of time and monotonous counting. Additionally, we were unconvinced as to the significance of the findings. For example, Incognito could have inserted 20 random, nonsense characters between every character that actually appeared in their code. Therefore, the frequency of certain letters would be irrelevant. Due to these factors, we abandoned frequency tables.

During the course of our counting, however, we discovered that double-letters occurred somewhat steadily. We analyzed lines 41- 78 and noted each double-letter sequence. There did not seem to be any set pattern for spacing the double-letters they appeared with as many as 168

characters between one set and the next, and with as a few as two characters between two sets. The letters which were doubled included C (6 times), M (2 times), Q (5 times), R (2 times), X (2 times), E (2 times), W (4 times), I (2 times), V (5 times), S (1 time), O (2 times), P (3 times),

B (1 time). Additionally, a trio of J’s occurred once.

Incognito’s proferred sentence was: "He knew there was something wrong with the data traffic he was watching." However, within this group’s sea of text, their hint did not illuminate the key to their code. For all we knew, the sentence was written backwards, arranged in a square-formation at the top of every sheet of page. Due to the volume of material, there is no conclusive way of analyzing the text without spending an inordinate amount of time.

The lack of punctuation in Incognito’s code also escalates its level of difficulty. It is impossible to see where one word ends and the next begins; thus, deduction based on groupings is impossible. To the best of our knowledge, very few tools are at our disposal for deciphering this enigma. Incognito’s transmission remains a mystery.

Obviously, enciphering and deciphering text are not simple tasks. Although we were not entirely successful in all of our decryption attempts, we honed our analytical skills and developed a deeper appreciation for real-life code-breakers. Codes are everywhere and so perhaps this won’t be the end of our code-breaking careers, but only the beginning.