by Neal R. Wagner
Copyright © 2002 by Neal R. Wagner. All rights reserved.
NOTE: This site is obsolete. See book draft (in PDF):
Newpapers in the U.S. have long presented to their readers a special puzzle called a cryptogram. The puzzle has taken a quotation in capital letters and substituted another letter for each given letter. The trick is to guess the substitutions and recover the original quotation. Here is an example of a cryptgram:
ZFY TM ZGM LMGM ZA HF Z YZGJRBFI QRZBF ATMQX TBXL WHFPNAMY ZRZGVA HP AXGNIIRM ZFY PRBILX, TLMGM BIFHGZFX ZGVBMA WRZAL UO FBILX. YHCMG UMZWL, VZXLMT ZGFHRY
It looks like complete gibberish, but if one knows, deduces, or guesses the translation scheme, the key for uncovering the quotation, then it is understandable. In this case the key is:
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ Translated to: ZUWYMPILBDJRVFHQSGAXNCTKOE
Given the quotation, the person making up this cryptogram would write Z for each A, U for each B, W for each C, and so forth. Given the cryptogram as above, one just has to go backwards, changing each Z back to A, and so forth. In this way, knowing the translation key, it is easy to recover the original quotation:
AND WE ARE HERE AS ON A DARKLING PLAIN SWEPT WITH CONFUSED ALARMS OF STRUGGLE AND FLIGHT, WHERE IGNORANT ARMIES CLASH BY NIGHT. DOVER BEACH, MATHEW ARNOLD
I have never solved one of these puzzles, but my parents often used to spend an hour or so recovering such quotations. (I must try one sometime.) I remember that my mother would first focus on a word that is a single letter as above, since this letter must be either an A or an I in ordinary English. After trying I for awhile, assume that Z is an A. Then there is a three-letter word ZFY that one now guesses starts with an A. This word appears twice, so one might guess that it is AND. From this, the last word (last name of an author?) becomes A_N__D. There is only one well-know author whose name looks like this, and the quotation is perhaps his most famous one, so one would have solved the puzzle immediately.
As another approach, my mother would check the frequencies of all the letters. In the scrambled quotation (leaving off the last line), they are far from uniform: Z:11, M:10, F:9, G:8, B:7, A:7, etc. Now, E is the most frequent letter in English, and it is the next-to-most-frequent in this quotation. One can also look for words with double letters or with other unusual features. With trial and error, and some luck, one soon has the quotation.
Here is a Java program to produce cryptograms at random, using whatever quotation you wish: Cryptogram program.
The ``quotation'' above is ordinarily called a message or plaintext in cryptography. The cryptogram is the ciphertext. The process of transforming the plaintext into ciphertext is encryption, while the reverse process of recovering the plaintext from the ciphertext is decryption. The 26 letters used for encryption and decryption is called the key. The particular method of translating plaintext to ciphertext is called a cryptosystem.
It is important to realize that a single key could transform an arbitrarily long piece of plaintext. Thus instead of keeping a large message secret, one uses cryptography so that one need only keep a short key secret. This leads to a law:
Law CRYPTO1a: Cryptography reduces the problem of keeping an arbitrarily long message secret to the problem of keeping a short key secret. What an impressive improvement!
Although the techniques of cryptography are wonderful and powerful, one also needs to realize the limitations of these tools. There still remains something to keep secret, even if it is short:
Cryptography has many uses, but first and foremost it provides security for data storage and transmission. The security role is so important that older books titled Network Security covered only cryptography. Times have changed, but cryptography is still an essential tool for achieving security. Network ``sniffers'' work because packets are in the clear, unencrypted. In fact, we make as little use of cryptography as we do because of a long-standing policy of the U.S. government to surpress and discourage work in the area and uses of it, outside classified military applications. If a transmission line needs security there are still only two basic options: physical security, with fences, rasorwire, and guard dogs, or security using cryptography. (The emerging field of quantum cryptography may yield a fundamentally different solution.) With cryptography, it doesn't matter if the line goes across a field or across the world.
The early part of this section regarded a cryptogram as a special (simple) cryptographic code. The process of recovering the original quotation is a process of breaking this code. This is called cryptanalysis, meaning the analysis of a cryptosystem. In this case the cryptanalysis is relatively easy. One simple change would make is harder: just realize that revealing where the blanks (word boundaries) are gives a lot of information. A much more difficult cryptogram would leave out blanks and other punctuation. For example, consider the cryptogram:
OHQUFOMFGFMFOBEHOQOMIVAHZJVOAHBUFJWUAWGKEHDPBFQOVOMLBEDBWMPZZVFOHQDVAZGWUGFMFAZHEMOHWOMLAFBKVOBGXTHAZGWQENFMXFOKGLOWGFUOMHEVQ
One might also present this just broken into groups of five characters for convenience in handling:
OHQUF OMFGF MFOBE HOQOM IVAHZ JVOAH BUFJW UAWGK EHDPB FQOVO MLBED BWMPZ ZVFOH QDVAZ GWUGF MFAZH EMOHW OMLAF BKVOB GXTHA ZGWQE NFMXF OKGLO WGFUO MHEVQ
Now there are no individual words to start working on, so it is a much more difficult cryptogram to break. However, this is an encoding of the same quotation, and there is the same uneven distribution of letters to help decrypt the cryptogram. Eventually, using the letter distributions and a dictionary, along with distributions of pairs of letters, one could get the quotation back:
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ Translated to: OXKQFDZGACIVLHEJSMBWPNURTY ANDWEAREHEREASONADARKLINGPLAINSWEPTWITHCONFUSEDALARMSOFSTRUGGLEANDFLIGHTWHEREIGNORANTARMIESCLASHBYNIGHTDOVERBEACHMATHEWARNOLD
Even here there are problems breaking the text into words, since it seems to start out with AND WEAR E HE REASON ....
Notice that the uneven statistical distribution of symbols is still a strong point of attack on this system. A much better system uses multiple ciphertext symbols to represent the more common plaintext letters. This is called a homophonic code, and it can be arbitrarily hard to cryptanalyze if one uses enough additional ciphertext symbols.
The cryptanalysis above assumed that the ciphertext (the cryptogram) was available, but nothing else. However, often much more information is at hand, and good cryptosystems must be resistant to analysis in these cases also. Often the cryptanalyst has both plaintext and matching ciphertext: a known plaintext attack. In the case of cryptograms, the code would be known for all letters in that particular plaintext, and this would effectively break the code immediately unless the plaintext were very plain indeed. Sometimes the cryptanalyst can even choose the plaintext and then view his own choice of plaintext along with the corresponding ciphertext: a chosen plaintext attack.
Amatuers in cryptography sometimes think they should keep the methd of encryption secret, along with the particular key. This is a bad idea though, because sooner or later the underlying method with be discovered or bought or leaked. Law CRYPTO3: The method or algorithm of a cryptosystem must not be kept secret, but only the key. All security must reside in keeping the key secret.