A Crypto Problem

My #3 Son was reading Poe's The Gold Bug and so he and I were discussing solving crypto puzzles. My son mentioned the letter frequencies Poe gave and I said Poe was wrong. So I asked him if he was sure he remembered it correctly. My son said he was sure he remembered it correctly and got the book out to prove it. Of course I had read Poe many years ago, but I didn't remember the letter frequencies Poe gave. Here are the letter frequencies Poe gave in order from most common to least as:

e a o i d h n r s t u y c f g l m w b k p q x z

In Linotype usage the frequencies are:

e t a o i n s h r d l u c m f w y p v b g k q j x z

and like any good crypto puzzle solver let me put them one on top of the other:

e a o i d h n r s t u y c f g l m w b k p q x z
e t a o i n s h r d l u c m f w y p v b g k q j x z

Do any of you have an explanation for the differences at least among the most common letters? Was he trying to make it harder for his readers to solve the crypto puzzles he often published?

Should you care to read or re-read the story here is an on line version of The Gold Bug from Project Gutenberg. And here is a paperback version: The Gold-Bug and Other Tales.

Cross Posted at Power and Control

posted by Simon on 01.01.09 at 09:02 PM










Comments

Simon Singh's terrific "The Code Book" has a crypto problem at the end. He offered money to the first person to solve it. Someone did.

Bleepless   ·  January 1, 2009 10:06 PM

Here's someone who thinks Poe is playing games:

http://is.gd/eo8b

Dennis   ·  January 1, 2009 10:59 PM

I speculate that Poe didn't have access to a good frequency table and generated his own, perhaps from an atypical sample, or from too small a sample size.

SBP   ·  January 1, 2009 11:03 PM

Why doesn't Poe include "j" and "v"?

Anonymous   ·  January 1, 2009 11:07 PM

The frequencies he gives are from his own cryptograms, none of which decodes to "vajayjay," because it was old-timey days.

guy on internet   ·  January 2, 2009 1:21 AM

SBP,

You would have a point if the letters "in error" came at the end of the table.

However misplacing t d and h so significantly and t especially seems intentional.

It might be interesting to scan a number of texts to see if it is possible to find one with that frequency pattern. Poe may have hidden a cryptogram in his letter frequency chart.

M. Simon   ·  January 2, 2009 2:33 AM

Wasn't Poe an accomplished cryptographer who claimed that no one in the world could design a cipher he couldn't crack? He apparently succeeded when he challenged readers of his newspaper column to try to stump him.

It seems more likely that he gave bad information on purpose. Perhaps he didn't want to give away too much information. It might generate interest in the puzzles but not give people an absolute key to solving them. The rest of his readers, who still didn't go in for the puzzles, would be none the wiser but could still enjoy the story with its internal logic.

Dennis   ·  January 2, 2009 9:19 AM

No mystery. Poe wrote 150 years ago. English word use was different then. And Poe was just one person, not the English Speaking Peoples. His table might have been right for the United States of his day.

An interesting test would be a table made from Poe's writings. That would be English as he used it.

Type would have still been set by hand for Poe. Mark Twain famously invested in early typesetting machines and lost a fortune.

K   ·  January 2, 2009 11:50 AM

Intriguingly, this is the ranking for a book cipher which uses the first letter of each word.

The ranking for first letters in English is

t o a w b c d s f m r h i y e g l n o u j k

whereas that for every letter in English is as you quoted.

If you multiply the numerical frequencies for first letters by the numerical frequencies for every letter, the ranking will be close to that given by Poe.

Alex Green   ·  January 2, 2009 11:53 AM

Alex,

That is very interesting.

M. Simon   ·  January 2, 2009 12:49 PM

Why doesn't Poe include "j" and "v"?

Now this is interesting.

Classical Latin uses j and i interchangeably, and the same for u and v.

I wonder if Poe was working from a Latin text?

SBP   ·  January 2, 2009 2:00 PM

The Wikipedia page doesn't give a table for Latin, but the Italian table has "t" at 5.62% and the Spanish table has "t" at 4.63%, both significantly lower than the 9.06% for English.

SBP   ·  January 2, 2009 2:08 PM

Kluber gives the percentages for Latin as:

I - 10.1 M - 3.4 V - 0.7
E - 9.2 C - 3.3 X - 0.6
U - 7.4 P - 3.0 H - 0.5
T - 7.2 L - 2.1 J - 0
A - 7.2 D - 1.7 K - 0
S - 6.8 G - 1.4 Y - 0
R - 6.8 Q - 1.3 Z - 0
N - 6.0 B - 1.2
O - 4.4 F - 0.9

Alex Green   ·  January 2, 2009 3:47 PM

Poe's table is probably based on a pretty small sample size. Note that in
eaoi(dhnrstuy)(cfglmw)(bkpqxz)
the three parenthesized groups are alphabetized. My guess is that this means they all had the same (or statistically indistinguishable) counts in his sample, so he just listed them in order. If this is true, then only the T has really shifted very much.

Dave   ·  January 2, 2009 8:47 PM

"Son #3"? I didn't even know you had any kids!

Aakash   ·  January 6, 2009 1:42 AM

Aakash,

Three sons, one daughter.

M. Simon   ·  January 6, 2009 1:53 AM

Post a comment


April 2011
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

ANCIENT (AND MODERN)
WORLD-WIDE CALENDAR


Search the Site


E-mail



Classics To Go

Classical Values PDA Link



Archives



Recent Entries



Links



Site Credits