Single Letter Frequencies in English

Every time that I read a paper that discusses the frequencies of single letters in English, I feel like I should sit down and calculate them for myself from a sample of English text. Today, I finally did. Here are the probabilities and negative log probabilities of the characters in English over the corpus of Shakespeare’s plays:

Single Letter Probabilities.png
Single Letter Inverse Probabilities.png

And, for those who care, here’s the code to generate the data from the plays, which I downloaded from Project Gutenberg:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def initialize_letter_counts(letter_counts)
  ('a'..'z').each do |chr|
    letter_counts[chr] = 0
  end
end
 
def parse_file(filename, letter_counts)
  f = File.new(filename)
  begin
    while 1
      char = f.readchar().chr.downcase
      if char.match(/[a-z]/)
        letter_counts[char] = letter_counts[char] + 1
      end
    end
  rescue EOFError
    return nil
  end
end
 
directory = '/Users/johnmyleswhite/Princeton/Research/Letter Frequency'
 
Dir.chdir(directory)
 
letter_counts = {}
 
initialize_letter_counts(letter_counts)
 
Dir.new('Data').entries.each do |entry|
  if entry.match(/\.txt$/)
    entry = File.expand_path(entry, directory + '/Data')
    parse_file(entry, letter_counts)
  end
end
 
letter_counts.keys.sort.each do |key|
  puts "'#{key}',#{letter_counts[key]}"
end