# Single Letter Frequencies in English

Every time that I read a paper that discusses the frequencies of single letters in English, I feel like I should sit down and calculate them for myself from a sample of English text. Today, I finally did. Here are the probabilities and negative log probabilities of the characters in English over the corpus of Shakespeare’s plays:

And, for those who care, here’s the code to generate the data from the plays, which I downloaded from Project Gutenberg:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38  def initialize_letter_counts(letter_counts) ('a'..'z').each do |chr| letter_counts[chr] = 0 end end   def parse_file(filename, letter_counts) f = File.new(filename) begin while 1 char = f.readchar().chr.downcase if char.match(/[a-z]/) letter_counts[char] = letter_counts[char] + 1 end end rescue EOFError return nil end end   directory = '/Users/johnmyleswhite/Princeton/Research/Letter Frequency'   Dir.chdir(directory)   letter_counts = {}   initialize_letter_counts(letter_counts)   Dir.new('Data').entries.each do |entry| if entry.match(/\.txt\$/) entry = File.expand_path(entry, directory + '/Data') parse_file(entry, letter_counts) end end   letter_counts.keys.sort.each do |key| puts "'#{key}',#{letter_counts[key]}" end