
Do you train Kung Fu?
Or know someone who does?
Then check out KungFuPeople.com
Mobile version of this page
Previous:
To JSON, Pickle or Marshal in Python
Next:
Sequences in PostgreSQL and rolling back transactions
"Historisk Guide till England"
Most common English words
Regular Expressions in Javascript cheat sheet
British English for Americans
"I was robbed by twooo men" Zuiikin English girls
British or American English or just English
To JSON, Pickle or Marshal in Python
Next:
Sequences in PostgreSQL and rolling back transactions
Related blogs
Crazy Egg of IssueTrackerProduct.com"Historisk Guide till England"
Most common English words
Regular Expressions in Javascript cheat sheet
British English for Americans
"I was robbed by twooo men" Zuiikin English girls
British or American English or just English
Related by category
Most unusual letters in English language
11th of May 2009
I needed to find out what are the least used letters in the English language. I pulled down a list of about 100,000+ English words, split them all and made a list of about 1,000,000 letters. Sorted them by usage and came up with this as the result:
esiarntoldcugpmhbyfkwvzxjq
It would be interesting to make a heatmap of this over an image of a QWERTY keyboard.
Below is a the same list but with ratios compared to the least common:
e 3.0 s 2.3 i 2.1 a 2.0 r 1.9 n 1.8 t 1.6 o 1.5 l 1.4 d 1.1 c 0.9 u 0.9 g 0.8 p 0.7 m 0.7 h 0.6 b 0.5 y 0.4 f 0.4 k 0.3 w 0.3 v 0.3 z 0.1 x 0.1 j 0.1 q 0.0
I hope I got that right because I did that calculation in a quick one-liner just now. It basically means that the letter e is 3 times more common than the average.
Comment
rgz -
11th May 2009
[«« Reply to this]
I did the same thing for Spanish *manually* 9 years ago and also found e to be the most common vowel and letter, and the letter s to be the most common consonant and almost more common than the vowel "u" (but still in Spanish the vowels outnumber all the consonants)
How I wish I knew python back then!
I did the same thing for Spanish *manually* 9 years ago and also found e to be the most common vowel and letter, and the letter s to be the most common consonant and almost more common than the vowel "u" (but still in Spanish the vowels outnumber all the consonants)
How I wish I knew python back then!
Michael Tobis -
11th May 2009
[«« Reply to this]
This is odd, especially the poor performance of 'H'. It was long believed that the order of frequency is
ETAOIN SHRDLU
which even has an entry on Wikipedia.
This is odd, especially the poor performance of 'H'. It was long believed that the order of frequency is
ETAOIN SHRDLU
which even has an entry on Wikipedia.
Yish -
11th May 2009
[«« Reply to this]
Where did you get your list of words? Also is it useful to account for the frequency of the words themselves when calculating this list? E.g. if "the" is the most popularly used word in english, it would significantly skew the upwards the usage of the letters "t" and "h"
Where did you get your list of words? Also is it useful to account for the frequency of the words themselves when calculating this list? E.g. if "the" is the most popularly used word in english, it would significantly skew the upwards the usage of the letters "t" and "h"
Peter Bengtsson -
11th May 2009
[«« Reply to this]
You're actually right. I just took a huge list of words without caring for which was the most common *words*. My bad.
You're actually right. I just took a huge list of words without caring for which was the most common *words*. My bad.
K -
13th May 2009
[«« Reply to this]
Perhaps a better data source would be texts from project Gutenberg or Wikipedia?
Perhaps a better data source would be texts from project Gutenberg or Wikipedia?
Michal Bartoszkiewicz -
11th May 2009
[«« Reply to this]
You should include the frequency of each word in the calculation – you probably treat the 't' in 'the' identically as the 't' in 'anthropomorphologically', but the former occurs slightly more often in (normal) English texts than the latter ;)
According to Wikipedia (http://en.wikipedia.org/wiki/Letter_frequencies) 't' is the second most popular letter with about 2.3 times the average.
You should include the frequency of each word in the calculation – you probably treat the 't' in 'the' identically as the 't' in 'anthropomorphologically', but the former occurs slightly more often in (normal) English texts than the latter ;)
According to Wikipedia (http://en.wikipedia.org/wiki/Letter_frequencies) 't' is the second most popular letter with about 2.3 times the average.
Eric -
11th May 2009
[«« Reply to this]
By "3 times more common" do you mean "4 times as common"? The numbers make it look that way, but an unfortunate recent tendency is to use "3 times more" when "3 times as much" is more appropriate.
By "3 times more common" do you mean "4 times as common"? The numbers make it look that way, but an unfortunate recent tendency is to use "3 times more" when "3 times as much" is more appropriate.
Dougal Matthews -
12th May 2009
[«« Reply to this]
http://utilitymill.com/utility/Keyboard_HeatMap
Not great but kinda fun.
http://utilitymill.com/utility/Keyboard_HeatMap
Not great but kinda fun.
Peter Bengtsson -
13th May 2009
[«« Reply to this]
Says the Jpg is broken every time I try to open it.
Says the Jpg is broken every time I try to open it.


> It would be interesting to make a heatmap of this over an image of a QWERTY keyboard.
If you don't clean your hands religiously, you most likely already have a "grimemap" on your keyboard which more or less equates to a usage heatmap.