Frequency analysis is the study of letters or groups of letters contained in a ciphertext in an attempt to partially reveal the message. The English language (as well as most other languages) have certain letters and groups of letters appear in varying frequencies.
This is a chart of the frequency distribution of letters in the English alphabet. As you can see, the letter ‘e’ is the most common, followed by ‘t’ and ‘a’, with ‘j’, ‘q’, ‘x’, and ‘z’ being very uncommon.
Knowing the usual frequencies of letters in English communication, if the encryption method does not effectively mask these frequencies it is possible to statistically determine parts of the plaintext from looking at the ciphertext alone. Let’s look at an example based on a plaintext encrypted with the Caesar Cipher – a cipher that provides no protection from frequency analysis.
wkh sdvvzrug lv vhyhq grqw whoo dqbrqh
If we look closely we can probably already notice some sentence structure here. Let’s break this down into some numbers we can work with. Let’s get the letter frequencies (how often each letter appears) of this ciphertext.
h = 5
v = 4
q = 3
r = 3
g = 3
d = 2
b = 1
k = 1
l = 1
s = 1
y = 1
Okay, so we’ve found our frequencies. The first reaction here is to try h = e, as the most common letter in the english alphabet and therefore our ciphertext is e. Since we know the cipher used is the Caesar cipher we know the letter shift from h to e is -3, so we can try a shift of -3 and viola, we have the key to descrypt the message. But what if we didn’t know the cipher – could we still figure this out, assuming the cipher didn’t mask the frequencies? Well with a little clever thinking, of course!
We can make the assumption that h is going to be the letter e here – it’s the most frequent. So already we can narrow some things down. We can take a look at some of the words and make further by looking for short two to three letter words and trying possibilities – there are only a handful in the english language. We can also look for instances of repeated letters as only a few letters repeat in such a fashion in common language). From there, it’s a matter of determining from words that you think you’ve figured out and using their letters to crack more words, eventually revealing the entire message!
However, frequency analysis is not always a viable method for breaking encryption. Many forms of encryption are sufficient in hiding the patterns within their data - making it appear essentially random in sequence. You wont have much luck breaking modern encryption with simple frequency analysis.