HUFFMAN CODING
Huffman Coding, also known as Huffman Encoding, is a kind of Greedy algorithm that is commonly used for lossless data compression. It was invented in 1952 and developed by David A. Huffman. It has a high level of optimization and is very efficient. In this, variable-length codes are assigned to input characters which depend upon how frequently a character has occurred. It is based on the probability of the source symbol. The characters which occur most frequently get the smallest code while those that occur rarely get the largest codes.
Huffman Coding uses the concept of prefix code to avoid any ambiguity in the decoding process, i.e. a code associated with a character should not appear in the prefix of any other code.
It comprises mainly two sections-
- Making a Huffman Tree out of the input characters.
- Transversing the Tree to find codes.
The Time Complexity of the Huffman Coding Algorithm is O(nlogn).
Here is an example to show the steps involving the construction of Huffman Tree-
Suppose we take the above string and sent it over a network. For the same purpose, in electronic components, we use ASCII codes. For each character or alphabet, we need 8 bits. So, for the above string with 30 characters, we would need 30*8=240 bits. Now, we need to compress this string to a smaller size using the Huffman encoding technique.
STEP 1:
For each character in the string, calculate the frequency or count of occurrence of the character.
STEP 2:
Arrange the nodes in order of increasing frequency value.
Taking the first two characters with minimum frequency,
create a new internal node.
The frequency of this new root node is the sum of the frequency of those two characters. Here letters P and T have the minimum counts as compared to the other letters, so we will add them to get the new internal node 9.
STEP 3:
Now, comparing this new root node 9 with other counts, again select the two minimum nodes and add them. In this case, characters S and Q having counts 6 and 7 respectively are combined to form new node 13.
STEP 4:
Repeat STEP 2 and STEP 3 until we get our optimal merge pattern tree which follows the greedy approach, i. e., always select the minimum two nodes.
STEP 5:
Make all of the left side edges 0 and all of the right side edges 1.
STEP 6:
Follow a path from the root node to the character in question to get the generated code, which will be used to encrypt the message. Suppose, we want to go P from the root node so the distance will be 111, for Q 01 and so on.
STEP 7:
Calculate the sizes of the individual characters and add them up to get the overall reduced size of the string or message. Suppose we need 3 bits for P and it is appearing 5 times in the string, then the size will be 5*3=15 bits and so on.
CONCLUSION
As we can see, we used to need 240 bits but now only need 69 bits, reducing the size and cost of the message by using the Huffman coding approach.