This week, we deliver a program on "compress text", with the help of Huffman coding.
But first, a little introduction:
Huffman coding is an algorithm used for data compression.
The term refers to the use of a table of variable length codes to encode a given symbol (such as a character in a file) where the table has been filled in a specific manner based on the estimated probability of occurrence for each possible value of the symbol.
Is the creation of a binary tree in which the leaf nodes are labeled with characters, along with their frequencies, and consecutively are joining each pair of nodes that join less frequently, from creating a new intermediate node labeled with that sum.
We proceed to do this until there are no leaf nodes join any top node, and has formed the binary tree.
Subsequently, the label edges joining each of the nodes with zeros and ones (left and right child, respectively, for example. Resultant code for each character is read following the branch, from the root to each character (or vice versa ) of each of the labels of the edges.
First, we give the text to be processed, and then run the program so it shows the table with their coding and we decode the text also in binary numbers.
** For clarification, seeing a code example, were used heapq module functions, such as: heapify, heappop and heappush, once a little explanation:
heapq.heapify(): Transform list x into a heap, in-place, in linear time
heapq.heappop(): Pop and return the smallest item from the heap, maintaining the heap invariant.
heapq.heappush(): Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().
Now, the code
Now, run the program and their results
First, use a short text, after a long text to detect if the program worked well
Experiment the typical-case
The first analysis was monitoring time, which results are as follows:
Results of one test
Results of two test
Experiment the worst-case
The second, was to monitor the weight of the uncompressed file and of the compressed file
First text: 25 bytes
Compressed: 89 bytes
Second text 340 bytes
Compressed: 1.28 KB
In the test to check the weight of the archives, the weight was higher, the space can be consumed in the file with the data entered