jueves, 11 de abril de 2013

Homework 3 - Text Compression

This week, we deliver a program on "compress text", with the help of Huffman coding.

But first, a little introduction:

Huffman Coding.

Huffman coding is an algorithm used for data compression.

The term refers to the use of a table of variable length codes to encode a given symbol (such as a character in a file) where the table has been filled in a specific manner based on the estimated probability of occurrence for each possible value of the symbol.

Technical

Is the creation of a binary tree in which the leaf nodes are labeled with characters, along with their frequencies, and consecutively are joining each pair of nodes that join less frequently, from creating a new intermediate node labeled with that sum.

We proceed to do this until there are no leaf nodes join any top node, and has formed the binary tree.

Subsequently, the label edges joining each of the nodes with zeros and ones (left and right child, respectively, for example. Resultant code for each character is read following the branch, from the root to each character (or vice versa ) of each of the labels of the edges.

Operation.

First, we give the text to be processed, and then run the program so it shows the table with their coding and we decode the text also in binary numbers.

** For clarification, seeing a code example, were used heapq module functions, such as: heapify, heappop and heappush, once a little explanation:

heapq.heapify(): Transform list x into a heap, in-place, in linear time

heapq.heappop(): Pop and return the smallest item from the heap, maintaining the heap invariant.

heapq.heappush(): Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().

Now, the code

Code:

Now, run the program and their results

Results:

Test one
Test two

First, use a short text, after a long text to detect if the program worked well

Analysis

Experiment the typical-case

The first analysis was monitoring time, which results are as follows:

Results of one test

Results of two test

Experiment the worst-case

The second, was to monitor the weight of the uncompressed file and of the compressed file

First text: 25 bytes
Compressed: 89 bytes

Second text 340 bytes
Compressed: 1.28 KB

In the test to check the weight of the archives, the weight was higher, the space can be consumed in the file with the data entered

Repository:

References:

1 comentario:

  1. El reporte quedó algo incoherente... Esperaba un experimento que tenga definido qué consideras un caso típico, qué un caso malo, cómo los generas para diferentes largos, cómo se portan etc. Intenta más escribir tu código propio en vez de internetear tanto. 7+4 pts.

    ResponderEliminar