jueves, 25 de abril de 2013

Activity Point Extra - Lempel-Ziv-Welch

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.

The algorithm is simple to implement, and has the potential for very high throughput in hardware implementations. It was the algorithm of the widely used Unix file compression utility compress, and is used in the GIF image format.

Encoding

A high level view of the encoding algorithm is shown here:

  • Initialize the dictionary to contain all strings of length one.
  • Find the longest string W in the dictionary that matches the current input.
  • Emit the dictionary index for W to output and remove W from the input.
  • Add W followed by the next symbol in the input to the dictionary.
  • Go to Step 2.
A dictionary is initialized to contain the single-character strings corresponding to all the possible input characters (and nothing else except the clear and stop codes if they're being used).

The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary.

Algorithm Compress:
Symbols and variables

Algorithm Decompress
Symbols an variables

Example Code in Python

Results

Funcionality

Compress


The string is "HolaHola".

Compressed program sequence of 8-bit symbols.

First, the word "Hello", it's the same, and not added the same.

Then, the string "Ho" is added to the 256:
Ho = 256

After the next, the character "la" one adds the 258:
la = 258

They met the characters "Ho" and "la".

Decompress

256,258 = Hola


The values ​​are replaced by the characters

References

Homework 4 - Adaptative Huffman

For this week, as was the task at adaptive Huffman algorithm.

In this task, use the code that I had already used in the previous post.

The functionality is explained in the previous post, and this time add the piece of code to be something dynamic and not is adaptative.

Functionality

The program compresses the text, in this occasion not enter text in the terminal, as it had done earlier in the program, the text is generated within the program, and it gives us the compressed data results.


Now the code:

Code:

Now the results:

Results:

Test 1:


Test 2:

Analysis of Results:

Testing time:

Test 1:

Test 2:

One problem with the program is that, when calculating the time, really does not take long to process the program, perhaps would be the size of characters produced, maybe that's the point at which this happens.

Repository:

References:

jueves, 11 de abril de 2013

Homework 3 - Text Compression

This week, we deliver a program on "compress text", with the help of Huffman coding.

But first, a little introduction:

Huffman Coding.

Huffman coding is an algorithm used for data compression.

The term refers to the use of a table of variable length codes to encode a given symbol (such as a character in a file) where the table has been filled in a specific manner based on the estimated probability of occurrence for each possible value of the symbol.

Technical

Is the creation of a binary tree in which the leaf nodes are labeled with characters, along with their frequencies, and consecutively are joining each pair of nodes that join less frequently, from creating a new intermediate node labeled with that sum.

We proceed to do this until there are no leaf nodes join any top node, and has formed the binary tree.

Subsequently, the label edges joining each of the nodes with zeros and ones (left and right child, respectively, for example. Resultant code for each character is read following the branch, from the root to each character (or vice versa ) of each of the labels of the edges.

Operation.

First, we give the text to be processed, and then run the program so it shows the table with their coding and we decode the text also in binary numbers.

** For clarification, seeing a code example, were used heapq module functions, such as: heapify, heappop and heappush, once a little explanation:

heapq.heapify(): Transform list x into a heap, in-place, in linear time

heapq.heappop(): Pop and return the smallest item from the heap, maintaining the heap invariant.

heapq.heappush(): Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().

Now, the code

Code:

Now, run the program and their results

Results:

Test one
Test two

First, use a short text, after a long text to detect if the program worked well

Analysis

Experiment the typical-case

The first analysis was monitoring time, which results are as follows:

Results of one test

Results of two test

Experiment the worst-case

The second, was to monitor the weight of the uncompressed file and of the compressed file

First text: 25 bytes
Compressed: 89 bytes

Second text 340 bytes
Compressed: 1.28 KB

In the test to check the weight of the archives, the weight was higher, the space can be consumed in the file with the data entered

Repository:

References: