The algorithm is simple to implement, and has the potential for very high throughput in hardware implementations. It was the algorithm of the widely used Unix file compression utility compress, and is used in the GIF image format.
A high level view of the encoding algorithm is shown here:
- Initialize the dictionary to contain all strings of length one.
- Find the longest string W in the dictionary that matches the current input.
- Emit the dictionary index for W to output and remove W from the input.
- Add W followed by the next symbol in the input to the dictionary.
- Go to Step 2.
The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary.
Example Code in Python
The string is "HolaHola".
Compressed program sequence of 8-bit symbols.
First, the word "Hello", it's the same, and not added the same.
Then, the string "Ho" is added to the 256:
Ho = 256
After the next, the character "la" one adds the 258:
la = 258
They met the characters "Ho" and "la".
256,258 = Hola
The values are replaced by the characters