SLZ compression

SLZ is a simple LZ compression format made for Mega Drive homebrew, suitable for compression of generic data that isn't time critical (e.g. assets decompressed at load time). All it needs is a buffer in RAM large enough to hold the uncompressed data. It originated in Project MD and has been used in games like Tanglewood.

If you just want to use SLZ and don't care about writing it yourself then skip ahead to the tools section and ignore the rest. This page is here to describe how the format works.

Tools and decompression code

You can download the SLZ source code from here. Includes compression tool as well as assembly and C routines for decompressing the data. They're under the zlib license (which means you don't even need to include a credit).

The tool to make SLZ files as well as code to decompress it on the Mega Drive can also be found in the mdtools GitHub repository.

How LZ compression works

SLZ is a form of LZ compression, so first some quick explanation.

The basic idea is that chunks of data may be repeated later in the file. The compressed stream consists of multiple tokens. There are two kinds of tokens:

Strings consist of a distance and a length. The distance is how far back we need to go from the current position to find the sequence of bytes (from the already decompressed data). The length is how many bytes long the sequence is. When we find a string, we copy those bytes over again.

How SLZ stores data

The first two bytes (big endian) indicate the uncompressed length. This is used to know how much data needs to be decompressed.

Then comes the compressed stream (keep doing this until you reach the end):

  1. Grab a byte, this gives you the types for the next eight tokens (one bit per token).
  2. Check the next bit (note: go from bit 7 to bit 0, not the other way)
    • If it's 0, grab a byte and store it as-is
    • If it's 1, it's a string (see below)
  3. When you run out of bits grab another byte and keep going
  4. When you reached the uncompressed length, you're done

When you run into a string:

  1. Grab two bytes (forming a 16-bit word, big endian)
    • Bits 3-0 are the length (add 3 to this value)
    • Bits 15-4 are the distance (add 3 to this value)
  2. Go back distance bytes in the (so far) decompressed data
  3. Take length bytes and copy them to the end of the decompressed stream
  4. Resume decoding

SLZ24 variant

There exists a variant called SLZ24. It's pretty much identical except that the uncompressed length value is three bytes instead of two (giving a maximum of 16MB). This variant is intended for use on the Mega CD and such, where there's more than 64KB of RAM to work with.

Remark: there's no harm using it on the base Mega Drive, it's just pointless.

What does SLZ stand for?

Who knows?