SCUMM Revisited - Thirst is Nothing, Image is Everything

Thirst is Nothing, Image is Everything

Edition 1.2

This very first SCUMM internals article deals with how to extract those wonderful images from the SCUMM games - a task SCUMM Revisited already masters, but nevertheless, I'm sure others would like to know how to do it. So, here goes.

I will have to assume a few things in this article in order to not make it too long. First of all, I'll assume you know how the main resource files of SCUMM are structured. If you don't, read some of the SCUMM Encyclopedia found in the SCUMM Revisited help file, and compare what it says to what SCUMM Revisited displays. Also, I'll assume you'll find out yourself how to write source code to achieve what I'm describing. I won't show how to bring the pixels to be shown on screen, or how to read from the file, simply because that's different from language to language, and platform to platform. However, I won't assume you know anything about compression schemes, so I'll go through the compression very gently, and maybe include some side remarks about why compression is achieved with SCUMM's method. Hopefully this also helps you to understand how things work.

This article will focus on how to decompress the 256 color images. The EGA images are not covered. Now, let's get to it.

Introduction to SCUMM Images
SCUMM has evolved through the years and thus, so have the image formats. The small table below shows where the info needed to display images is stored in each of the major SCUMM formats. An example of an Old game is The Secret of Monkey Island VGA (not the enhanced version). An example of a New game is Monkey Island 2. CMI uses its very own block names. The compression for all these blocks (in 256 color games) is the same, only the positions of the various values inside the blocks vary.

If a slash is used, the block before the slash is the block used for room/background images, and the block after the slash is the block for object images:

Table 1: SCUMM Image Information Blocks

Format Image data Info Palette

Old BM / OI HD / OC PA

New RMIM / OBIM RMHD / IMHD CLUT / APAL

CMI IMAG RMHD / IMHD APAL

Now, the Image Data blocks are where the actual compressed pixel data of the image is stored. The Info blocks store the width and height of the image. And the Palette block stores the palette to be used.

I won't go into how the palette blocks work, other than saying that they store the RGB values of each palette index. They're documented formally in the SCUMM Format Encyclopedia.

The Image info blocks vary between each SCUMM format - and even between games in the same format. First, let's look at the HD and RMHD blocks, used for background images.

There is only one format used for the HD block. This block stores the background width as a Little Endian (LE) word at byte offset 6 of the block (remember, the first four bytes in an old format block store the size of the block, and the next two the block type). The height is stored right after that, at offset 8, also as an LE word. That's all we need - if you want to know, the next LE word, at offset 10, stores the number of objects in the room. Thus, the HD block looks like this:

Table 2: HD Block

Offset Description Format

0 Block size dword - LE

4 Block Name 2 * char = "HD"

6 Width word - LE

8 Height word - LE

10 Number of Objects in Room word - LE

The RMHD block is a bit more inconsistent. For new format games prior to Full Throttle, the data is stored exactly the same as for the old format games, except that the first word is now at offset 8 rather than 6 (due to the new block header using 4 bytes rather than 2 for storing the block name). Thus, we have this format:

Table 3: Pre Full Throttle RMHD Block

Offset Description Format

0 Block Name 4 * char = "RMHD"

4 Block Size dword - BE

8 Width word - LE

10 Height word - LE

12 Number of Objects in Room word - LE

With Full Throttle and The Dig, a new RMHD format enters. This format introduces what SCUMM calls the MMucus version (decimal 730 for those two games, meaning version 7.3.0). The MMucus version is stored in the first dword (offset 8), still as LE. Then come, in the usual order and format, the width (offset 12), the height (offset 14) and the number of objects (offset 16) - all LE words.

Table 4: Full Throttle / The Dig RMHD Block

Offset Description Format

0 Block Name 4 * char = "RMHD"

4 Block Size dword - BE

8 MMucus Version dword - LE = 730 (0x2DA)

12 Width word - LE

14 Height word - LE

16 Number of Objects in Room word - LE

In CMI the RMHD format is revised - again. Many of CMI's blocks were revised in order to use dwords rather than words to store values. We still have the dword MMucus version at offset 8. For CMI this version number is 800 - version 8.0.0. Notice how the MMucus version follows the version of SCUMM used - Full Throttle is SCUMM version 7.3.4 (at least my version is, there might be a different release part on your version, say, 7.3.2). CMI is SCUMM version 8.1.0.

After the MMucus version number, we have three LE dwords, containing the width (offset 12), height (offset 16) and number of objects (offset 20). Once again, we don't need the number of objects, but it's nice to know, isn't it? :) The version 8.0.0 header stores a few extra values, including the number of Z-Buffers for the background image.

Table 5: CMI RMHD Block

Offset Description Format

0 Block Name 4 * char = "RMHD"

4 Block Size dword - BE

8 MMucus Version dword - LE = 800 (0x320)

12 Width dword - LE

16 Height dword - LE

20 Number of Objects in Room dword - LE

... ... ...

Can you see an easy way to recognize each format of the RMHD, to know where to read the proper values? Right. We read the first dword, and check if it's 730 or 800 or neither of the two, and read the appropriate values depending on what value we found. There's very little chance that the pre-Full Throttle games will have a first dword of one of those values - it would require a room with a width of 730 or 800 and a height of 0.

OK! Now we know what width and height the background image is, no matter what game we're dealing with! Great, huh? Nah, we'll get to the actual image decoding soon, I promise :) But first, we'll have a very quick look at the IMHD and OC structures used to determine the width and height of object images.

The OC block is the same for all the early VGA games. It's a bit peculiar: The byte at offset 9 stores the initial X position of the object - divided by 8! The byte at offset 10 stores the initial Y position of the object, once again divided by 8. The byte at offset 11 stores the width - divided by 8. And the byte at offset 17 stores the height, which is not divided by 8, but which may have some unknown value in the lower three bits, which you need to remove to actually get the real height (i.e., AND 0xF8). I must admit, I'm not sure what those three bits are used for in the height, as I haven't studied the old object formats that much yet. The division by 8 in the X position and width is most likely due to the Amiga (where the games were first developed - the game files were easily converted to PC), whose sprites (as far as I recall, or maybe it was bobs) could only be placed on an X position divisible by 8, and have a width divisible by 8. Enough about that. The fact remains, Width, X position and Y position must be multiplied by 8. Height must be AND'ed with 0xF8.

OK, it didn't go as quickly as I hoped, hope you'll bear with me. Now, how do we figure out what OC corresponds to which OI? Simple. The first LE word (i.e., offset 6) of both OC and OI holds an object identifier. If an OC's object ID is the same as an OI's, they describe the same object. OK, the stuff we need for OC looks like this:

Table 6: OC Block

Offset Description Format

0 Block size dword - LE

4 Block Name 2 * char = "OC"

6 Object ID word - LE

... ... ...

9 X Position / 8 byte

10 Y Position / 8 byte

11 Width / 8 byte

... ... ...

17 Height AND some 3-bit value byte

Now, let's move on to the new format. So much less pain involved with those. And yet...

All object images in new games (including CMI) use an IMHD block for description of their X position, Y position, width, height etc. I'll just show the different formats for IMHD between games in some tables:

Table 7: Pre-Full Throttle IMHD Block

Offset Description Format

0 Block Name 4 * char - "IMHD"

4 Block size dword - BE

8 Some identifier word - LE

10 Number of Images word - LE

12 Z-Buffers per Image dword - LE

16 X Position word - LE

18 Y Position word - LE

20 Width word - LE

22 Height word - LE

Table 8: Full Throttle / The Dig IMHD Block

Offset Description Format

0 Block Name 4 * char = "IMHD"

4 Block size dword - BE

8 MMucus version = 730 dword - LE

12 Some identifier word - LE

14 No. of Images word - LE

16 X Position word - LE

18 Y Position word - LE

20 Width word - LE

22 Height word - LE

Table 9: CMI IMHD Block

Offset Description Format

0 Block Name 4 * char - "IMHD"

4 Block size dword - BE

8 Internal Name null-terminated string

48 MMucus version = 801 dword - LE

52 Number of Images dword - LE

56 X Position dword - LE

60 Y Position dword - LE

64 Width dword - LE

68 Height dword - LE

Notice that the MMucus version isn't stored at the same offset in Full Throttle/The Dig and CMI. Shouldn't keep you from using it for version recognition, though ;)

That's all! We now know the width and height of all images in 256 color games, as well as the initial X/Y Position of objects! Cool... But what about the actual image then? Let's look at it...

The Strip Issue
All the way up through the SCUMM versions, the actual image data and compression methods remain the same. Actually, for SCUMM Revisited, only CMI was disassembled to retrieve the decompression for all the earlier games (except for Last Crusade, more on that later).

The most basic aspect of SCUMM image compression is the division of the image into "strips". What this means is, rather than storing the image horizontally or vertically line by line, some of SCUMM's compression methods store 8 pixels of the first line, then 8 pixels of the second, etc. Look at the illustration (the corner of a well known background):

Illustration 1: Image Strips
Strips

The yellow lines mark where a strip ends and a new begins. I've marked the first with numbers as an example of the order of pixels mentioned above. The third strip shows the other order that SCUMM can use - simple vertical lines. (Please note that I chose the strips randomly - strip 1 in the actual image I used above is not necessarily compressed with a horizontal compression, and strip 3 not with a vertical one).

Why this weird way of storing the images? Because it allows for better compression with most methods. Look at the illustration again. If, for example, we had a compression method that simply stored repeated pixels as a color and a number of pixels where that color is repeated (known as Run Length Encoding (RLE) in compression terms), using simple horizontal lines would only allow us to compress 18 orange pixels, because after the 18th pixel the color changes to red (I marked that pixel with a green box). If, however, we store the image as 8 pixels of the first line, 8 of the second, 8 of the third etc., we can compress 54 pixels before the color changes (the other green box). Simply put, if we have large areas of the same color in an image, the strip method is likely to give better compression results.

Also, as Ludvig Strigeus pointed out to me a while ago, another reason for the strip method is that it allows for a smart way of scrolling, without keeping the entire background in memory. Usually SCUMM backgrounds scroll 8 pixels at a time, so to scroll a background to the left, we just move all strips one strip-width left, discarding the first one, and decompress the new strip to be displayed at the right.

Each strip in a SCUMM compressed image can have its own compression method. That way, the compression routine of the SCUMM compressor can decide on what method to use for each strip, depending on various qualities of that strip. For example, if it has a large area with the same color, it'll use a compression method that uses the horizontal 8 pixel method for storing that strip. If it has long vertical lines of the same color, it'll use a compression method that uses the vertical method of storage.

So, if I later on say "The image is rendered horizontally", it means that we use the horizontal 8 pixel method when we draw each pixel. If I say, "The image is rendered vertically", it means that we use simple vertical lines.

OK, now you should have a good understanding of the strips issue. Let's look at how the strips are stored.

The Image Data Blocks
In the old games, the BM block stores the image data of backgrounds, and the OI block stores the image data of objects. The layout of the two is almost exactly the same. These old image blocks can only contain one image each.

The first piece of actual data (byte offset 6) in the BM block is a dword, which I actually don't remember what does. In any event, it's not necessary to decompress the image. In the OI block this dword is stored at offset 8, and offset 6 is used for the Object ID - more on that below.

Then, starting at offset 10 (or offset 12 for OI), follows a row of dwords (Little Endian), each containing an offset into the block (relative to offset 6 for BM and offset 8 for OI). What do they point to? A strip. Thus, to calculate the number of offsets, you simply take the image width and divide it by 8 (as each strip is 8 pixels wide). Before we look at what's in each strip definition, we'll go through the other image data blocks, as the actual contents of the strip definitions are the same all the way up to CMI.

Now, the RMIM and OBIM blocks. You might remember that the CMI RMHD block stores the number of Z-buffers for the background, but the older games don't. Where is that number stored in those games then? In a RMIH block inside the RMIM block. That's the only bit of info stored in RMIH blocks, so we can ignore those for image decompression. In OBIM blocks, the RMIH block is replaced with an IMHD block. This is where we find the object image dimensions etc. We've already covered that earlier. After these header blocks comes a number of IMxx blocks. Only one in Room Image blocks - IM00, the background image. IM01, IM02 etc. are inside the Object Image blocks and are used... for object images. Inside the IMxx blocks, we find an SMAP b lock and 0 or more ZPxx blocks. The ZPxx blocks are used for storing the Z-Buffers (so, if the number of such in the header block was 0, there'll be no ZPxx blocks here). Again, we don't need those for image decompression.

The SMAP block is what's important. It's exactly the same as the BM and OI blocks of the old games, except that the offsets to each strip start at byte offset 8, rather than 10.

With CMI, things get a bit confusing. Here, the root block of the image data is called IMAG, rather than RMIM. Less confusing is it that the same IMAG structure is used for both room and object images. Inside the IMAG block, we find a WRAP block. As the name suggests this block just wraps around its contents, which are an OFFS block and a number of SMAP blocks (one per image).

The OFFS block contains a list of offsets (dwords, LE, starting at byte offset 8), relative to itself (i.e., byte offset 0 of the OFFS block), to the SMAP blocks. This may seem a bit redundant, as there are other ways to find the SMAP blocks, but oh well.

Where did the ZPxx blocks of the past go? Well, inside the SMAP block, you'll find two blocks, a BSTR block and a ZPLN block. The ZPLN block contains, once again, a WRAP block, which contains an OFFS block and a number of ZPxx blocks. There they were. We don't need them.

The BSTR block is more interesting. Inside it we find, guess what? A WRAP block! And inside that? An OFFS block! Wow! But this OFFS block is different from the others. We can actually use it for something... This OFFS block is structured the same as the others. Starting at byte offset 8, we have a list of Little Endian dword offsets, relative to the start of the OFFS block. Those are the offsets to the strips! Phew! We're there!

So, this massive hierarchy inside the IMAG block goes:

IMAG

There, that concludes the hunt for strip offsets.

What's in a Strip?
OK, so we find the strip offsets, we follow them one by one, but what do we do when we get to a strip? OK, here's the information on what is actually stored in the strip definitions.

The first byte in the strip data is the compression ID. This is a number between 1 and 128 (0x80). We'll get to that in a second. The next byte is the color of the first pixel in the strip, and also the initial palette index. I.e., the palette index we continue drawing with until we're told otherwise. After these two bytes follow the actual compressed data.

Tiny Bits of Decompression
The compressed data in SCUMM image strips is based on bit streams. That means, we read a few bits, look up in a command dictionary what we should do with them, and then we do just that.

More specific, you say? Well, we read the first bit of the first byte, bit 0. Where's bit 0? At the right end of the byte. The bit order goes like this, in case you forgot:

7 6 5 4 3 2 1 0

If the byte equals 7, which can be written binary as 00000111, the first bit we read is 1. Then we read the second one, 1 too, third one is 1 too, and then 0, 0, 0, 0, 0. Then we go to the next byte. Etc.

This goes for (almost) all the SCUMM image decompression methods. The difference between them is what the command dictionary tells us the bits mean.

Now, there are two different ways of reading the bits. Either we read one bit at a time, or we read a number of bits at once (for example, when reading a palette index). In the text below, I've chosen to show bits read one at a time in the order they're read. When reading more than one bit, I show them in the order they're stored in the file. I.e., if a byte in the file looks like this:

11101011

... and we read four bits, one at a time, I show the bits as:

1101

... but if we read them all at once, I show them as:

1011

This may seem a bit strange at first, but the reason is that when we read the bits one at a time, we look at each bit in the order we read them. When we read several bits at once, we're interested in an integer value, which needs the bits in the same order as they're stored in the file (for example, reading all 8 bits of the byte above as a value, obviously doesn't give a value of 11010111, but rather 11101011, just as it's stored). Whenever we need an integer value, rather than a row of individually read bits, I explicitly state that we need the value.

Decompression Methods
As mentioned before, the first byte we read from a strip is the compression ID - a number between 1 and 128 (0x80). Now, there aren't 128 different decompression methods. Rather, we have only 2 different methods for compression (which we'll refer to as 1st method and 2nd method) - with various variations - as well as an uncompressed method. These methods each span over a number of IDs, as shown in the following table:

Table 10: Compression Methods

IDs Method Rendering Direction Transparent Param Subtraction Remarks

0x01 Uncompressed Horizontal No - -

0x0E .. 0x12 1st method Vertical No 0x0A -

0x18 .. 0x1C 1st method Horizontal No 0x14 -

0x22 .. 0x26 1st method Vertical Yes 0x1E -

0x2C .. 0x30 1st method Horizontal Yes 0x28 -

0x40 .. 0x44 2nd method Horizontal No 0x3C -

0x54 .. 0x58 2nd method Horizontal Yes 0x51 -

0x68 .. 0x6C 2nd method Horizontal Yes 0x64 Same as 0x54 .. 0x58

0x7C .. 0x80 2nd method Horizontal No 0x78 Same as 0x40 .. 0x44

Some explanation may be needed for this table. The first column (IDs) obviously shows the range of values for each method. Note that 0x68..0x6C uses exactly the same decompression routine as 0x54..0x58. The same goes for 0x7C..0x80, which uses the same decompression routine as 0x40..0x44.

The second column (Method) shows the decompression method used for the range. All three will be described below.

The third column (Rendering direction) shows whether the strip is rendered horizontally (render 8 pixels in first row, then 8 pixels in next row, then 8 pixels in next etc.) or vertically (just render an entire column, then the next column etc.).

The fourth column (Transparent) shows whether the strip is rendered transparently. If it is, every time the current palette index equals the transparent palette index for the room (which is specified as an LE dword at offset 8 in the TRNS block) nothing is drawn, in order to let whatever's behind the image "shine through".

The fifth column (Param Subtraction) mentions a "parameter", which we haven't dealt with at all yet. This is the reason why each compression method has several values assigned. By subtracting the Param Subtract from the ID, we get a number between 4 and 8. This is the number of bits used to represent a palette index value when reading it from the bit stream. I.e., if the parameter is 4, whenever we need to read a new palette index from the stream, it'll be between 0 and 15 (because those are the numbers we can represent with 4 bits). If it's 5, we can get a palette index between 0 and 31. If it's 8, we just read an entire byte (8 bits). Etc. In this way, the compression saves a number of bits for each palette index, if it doesn't need to access the entire palette. To make this work, the most often used colors are stored first in the palette.

As for the actual methods:

Uncompressed
The uncompressed method simply stores the palette index for each pixel in a byte. I.e., if the start of a strip looks like this:

01 1A 2B 30 03...

... the first byte tells us that the image is not compressed. The first byte after that is the palette index of the first pixel (at coords 0,0), the second one is the palette index of the second pixel (1,0), the ninth one is the palette index of the ninth pixel (which, with the horizontal rendering that the uncompressed format uses would be at coords (0,1)) etc.

1st method
This compression method uses a subtraction variable, which will be discussed more below. The value of this variable when starting on a strip is 1.

The 1st method recognizes bits as follows (Note: the order of the bits here is the order you read them, i.e., "110" means that the actual representation in the file is 011, but as you read the right-most bit first, the order you read the bits is 110):

0: Draw next pixel with current palette index.

10: Read a new palette index from the bit stream, i.e., read the number of bits that the parameter specifies as a value (see the Tiny Bits of Decompression chapter). Set the subtraction variable to 1, and draw the next pixel.

110: Subtract the subtraction variable from the palette index, and draw the next pixel.

111: Negate the subtraction variable (i.e., if it's 1, change it to -1, if it's -1, change it to 1). Subtract it from the palette index, and draw the next pixel.

For example, if the start of a strip looks like this:

11 05 80 FC,

The first byte tells us we're dealing with the 1st compression method, in a horizontal render variation, not transparent, and with a palette index size of 7 (0x11 - 0x0A = 7). The second byte gives us the initial palette index, and we draw the first pixel in the strip with that color. From the third byte, we start reading as a bit stream:

0x80 = 10000000

The first bit is 0. That means, we draw another pixel with the current palette index: 5. The next 6 bits are the same. We draw a pixel for each bit, still with the color 5. Now we find a 1. There are three codes that start in a 1, so we need the next bit to find out more. We've run out of bits in the first byte, so we read the next byte:

0xFC = 11111100

The next bit is 0. So, we have the code 10, which means we should read a new palette index. From the first byte, we found that the parameter was 7, i.e., a palette index in this strip is 7 bits long. So, we read the next 7 bit value: 1111110 = 0x7E. So, 0x7E is the new palette index. We draw the next pixel using that index. The code 10 also tells us to set the subtraction variable to 1 (it already is 1, as that's its initial value). We have run out of bits in this byte too, so we continue to the next one.

And so on.

How this works: Obviously, using only a single bit to signify "draw a pixel" takes up less room than using 8 (code 0). The palette is stored in such a way that the most commonly used colors in the image are stored first. That way, we can use less than 8 bits to change the current palette index (code 10). Also, colors that are commonly used after each other are stored right before or after each other in the palette, so that the subtraction variable can be used to change between them using only 3 bits (code 110 and 111).

2nd method
The second method is similar to the 1st method, but it recognizes the bits differently (Remember, the order of the bits listed here is the order in which you read them, not the order of the bits in the file!):

0: Draw next pixel with current palette index.

10: Read a new palette index from the bitstream (i.e., the number of bits specified by the parameter), and draw the next pixel.

11: Read the next 3 bit value, and perform an action, depending on the value:

000 (0): Increase current palette index by 4.

001 (1): Increase current palette index by 3.

010 (2): Increase current palette index by 2.

011 (3): Increase current palette index by 1.

100 (4): Read next 8 bits. Draw the number of pixels specified by these 8 bits with the current palette index (somewhat similar to RLE).

101 (5): Decrease current palette index by 1.

110 (6): Decrease current palette index by 2.

111 (7): Decrease current palette index by 3.

How this works: Code 0 and 10 are the same as in the 1st method. Code 11 supplies additional decompression actions. The increasing and decreasing of the palette index (11-000, 11-001, 11-010, 11-011, 11-101, 11-110 and 11-111) gives the same advantages as codes 110 and 111 of the 1st method, except that the codes in the 2nd method allow increases and decreases by other values than 1. Code 11-100 works like RLE. Rather than storing, say, 128 pixels of the same color, the compressor stores only the number of pixels to paint using that color.

The Missing BOMP
Now, there are a few compression methods missing from the discussion so far. Namely the BOMP format, which is used for inventory icons in CMI and for various graphics, such as the highway images in Sam & Max. And the Last Crusade compression IDs (between 0x02 and 0x0D) which work the same as all the others, but the compression of which I haven't figured out yet. So, let's just, here at the end of the article, look a bit at BOMP.

BOMPs are rendered horizontally, but without the 8 pixel wide strips. The data isn't divided into strips at all, but rather into rows, so we simply draw all pixels in the first row, then all pixels in the next, etc. The compression used is a variant of RLE.

The actual BOMP data starts at offset 16 in the BOMP block. As mentioned, the data is divided into rows. Each row starts with the length of the row data, stored as an LE word. This length does not include the actual length word. We'll call it nRowLength

We now read the next byte, which is the compression identifier, and check if its first bit is 1 (i.e. ReadByte AND 0x01 = 1).

If it is, we shift the byte right by 1 (i.e., divide it by 2, if you know your bit arithmetics :) and add 1 to it. The result (let's call it nRLELength) is the number of pixels to draw. The next byte in the data is the palette index to use for drawing, and we then draw nRLELength pixels with that color.

If the first bit is 0, we still shift the byte right by 1 and add 1 to it. But the number we get out of this (let's call it nUncompLength) is a number of bytes that are not compressed. I.e., the next nUncompLength bytes are palette indices, which we simply draw onto the bitmap, one by one.

When we're done with either of these two ways of drawing, the next byte in the data is the next compression identifier.

We keep on doing this until we've processed nRowLength bytes. Then we move on to the next row on our bitmap, read the length of the next row data (the LE word), and start over.

Why "BOMP"? I can't take the credit for this knowledge, but BOMP relates to Blast Objects (probably stands for Blast Object MaP, or something similar). The word Blast indicates that this type of image storage was done for performance rather than compression. I.e., it's mostly used for images that are either used a lot, or need to be displayed fast - such as the CMI inventory objects, the Sam & Max Highway Surfing graphics and so on.

Conclusion
You really want a conclusion? I guess not. I hope this was useful for the people outthere who want to make their own SCUMM image decompression utils. And I hope I didn't make it harder than it is - with all this babbling. It may not seem all that obvious the first time you read it, but then, please, try reading it once more, and experiment with doing some code to test your knowledge. You might understand more than you think! Have fun decompressing, everyone!