I’ve been working on a disassembler for the Motorola 68000 family of microprocessors, which were popular in home computers in the 80s. A disassembler takes a binary blob of machine code and transliterates it into a more human readable text format. These are the actual steps your CPU is taking, however they’re just in human-readable format.
A disassembler is a fairly trivial program, if somewhat tedious. Most instructions are easy to decode. For example, let’s look at the page for the instruction “AND” (from the M68000 Family Programmer’s Reference Manual):
And let’s say the next 16-bit CPU machine language instruction we’re hoping to decode into assembly language is:
0xC000. Well, we see that the first 4 bits match, so we’re definitely looking at some kind of AND instruction. The next 3 bits are called “register”, the 3 bits after that are “opmode” (whatever that is), 3 bits after that are the “effective address mode”, and the last 3 bits are the “effective address register”. Seems easy. Is it? Yes. Let’s check out that a value of
0 means for all of those four fields.
After looking at the Programmer’s Reference Manual a bit more, it turns out that a “register” of 0 indicates that the second operand is in register
D0. An “opcode” of
0 means that the result of the AND operation will be stored into the second operand (which we just determined is
D0). Also it indicates that the AND will only be applied to the bottom 8 bits. The other bits in
D0 will be unaffected. An “effective address mode” of
0 means that the first operand is a Data register. And since the “effective address register” is
0 also, that means that the first operand is
D0 is both the source and destination. This means that the instruction
0xC000 will take the value in register
D0, AND it with itself, and store the result back into D0. And just to be precise, as I mentioned, only the bottom 8 bits are actually changed. TL;DR:
0xC000 translates to
D0 AND D0 -> D0.
An identical instruction would be
0xC100. This would mean that “opmode” is now 4. Which simply switches the register that is stored back into. But since both the operands are
D0, it’s the same operation effectively.
That’s a lot to write out, but it’s actually pretty simple. The hard part comes when the “effective address mode” is something other than 0. Such as 6.
Here’s the table of addressing modes (which “effective address mode” selects from):
First on the list is our familiar “effective address mode” of 0, which indicates a data register as the operand. Other modes of note include 2 (take the contents of an address register, and use that value as a pointer to go fetch something from memory), 3 (same as 2, but also increment the value of the address register – this is good for iterating through arrays), and 4 (same as 2, but pre-decrement the value in the address register, which is useful for going through arrays… in the other direction…). And then there’s the confusing modes 5 & 6.
Mode 5, a.k.a. (d16,An) a.k.a. “Register indirect address with displacement” (yikes), means “Take a given address register, add a 16-bit number to it (from somewhere), and use that value as a pointer to fetch some data from memory”. So how is that instruction encoded?
It’s coded using another two bytes. Let’s say our disassembler is chugging along, and it encounters the instruction
0xC028. We can decode that as “AND, register=0, opmode=0, ea-mode=5, ea-register=0”. So we’ve only changed the Effective Address Mode to 5. When the CPU sees that the Effective Address Mode is “Register indirect address with displacement”, a.k.a. (d16, An), it’ll fetch the next 2 bytes and use those as the d16 value. So perhaps the next 2 bytes are
0x0102 (or 258 in decimal). So now the full instruction bytecode is
0xC028 0x0102. And that translates to “Take the contents of register
A0, add 258 to that, and use that result value as a pointer to fetch some data from memory, then AND it with the value in register D0, and store it in register D0″. Whew.
Okay so now Mode 6. Represented in the table by “(d8, An, Xn)”. We know what An means. We can kind of guess what d8 means (take the next byte?). But what does Xn mean?
So technically Mode 5 uses something called an “extension word”. This is a sequence of 2 bytes which follows the instruction. Some instructions always have an extension word. Some addressing modes add their own. Basically, instructions are of variable length on m68k CPUs. Because this is CISC, after all.
Mode 5’s extension word is represented in the table as “d16”, meaning the whole extension word (two bytes, or 16-bits, a.k.a. a “word” in the CPU jargon of the 1980s) is treated as a single number.
Okay back to Mode 6. With Mode 6, there is also an extension word, but the format is different:
This is called the “brief extension word” format. It includes a signed 8-bit value (“displacement”), an index register number (“register”), a bit (“D/A”) to indicate if the index register is a data (0) or address (1) register, a bit (“W/L”) to indicate if only the first 16-bits of the index register (sign-extended) should be used (0) of if the whole 32-bits should be used (1).
So let’s look at an example of a Mode 6 instruction. Let’s say the bytes we see are “
0xC030 0x9808“. So we can decode the “
0xC030” part as “ADD, register=0, opmode=0, effective address mode = 6, effective address register = 0”. Since it’s mode 6, we look at the next two bytes (
0x9808), and decode that as “displacement=8, scale=0,W/L=L, register=1, D/A=A”. So the full instruction could be written out “Take the value in register A1, multiply it by a scale of 1x, add the value in A0, add the displacement, which is 8, take the result of those additions and use it as a pointer to fetch some data from memory, then AND the result with the contents of register D0, and place the result in register D0”.
Full Extension Words
Okay we’re on a roll here. But don’t get cocky – there are much more complex instruction formats lurking in the m68k ISA. Because… Mode 6 can also be paired with the “Full Extension Word Format”:
As you can see, this one differs from the “Brief Extension Word” format by the 1 bit in bit position 8. We see some similar fields here. There’s the index register (“register”), there’s the D/A and W/L bits, scale. But the 8-bit immediate value has been replaced by a series of flags: BS, IS, BD SIZE, I/IS. “BS” determines if we should suppress the “base register” (the register specified in the first word of the instruction, not an extension word. Usually referred to as “An”). Suppressing something in these calculations means replacing it with
0. “IS” suppresses the index register (the “register” in the extension word). “BD SIZE” determines the number of extension words following this one (the words labeled “base displacement (0, 1, or 2 words)” in the above graphic). “I/IS” and “IS” combined determine the size of the “outer displacement” (how many OD words will follow the extension word and possibly any Base Displacement words). I won’t get into the logic for that, but there’s a nice table in the M68000 Family Programmer’s Reference Manual on page 45 (table 2-2).
Let’s look at an example of doing an AND operation using this instruction format! Let’s say you get the instruction “
0xC030 0x5567 0x000C 0x0004“. We can translate that to “
EA AND D0 -> D0, where
EA = Take the contents of the
A0 register, add the base displacement (1 word, value = 12), use that value as a pointer to fetch a long from memory, add the contents of register A1 to that (scaled by 1x), then add the outer displacement (1 word, value = 4), then use that value as a pointer to fetch a byte from memory. That’s the result. Store it in
So the 68000 is a mess, at least when it comes to addressing modes. And on top of that, the documentation is terrible. Motorola’s normally awesome writing style just fell apart about the time the 68020 came out, and a bunch of new addressing modes were added.
It’s funny, people who wrote 68000 assembly back in the day speak favorably about it, and they seem to prefer it to the Intel 8086. But even the 68020 was starting to show that CISC always leads to a convoluted assembly programming model.