I’ve been skimming through the 1993 book Windows Internals by Matt Pietrek. At that time, Microsoft Windows (and a copy of Microsoft MS-DOS to bootstrap it) was already the go-to Operating System for businesses. But there were attempts to dethrone it, such as IBM’s PS/2 and of course the Apple Macintosh. What made Windows able to survive while the competition eventually fell off?
Well, one thing that stands out to me while looking through the book (and this isn’t exactly a novel opinion) is that the Intel x86 CPUs were absolutely ridiculous to program against. Probably 80% of what makes up MS-DOS and Microsoft Windows is just trying to provide a consistent, clean interface over top of the ridiculous design decisions made at Intel. For instance, the Intel 8086, 80286, and 80386 all have radically different capabilities, while maintaining backward compatibility, and these were released in 1979, 1982, and 1985, respectively. In our modern age of CPU stability (or stagnation, if you’re still bitter about the end of Moore’s Law as I am), where your 2021 Intel CPU is effectively the same as a 2004 Intel CPU (besides nice-to-have bonus features that really only matter in user-land and the OS doesn’t have to worry about), it’s hard to get a sense of what “first to market” means when the CPU architecture is changing under you every few years.
Microsoft put in the hard work of shipping Windows for these ridiculous platforms, and they reaped the rewards. I think that a similar thing happened to Linux. Linux was originally released in 1991, for the Intel 386. Commercial Unixes existed for the platform, but a low-cost solution would take years. A free solution was unthinkable. Well okay, there was MINIX, but that wasn’t really free software (source available, for a fee, so not truly libre).
These are (old, outdated) technical documents that never made it to the internet. Presumably they’re still covered by patents or other intellectual property, and they’ll hopefully be released someday. Or maybe they’re lost forever, who knows.
AMD-751™ System Controller Data Sheet (order #21910)
AMD Athlon™ System Bus Specification (order #21902)
I’ve been designing my own Pentium 1 motherboard. I originally wanted to connect a 386 or 486 to a modern FPGA and see if I could get DOS or Windows 3.1 to boot. However, CPUs from that era use “TTL” signaling (5v logic high). Modern FPGAs will literally break if you send 5v at them in any way. So I was about to give up, when I decided to widen my sense of what’s possible. I had assumed that newer chips were simply too complex. However, after looking at the Pentium 1 datasheet, it seems like the ideal CPU to use for a homemade motherboard. It uses CMOS signaling (3.3v high), which modern FPGAs support (for now!). The downside is, the Pentium has a 64-bit-wide data bus, and all 64 lines must be connected. Add on a 32-bit address bus (actually 37 lines for legacy compatibility reasons), and many control lines, and I calculate that you’ll need an FPGA with at least 141 GPIOs. Just for interfacing with the CPU. Want to add external RAM? Or VGA, keyboard, mouse, disk drives? That’s extra.
I’ve been designing the board using the excellent and free CircuitMaker tool. It’s basically Altium Designer, with some advanced features removed, and your projects must be public. No problem for me.
Due to the cost of the FPGA board I’ll have to buy, I probably won’t be working on this for awhile. But eventually, if all goes well, I’ll have a working board, and lot of Verilog to write to simulate a simple BIOS, northbridge, RAM, etc.
I also want to shout out to a similar project: https://hackaday.io/project/174327-wirewrap-pentium Projects like this one (and, I hope, mine) raise the collective bar for what’s possible for a hobbyist to accomplish. Nothing is truly out of reach, it just takes effort (and often some clever hacks to avoid spending too much money!)
I’ve been working on a disassembler for the Motorola 68000 family of microprocessors, which were popular in home computers in the 80s. A disassembler takes a binary blob of machine code and transliterates it into a more human readable text format. These are the actual steps your CPU is taking, however they’re just in human-readable format.
A disassembler is a fairly trivial program, if somewhat tedious. Most instructions are easy to decode. For example, let’s look at the page for the instruction “AND” (from the M68000 Family Programmer’s Reference Manual):
And let’s say the next 16-bit CPU machine language instruction we’re hoping to decode into assembly language is: 0xC000. Well, we see that the first 4 bits match, so we’re definitely looking at some kind of AND instruction. The next 3 bits are called “register”, the 3 bits after that are “opmode” (whatever that is), 3 bits after that are the “effective address mode”, and the last 3 bits are the “effective address register”. Seems easy. Is it? Yes. Let’s check out that a value of 0 means for all of those four fields.
After looking at the Programmer’s Reference Manual a bit more, it turns out that a “register” of 0 indicates that the second operand is in register D0. An “opcode” of 0 means that the result of the AND operation will be stored into the second operand (which we just determined is D0). Also it indicates that the AND will only be applied to the bottom 8 bits. The other bits in D0 will be unaffected. An “effective address mode” of 0 means that the first operand is a Data register. And since the “effective address register” is 0 also, that means that the first operand is D0. So D0 is both the source and destination. This means that the instruction 0xC000 will take the value in register D0, AND it with itself, and store the result back into D0. And just to be precise, as I mentioned, only the bottom 8 bits are actually changed. TL;DR: 0xC000 translates to D0 AND D0 -> D0.
An identical instruction would be 0xC100. This would mean that “opmode” is now 4. Which simply switches the register that is stored back into. But since both the operands are D0, it’s the same operation effectively.
That’s a lot to write out, but it’s actually pretty simple. The hard part comes when the “effective address mode” is something other than 0. Such as 6.
Here’s the table of addressing modes (which “effective address mode” selects from):
First on the list is our familiar “effective address mode” of 0, which indicates a data register as the operand. Other modes of note include 2 (take the contents of an address register, and use that value as a pointer to go fetch something from memory), 3 (same as 2, but also increment the value of the address register – this is good for iterating through arrays), and 4 (same as 2, but pre-decrement the value in the address register, which is useful for going through arrays… in the other direction…). And then there’s the confusing modes 5 & 6.
Mode 5, a.k.a. (d16,An) a.k.a. “Register indirect address with displacement” (yikes), means “Take a given address register, add a 16-bit number to it (from somewhere), and use that value as a pointer to fetch some data from memory”. So how is that instruction encoded?
It’s coded using another two bytes. Let’s say our disassembler is chugging along, and it encounters the instruction 0xC028. We can decode that as “AND, register=0, opmode=0, ea-mode=5, ea-register=0”. So we’ve only changed the Effective Address Mode to 5. When the CPU sees that the Effective Address Mode is “Register indirect address with displacement”, a.k.a. (d16, An), it’ll fetch the next 2 bytes and use those as the d16 value. So perhaps the next 2 bytes are 0x0102 (or 258 in decimal). So now the full instruction bytecode is 0xC028 0x0102. And that translates to “Take the contents of register A0, add 258 to that, and use that result value as a pointer to fetch some data from memory, then AND it with the value in register D0, and store it in register D0″. Whew.
Okay so now Mode 6. Represented in the table by “(d8, An, Xn)”. We know what An means. We can kind of guess what d8 means (take the next byte?). But what does Xn mean?
So technically Mode 5 uses something called an “extension word”. This is a sequence of 2 bytes which follows the instruction. Some instructions always have an extension word. Some addressing modes add their own. Basically, instructions are of variable length on m68k CPUs. Because this is CISC, after all.
Mode 5’s extension word is represented in the table as “d16”, meaning the whole extension word (two bytes, or 16-bits, a.k.a. a “word” in the CPU jargon of the 1980s) is treated as a single number.
Okay back to Mode 6. With Mode 6, there is also an extension word, but the format is different:
This is called the “brief extension word” format. It includes a signed 8-bit value (“displacement”), an index register number (“register”), a bit (“D/A”) to indicate if the index register is a data (0) or address (1) register, a bit (“W/L”) to indicate if only the first 16-bits of the index register (sign-extended) should be used (0) of if the whole 32-bits should be used (1).
So let’s look at an example of a Mode 6 instruction. Let’s say the bytes we see are “0xC030 0x9808“. So we can decode the “0xC030” part as “ADD, register=0, opmode=0, effective address mode = 6, effective address register = 0”. Since it’s mode 6, we look at the next two bytes (0x9808), and decode that as “displacement=8, scale=0,W/L=L, register=1, D/A=A”. So the full instruction could be written out “Take the value in register A1, multiply it by a scale of 1x, add the value in A0, add the displacement, which is 8, take the result of those additions and use it as a pointer to fetch some data from memory, then AND the result with the contents of register D0, and place the result in register D0”.
Full Extension Words
Okay we’re on a roll here. But don’t get cocky – there are much more complex instruction formats lurking in the m68k ISA. Because… Mode 6 can also be paired with the “Full Extension Word Format”:
As you can see, this one differs from the “Brief Extension Word” format by the 1 bit in bit position 8. We see some similar fields here. There’s the index register (“register”), there’s the D/A and W/L bits, scale. But the 8-bit immediate value has been replaced by a series of flags: BS, IS, BD SIZE, I/IS. “BS” determines if we should suppress the “base register” (the register specified in the first word of the instruction, not an extension word. Usually referred to as “An”). Suppressing something in these calculations means replacing it with 0. “IS” suppresses the index register (the “register” in the extension word). “BD SIZE” determines the number of extension words following this one (the words labeled “base displacement (0, 1, or 2 words)” in the above graphic). “I/IS” and “IS” combined determine the size of the “outer displacement” (how many OD words will follow the extension word and possibly any Base Displacement words). I won’t get into the logic for that, but there’s a nice table in the M68000 Family Programmer’s Reference Manual on page 45 (table 2-2).
Let’s look at an example of doing an AND operation using this instruction format! Let’s say you get the instruction “0xC030 0x5567 0x000C 0x0004“. We can translate that to “EA AND D0 -> D0, where EA = Take the contents of the A0 register, add the base displacement (1 word, value = 12), use that value as a pointer to fetch a long from memory, add the contents of register A1 to that (scaled by 1x), then add the outer displacement (1 word, value = 4), then use that value as a pointer to fetch a byte from memory. That’s the result. Store it in D0“.
So the 68000 is a mess, at least when it comes to addressing modes. And on top of that, the documentation is terrible. Motorola’s normally awesome writing style just fell apart about the time the 68020 came out, and a bunch of new addressing modes were added.
It’s funny, people who wrote 68000 assembly back in the day speak favorably about it, and they seem to prefer it to the Intel 8086. But even the 68020 was starting to show that CISC always leads to a convoluted assembly programming model.
I’ve been porting a Linux driver to Arduino. Many old computers (late 80s, early 90s), especially Apple Macintosh computers, used a peripheral bus called SCSI to communicate between the motherboard and hard drive. The protocol is pretty simple, but there can be some high-speed activity, which was too fast for early CPUs to handle. So companies like NCR and AMD made interface chips to handle the high-speed aspects of the protocol, and the CPU just needs to insert/retrieve data from it. Well, the Linux kernel contains drivers for several such chips, including the most well-known SCSI interface chip, the NCR5380. I’ve been porting that code to Arduino, so that the lowly microcontroller can talk to retro hard drives. Pretty neat, eh?
Porting code from a full-fledged kernel driver to a microcontroller is surprisingly straightforward. Basically, you just delete all of the code for mutexes, queues, and other multithreaded OS nonsense like that. 😉 Then just make sure the low-level I/O interface is modified to talk to the Arduino pins, and the higher-level functions should just work.
This particular chip is very old (late 80s), and many CPUs back then were still 8-bit, so that’s another reason that this is fairly easy. I downloaded Linux version 1.0, from 1994, and the NCR5380 driver was in back then, and it looks like it hasn’t changed much over the years.
The Arduino code is a work-in-progress. I’ve barely scratched the surface. I’ve reached the point where sending SCSI commands is working, and that was the hard part. From here, all I have to do is issue the right commands to load a disk and read or write to it!
Extract the .rpm using 7zip or some other archive opening tool.
Make a batch file to test out QEMU + OVMF: "c:\Program Files\qemu\qemu-system-x86_64.exe" -bios "C:\<path where you extracted the rpm>\usr\share\edk2.git\ovmf-x64\OVMF-pure-efi.fd" pause
When you run it, you should get a message like “BdsDxe: failed to load Boot0001…” and it should eventually dump you into a UEFI shell. This means OVMF is running.
Now modify the batch file to add a few flags. We need to listen on the default GDB port: "c:\Program Files\qemu\qemu-system-x86_64.exe" -bios "C:\<path where you extracted the rpm>\usr\share\edk2.git\ovmf-x64\OVMF-pure-efi.fd" -s -S pause
Now run it again. The UEFI bootloader shouldn’t run. QEMU is waiting for us to connect via GDB and tell it to start running.
Following the above blog post should be straightforward, but as a hint, here’s my batch file for creating the kernel.bin file: "C:\Program Files\NASM\nasm.exe" multiboot_header.asm "C:\Program Files\NASM\nasm.exe" boot.asm "C:\Program Files\NASM\nasm.exe" -f elf64 boot.asm "C:\Program Files\NASM\nasm.exe" -f elf64 multiboot_header.asm "C:\mgc\embedded\codebench\bin\x86_64-amd-linux-gnu-ld.exe" -n -o kernel.bin -T linker.ld multiboot_header.o boot.o pause
TODO: Probably want to backtrack here a bit and create an EFI executable instead. Probably using NASM, because GNU-EFI does too much of the work for us, and this is primarily a learning adventure for me.
TL;DR: “with exports” means the project starts with some example exports added. You’ll need exports, so why not let it insert some boilerplate code for you? Less chance of typos. Use “dynamic-link library” instead of “dynamic-link library with exports” if you want to add ‘__declspec’ to things yourself.
So you want to create a DLL in Visual Studio. You’re vaguely aware of how a DLL works, but the options provided to you are still unclear:
Assuming you want a plain old DLL (that has existed for the last 25+ years), you’re going to want one of the top 2 options.
The major difference between them is, “with exports” adds some defines:
And it adds some example exports, so you can see how they work:
// This is an example of an exported variable
DLL1_API int nDll1=0;
// This is an example of an exported function.
DLL1_API int fnDll1(void)
// This is the constructor of a class that has been exported.
In theory, you can compile this DLL and test it out immediately.
I just started hacking on an application using the Microsoft Foundation Class library. There are a lot of options in the “new project” wizard, but they don’t show you what any of the options mean. So here are some screenshots of the settings, and the result (this is mostly for my own future reference):