A Neophyte's Guide to DLX

The aim of this web page is to provide an introduction to the DLX instruction set, created in Computer Architecture: A Quantitative Approach by Hennessy and Patterson. If you have some programming experience, but only in (relatively) high level languages like C/C++, understanding basic DLX commands and code fragments is well within your realm, despite what you may think after trying to read Hennessy and Patterson's opaque tome. Unfortunately, shining a light through more than a few pages of that monstrosity is beyond the scope of this web page, but if you've found Chapter 2 to present some hard slogging, then herein you have found your Mecca.


Contents

What makes up the basic DLX machine?

DLX Commands: Explanations and Examples

Some Sample Code

Exercises and Questions for Review


What makes up the basic DLX machine?

Of course, the machine based on the DLX instruction set is a total work of fiction. If one existed, however, it would be a 32 bit machine, i.e., each word would be four bytes. DLX is Big Endian, as opposed to Little Endian, which means that a DLX address first accesses the byte in the most significant position when it's getting a word out of memory. Another issue, byte alignment, is beyond the scope of this web page, but is something you should probably worry about for exam purposes.

Registers and Data Types

The DLX machine is a general-purpose register (GPR) machine, and as such has at it's core a bunch of registers. The ones you really need to worry about are the

integer registers, R0, R1, . . . , R31, the GPRs, and the

floating point registers, F0, F1, . . . , F31, the FPRs.

Each kind of register holds what you'd expect from the name, with one twist: DLX handles floating point numbers of both "single precision" - 32 bits or four words - and "double precision" -- up to 64 bits, i.e., two words. You know these data types as "doubles" and "floats" from C++. To accommodate double-precision floats, you need to use two consecutive floating point registers, paired together, starting with one that's even-numbered and continuing with one that's odd-numbered. Warning: some operations, notably multiplication and division, can only be performed in the FP registers, even when the operands are integers. Later, we'll see code fragments that move data from integer registers to FP registers, perform the desired operations, and move the data back. (This is a good thing to know how to do because it seems to attract professors looking for exam questions like you-know-what does flies.)

To keep things confusing (and to provide more exam fodder), besides the two FP data types, DLX has three integer data types: 8 bit (1 byte), 16 bit (2 byte, or half-word), and 32 bit (4 byte, or word), respectively. The confusing aspect arises when you load less that a word into a register, because you have to worry about the part of the register that doesn't contain the data you just loaded. For example, say you've loaded a word into a register. That word accounts for 8 bits, but the register holds 32 bits. Arithmetic comprehensible to even the dullest moron tells you that you've got another 24 bits floating around out there somewhere in la-la land to worry about. Fortunately, what you do with these left-over bits isn't too difficult, or surprising: you just fill them with 0s. Later, we'll see examples of commands that load data consisting of less than a word into memory.

H&P tantalizingly tell you that DLX has a few other "special" registers. These aren't things you'll worry about too much; the one you really need to worry about is the first integer register, R0, because it's value is always 0. What's the purpose of R0? Well, DLX, as you're no doubt aware, is a RISC architecture. (If you didn't know this, you might as well hang up your cleats right here and now.) When H&P (p.98) tell you that "we can use this register to synthesize a variety of useful operations from a simple instruction set" what they mean is that they're going to use a few tricks to fake out some of the commands that DLX doesn't explicitly support. This leads us to our dear friend, R0. Most importantly, while there are a ton of different ways to access memory (that's what all those addressing modes on page 75 are), DLX only explicitly supports one, displacement. As we'll see below, a few of the others are effectively emulated with R0. We'll also see examples of R0's usefulness when we look at jumps and branches (i.e., when implementing a loop one can use R0 as an easy point of reference for a counter).

Finally, you should understand that DLX is a (0,3) GPR machine. For exam purposes you undoubtedly need to understand the GPR (m,n) format, and know which machine is which, and why, but for DLX programming purposes all you really need to know is that DLX operations take up to three operands, and that ALU operations and memory accesses cannot be combined. For example,

add R3, R2, 256(R1); R3 = R2 + the contents of the memory location pointed to by R1+256

would not work in DLX (although it might in a (1,3) GPR machine). Instead,in DLX, you'd need to do the following:

LW R4, 256(R1); load contents of memory location R1+256 into R4

add R3, R2, R4; now we can add: R3 = R2 + R4

As you've probably figured out from the above examples, in DLX, when you're specifying a memory location, you put whatever it is that refers to the memory location in parentheses. And in DLX, as the next section explain, the thing in parentheses will always be a register.

back to top

Addressing Modes

There are three different kinds of objects a CPU may need to access: constants (known in DLX-land as immediates), registers, and memory locations. When H&P are talking about addressing modes, it's easy to forget that they're talking about anything other than a memory location, because we're used to thinking about memory addresses. Thus, it's worth stating the obvious, to wit: that any real computer program will have both constants and will reuse data, and in DLX we have to load data items into registers from memory before we can process them, all of which means that DLX commands address all three kinds of objects. An addressing mode is simply the way in which you describe the object you want to access.

What is an addressing mode will probably become most clear when you look at some of the examples below, but for starters consider that, just as you have different modes of address for different people, an instruction set has different addressing modes for different kinds of objects. And just as you might address the same person in a number of different ways, depending on the context of your address (e.g, "Bill", "Mr. President", "hey you", and "Mr. Spineless" can all be used to address the same person) an ISA can use a number of different modes of address to get at the same object. While some cruel 411 instructor may make you know all of them for an exam (see table on page 75), for DLX you need to worry about only five addressing modes, two of which DLX doesn't actually support, but can simulate with our friend R0. DLX's five addressing modes (with the two indirectly supported modes listed last) are as follows:

1) Register - this mode you use most of all; it's what you use any time you specify a register. For example, see about any code fragment contained herein, including those in the preceding section, which reference registers. This addressing mode is so basic that H&P don't even bother to mention it as a part of DLX (not that H&P's failure to mention something is a reliable indicator of whether or not it's basic). Remember, any time you access any object, you're using an addressing mode -- just as you are any time you speak to another person.

2) Immediate - this is how you access constants. Example:

add R2, R1, #3; R2 get sum of contents of R1 and integer

The "#3" is an immediate. The pound sign apparently isn't necessary, but when you see it in H&P in a DLX command, it signifies an immediate.

3) Displacement - I would have understood this one better from the start had it been called "offset" mode, because what goes on in the displacement mode is that you specify a register which contains a memory address (e.g., (R1)) and then specify an offset or displacement to be added to the address in the register you've specified. That is, the address you're aiming for is given by the number in the register plus the amount of the displacement you specify. Example:

LW R17, 400(R23); put contents of address R23+400 into R17

4) Register deferred - this address mode is allows you to access a memory address contained in a register. In essence, it's the displacement mode without the displacement. In DLX, you fake this mode by simply using 0 as your offset. Example:

LW R17, 0(R23); put contents of address specified by R23 into R17

5) Absolute - this addressing mode, also known as direct addressing, allows you to specify an address (e.g., (2001)) to be accessed. This one is faked out in DLX by using R0 as your register, and whatever address you want to access as your displacement. Example:

LW R17, 2001(R0); put contents of address 2001 into R17

back to top

DLX Instruction Format

On page 99 of their monster tome, H&P, with their usual prolixity, describe the format of DLX instructions in a brief paragraph accompanied by a small diagram. There are three formats for DLX instructions: I-type (I is for immediate), R-type (R is for register), and J-type (J is for jump). All DLX instructions are 32 bits long, and commence with a 6 bit opcode. The opcode is simply nothing more than a 6 bit binary number that represents a particular DLX command. For example (and I made this up off the top of my head) suppose the opcode for add is 110011. Then the first six digits of all add instructions will be 110011.

Think of a DLX instruction just as you would a programming command in a high-level language. The opcode is like a keyword, and what follows are its arguments. That's what Figure 2.21 on page 99 shows you: what arguments follow the opcode for each type of instruction. For example, the only argument a J-type instruction takes is a 26 bit offset to the PC. This makes sense: when a program branches, what you're doing is skipping to a selected spot in the program, i.e., telling the PC to go next not to the next instruction in the sequence, but the one specified by the offset. Similarly, an I-type instruction takes as arguments two registers, one a target and one an operand, as well as an immediate, the other operand.

back to top


DLX Commands: Explanations and Examples

Most of the DLX commands you'll actually need to know (to succeed in CMSC 411, at any rate) are actually pretty easy. The syntax for a DLX command is, in general:

<opcode> <target> <source(s)>

Specifically, you'll usually use one of the following:

<command> <operand register> <immediate>

<command> <target register> <operand register> <operand register>

<move command> <target> <source>

<branch command> <label of place in code to go to>

That is, you say what you're going to do, specify which register receives the result, and which registers are accessed to get the result. Obviously, a number of different kinds of commands are described above; don't worry if these descriptions don't make sense to you right off the bat. The four different kinds of instructions in DLX are: (1) ALU operations, (2) data transfers, (3) branches and jumps, and (4) floating point operations. The rest of this section considers each of them in turn, after a brief note on notation.

A Few Words on Syntax and Notation

Although it may not be obvious from the immediately preceding section, DLX syntax, at least insofar as you need to worry about it, is actually pretty simple. Anyone who's ever wrestled with a header file defining inherited class with pure virtual functions and all that junk in C++ will find mastering the amount of DLX required for this course to be a breeze. The hard part is understanding H&P's complicated and convoluted hardware notation. My advice to you with respect to the hardware notation is that, even though H&P found it so important they put it on the inside back cover of the book, you should ignore it. If you're taking CMSC 411 from the instructor who commissioned this web page, at least, you can follow this advice without any problems. You might need to struggle through a few examples in the book, since their hardware notation is the only commenting that H&P provide, but apart from that, you're better off spending your time worrying about what the code does as opposed to the difficult-to-decipher descriptive scheme concocted by H&P. This web page follows its own advice and ignores H&P's hardware notation. Everything is commented in English. It may be bulkier, but at least you have a shot at comprehending it.

back to top

ALU Operations

ALU operations are at the core of most computer programs. For the most part, they consist of simple arithmetic and simple logic.

Arithmetic: Adding, Subtracting, Multiplying and Dividing

These are so easy that we include them here only for the sake of completeness. Examples:

ADD R3, R2, R1; R3 = R2 + R1

SUB R3, R2, R1; R3 = R2 - R1

Furthermore, ADDI, ADDU, ADDUI, SUBI, SUBU, and SUBUI all work like the above; simply be aware that the "U" means unsigned and the "I" means immediate, and you should use them as appropriate for your operands (I know at least one CMSC 411 professor who will take points off if you don't!).

Multiplication and division work in a fashion similar to add and subtract, save that these operations can only be performed on data contained in floating-point registers. We'll take a look at how to move data from an integer register to a floating point register in the section on data transfers, below.

back to top

Logic: ANDs, and ORs

AND and OR work just like the logical AND and OR you're used to from languages like C++:

AND R3, R2, R1; if R1==R2 R3 gets value 1, else R3 gets value 0

OR R3, R2, R1; if R2 != 0 or R1 != 0, R3 gets value 1, else R3 gets value 0

Other basic logical operations include XOR, ANDI (and immediate), ORI (or immediate), and XORI (xor immediate). They work as you'd expect.

Load High Immediate

Now for the twists. The first low-level pain in the rear to worry about is load high immediate:

LHI, R19, #429; value 429 goes in upper half of R19, lower half of R19 gets 16 0s

Remember, immediates are 16 bits. If you want to put an immediate in the upper half, i.e., the upper two words, of a memory location, well, then, by gum, LHI is just what you're looking for. And don't worry about the rest of the register -- LHI quite thoughtfully fills the lower two bytes with 0s.

Shifts

Another logical operation that will probably seem familiar from CMSC 311 (remember all those damn shift registers?) is the shift. A shift command takes what's in a register, shifts it to the left or right a specified number of places, and puts the result in the target register, like this:

SLL, R6, R5, R3; shift R5 to the left by the amount specified in R3; place value in R6

SLLI, R6, R5, #4; shift R5 left 4 places and put result in R6

Other shift commands are SRL (shift right logical), SRA (shift right arithmetic), SLLI (shift left logical immediate), SRLI (shift right logical immediate), and SRAI (shift right arithmetic immediate). The arithmetic shifts differ from the logical shifts in that they operate on signed two's-complement numbers to preserve the sign bit upon the shift. (Actually, while this is probably enough to know, it's a little more complicated than this. If, like me, you weren't taught this concept in the prerequisite to CMSC 411, you might want to check out Chapter 9 of Logic and Computer Design Fundamentals by Mano and Kime.)

Setting Conditionals

The category of arithmetic/logical operation we need to worry about are those that set a conditional. DLX allows you to set the following conditions: LT (less than), GT (greater than), LE (less than or equal), GE (greater than or equal), EQ (equal) and NE (not equal). The syntax for setting a condition is:

<set condition> <target register> <operand 1> <operand 2>

The target register takes on the value of 0 or 1, depending on whether or not comparing the operands meets the established condition. For example:

SGE R1, R15, R28; if R15 >= R28 then R1 = 1 else R1 = 0

SGEI R1, R27, #1089; if R27 >= 1089 then R1 = 1 else R1 = 0

back to top

Data Transfers

DLX includes three kinds of data transfers: loads, stores, and moves.

Loads and Stores

Loads and stores are fairly simple, and are essentially the inverse of one another. For example:

LW R5, 0(R3); load the word in the memory location pointed to by R3 into R5

SW 0(R3), R5; store the word in R5 in the memory location pointed to by R3

A complete list of load and store commands for integers is as follows: LB (load byte), LBU (load byte unsigned), LH (load half byte), LHU (load half unsigned), LW (load word), SB (store byte), SH (store half), SW (store word). Of course, if you load or store less than a word, that is, a byte or a half, then you have to worry about what happens to the rest of the word. Because DLX is Big Endian, when it works with less than a word it fills the most significant byte(s) of the word, and puts 0s in the rest of the word. For example, loading a byte into a register will put that byte into positions 24-31 of the register, and will put 0s into the register's positions 0-23. Similarly, storing a half word stores two bytes of data in positions 16-31 of the designated memory address, and 0s in positions 0-15.

The above commands all act on the integer registers. For floats, you need to know about LF (load float), LD (load double), SF (store float), and SD (store double). These work just like their integer counterparts.

Finally, you should make sure you understand the concept of sign extension. No doubt you recall from CMSC 311 that signed two's-complement numbers use their most significant digit for the sign bit. Sign extension simply means that DLX fills empty bits in a word with the sign bit. For example, as H&P describe on page 101, when you use an immediate in an ALU operation, what you get is a 16 bit sign extended immediate - i.e., the upper two bits of the word are filled with the sign bit of the immediate.

back to top

Moves

DLX has three categories of moves: (1) MOVI2S and MOVS2I -- moves between integer registers and special registers, (2) MOVF and MOVD -- copies (not moves) between single precision and double precision FP registers, and (3) MOVFP2I and MOVI2FP -- moves between FP registers and special registers. Notice that almost everywhere else in DLX when you see and "I" it stands for "immediate", but here it means "integer". The syntax for move commands is:

<move> <target> <source>

It's really that simple. The main issue with moves is remembering to do them when you have data in one register and need to perform an operation that can only be done in another register.

back to top

Control Commands: Branches and Jumps

H&P explain (page 80) that they refer to a " jump when the change in control is unconditional and a branch when the change is conditional." Fair enough, you say, but then you wonder "what's a change in control?" Nothing too complicated it turns out; jumps and branches are just like function calls, ifs, and the like in C++. Of course, when you get to pipelining, jumps and branches are a major headache, but from a pure DLX programming standpoint, you shouldn't find them to be too tough.

Ordinarily, in DLX, just like the high level languages it mirrors, instructions are sequentially executed. That is, whenever the CPU gets a new instruction the PC, or program counter, increments by 4 (remember, DLX memory addresses contain four bytes each), and the CPU fetches, decodes, executes, etc. the next instruction in the sequence. A branch mucks up this neat little scheme - just how badly you'll learn when you get to pipelining. For the moment, you need to understand that, when a jump or branch occurs, the program breaks out of its sequential execution, and goes to the instruction named by the branch or jump. In the case of a branch, the sequential execution is broken only if the condition on which the branch is predicated is met. As we explained when we looked at the DLX instruction format, J-type instructions - jumps and branches - have a 26 bit offset which is added to the PC. This offset contains the address of the instruction which should be fetched next in place of the instruction which would otherwise come next in the sequence. From a DLX programming standpoint, this means that whenever you implement a jump or a branch, you include in your command the name of the jump's or the branch's destination. Examples of all the control commands you need to know should make this more clear:

BEQZ, R12, subroutine; if R12 == 0 go to line labeled "subroutine"

BNEZ, R12, subroutine; if R12 != 0 go to line labeled "subroutine"

Branch if equal zero and branch if not equal to zero are more or less self explanatory. So is the basic jump. A jump and link is a little more complicated, but not much. With this command, execution jumps to the line you specify, but the location of the next instruction in the sequence (PC + 4) is stored in R31. Similarly, with jump register and jump and link register, you specify the register to jump to, and, if you also specify a link, PC + 4 is again stored in R31.

J loop; go to line labeled "loop"

JAL loop; go to line labeled "loop" and put PC + 4 in R31

JALR R21; jump to instruction whose address is in R21 and put PC + 4 in R31

JR R15; jump to register whose address is R15.

DLX also has two commands which test the vlaue of the FP status register. This register is another one of those special registers (like R0 or the PC) whose existence we mentioned earlier. This register, true to its name, reflects the status of floating point operations. Naturally, we want a way to get at that valuable information about how the old FP operations are coming along. That's where BFPT (branch floating point true) and BFPF (branch floating point false) come in.

DLX has two other control commands: RFE (return from exception) and TRAP (transfer to operating system). All you need to pretend you know is that they do what you'd think from hearing their names, and beyond that we'll not go here.

back to top

Floating-point Operations

The main fact that should concern you with respect to floating point operations is their existence. That is, be aware that when you're dealing with the FP registers, you need to use the commands that manipulate data in those registers. If you want to add two floats, use ADDD (add double) or ADDF (add single-precision float). Likewise for subtracting, multiplying, and dividing. And of course, as we've already noted, you must use the float registers to multiply and divide, even if your operands are integers. Note also that there are special commands for comparing floats just as there are for integers (and they are all analogous). Finally, there are a host of commands to allow you to convert between floats, doubles, and integers. I never used them in CMSC 411 and I don't see why you should either.

back to top


Some Sample Code

Here are a few simple sample snippets of DLX code to get you started. A few more sample snippets can be found in the answers to the exercises below; these answers aren't commented, so you'll need to look at the code-writing exercises to see what these snippets are supposed to do. Actually, you can probably find all the sample code you need on old exams, but just in case you can't, here goes:

Multiply 2*4 and store result

Here we place 4 and 2 in two integer registers, move them to FP registers so we can multiply them, and then move the result back to an integer register so we can store it as an integer. Note that, since we're creating what's in the integer registers by adding immediates, we could have simply added the immediates to the FP registers. Such an implementation of our multiplication would have created a pedagogical shortfall, however, because we would have lost an opportunity to display our moves.

ADDI R1, R0, #4; R1 now contains 4
ADDI R2, R0, #2; R2 now contains 2
MOVI2FP F1, R1; contents of R1 (the value 4) moved to F1
MOVI2FP F2, R2; contents of R2 (the value 2) moved to F2
MULT F3, F2, F1; F3 now contains 8 (2*4)
MOVFP2I R3, F3; contents of F3 (the value 8) moved to R3
SW 7000(R0), R3; integer 8 stored in memory location 7000

back to top

Check array for value zero

Here's a loop for you. This code fragment checks the elements of a 10 element array for the value 0, stopping when the value is found, or when all 10 elements have been checked. If a zero is found, the code stores a non-zero value (signifying "true"), else it stores a 0 (signifying "false"). Kind of confusing, eh? This truly is a code fragment in the proud spirit of H&P!

ADDI R31, R0, #1 ; R31 = 9
loop: LW R15, 0(R1); put integer in location R1 into R15
BEQZ R15, done; if R15 == 0 we need to exit the loop now
ADDI R1, R1, #4; make R1 point to the next element in the array
SUBI R31, R31, #1; decrement R31 by 1
BNEZ R15, loop; if R15 != 0, we need to check the next value
done: SW 256(R0), R31; R31 == 0 only if we FAILED to find a 0 in the array

back to top

Raise a float to the nth power

Here's another loop. Assume we have a float, X, in F1, and a positive integer, I, in R1. We want to raise X to the Ith power.

ADDI R2, R0, #1; R2 = 1
MOVI2FP F3, R2; F3 = 1
loop: MULT F3, F3, F1; F3 = F3*F1
SUBI, R1, R1, #1; decrement R1 by 1
BNEZ, R1, loop; if R1 != 0 then we need to continue
SW 3000(R0), F3; store result

back to top


Exercises and Questions for Review

The following problems are here more to test your knowledge than to challenge you. The point is that if you get somewhat comfortable with DLX, issues you may be tested on won't present any more difficulty than programming problems presented in any of the languages you've already worked in.

What's wrong with this code?

Figure out what's wrong with the following lines of code:

  1. ADDI R34, R5, R0;
  2. LW 0(R12), F23;
  3. SUB 0(R26), F4, R3;
  4. J R11;
  5. MULT R9, R8, R7;
  6. LD F15, 0(R1);

answers - back to top

Drop a line (or two)

Write a line or two of code that does the following:

  1. Tests to see if a value in R19 is equal to zero, and branches to a line labeled "loop" if it is not.
  2. Jumps to an instruction who's pointed to by R16, and saves the address of the next instruction (PC+4) in R31.
  3. Multiplies two integers residing in R1 and R2 and stores the result.
  4. Load two double precision floats, multiply them, and store the result.
  5. Find the factorial of the number 10 and store the result.

Since there is always more than one way to skin a cat, your answers may be different - but not too different, because the code here is all very short and simple - from the answers below and still right.

back to top

Things you should know: a partial list

Here are a few concepts related to DLX with which you should be conversant. Each term is a link to the place in this web page where we explain it.

Addressing modes (can you name and describe all the kinds DLX uses?)

Big Endian

Displacement

DLX data types

DLX instruction formats

Immediate

Load high immediate

opcode

Sign extension

back to top (no, this isn't a DLX term!)

Answers

Here are answers to some of the review questions:

What's wrong with this code? Let's have a look . . .

  1. First, the GPRs range from R0 to R31; there's no R34. Secondly, ADDI means you're adding an immediate, and R0, though a special register (remember, it's always 0), is not an immediate.
  2. DLX syntax always puts the target before the source (and the cart before the horse). The register holding the address of the word to be loaded should come after the target register is specified.
  3. DLX is a (0,3) architecture; you can't put a memory reference in an ALU command. Also, what's that FPR doing in there?
  4. J, a plain old jump command takes the name of the line of code (e.g., "loop" or "subroutine1") as its argument; if you want to specify an address in a register for the jump target, use JR.
  5. MULT only works on data in FPRs.
  6. You can only load a double into an even numbered FPR; for doubles, FPRs travel in pairs (see the section above on registers).

back to top

Drop a Line (or two) - here's what you shoulda coulda dropped . . .

  1. BNEZ R19, loop;
  2. JALR R16;
  3. MOVI2FP F1, R1;
    MOVI2FP F2, R2;
    MULT F3, F2, F1;
    SW 100(R0), F3;
  4. LD F6, 0(R10);
    LD F8, 8(R10);
    MULTD F10, F6, F8;
    SD 16(R10), F10;
  5. ADDI R10, R0, #10;
    MOVI2FP F10, R10;
    SUBI F9, F10, #1;loop: MULT F10, F10, F9;
    SUBI F9, F9, #1;
    BNEZ F9, loop;
    SW 2000(R0), F10;

back to top

THE END