Typical MIPS instructions

lw $s1, 100($s2)

• $s1 is the register to be loaded • $s2 is the address in memory

add $s1,$s2, $s3 • $s1 is the destination register
• $s2 and $s3 are source registers

beq $s0,$s1, 10

• $s0 and $s1 are registers to be compared for equality

j 3000

• The operand is a word address of the next instruction
• The operand needs to be multiple of $4$

We will build a hardware datapath that will implement each of the instruction formats.

• R-format add $t1,$t2, $t3 => add rd, rs, rt • Load and Store lw$t1, 100($t2) => lw rt, 100(rs) • Jump j 3000 High-level view of MIPS functional units The basic building blocks of a processor are these four functional units: • Instruction Memory • Registers • ALU • Data Memory Fetch-Execution cycle PC is the program counter that is a special register for storing the address of current instruction. Every clock cycle, we will fetch and execute. 1. Fetch instruction. 2. Execute instruction • Fetch register operands • Compute result • Store into registers / use to index memory First Implementation of data path: one cycle per instruction Features • Simple to understand • Separate instruction and data memories • Clock must be slowed to speed of lowest instruction Implementating Fetch of Fetch-Execute cycle • PC and instruction memory are called state elements, because they can preserve information. • Adder is not a state element. Adder is combinational, which is the opposite to state elements. • We add$4$to a 32-bit address.$4$is$32$-bit wide hardwired value. Implementing R-type instruction This design permits read/write of the same regsiter, e.g. add$t1, $t2,$t1.

• lw $t1, 100($t2)

• Sign extend is combinational. It extends a 2’s complement number, i.e., the offset.
• Assume for simplicity that data memory is edge-triggered.

Implementing branch instruction

• Shifting left by $2$ is the same as prepending two $0$’s at the end.

Current implementation of single-cycle datapath

• There is no control unit yet.
• Multiplexors are added to reuse units.
• ==Note you can not write in to instruction memory.==

Exam an R-format instruction add $t1,$t2, $t3 • The first 6-bit field is operation code, or opcode. • Even though add, sub, and, or are different computations, they will have the same opcode, i.e.,$0$, but the$6$-bit funct will specify the desired computation. The steps for instruction add$t1, $t2,$t3

1. Fetch instruction from instruction memory using PC
2. Access the register file with the two source register numbers.
3. Take the data of the two source registers and send this through the ALU to perform an operation as given in the instruction.
4. Produce a result, i.e., the output of the ALU.
5. This result will be written to the destination register.
6. Provide this result as write data to register file.
7. At the end of the clock cycle, write this result to the register file.

Update to PC

At the start of each clock cycle, the value of PC+4 is ready, but writes will occur at the end of the clock cycle. PC register gets updated at the end of clock cycle.

Control Unit

• Control is the main control unit.
• ALU control cares only of the ALU operations.

Control bits for R-format instructions

• RegWrite = 1
• Branch = 0
• ResDst = 1
• MemRead = 0
• MemToReg = 0
• ALUSrc = 0
• ALUOp = 10
• MemWrite = 0

Control bits for load word instruction

• RegWrite = 1
• Branch = 0
• ResDst = 0
• MemRead = 1
• MemToReg = 1
• ALUSrc = 1
• ALUOp = 00
• MemWrite = 0

Single-cycle controls

• Control unit will generate all the output-bits as needed by the datapath for each individual instruction.
• Control lines are set as needed by each instruction in order for the datapath to execute that instruction.

Timing

Suppose the following:

• Memory units take 200ps
• ALU takes 100ps
• Register file read or write takes 50ps
• No time on other units

Question: What is the execution time of each type of instruction?

• R-Format: Instruction memory, register file read, ALU, register file write (400ps)
• Branch: Instruction memory, register file read, ALU (350ps)
• Store-word: Instruction memory, register file read, ALU, Data memory read (550ps)

Question: If the adder takes 100ps also, does anything change in the timings of each instruction?

• No, timings are the same, since the adder and the ALU run at the same time.

Question: what should the clock cycle be?

• 600ps. The clock cycle is static, so we must take the maximum of all types of instructions.

Question: can we perform the jump instruction on the datapath now?

• Notice that the word address is a 26-bit intermediate. The PC needs a 32-bit value. The correct thing to do is to concatenate the top 4 bits of PC to the 28-bit jump target address.

We cannot jump around anywhere in program memory. We have to stay within the range of program memory that we are in. This range information is given by the top 4 bits of PC.

• We will set the jump control bit to 1 for jump instructions and set it to 0 for any other instructions.