Single-cycle processor

Typical MIPS instructions

lw $s1, 100($s2)

  • $s1 is the register to be loaded
  • $s2 is the address in memory

add $s1, $s2, $s3

  • $s1 is the destination register
  • $s2 and $s3 are source registers

beq $s0, $s1, 10

  • $s0 and $s1 are registers to be compared for equality

j 3000

  • The operand is a word address of the next instruction
  • The operand needs to be multiple of $4$

We will build a hardware datapath that will implement each of the instruction formats.

Instruction-bit break-down

  • R-format add $t1, $t2, $t3 => add rd, rs, rt

image-20190331155125007

  • Load and Store lw $t1, 100($t2) => lw rt, 100(rs)

image-20190331155909818

  • Jump j 3000

image-20190331155935443

High-level view of MIPS functional units

The basic building blocks of a processor are these four functional units:

  • Instruction Memory
  • Registers
  • ALU
  • Data Memory

image-20190331160200098

Fetch-Execution cycle

PC is the program counter that is a special register for storing the address of current instruction.

Every clock cycle, we will fetch and execute.

  1. Fetch instruction.
  2. Execute instruction
    • Fetch register operands
    • Compute result
    • Store into registers / use to index memory

First Implementation of data path: one cycle per instruction

Features

  • Simple to understand
  • Separate instruction and data memories
  • Clock must be slowed to speed of lowest instruction

Implementating Fetch of Fetch-Execute cycle

image-20190331161052060

  • PC and instruction memory are called state elements, because they can preserve information.
  • Adder is not a state element. Adder is combinational, which is the opposite to state elements.
  • We add $4$ to a 32-bit address. $4$ is $32$-bit wide hardwired value.

Implementing R-type instruction

image-20190331155125007

image-20190331161543668

This design permits read/write of the same regsiter, e.g. add $t1, $t2, $t1.

Implementing load/store instruction

image-20190331161925327

  • lw $t1, 100($t2)

image-20190331155909818

  • Sign extend is combinational. It extends a 2’s complement number, i.e., the offset.
  • Assume for simplicity that data memory is edge-triggered.

Question about sign extend

image-20190331163544446

Implementing branch instruction

image-20190331155909818

image-20190331163125806

  • Shifting left by $2$ is the same as prepending two $0$’s at the end.
  • The remaining thing to do is to control PC loading.

Current implementation of single-cycle datapath

image-20190331163732575

  • There is no control unit yet.
  • Multiplexors are added to reuse units.
  • ==Note you can not write in to instruction memory.==

Exam an R-format instruction add $t1, $t2, $t3

image-20190331155125007

  • The first 6-bit field is operation code, or opcode.
  • Even though add, sub, and, or are different computations, they will have the same opcode, i.e., $0$, but the $6$-bit funct will specify the desired computation.

The steps for instruction add $t1, $t2, $t3

  1. Fetch instruction from instruction memory using PC
  2. Access the register file with the two source register numbers.
  3. Take the data of the two source registers and send this through the ALU to perform an operation as given in the instruction.
  4. Produce a result, i.e., the output of the ALU.
  5. This result will be written to the destination register.
  6. Provide this result as write data to register file.
  7. At the end of the clock cycle, write this result to the register file.

Update to PC

At the start of each clock cycle, the value of PC+4 is ready, but writes will occur at the end of the clock cycle. PC register gets updated at the end of clock cycle.

image-20190331164741375

Control Unit

image-20190331164849480

  • Control is the main control unit.
  • ALU control cares only of the ALU operations.

Control bits for R-format instructions

image-20190331165121519

  • RegWrite = 1
  • Branch = 0
  • ResDst = 1
  • MemRead = 0
  • MemToReg = 0
  • ALUSrc = 0
  • ALUOp = 10
  • MemWrite = 0

Control bits for load word instruction

image-20190331165643227

  • RegWrite = 1
  • Branch = 0
  • ResDst = 0
  • MemRead = 1
  • MemToReg = 1
  • ALUSrc = 1
  • ALUOp = 00
  • MemWrite = 0

Single-cycle controls

  • Control unit will generate all the output-bits as needed by the datapath for each individual instruction.
  • Control lines are set as needed by each instruction in order for the datapath to execute that instruction.

image-20190331170154261

image-20190331170433581

image-20190331170508743

Timing

Suppose the following:

  • Memory units take 200ps
  • ALU takes 100ps
  • Register file read or write takes 50ps
  • No time on other units

image-20190331170722261

Question: What is the execution time of each type of instruction?

  • R-Format: Instruction memory, register file read, ALU, register file write (400ps)
  • Branch: Instruction memory, register file read, ALU (350ps)
  • Load-word: Instruction memory, register file read, ALU, Data memory read, register file write (600ps)
  • Store-word: Instruction memory, register file read, ALU, Data memory read (550ps)

Question: If the adder takes 100ps also, does anything change in the timings of each instruction?

  • No, timings are the same, since the adder and the ALU run at the same time.

Question: what should the clock cycle be?

  • 600ps. The clock cycle is static, so we must take the maximum of all types of instructions.

Question: can we perform the jump instruction on the datapath now?

image-20190331171657290

image-20190331171736315

  • Notice that the word address is a 26-bit intermediate. The PC needs a 32-bit value. The correct thing to do is to concatenate the top 4 bits of PC to the 28-bit jump target address.

We cannot jump around anywhere in program memory. We have to stay within the range of program memory that we are in. This range information is given by the top 4 bits of PC.

  • We will set the jump control bit to 1 for jump instructions and set it to 0 for any other instructions.