Home | Projects | Notes > Computer Architecture & Organization > Introduction to the Stored Program Machine & ARM
A microprocessor like the ARM has a stored program architecture which, Locates programs and data in the same memory.
Operates in a fetch-execute mode.
Instructions are read from memory, decoded, and executed sequentially.
Such a modern CPU consists of the following major hardware:
Arithmetic and Logic Unit (ALU)
Does unary, binary or ternary operations.
Memory
Registers, Random Access Memory (RAM), Read Only Memory (ROM), etc.
Buses
Allows the trasfer of information from various locations within the CPU, memory, I/O devices, etc.
Control Unit
Based on the instruction and addressing modes, sends signals to the registers, ALU, buses, memory, I/O devices to execute the instruction.
In general, the most sophisticated component of all.
Connected to the most of the important components inside the CPU.
Register have names rather than addresses.
ARM register: r0
, r1
, r2
, ... , r15
Intel registers: AX
, BX
, CX
, DX
, SP
, BP
, SI
Freescale registers: D0
, D1
, ... , D7
Categories of registers
General purpose registers
Hold the temporary data while performing different operations.
Can be accessed/used by the programmers.
e.g., r0
can be used by the programmer to store any data type or address.
e.g., Accumulator, BX, etc.
Special purpose registers
Hold the status of a program and are designated for special purpose.
Cannot be directly accessed by the programmers, but can be controlled by the programmers only by the execution of branch instruction.
e.g., Stack Pointer (SP), Program Counter (PC), Condition Code Register (CCR), etc.
Invisible registers
Needed by the CPU but cannot be direcly accessed or controlled by the programmer.
e.g., Instruction Register (IR), Memory Address Register(MAR), Memory Buffer Register (MBR), etc.
Program counter (PC)
Contains the address of the next instruction to be fetched from memory.
Instruction Register (IR)
Contains the current instruction being executed.
The instruction is decoded and the control signals are generated to execute the instruction.
Memory Address Register (MAR)
Contains the address of the data that is being read from or writtne to memory.
Memory Buffer Register (MBR)
Contains the data that is being read from or written to memory.
MAR and MBR works in conjunction with each other.
Condition Code Register (CCR)
Contains bits that show the results of the ALU calculations.
In case of the ARM: Z
(Zero), N
(Negative), C
(Carry), V
(Overflow).
r0 ~ r15
A set of 16 registers that store data.
Some RISC machines (e.g., ARM, PowerPC, MIPS, etc.) are called load-store machines in that they have only two instructions to access memory; one to read from memory, and the other to write to memory. All other instructions are ALU operations related which only occur between registers.
Two categories of instructions in a load-store architecture
Memory access
Memory to Register: LDR
(Load Register)
Register to Memory: STR
(Store Register)
ALU operations
Register to Register: operation <Reg destination>, <Reg source1>, <Reg source2>
Limiting operations only to registers greatly simplifies the hardware of the CPU but more instructions have to be written by the programmer to get the same effect.
General format of the assembly instructions for the ARM
xxxxxxxxxx
11<label>: <instruction> <operand1>, <operand2>, <operand3>
Example: A three-operand instruction.
xxxxxxxxxx
121LDR r0, 1234 (Takes the contents of memory location 1234 and loads the
2 value into r0)
3[R0]←[1234] (RTL notation)
4
5STR r1, 3456 (Takes the contents of r1 and stores the value into memory
6 location 3456)
7[3456] ← [r1] (RTL notation)
8
9ADD r1, r2, r3 (Takes the contens of r2 and r3, adds them together and
10 stores the results into r1. r2 and r3 remain unchanged)
11
12[r1] ← [r2]+[r3] (RTL notation)
First two instructions (
LDR..., STR...
) are direct addressing modes; The effective address is contained in the instruction register. (ARM does not support direct addressing mode but almost every other processor does.)Again, since ARM does not support direct addressing, operands have to be preloaded into the registers for the operations to be performed.
Control signals from the control unit to execute the instruction:
xxxxxxxxxx
31Bus P ← [r2];
2Bus Q ← [r3];
3[r1] ← ADD(P, Q);
Address bus is share by:
The Program Counter (PC)
Memory Address Registers (MAR)
The Operands part of the Instruction Register (IR)
Data bus is shared by:
The Operands part of the IR
The ALU
Once something goes out to the bus then nothing else can be put on the same bus until the next time slot. The next time slot or clock pulse will clear the current control signals.
Notes about the diagram
The control unit has connections to every component of the CPU. For simplicity these connections are NOT shown on this diagram.
Address bus (red arrows) is all the buses connected to the MAR and PC.
Data bus (blue arrows) is all the buses connected to the MBR, registers and ALU.
Each register has a read from bus and an out to bus signals.
IR has both out to data bus and out to address bus.
A memory read looks like: [MBR] ← [[MAR]]
A memory write looks like: [[MAR]] ← [MBR]
Q
[MBR]
- [MBR]
is invoked when the contents of the MBR are used. Otherwise, Q
will be one of the registers.
f(P, Qx)
is the function that controls the ALU. Examples:
ADD(P, Q)
- Adds the values of what is on bus P
and Q
and puts the results out on the ALU output bus.
Xfer(P)
- Trnsfers the contents of bus P
to ALU output bus.
SUB(P, Q
literal
)
- Subtracts the value of literal part of the IR from what is on bus P
.
All stored program machines have to go through the exact same instruction fetch process.
ARM instruction fetch process using the register notation:
xxxxxxxxxx
171FETCH
2-----
3[MAR] ← [PC] : Get ready to fetch the next instruction from memory.
4[PC] ← [PC] + 4 : Point to the next instruction; 4 is word size. (Good time
5 to do this since reading from memory takes relatively
6 long time.
7[MBR] ← [[MAR]] : Read the instruction from memory and store results.
8[IR] ← [MBR] : Transfer the instruction to the instruction register
9 and start the decode process.
10
11LDR (This is a 'direct addressing' example. ARM does not support this mode.)
12---
13[MAR] ← [IR(address part)] : Copy the operand address from IR to MAR.
14[MAR] ← [[MBR]] : Read the data from memory and store results.
15[rX] ← [MBR] : Move the data (operand) to rX through the data bus.
16 Because of the way the control unit works with the IR, a
17 register rX do not need to be specified.
Note for this section that the ARM does support literal addressing and does NOT support direct addressing. The syntax given here for direct addressing is for illustration purposes only and cannot be used in your ARM assmbly programs. Other processors do support direct addressing and that is why direct addressing is being discussed in this section.
Direct Addressing Mode
Address field in the instruction contains the effective address of the operand and no intermediate memory access is required.
Example
xxxxxxxxxx
31LDR r1, 1234 : Get the data from memory location 1234 and take those
2 contents and put them into r1.
3[r1] ← [1234] : RTL notation.
Literal Addressing
Instead of the instruction pointing to the location of the data, the instruction contains the data. (No effective address but the data is part of the instruction.)
With literal addressing a #
is used to mark the addressing mode as literal.
Example
xxxxxxxxxx
41LDR r1, #200 : Load 200 into r1. (This does not work on ARM assembly
2 since LDR and STR require access to a memory location.
3 Must use the MOV instruction.
4MOV r1, #200 : This is the correct syntax for the ARM assembly.
Literal addressing saves a memory access so they run faster.
Literal addressing can be used in other instructions:
xxxxxxxxxx
21ADD r0, r2, #50 : Add contents of r2 and the literal 50 and store the
2 reult into r0.
Size of a literal
Since the data is part of the instruction, there has to be a limit on the size of the literal.
In the case of the ARM it is a 12-bit unsigned integer. (The real truth will be covered in the later section.)
2
12
= 4096
is the limit.
xxxxxxxxxx
21LDR r2, #-20 : Not allowed.
2LDR r8, #5000 : Not allowed.
If you want to subtract with literal addressing you have to use the subtract instruction.
xxxxxxxxxx
11SUB r2, r3 #34
In some assemblers, if the literal is not valid (too big) it will automatically define a memory location for you and assign the value to that location.
The instruction is also changed from literal addressing (MOV
) to direct (LDR
).
The literal has to be the last operand of the assembly instruction.
xxxxxxxxxx
11SUB r1, #56, r2 : Not allowed. The assembler will give error message.
Details on LDR
xxxxxxxxxx
11MOV r1, #200
Control signals (2 ways):
xxxxxxxxxx
211. `IR` operand field(`11-0`), literal, out to data bus, instruct `r1` to read that value in.
22. Or, `IR` operand field(`11-0`), literal, out to data bus, `ALU` reads in the value through `Q`<sub>`literal`</sub>, instruct the `ALU` to transfer `Q`<sub>`literal`</sub> out to the data bus (`Xfer(Q`<sub>`literal`</sub>`)`), instruct `r1` to read that value in.
[!] Note: Follow these two paths on the diagram.
Details on ADD
xxxxxxxxxx
11ADD r0, r2, #50
Control signals:
IR
operand field(11-0
), literal, out to data bus, contents of r2
out to data bus, ALU
reads in those two values through Q
literal
and P
, respectively, instruct the ALU
to add those two values, put the result to r0
.
For this example assume the addressing mode is direct addressing. Make sure to understand exactly how the information is passed through and around the CPU. It is also important to know how these control signals change based on different addressing modes.
xxxxxxxxxx
601LDR r0, address : Takes the contents of memory location at address and
2 copies it to r0.
3 This is direct addressing which the ARM does not support.
4
5 [MAR] ← [Operand field of IR; address] IR out to data bus
6 [MBR] ← [[MAR]] (Read memory)
7 [R0] ← [MBR] (Or, to make use of ALU, use Xfer(Qmbr))
8
9STR r0, address : Takes the contents of r0 and copies the value to memory
10 location address.
11 This is direct addressing which the ARM does not support.
12
13 [MAR] ← [Operand field of IR; address] IR out to data bus
14 [MBR] ← [R0] (Or, to make use of ALU, use Xfer(P))
15 [[MAR]] ← [MBR] (write to memory)
16
17ADD r0, r1, r2 : r0 ← r1 + r2 (r1 and r2 remain unchanged.)
18
19 Bus P ← [R1]
20 Bus Qmbr ← [R2]
21 [R0] ← ADD(P, Qmbr)
22
23 [!] Note: Bus choices for r1 and r2 can be changed.
24
25SUB r0, r1, r2 : r0 ← r1 - r2 (r1 and r2 remain unchanged.)
26
27 Bus P ← [R1]
28 Bus Qmbr ← [R2]
29 [R0] ← SUB(P, Qmbr)
30
31 [!] Note: Bus choices for r1 and r2 can be changed.
32
33BPL target : Branch on positive to location target.
34 If the result of the previous operation was positive or
35 zero then go to location target and start program
36 execution there.
37 If the result was negative then the next instruction
38 after the BPL will be executed.
39
40 If CCR (Z and not N)
41 then [PC] ← [IR 23-0] out to address bus
42
43 [!] Note: When it comes to branching, more portion of the
44 IR gets involved (i.e., bit 0-23)
45
46BEQ target : Branch equal.
47 If the result of the previous operation was zero then go
48 to location target and start program execution there.
49 If the result was not zero then the next instruction after
50 the BEQ will be executed.
51
52 If CCR Z then [PC] ← [IR 23-0] IR out to address bus
53
54 [!] Note: When it comes to branching, more portion of the
55 IR gets involved (i.e., bit 0-23)
56
57B target : Unconditional branch.
58 Go to location target and start program execution there.
59
60 [PC] ← [Operand field of IR; address] IR out to address bus
Since the data is contained in the instruction, there has to be hardware which can put that part of the instruction register (IR
) onto the bus so it can be transferred around the processor and used for calculation.
The IR
decoder has to recognize the addressing mode then have the hardware to only put that constant on the bus to do the operations.
Note the 32-bit instruction or 32-bit data from memory is much larger than the number of bits for the literal addressing. The high order bits have to be set to 0
.
Flow control is the ability to do comparisons and alter the execution sequence based on the results.
It is one of the most important characteristics of a computer.
Based on a calculation or the value of a register branch or do not branch to a certain location.
The S
at the end of the instruction sets the flags in the Condition Code Register (CCR
).
S
is optional. If not used, CCR
flags will not be set.
Example
xxxxxxxxxx
81SUBS r5, r5, #1 : Decrement the loop counter.
2BEQ onZero : When the counter hits zero exit the loop.
3
4if notZero, then ADD r1, r2, r3
5if onZero, then SUB r1, r2, r3
6
7[!] Note: onZero is the Branch Target Address (BTA). In this case it is much
8like an immediate addressing mode in that the BTA is part of the instruction.
It starts with ALU
. Have to have a way to determine and keep the results of the last ALU
operations. The results go into the CCR
.
Based on the contents of the CCR
make, or do not make the jump. If the condition is not met we do not have to do anything. (By default the next instruction is executed.)
If the branch has to be made, the PC
has to be updated to the location stored in the instruction.
xxxxxxxxxx
11If CCR(Zero)=1 then [PC] ← [IR 23-0] IR out to address bus
Following is how an if... then
looks like in hardware:
Condition flags (bits)
N
: Negative
Z
: Zero
C
: Carry or borrow
V
: Overflow
Each of these come from the results of the ALU. They are provided as inputs to the Control Unit.
For the ARM, an "S
" needs to be added to the end of the instruction to get the CCR updated.
e.g.,
xxxxxxxxxx
31ADDS ...
2
3SUBS ...
Or, just use CMP
(Compare) or TST
(Test) instruction which forces the CCR update as well.
CISC processors (e.g., Intel IA32) update status flags after each operation.
RISC processors, (e.g., ARM) require the programmer to determine when the status flags are updated.
When the ALU performs an operation, it stores status or condition information in the CCR. The processor records whether the result is:
Negative in two's complement terms (N
), MSB = 1
Zero (Z
), all bits = 0
Generated a carry (C
), carry out from the full-adder (directly connected to C
flag of CCR)
Arithmetic overflow (V
)
Positive + Positive = Negative
Negative + Negative = Positive
Positive(Negative) + Negative(Positive)
never produces overflow!
Not all instructions change these bits of the CCR.
Binary Examples Using 8-bit Word
xxxxxxxxxx
411 0011 0011 51
2 +0100 0010 +66
3 ---------- ---
4 0111 0101 117 (correct)
5
6 N=0, Z=0, C=0, V=0
7
8
9 1111 1111 -1
10 +0000 0001 + 1
11 ---------- ---
12 10000 0000 0 (correct)
13
14 N=0, Z=1, C=1, V=0
15 Zero, Carry
16
17
18 0101 1100 92
19 +0100 0001 +65
20 ---------- ---
21 1001 1101 -99 (incorrect)
22
23 N=1, Z=0, C=0, V=1
24 Negative, Overflow(P+P=N)
25
26
27 1101 1100 -36
28 +1100 0001 -63
29 ---------- ---
30 11001 1101 -99 (correct)
31
32 N=1, Z=0, C=1, V=0
33 Negative, Carry
34
35
36 0000 0000 0000 0000 0
37 -0001 0000 +1111 0000 (two's complement of -0001 0000) -16
38 ---------- ---------- ---
39 1111 0000 -16 (correct)
40
41 N=1, Z=0, C=0, V=0
Hardware implementation of condition flags
N
: MSB
value
Z
: 32 input NOR gate
C
: Carry out from full-adder directly connected to the C
flag
V
: a(n-1)b(n-1)s(n-1)' + a(n-1)'b(n-1)'s(n-1)
meaning, V
flag is set when P+P=N
or N+N=P
Comments about overflows
In C, it keeps on going like nothing ever happened.
In C++, it has the ability to detect and throw an overflow exception. You as a programmer still have to handle it.
In Java, it does not automatically throw an error or overflow exception.
In Assembly, no way! You as as programmer have to check the overflow bit and do all that code yourself.