The Digital Digest No 2
Dissecting the Computer Microprocessor
As a whole, a computer microprocessor is simply a device which processes input based on a set of actions and decisions. These stored actions and decisions are effectively the program that the processor executes. This stored program is typically compiled from a high level language such as C, which is easilly readable by a human programmer, to a machine code, which is coded specifically that a electrical circuit can use it. The format of this machine code varies as widely as the number of languages spoken around the world. However, most modern microprocessors follow a number of conventions, and fall into one of only a few categories.
First, most modern computer systems follow an organizational approach known as a Von Neuman architecture, consisting of two major concepts. One half is a central processor to perform all computing, which organizes all actions within the system. The second half is a dedicated memory unit, storing and organizing all data within a retrievable form. In the context of a simple computer system, and for the purposes of conceptual understanding, it may help to consider the processor the center of the computer, with the memory encapsulating not only temporary storage such as system RAM, but also non-volatile memories such as hard drives and other peripheral devices. Some systems even do the majority of their interfacing with external components through special addresses in memory that are not routed to data storage, but to registers and control lines on peripheral devices, otherwise known as memory-mapped I/O.
Second, nearly all microprocessors use binary to store their data. There have been a few experimental designs which use other bases, but none have seen widespread popularity and ease of design as binary systems. Typically, binary machine code is byte aligned, meaning that each instruction is a multiple of eight bits. Similarly, the data that these systems use is divided in the same way. Instructions and data that are larger than a single byte, are grouped together with other naming conventions. Two byte segments are typically (although not always) referred to as a word. Furthermore, it is also possible to have double and quad words, respectively referring to two and four word data lengths. Occasionally, half byte segments, four bits in length, are referred to as nybbles, with the obvious pun intended.
Third, the processor core will follow a known processor archetype, which is typically either a stack processor or a register machine. Stack machines have very few internal registers, storing only the most minimal information, and operating directly on data in memory in much the same way as a player might add and remove cards from the top of a deck of cards. This is also the same sort of operation that some reverse polish notation (RPN) calculators use. In contrast, general register machines tend to keep many registers internally, storing a number of values internally at once to optimize for fewer memory accesses on complex operations (although most operations are still simple). This also eliminates the explicit need for a stack in memory for data processing, although recent general purpose register processors include this for other reasons. Finally, there are more exotic designs known as very long instruction word (VLIW) processors, which favor a drastically reduced internal structure, exposing more of the internal workings to the language compiler, and forcing it to schedule instruction execution, instead of internal circuitry automatically managing this (more on this later).
The Design of a Simple Register Machine
To understand the design of a simple process, consider the following actions that are minimally required to successfully operate upon one machine instruction.
1. Fetch one instruction from memory
2. Identify what sort of instruction it is
3. Fetch any needed instruction operands
4. Execute the instruction
5. Write results for use later
The first step seems simple enough. However, you will need to know two pieces of information before the processor will be able to accomplish this task. First, you must know where this instruction is. If you consider the main memory to act like a giant multiplexor (or mux in shorthand), the select line can control what data you are reading (memory address), with the output of the mux presenting a single byte at the specified address. To accomplish this, processors contain a special register known as the program counter, which simply stores the address of the next instruction to be executed. Thus to read from memory, one needs to do the following:
1. Read program counter and send to memory as address input
2. Increment program counter, wait for read to finish
3. After read completes, retrieve data and store for later
To read machine instructions larger than one byte, the processor need only repeat this process, or use a memory system which supports larger reads and writes.
A Single Data Bus Design
Now that some basic concepts have been introduced about the movement of data within the processor, the description and purpose of the major components of a computer microprocessor will hopefully make more sense. Following is a brief description on the contents and organization of a very very simple register machine processor. It is not intended as a complete design, but rather a mock-up for conceptual understanding.
In the diagram below, you will see a number of small boxes surrounding a bold vertical line, with a larger box appearing above everything else. The purpose of this diagram is to describe the connections made by each component to a central collection of wires, collectively known as a bus. Note that all components have arrows coming from and going back to the data bus, with the exception of the larger unit at the top. The lower components collectively form the datapath, or the network where data flows within a processor. The upper component is the control unit which does not operate on data directly, but receives simple inputs based on special registers within various datapath components such as the register file and ALU, and uses those to schedule actions at the appropriate times. It does this by sending special control signals to the various components in the datapath (small arrows pointing down), upon simple information sent back to the control by the various components (the arrows returning to the control).
Components in the Datapath
The register file is one of the simplest components to understand. In short, it is a collection of a number of registers, which store temporary data. each of these is a collection of D flip flops, which are synchronized to a common clock, and share a common data input from the data bus. To load a register from the data bus, the enable for each of the flip flops within the register is held high for one clock period, latching in whatever data is currently on the bus. Alternatively, the register file can output one register back to the data bus as well. By gating the outputs of each register through a large mux, we can pick a specific register to send back to the bus. At this point, it is worth noting that the load enable for each register, as well as the signals to select a specific register are considered control signals sent from the control unit. This is the primary purpose of the control unit, to schedule operations within the processor using general purpose and generic logic that is largely ignorant of the data moving through the bus, with the exception of the times where conditional decisions are needed.
Similar to the register file, the memory is typically much larger and contains a great deal more data than the register file. The main difference is that the memory is large enough to warrant one or more clock cycles before memory operations can complete, whereas register values are available immediately. Although memory interfaces can grow complicated quickly, and the exact implementation complex in some cases, presume that the memory is a simple one that contains a few registers, and the bare minimum of control. There is a pair of registers (a unidirectional register to hold the memory address and a bidirectional register that stores data moving to and from memory) and a read and write memory control signal (not shown). Reads are performed by writing the appropriate address to the address register, toggling the read control line, and waiting for the result to be loaded to the bidirectional data register. Similarly, writes are performed by writing the address and outgoing data to the respective registers (on separate clock periods), and then toggling the write signal.
The arithmetic logic unit, or ALU, is another relatively simple component, although implementations vary widely. In essence, it is an data computation component which is capable of performing simple mathematical or logical operations. Note that the data bus can contain only one datum at a time, and so the ALU contains its own registers to save data on one clock period, to combine it with other data upon a subsequent clock period. Some these typical operations include addition, subtraction, and many of the common logical operations such as AND, OR, and NOT. To perform each function, the ALU contains copies of unique circuitry that are designed to implement each function. As soon as data is loaded, all functions occur simultaneously, using a mux to pick a desired result when the computations complete. The result is again registered, and written to the data bus as required by the control unit.
The Control Unit
Finally, the control unit remains the heartbeat of the system, and at its core is a counter. It should be unsurprising that at the very center of any complicated computer system is a very simple clock that sits and counts clock cycles, over and over again. To recap, consider the first few steps of instruction execution, the instruction fetch, where one single value is read from a continually incrementing program counter. Thus, the control would need to count at minimum, three states (0, 1, 2). Although inefficient, this can be implemented easily with a two bit register that is added with one continually, leading to a counter that counts 0, 1, 2, 3 instead. It is left as an exercise to the reader to implement this, from the information in the last issue.
Now, the signals that the control unit would need to generate would be one for the output of the program counter to drive the data bus, one for the address register to load data from the bus, one for the ALU to load data from the bus, one for the ALU to send the result of an increment to the bus, one for a memory read operation, and one each for various registers to load data from from the bus, and a final one to load the returned value onto the data bus to be used.
By simply using combinational logic, we can generate pulses that will stay high for one clock period. For example, by OR'ing both bits and inverting the output, we can generate a pulse that is high only when the counter is 0 (00 in binary). A pulse for the value 1 (01) is the low bit AND NOT the high bit of the counter. Similarly, a pulse for the value 2 (10) is the high bit NOT NOT the low bit.
We can wire these pulses to wire directly to the enable lines for each register and selector mux, as this is a very very simple design for now. For example, both the output enable on the program counter and the load enable for the memory address register can be wired directly to the pulse on count 0. The program counter address, being on the data bus on cycle 0, can be loaded by the ALU as well, sending the incremented address back to the program counter on cycle 1, when wired to the pulse for cycle 1. Note we can also use this pulse as the memory read signal. When cycle 2 arrives, we will have already incremented the program counter, and the returned data will be ready in the memory data register(MDR). We route the pulse such that the bus is loaded from the MDR writes to the bus, and the appropriate register saves it for later. Typically, this register is given a special name of ``instruction register'' and is attached to simple combinational logic which decodes the machine language to a series of simple signals that describe what sort of operation it is and what operands it will require.
As can be expected, this micromanagement of these control signals gets tedious quickly and prone to errors. Control signals tend to multiply quickly even for simple tasks. Thus, control signals are typically bunched together in organized lists or tables describing in detail on which cycles that they are driven high to '1' (while typically left at '0' most of the time), and are given shorthand names to save space. For example PC_ld might indicate "program counter load from data bus," while PC_wr might mean "program counter write to data bus." Thus, the first three cycles for this control unit appear below:
Cycle 1 - PC_wr, MEM_ADDR_ld, ALU_ld
Cycle 2 - ALU_INC_wr, PC_ld, MEM_READ
Cycle 3 - MEM_DATA_wr, IR_ld
Instruction Specific Scheduling
Once a machine instruction is loaded into the instruction register, control signals become only slightly more tricky, as they are dependant on what sort of instruction is executing. This is handled with additional combinational logic that AND's each instruction specific signal from the instruction register with the appropriate clock pulse. These AND gates effectively turn off control signals from occurring, and lets them pulse only when the instruction is identified to need that control signal, and only at the correct cycle of the control clock.
For example, imagine that the instruction read from memory is a addition operation to add two registers, and store to a third. To accomplish this, we would send the first operand to the ALU on the fourth cycle, the second operand to the ALU on the fifth, and send the addition result back to a result register on the sixth. The continued register-addition specific control signals are shown below.
Cycle 4 - Reg_A_wr, ALU_ld
Cycle 5 - Reg_B_wr, ALU_ld
Cycle 6 - ALU_ADD_wr, REG_C_ld
It is worthwhile mention that the signals for each register load and write are generated in part by the instruction in the instruction register. Each instruction typically has a few bits that specify the raw registers accessed by the operation, and this is combined with the current state of the processor to generate the appropriate internal load enable and bus selection signals.
If the instruction varied a slight amount, and indicated that the target register C was the program counter, whatever result value was computed would be written there, and then become the address of the next instruction read from memory, effectively forming a jump or ``goto.'' Furthermore, we can gate the program load enable with other simple logic, to load the program counter only under certain circumstances, such as only when the last computation performed was negative or zero. In that case, the processor will still go through the appropriate actions of obtaining the value to write to the program counter, although the write itself may or may not occur. This forms the basis of how conditional program jumps are implemented in circuitry, and how a simple circuit can make decisions based on human written software.
Other assembly operations such as memory reads and writes can be performed in a similar manner, by laying out the explicit flow of data within the datapath, and then writing a list of the appropriate signals to occur on the appropriate cycles. An instruction to read from a specified address in memory can be implemented similarly to the first three control cycles, that fetch one operation in memory. Instead, we use the loaded instruction to identify which register contains an address to read, send that to the address register, toggle the read operation and read back on the operation after. The appropriate control signals to do that could be similar as follows:
Cycle 4 - Reg_A_wr, MEM_ADDR_ld
Cycle 5 - MEM_READ
Cycle 6 - MEM_DATA_wr, REG_B_ld
Note that although we did not use the ALU in this operation, it would be easy to form an operation, using more processor cycles, which could set the address register by means of an arithmetic computation such as the addition of two registers (indexed addressing), or even indirect addressing, using the result of a previous read (similar to how pointers are followed in C). In reality, the sky is the limit as for what is ossible inside of a processor. Additional functionality can be added by attaching additional components to the data bus, and any number of instructions can be implemented using simple logic.
Ultimately, the processor's performance is measured not only in the robustness of the assembly/machine language, but also how quickly those simple instructions can be executed. More complicated instructions will require more clock cycles to perform their needed actions and will may slow the chip's instruction issue rate. On the other hand, more involved instructions may prove useful if allow the programmer or compiler to write a program in fewer instructions, which might execute faster than a larger number of simpler ones. In the early designs, there was no clear answer to provide the best performance, and designs were left up to engineering teams to find the best balance.
Historical Implications
Its worthwhile to note that many early microprocessors optimized performance on the idea that programs executing in as few instructions as possible would ultimately lead to faster execution. So many companies sold microprocessors not just on speed and memory capacity, but also the number of different instructions their machines supported. Some took this farther and modified their control logic to function off a simple control RAM inside of the processor that was user modifiable. By adding or changing the "micro-instructions" stored in this "micro-code," users could easily implement their own instructions in software. Even today, many modern Intel microprocessors allow you to change the embedded microcode, but not necessarily to the degree that early processors did, as the structure of the modern processors has changed substantially from the aforementioned design.
As simple a concept as this was, this sort of design suffered from several drawbacks. First, it is slow, executing one instruction perhaps every half dozen clock cycles, with complicated processors likely taking longer. Second, large variance in instruction sets would invariably lead to portability problems with some groups unofficially making their own additions to then-standard assembly languages. Lastly, the original simple designs left much of the computational logic of the microprocessor idle when doing operations in a serial fashion.
Designs were improved over time, splitting buses to increase the flow of information through the processor, improving throughput and reducing total execution time. However, the next major revolution in processor performance came by developing the control and data flow designs that allowed the microprocessor to simultaneously execute several instructions at once, issuing later instructions several cycles before earlier ones fully completed their execution. This design philosophy was known as pipelining, and focused on attempting to execute one machine instruction on every clock cycle. As could be expected, this improved the speed of these designs considerably and gave the early pipelined research processors the distinction of completely dwarfing the speed and performance of their their microcoded counterparts, at a fraction of the size.
Of course, this in turn pales in comparisons to more modern superscalar designs that may execute several instructions per clock cycle. For reference the most recent processors from the likes of Intel can execute up to four. But, the magic of how that works, will have to wait for the next issue.







Sign In
Register
Help
Add Reply


Back to top
MultiQuote
