interrupts: external events that cause the processor to stop processing the current process, and after putting the machine in a consistent state, vector to the starting address of the service routine specified by the specific vector. exceptions: events that are internal to the process running, such as illegal opcode, page fault, floating point underflow, etc., that cause the processor to stop processing the current program and deal with the exceptional event. Like interrupts, it is necessary to first put the machine in a consistent state and then vector to the starting address of the exception service routine. The service routine that is executed depends on the exception which has occurred. Each exception (or set of exceptions) has its own vector to tell the machine which service routine. service routine (handler): see above. interrupt vector: the bit pattern identifying the particular interrupt, which is used to initiate the correct service routine. exception vector: see interrupt vector consistent state: the state of the machine between the execution of instructions. In the middle of execution of an instruction, the internal state of the processor can be whatever the microarchitecture wishes. Even general purpose registers can be clobbered if the microarchitecture wishes. BUT, at the end of each instruction, the contract with the software requires that the state of the machine be recoverable to the state produced by the program up to that point. That point is the consistent state. power failure: loss of power causing a very high priority interrupt. machine check: the highest priority interrupt, indicating that the processor is generating bogus information. Cause: a hardware malfunction. sticky bit: a condition code that once set retains its condition until interrogated under program control. interrupt/exception priority level: priority at which an event requests interruption of the running program. Usually, the priority at which the service routine runs. Priority depends on urgency of the event. maskable: the ability to ignore an event so as to defer handling, or even deny handling altogether. medium: the material containing the information -- the hole in a punched card, the magnetic material on the disk, etc. device: the I/O unit that houses all the stuff that causes the information to be stored and transformed from one form to another. Disk drive, for example transducer: the thing that tranduces from one medium to another. From light passing through a punched card to the electrical signals. From the magnetized material on the disk track to the electrical signals. polling: I/O activity controlled by the processor executing its program. interrupt driven I/O: I/O activity controlled by the device requesting the processor interrupt what it is doing. DMA: direct memory access. Allows memory and the disk (for example) to transfer information without tying up the processor. I/O processor: a special purpose processor that manages I/O activity, including doing some processing on that information which is being transferred. dedicated bus: bus dedicated to a single purpose. platter: colloquial word to describe one of the surfaces of a disk. track: on a platter, a complete circle, containing bits of information. aerial density: the number of bits per unit length along a track. disk head: that part of the mechanical arm that contains the ability to read the magnetic material stored on the track, or write to the track. cylinder: the set of tracks that can be read simultaneously. seek time: the time it takes the head to get to the desired track to read or write the information. rotation time: the time it takes the correct starting point on the track to come under the head where it can be read/written. disk crash: when the head actually hits the spinning platter. ...and scores the surface, usually. BAD. disk block: the unit of storage accessed from the disk. 512 bytes, for example. redundancy: the concept of using more than n bits to store n bits of information in order to recover in case one or more of the bits get corrupted. Parity provides one extra bit of redundancy. ECCL provides logn + 1 extra bits of redundancy. RAID-0: striped disk array with no redundancy. mirroring: aka RAID-1. Two disks to store the information of one. Each bit has its mirror bit. striping: interleaving, when applied to disk. RAID-1: see mirroring RAID-2: array with ECC coding across the disks RAID-3: array with parity coding. All parity bits in the same disk. RAID-4: array with larger units on interleaving, so an entire file can be accessed on the same disk. RAID-5: RAID-3, except the parity bits are stored across all the disks, getting rid of the hot spot of a parity disk. data lines: lines containing data bits. address lines: ditto, except address bits. control lines: ditto, except control bits. multiplexed bus: a bus that is used to store address bits at some times and data at other times. pending bus: a bus with the property that once the bus is grabbed for a transaction, the bus is tied up (often twiddling its thumbs) until the transaction has completed. split-transaction bus: a bus that, instead of twiddling its thumbs, releases the bus to allow other transactions to use it while it is waiting for the second half of the transaction. pipelined bus: split transaction bus where the order of the second half maintains the order of the first half of each bus transaction. synchronous bus: a clocked bus. asynchronous bus: no clock, bus cycles take as long as they need to take. Since no clock, handshaking is required. handshaking: the interaction that starts off each bus cycle that is required since there is no clock. arbitration: the act of determining who gets the bus for the next bus cycle. central arbitration: arbitration is done by one central arbiter. (e.g., PAU) distributed arbitration: each device controller has enough information to know whether or not it gets the bus the next cycle. priority arbitration unit (PAU): the central arbiter. bus cycle: one unit of bus transaction time. bus master: device controller in charge of the current bus cycle. slave: the other controller that participates in the bus transaction, under the control of the bus master. vertical priority: priority determined by the particular BR line. horizontal priority: priority determined by proximity to the PAU, among all controllers having the same vertical priority. device controller: that which interfaces the device to the bus, and manages all accesses to/from the device. bus request: signal from a controller requesting a bus cycle bus grant: signal from the PAU granting the bus daisy chaining: mechanism by which all controllers with the same vertical priority receive the bus grant in order, and pass it on to the next if it does not want it. burst mode: successive data transfers in the same bus cycle. SACK: signal output by the bus grantee signifying it will be the next bus master. MSYN: signal used by the bus master in synchronizing the transaction. SSYN: signal used by the slave in synchronizing the transaction. fundamental mode: asynchronous operation of a sequential machine wherein at most one input signal changes its value at a time. pipelining: the assembly lining of instruction processing. Each instruction goes through multiple steps (stages), from Fetch to Retirement/completion of the instruction. pipeline stages: the set of steps each instruction goes through in the pipeline. pipeline bubble: a hole in pipeline, caused by the inability of instructions to flow through the pipeline. For example, following a conditional branch, it may not be possible to know which instruction to fetch next. This causes a a pipeline bubble. ...or a load followed by the use of that load can cause a pipeline bubble. branch misprediction penalty: the number of cycles wasted due to branch misprediction where everything speculatively fetched needs to be thrown away before we can start along the correct instruction path. data dependency: one of the three data dependencies: flow, anti, write flow dependency: often called read after write hazard. Example a + (b * c). Before we can do the add, we must do the multiply. That is the result of the multiply must be WRITTEN before it can be READ as a source for the add. anti-dependency: using a register to store more than one value. Before we write to that register, all instructions which need to read the old value have to read it. Sometimes called write after read hazard. output dependency: values output from the same register must be output in program order. control dependency: dependency caused by control instructions, such as branches. Which instruction is executed next depends on which way a branch goes is an example of a control dependency. data forwarding: information is sent from where it is produced to where it is needed without going first to a register. Speeds up processing. sequential programming model: instructions are expected to execute in the order specified by the programmer. stale data: data that was stored in a location but the location has since been reassigned so that the old data is no longer valid for new instructions that come along. in-order execution: a microarchitecture that executes instructions in program order. scoreboard: a hardware interlock that prevent an instruction from sourcing stale data. out-of-order execution: a microarchitecture that executes instructions when they are ready for execution, regardless of their order in the program. speculative execution: activity that is carried out before the hardware knows it must be (mandatory) carried out. For example, instructions fetched as a result of a branch prediction. advantages/disadvantages of condition codes branch prediction: rather than wait for the condition code to be resolved upon which a branch instruction is based (taken or not not taken) and creating a hole in the pipeline, the hardware guesses and fetches based on that guess. The guess is called a branch prediction. always taken/always not taken: the prediction is determined by the designers of the chip. compile-time predictor: the guess is determined by the compiler, usually as result of profiling. run-time predictor: the guess is determined by the hardware on the basis of what has been happening with the running of the program. BTFN: guess taken if the branch is to an earlier location in the code, not taken if the branch is to a location later in the program. This is consistent with the behavior of for and while constructs in high level languages. Best example of a compile time predictor. last time predictor: predict whatever the branch did last time it will do this time. 2-bit saturating counter: advance the counter depending on what the branch does, predict according to the current high bit of the counter. saturating arithmetic: arithmetic which does not wrap around. That is, the highest value serves the role of infinity. i.e., infinity + 5 = infinity, therefore, highest value + 5 = highest value. register renaming: also called aliasing. Assigning unique register names to independent uses of the same register. reservation station: storage location for an instruction awaiting operands before it can execute. Tomasulo's algorithm: the algorithm which assigns register aliases that allows an instruction to move to a reservation station, while maintaining all producer/consumer relationships. This maintenance of the producer/ consumer relationships is the most important single thing needed to allow instructions to actually execute out of order. tag: another word for the alias or distinct identifier assigned to each instance of a register. in-order retirement: retirement of instructions occur in the order of their location in the program. Necessary to obey the sequent programming model. register alias table (RAT): the table of aliases used by the Tomasulo algorithm to assign tags. Each register has a valid bit, a tag and a value. Depending on the valid bit, either the tag or the value is correct. common data bus (CDB): in the data path of the IBM 360/91, the bus used to distribute values and corresponding tags to registers in the RAT and in the reservation stations that are waiting for the value uniquely assigned to that tag. scheduling algorithm: an algorithm used to decide which of several ready instructions should be executed next. fire when ready: schedule the instruction for execution (that is, send it to a functional unit) when all data that it needs is available. data flow: the paradigm wherein instructions are executed when their data is available, rather than when they show up in program order. data flow graph: a directed graph showing all the instructions as nodes, and the producer/consumer relationships as arcs. nodes: instructions edges: or arcs. Shows result produced by one instruction that is sourced by the other. producer: the instruction producing the result that is sent to various consumers. consumer: an instruction that requires the result as a source. restricted data flow: data flow in the Tomasulo sense, wherein only that subset of the data flow graph that corresponds to instructions that have been decoded but not retired is contained. I coined the term in 1984 to differentiate it from the data flow graph of the entire program vectorizable: a loop instruction whereby each iteration of the loop is completely independent of the other iterations. vector processing: the paradigm we discussed in class for handling vectorizable loops. SIMD: single instruction stream, multiple data sets. Sometimes called data parallel. Two forms: vector processors and array processors. vector load: multiple values loaded as a result of a single load instruction. vector register: a register consisting of multiple components, useful for storing the result of a vector load, vector add, etc. vector length register: identifies how many of the components of a vector register, or vector operation, are operative. row-major order: an array stored in memory in the order: first the components of the first row, then the components of the second row, etc. column-major order: an array stored in memory in the order: first the components of the first column, then the second column, etc. stride: the distance in memory between locations to be accessed vis-a-vis successive registers. vector stride register: register containing the current stride value. vector chaining: instruction processing wherein the results of one vector operation can be forwarded to the next vector operation without waiting for all components of a vector instruction to be completed. scalar computer: what you are used to. No vector registers, no vector loads, stores, adds, etc. reciprocal approximate: a quick way to do division. a divided by b is replaced by a times 1 divided by b. The reciprocal approximate unit takes b as input and produces 1 divided by b. Backup registers: Cray program controlled cache, sort of. Eight S registers are backed up by 64 T registers. Instead of going to memory again and again, one can do a vector load to the T registers. Software managed cache: a tongue in cheek description of Cray T registers. Loop buffers: in the Cray machines, an instruction buffer containing enough storage to accommodate an entire loop. Stripmining: The process of handling more than 64 iterations of a loop, 64 at a time. Fixed point arithmetic: the generalization of integer arithmetic. The decimal point (or binary point) is always in the same place, in the case of integer arithmetic, at the far right. No bits of a data element are wasted identifying where the binary point is. 2's complement: one representation of integers 1's complement: ditto signed-magnitude: ditto word length: the size of the ALU, the size of the registers, the default value of sizes. short integer: an integer of the size of the word length. long integer: integers of size that is a multiple of the word length. Not supported by the ISA. However, software can be written to deal with long integers. Example in class: with a word length of 32 bits, the x86 can operate on 320 bit integers by means of a procedure that processes the values 32 bits at a time. Takes a loop that will iterate 10 times. Binary Coded Decimal (BCD): an encoding of integers, such that each decimal digit is encoded in 4 bits. Number of digits is usually arbitrary, which means one needs two pieces of information to specify a BCD number: starting address and number of digits. Shift and Add multiplier: classical way to do binary multiplication. Since multiplying by 1 simply means adding in the multiplicand, and multiplying by 0 means adding zero, both after aligning (shifting) properly, multiplication is usually described as a series of shifts and adds, one for each bit in the multiplier.