interrupts: external events that cause the processor to stop processing the
current process, and after putting the machine in a consistent state, vector
to the starting address of the service routine specified by the specific 
vector.

exceptions: events that are internal to the process running, such as illegal
opcode, page fault, floating point underflow, etc., that cause the processor
to stop processing the current program and deal with the exceptional event.
Like interrupts, it is necessary to first put the machine in a consistent state
and then vector to the starting address of the exception service routine.  The
service routine that is executed depends on the exception which has occurred.
Each exception (or set of exceptions) has its own vector to tell the machine
which service routine.

service routine (handler): see above.

interrupt vector: the bit pattern identifying the particular interrupt, 
which is used to initiate the correct service routine.

exception vector: see interrupt vector

consistent state: the state of the machine between the execution of 
instructions.  In the middle of execution of an instruction, the internal
state of the processor can be whatever the microarchitecture wishes.  Even
general purpose registers can be clobbered if the microarchitecture wishes.
BUT, at the end of each instruction, the contract with the software requires 
that the state of the machine be recoverable to the state produced by the 
program up to that point.  That point is the consistent state.
 
power failure: loss of power causing a very high priority interrupt.

machine check: the highest priority interrupt, indicating that the processor
is generating bogus information.  Cause: a hardware malfunction.

sticky bit: a condition code that once set retains its condition until 
interrogated under program control.

interrupt/exception priority level: priority at which an event requests
interruption of the running program.  Usually, the priority at which the
service routine runs.  Priority depends on urgency of the event.

maskable: the ability to ignore an event so as to defer handling, or even
deny handling altogether.

medium: the material containing the information  -- the hole in a punched
card, the magnetic material on the disk, etc.
 
device: the I/O unit that houses all the stuff that causes the information
to be stored and transformed from one form to another.  Disk drive, for example

transducer: the thing that tranduces from one medium to another.  From light
passing through a punched card to the electrical signals.  From the magnetized
material on the disk track to the electrical signals.

polling: I/O activity controlled by the processor executing its program.

interrupt driven I/O: I/O activity controlled by the device requesting the
processor interrupt what it is doing.

DMA: direct memory access.  Allows memory and the disk (for example) to 
transfer information without tying  up the processor.

I/O processor: a special purpose processor that manages I/O activity, including
doing some processing on that information which is being transferred.

dedicated bus: bus dedicated to a single purpose.

platter: colloquial word to describe one of the surfaces of a disk.

track: on a platter, a complete circle, containing bits of information.

aerial density: the number of bits per unit length along a track.

disk head: that part of the mechanical arm that contains the ability 
to read the magnetic material stored on the track, or write to the track.

cylinder: the set of tracks that can be read simultaneously.

seek time: the time it takes the head to get to the desired track to read or
write the information.
 
rotation time: the time it takes the correct starting point on the track to
come under the head where it can be read/written.

disk crash: when the head actually hits the spinning platter.  ...and scores
the surface, usually.  BAD.

disk block: the unit of storage accessed from the disk.  512 bytes, for 
example.

redundancy: the concept of using more than n bits to store n bits of 
information in order to recover in case one or more of the bits get 
corrupted.  Parity provides one extra bit of redundancy.  ECCL provides logn
+ 1 extra bits of redundancy.

RAID-0: striped disk array with no redundancy.

mirroring: aka RAID-1. Two disks to store the information of one.  Each bit
has its mirror bit.

striping: interleaving, when applied to disk.

RAID-1: see mirroring

RAID-2: array with ECC coding across the disks

RAID-3: array with parity coding.  All parity bits in the same disk.

RAID-4: array with larger units on interleaving, so an entire file can
be accessed on the same disk.

RAID-5: RAID-3, except the parity bits are stored across all the disks,
getting rid of the hot spot of a parity disk.

data lines: lines containing data bits.  

address lines: ditto, except address bits.

control lines: ditto, except control bits.

multiplexed bus: a bus that is used to store address bits at some times and
data at other times.

pending bus: a bus with the property that once the bus is grabbed for a 
transaction, the bus is tied up (often twiddling its thumbs) until the
transaction has completed.

split-transaction bus: a bus that, instead of twiddling its thumbs, releases
the bus to allow other transactions to use it while it is waiting for the
second half of the transaction.

pipelined bus: split transaction bus where the order of the second half
maintains the order of the first half of each bus transaction.

synchronous bus: a clocked bus. 

asynchronous bus: no clock, bus cycles take as long as they need to take.
Since no clock, handshaking is required.

handshaking: the interaction that starts off each bus cycle that is required
since there is no clock.

arbitration: the act of determining who gets the bus for the next bus cycle.

central arbitration: arbitration is done by one central arbiter.  (e.g., PAU)

distributed arbitration: each device controller has enough information to know
whether or not it gets the bus the next cycle.

priority arbitration unit (PAU): the central arbiter.

bus cycle: one unit of bus transaction time.

bus master: device controller in charge of the current bus cycle.

slave: the other controller that participates in the bus transaction, under
the control of the bus master.

vertical priority: priority determined by the particular BR line.

horizontal priority: priority determined by proximity to the PAU, among all
controllers having the same vertical priority.

device controller: that which interfaces the device to the bus, and manages
all accesses to/from the device.

bus request: signal from a controller requesting a bus cycle

bus grant: signal from the PAU granting the bus

daisy chaining: mechanism by which all controllers with the same vertical
priority receive the bus grant in order, and pass it on to the next if
it does not want it. 

burst mode: successive data transfers in the same bus cycle.

SACK: signal output by the bus grantee signifying it will be the next bus
master.

MSYN: signal used by the bus master in synchronizing the transaction.

SSYN: signal used by the slave in synchronizing the transaction.

fundamental mode: asynchronous operation of a sequential machine wherein
at most one input signal changes its value at a time.

pipelining: the assembly lining of instruction processing.  Each instruction
goes through multiple steps (stages), from Fetch to Retirement/completion
of the instruction.

pipeline stages: the set of steps each instruction goes through in the
pipeline.

pipeline bubble: a hole in pipeline, caused by the inability of instructions
to flow through the pipeline.  For example, following a conditional branch,
it may not be possible to know which instruction to fetch next.  This causes
a a pipeline bubble.  ...or a load followed by the use of that load can cause
a pipeline bubble.

branch misprediction penalty: the number of cycles wasted due to branch
misprediction where everything speculatively fetched needs to be thrown away
before we can start along the correct instruction path.

data dependency: one of the three data dependencies: flow, anti, write

flow dependency: often called read after write hazard.  Example a + (b * c).
Before we can do the add, we must do the multiply.  That is the result of
the multiply must be WRITTEN before it can be READ as a source for the add.

anti-dependency: using a register to store more than one value.  Before we
write to that register, all instructions which need to read the old value
have to read it.  Sometimes called write after read hazard.

output dependency: values output from the same register must be output in
program order.

control dependency: dependency caused by control instructions, such as
branches.  Which instruction is executed next depends on which way a branch
goes is an example of a control dependency.

data forwarding: information is sent from where it is produced to where it
is needed without going first to a register.  Speeds up processing.

sequential programming model: instructions are expected to execute in
the order specified by the programmer.

stale data: data that was stored in a location but the location has 
since been reassigned so that the old data is no longer valid for new
instructions that come along.  

in-order execution: a microarchitecture that executes instructions in
program order.

scoreboard: a hardware interlock that prevent an instruction from 
sourcing stale data.

out-of-order execution: a microarchitecture that executes instructions
when they are ready for execution, regardless of their order in the program.

speculative execution: activity that is carried out before the hardware
knows it must be (mandatory) carried out.  For example, instructions
fetched as a result of a branch prediction.

advantages/disadvantages of condition codes

branch prediction: rather than wait for the condition code to be
resolved upon which a branch instruction is based (taken or not not taken)
and creating a hole in the pipeline, the hardware guesses and fetches
based on that guess.  The guess is called a branch prediction.

always taken/always not taken: the prediction is determined by the 
designers of the chip.

compile-time predictor: the guess is determined by the compiler, usually
as result of profiling.

run-time predictor: the guess is determined by the hardware on the basis of
what has been happening with the running of the program.

BTFN: guess taken if the branch is to an earlier location in the code,
not taken if the branch is to a location later in the program.  This is
consistent with the behavior of for and while constructs in high level
languages.  Best example of a compile time predictor.

last time predictor: predict whatever the branch did last time it will
do this time.

2-bit saturating counter: advance the counter depending on what the branch
does, predict according to the current high bit of the counter.

saturating arithmetic: arithmetic which does not wrap around.  That is, the
highest value serves the role of infinity.  i.e., infinity + 5 = infinity,
therefore, highest value + 5 = highest value.

register renaming: also called aliasing.  Assigning unique register names to 
independent uses of the same register.

reservation station: storage location for an instruction awaiting operands
before it can execute.

Tomasulo's algorithm: the algorithm which assigns register aliases that 
allows an instruction to move to a reservation station, while maintaining all 
producer/consumer relationships.  This maintenance of the producer/
consumer relationships is the most important single thing needed to allow
instructions to actually execute out of order.

tag: another word for the alias or distinct identifier assigned to each
instance of a register.

in-order retirement: retirement of instructions occur in the order of
their location in the program.  Necessary to obey the sequent programming
model.


register alias table (RAT): the table of aliases used by the Tomasulo
algorithm to assign tags.  Each register has a valid bit, a tag and a
value.  Depending on the valid bit, either the tag or the value is correct.

common data bus (CDB): in the data path of the IBM 360/91, the bus used
to distribute values and corresponding tags to registers in the RAT and
in the reservation stations that are waiting for the value uniquely
assigned to that tag.

scheduling algorithm: an algorithm used to decide which of several ready 
instructions should be executed next.

fire when ready: schedule the instruction for execution (that is, send
it to a functional unit) when all data that it needs is available.

data flow: the paradigm wherein instructions are executed when their data
is available, rather than when they show up in program order.

data flow graph: a directed graph showing all the instructions as nodes,
and the producer/consumer relationships as arcs.

nodes: instructions

edges: or arcs.  Shows result produced by one instruction that is sourced
by the other.

producer: the instruction producing the result that is sent to various
consumers.

consumer: an instruction that requires the result as a source.

restricted data flow: data flow in the Tomasulo sense, wherein only that
subset of the data flow graph that corresponds to instructions that have
been decoded but not retired is contained.  I  coined the term in 1984
to differentiate it from the data flow graph of the entire program

vectorizable: a loop instruction whereby each iteration of the loop
is completely independent of the other iterations.

vector processing: the paradigm we discussed in class for handling
vectorizable loops.

SIMD: single instruction stream, multiple data sets.  Sometimes called
data parallel. Two forms: vector processors and array processors.

vector load: multiple values loaded as a result of a single load instruction.

vector register: a register consisting of multiple components, useful for
storing the result of a vector load, vector add, etc.

vector length register: identifies how many of the components of a vector
register, or vector operation, are operative.

row-major order: an array stored in memory in the order: first the components
of the first row, then the components of the second row, etc.

column-major order: an array stored in memory in the order: first the
components of the first column, then the second column, etc.

stride: the distance in memory between locations to be accessed vis-a-vis
successive registers.

vector stride register: register containing the current stride value.

vector chaining: instruction processing wherein the results of one 
vector operation can be forwarded to the next vector operation without
waiting for all components of a vector instruction to be completed.

scalar computer: what you are used to.  No vector registers, no vector
loads, stores, adds, etc.

reciprocal approximate: a quick way to do division. a divided by b is
replaced by a times 1 divided by b.  The reciprocal approximate unit
takes b as input and produces 1 divided by b.

Backup registers: Cray program controlled cache, sort of.  Eight S registers
are backed up by 64 T registers.  Instead of going to memory again and
again, one can do a vector load to the T registers.

Software managed cache: a tongue in cheek description of Cray T registers.

Loop buffers: in the Cray machines, an instruction buffer containing 
enough storage to accommodate an entire loop. 

Stripmining: The process of handling more than 64 iterations of a loop,
64 at a time.

Fixed point arithmetic: the generalization of integer arithmetic.  The
decimal point (or binary point) is always in the same place, in the case
of integer arithmetic, at the far right.  No bits of a data element are
wasted identifying where the binary point is.

2's complement: one representation of integers
1's complement: ditto
signed-magnitude: ditto

word length: the size of the ALU, the size of the registers, the default
value of sizes.

short integer: an integer of the size of the word length.

long integer: integers of size that is a multiple of the word length.
Not supported by the ISA.  However, software can be written to deal with
long integers.  Example in class: with a word length of 32 bits, the x86
can operate on 320 bit integers by means of a procedure that processes
the values 32 bits at a time.  Takes a loop that will iterate 10 times.

Binary Coded Decimal (BCD): an encoding of integers, such that each decimal
digit is encoded in 4 bits.  Number of digits is usually arbitrary, which
means one needs two pieces of information to specify a BCD number: starting
address and number of digits.
 
Shift and Add multiplier: classical way to do binary multiplication.
Since multiplying by 1 simply means adding in the multiplicand, and 
multiplying by 0 means adding zero, both after aligning (shifting)
properly, multiplication is usually described as a series of shifts and
adds, one for each bit in the multiplier.