Byte Addressable Little endian Big endian ISA Microarchitecture Addressability Data types Fixed point Unary Binary Decimal Virtual Machine Science of Trade-offs Fault tolerent computing Adders: Ripple carry, lookahead carry Fan-in Design points Transformation Hierarchy Design for high performance, low power, availability, security Caches - historically uArch now ISA Touch/Prefetch/Poststore instructions Determinism - Cray machines. S & T registers Memory Hierarchy Fast vs big run time vs compile time recompile at run-time x86 loop construct Probe instruction Fixed length vs Variable length Pipeline Flynn's bottleneck Fetch stage Decode stage Wimpy decoders Bits per instruction LD/ST Architectures Uniform decode Huffman encoding of istream Three kinds of instructions: operate, data movement, and control NAND is complete logic Test & Set Condition codes Condition code - Advantages and Disadvantages Steering bit 0-Address machine stack machine 1-Address machine Accumulator 2-Address machine 3-Address machine Special I/O instructions Memory mapped I/O Multiple condition code Compound predicate -- collapsing branch Memory mapped I/O More registers - Advantages and Disadvantages Power aware computing Precision architecture EditPC (VAX) Index instruction (VAX) IEEE Double Extended Capability based machine Bit vector Position independence Interrupt latency Realtime MIPS vectoring interrupts Processor status register RISC live out Register Spilling Addressing Modes Indirect addressing PC relative addressing Binning Overlaying the address space Performance equation Clock skew Wave-pipelining Critical path Speed path Cycles per instruction Bread & Butter design Hardwired vs. Microprogrammed Control ROM vs. PLA Control Store Diode "The Refrigerator" PSR Privilege vs. Priority Trace bit Compatability bit 1's catching property SRAM vs. DRAM Page mode Row Address Strobe Column Address Strobe Granularity Segment Registers DRAM refresh SECDED ECC bits (Parity/Checksum) Hamming Distance Interleaving memory Tristated outputs Unaligned access Virtual Memory Interleaving for concurrency Interleaving for hiding latency Thrashing Von Neuman Architecture Harvard Architecture Pages Frames Resident Access Control 4-levels of access Translation Working Set Balance Set Page Fault Faults vs. Traps Page Table Entry(PTE) Program Region Control Region Valid bit Protection bits Modified bit Reference bit Page replacement Physical Frame Number System Page Table Process Page Table Base Register Length Register --------------------Exam 1 ----------------------- Virtual Memory ============== Pages Frames Resident Access Control 4-levels of access Translation Working Set Balance Set Page Fault Faults vs. Traps Page Table Entry(PTE) Program Region Control Region Valid bit Protection bits Modified bit Reference bit Page replacement Physical Frame Number System Page Table Process Page Table Base Register Length Register VAX Two-level translation PxBR / SBR Page fault latency TLB Context switching Segmentation Cache Memory ============= memory hierarchy Access Latency prefetching post-storing touch instruction spatial locality temporal locality tag store data store dirty bit reference bit hit ratio cache block index bits Tag bits Tag store entry direct mapped cache Set n-way set associative cache fully associative cache byte within block content-addressable memory (CAM) write back / write through replacement algorithms perfect LRU pseudo LRU FIFO Random Victim/Next Victim Tree(Triangle) replacement Instruction/data caches Virtual/physical caches Interprocess communication Cold start problem Runtime Activation Record Context switch Write Back/Write through Virtually indexed physically tagged Write allocate Sector cache Uniprocessor cache consistency Intelligent I/O Direct memory access Inclusion property Interrupt/Exception =================== interrupt exception precise exception fault trap difference between interrupts & exceptions cause when to handle mask context priority polling (interrupt)/ spin-waiting interrupt enable bit Autoincrement architectural state internal state interrupt[exception] vector machine check power failure trace bit sticky flag I/O === asynchronous/synchronous handshaking protocol ethernet polling/interrupt/controller Backdoor Bus DMA (Direct Memory Access) I/O processor Access method of I/O CAM/RAM/ Random access/direct access(DASD)/sequential access Interrupt priority Level I/O structure medium/device/controller transaction transfer/arbitration central/distributive arbitration master/slave Request Grant SACK data/address/control Multiplexing pending bus / split-transaction bus priority arbitration unit daisy chain race condition Aerial Density Rotation Seek Time UniBus Burst Starvation Urgent preempts less urgent Master Synch Slave Synch RAID 0 - 5 Performance/Reliability/Capacity RAID 1: Mirroring RAID 2: ECC RAID 3: Parity and Interleaving RAID 4: Parity with a file per disk RAID 5: Spread parity bits to rotate Processor Speedups ================== SSI MSI LSI VLSI Bit Slice Wafer Fault tolerant computing Triple modulo redundancy Fault avoidance Single point of failure Pipeline Register Fetch Decode Execute Speed up Flow dependency RAW Hazard Vector Processor Scalar Processor Superscalar Super pipelined non-deterministic caches Vector load Vector multiply Instruction buffer Loop buffer Setup time/ Hold time Scoreboard Strip-mining Semantic Gap uOps ROps (AMD) Load Context Instruction (VAX) Vectorizable or Parallelizable Recurrence relation Vector length register Row major order Stride Vector chaining Out-of-order execution Data forwarding Scoreboarding Resource problem Stall Tomasulo's alogorithm Output dependency Register renaming Register alias table Reservation station -----------------Exam 2--------------------------------- Processor Speed-ups =================== Scoreboarding Interlock MIPS Renaming Reservation Station(Node Table) Dataflow Register Alias Table Unknown address problem (Memory Disambiguation problem) Program Order Precise Exception Retire Dataflow graph Data-driven Execution Disadvantages of Dataflow Barrier Synchronizer Safe vs Queue Irregular Parallelism Restricted Dataflow Window (Instruction In-flight) Copy instruction Reorder Buffer Architectural Registers Branch Predictor ================ Speculative vs. Mandatory Delayed Branch Delay Slot Branch Prediction Squashing Predicate Predicated Execution Predicate Registers CMOV Always Not Taken Always Taken Profiling Representative Last-Time Predictor Prediction Accuracy Hysterisis Two-bit Counter Saturating Addition Two-level Predictor Adaptive Predictor Branch History Register Pattern History Table Interference Equivalence Classes GAg, PAp, SAs ... Hybrid Predictor Floating Point Arithmetic ========================= Sign Exponent Fraction Normalized Number Mantissa(significand) Fixed-Point Integer Integer Execution Unit Floating Point Execution Unit Double Precision / Single Precision IEEE Floating Point Standards Inexact Excess Code Bias Radix Rounding Chop Round-up Round-down Round-to-nearest Binade ULP Infinity Decade Hexade Wobble Error due to rounding Round-to-Even Sticky bit NAN(Not A Number) Divide-by-zero Overflow / Underflow Gradual Underflow Error due to underflow Subnormal numbers Multiprocessing =============== Simultaneous Multi-Threading(SMT) HEP Concurrency Interconnections Cost / Contention / Latency Bus Crossbar Omega Network Mesh Hypercube Tree X-Tree Ring Speed-up Horner's rule Parallizability Sequential Bottleneck Maximum Speedup Amdahl's Law Redundancy Efficiency Utilization Cache Coherency ================ Directory Scheme Snoopy Cache Goodman Scheme(Write Once)