A student writes:

	Dr. Patt,

	While you were going over the omega network, you asked us 
	to design different omega networks for n=64 nodes.  I did not 
	understand what you were asking.  Could you please explain it again.

	Thank you
	<<name withheld to protect ...>> 

Sure.  I guess that did get covered pretty fast.  Aater has already asked me to write up 
the stuff I put on the board covering the cost, contention and latency for all the interconnection 
structures we went over.  And, since 
Aater is the boss...   I will hand that out on Monday and answer any questions
before we get to cache coherency.  

With respect to omega networks, recall it is the "half-way" house between a bus and a full crossbar.  
Recall it is an array of cross-bar switches that allows you to get from any node on the left to any
node on the right by going through logk n columns of k by k crossbar switches.  The crux is that with
a k by k switch, you can go to k different places with one switch.  If the output of that switch goes 
to another k by k switch, you can get to k^2 places after two switches.  If the output of the second 
switch goes to another k by k switch, you can get to k^3 places after three switches.  Each column 
of k by k switches provides such paths for every node on the left (at the input).

In class, we showed the case n=8, and k=2.  To get to all 8 nodes with 2x2 switches, we need three 
columns of switches, since 2^3 = 8.

Suppose n=16, and k=4.  Then, to get to all 16 nodes, we only need two columns, since 4^2 = 16.  
I have constructed the network below, leaving you to "connect most of the dots" since my ascii terminal 
would make a mess trying.


             ---               ---
         ---|   |----a1--a1---|   |----
         ---|4x4|----a2  b1---|   |----
         ---|   |----a3  c1---|   |----
         ---|   |----a4  d1---|   |----
             ---               ---

             ---               ---
         ---|   |----b1  a2---|   |----
         ---|   |----b2--b2---|   |----
         ---|   |----b3  c2---|   |----
         ---|   |----b4  d2---|   |----
             ---               ---

             ---               ---
         ---|   |----c1  a3---|   |----
         ---|   |----c2  b3---|   |----
         ---|   |----c3--c3---|   |----
         ---|   |----c4  d3---|   |----
             ---               ---

             ---               ---
         ---|   |----d1  a4---|   |----
         ---|   |----d2  b4---|   |----
         ---|   |----d3  c4---|   |----
         ---|   |----d4--d4---|   |----
             ---               ---
 
What I asked you to do is compute, for n=64, what the cost and latency would be using 2x2, 4x4, 
8x8 and 64x64 crossbars.

Latency is simply the number of columns, since it takes 1 time unit to get through one k by k switch.

Cost is k^2 for each switch times the number of switches.

OK?

Hope you are enjoying the Thanksgiving weekend.

Yale Patt

ps. Anticipating another question, I will also write up the equations and example for S, E, U, 
and R that we covered in class on Monday.