DeadmeatGA
Banned
http://scv.bu.edu/SCV/Archive/IBM/BGL-BU.pdf
So the keyword is simple, inexpensive and low power. I doubt IBM would design a complex, super-expesive, and power-hungry chip for SCEI....The Problem: A Hardware Perspective(Page 3)
The current approach to large systems is to build clusters of large
SMPs (NEC Earth Simulator, ASCI machines, Linux clusters)
SMPs are typically designed for something else (sweet spot of market)
and then combined “unnaturallyâ€
Very expensive switches for high performance
Very high electrical power consumption: low computing power density
Significant amount of resources (particularly in memory hierarchy)
devoted to improving single-thread performance
Would like a more modular/cellular approach, with a simple building
block (or cell) that can be replicated ad infinitum as necessary –
aggregate performance is important
Cell should have low power/high density characteristics, and should
also be cheap to build and easy to interconnect
Approach: Cellular Systems Architecture(Page 5)
A homogeneous collection of simple independent processing units
called cells, each with its own operating system image
All cells have the same computational and communications
capabilities (interchangeable from application or OS view)
Integrated connection hardware provides a straightforward path to
scalable systems with thousands/millions of cells
Challenges:
Programmability (particularly for performance)
System management
Fault tolerance and high availability
Mapping of computations to cells
BlueGene/L Fundamentals(Page 14)
A large number of nodes (65,536)
Low-power nodes for density
High floating-point performance
System-on-a-chip technology
Nodes interconnected as 64x32x32
three-dimensional torus
Easy to build large systems, as each
node connects only to six nearest
neighbors – full routing in hardware
Cross-section bandwidth per node is
proportional to n2/n3
Auxiliary networks for I/O and global
operations
Applications consist of multiple
processes with message passing
Strictly one process/node
BlueGene/L Fundamentals (continued)
Machine should be dedicated to execution of applications, not
system management
Avoid asynchronous events (e.g., daemons, interrupts)
Avoid complex operations on compute nodes
The “I/O node†– an offload engine
System management functions are performed in a (N+1)th node
I/O (and other complex operations) are shipped from compute node to
I/O node for execution
Number of I/O nodes adjustable to needs (N=64 for BG/L)
This separation between application and system functions allows
compute nodes to focus on application execution
Communication to the I/O nodes must be through a separate tree
interconnection network to avoid polluting the torus