Stream Processor @ 45nm= 10 watts & over TFLOP

Brimstone

B3D Shockwave Rider
Veteran
Here it is in writting. It's at the end of the document.


Using the scaling techniques presented in this paper, by
2007, stream processors with 1280 ALUs will be able to
provide a peak performance of over 1 TFLOPs while dis-sipating
less than 10 Watts.
This presents several exciting
challenges in stream processing. One area of future work is
architectural optimizations that will enable even higher area
and energy efficiency, such as utilizing non-fully-connected
crossbars for the intracluster and intercluster switches. An-other
area of future work is to compare these two scaling
techniques compare to multiple stream processors on a sin-gle
chip simultaneously executing different kernels of one
stream program. As software tools for exploiting these two
techniques mature, the performance and cost advantages of
these and other scaling techniques can be explored.

ftp://cva.stanford.edu/pub/publications/khailany_im_scalability.pdf


If Sony does launch the PS3 at the .45 nm node watch out!
 
I can't tell you that the PS3 will be capable of 1TFLOP or not as ultimately I have no idea on how fast the BE/ CPU will clock, except > 1GHz. But IMHO, I think the PS3s chipsets will be released on 45nm and the release date of the PS3 will ultimately depend on that process being ready! :)
 
I thought the consensus was that Sony would launch using 65nm chips, then strink the process later... did I miss something?

Anyway, I still feel 45nm anytime in 2006(on such a relatively mass scale) would be a little risky(not impossible). Everything would have to be running "smoothy" many months before that time. I guess we'll see...
 
What kind of memory architecture is that stream processor supposed to use to be able to feed a 1Tflop processing rate? Or is it peaking at 1Tflop for a few ms before it runs out of cache space and then crashes back to ground again when processing becomes I/O bound?
 
Megadrive1988 said:
if PS3 launches using 45 nm chips, what does that say about release timing?

If they use 45nm earliest launch would be late 2006 , with early 2007 most likely .

If ms launches in 2005 and sony doesn't launch till 2007 there is going to be some big problems for sony. A year head start they can handle , even 2 years they can handle i think . Its just that thier generation will be squeezed from both ends .

They may have the most advance console tech wise but the other consoles will be half way through thier life by the time sony launches .

Which is why i believe an early 2006 launch at 65nm with a shrink to 45nm in 2007.
 
Well this patent to me looks like it hints that something is going to have a stream buffer cache. Maybe I'm looking at it totally wrong.



Methods and apparatus for controlling hierarchical cache memory


[0006] The particularities of the known algorithms for taking advantage of locality of reference, or any other concept, for controlling the storage of executable programs and/or data in a cache memory are too numerous to present in this description. Suffice it to say, however, that any given algorithm may not be suitable in all applications as the data processing goals of various applications may differ significantly.

[0010] Unfortunately, invalidating data in higher level cache memories, such as the L1 cache memory, as dictated by the conventional control technique results in an overall lower throughput for the microprocessing system.
Indeed, use of, for example, the L1 cache memory would not be optimized if the data therein were unnecessarily invalidated. This may result in cached instructions or highly accessed data in a loop-body being unnecessarily invalidated, as often happens when a very large data array is accessed.

[0011] Accordingly, there are needs in the art for new methods and apparatus for controlling a cache memory, which may include an L1 cache memory, an L2 cache memory and/or further lower level cache memories, in order to improve memory efficiency, increase processing throughput and improve the quality of the overall data processing performed by the system.


This part here makes me think of a Stream Register File and a Stream Buffer on the Imagine chip.

[0043] The further memory 104 may be a next lower level cache memory or main memory. Indeed, the invention contemplates any number of hierarchical cache memories between the microprocessor 101 and main memory.

[0044] The structure and operation of the apparatus 100 will be better understood with further reference to FIG. 2, which is a flow diagram illustrating certain actions carried out by, or in association with, the apparatus 100. For the purposes of discussion, it is assumed that the first level cache memory 102 is an L1 cache memory, the lower level cache memory 103 is an L2 cache memory, and the further memory 104 is main memory.


Here are some quotes from a paper on cache optomized for streaming.

A crucial problem in media and network processing is
achieving efficient memory performance. Stream-based
applications commonly spend 25-50% of their execution time in
memory stalls on standard cache-based memory systems. These
penalties arise because the memory access patterns for these
applications does not correspond well with standard memory
systems. Standard cache hierarchies were designed for general-purpose
applications, which retain most of their data in on-chip
data cache/memory and stream program instructions in from off-chip,
as shown in Figure 1.1a and Figure 1.1b, respectively.
These applications have good temporal and spatial locality, and
are ideally supported by cache-based memory hierarchies.
Conversely, media and stream-based applications stream much
of their data memory on and off chip and retain their instruction
code in on-chip instruction memory/cache [1,2]. Such streaming
memory patterns offer poor temporal locality and cause many
compulsory misses in standard cache-memory hierarchies.
Improved memory performance for media and stream processing
requires efficient support for streaming memory.

MULTI-LEVEL MEMORY PREFETCHING FOR MEDIA AND STREAM PROCESSING


The "magic" of the Imagine stream processor is from the Stream Register File.

The way I understand it, the SRF is just a Level 1 cache desinged differently from a tradional processor CPU's level 1 and 2 cache hierarchy. What the "Methods and apparatus for controlling hierarchical cache memory " patent is hinting at is a cache hierarchy structure like the Imagine Stream processor.

This paper calls the Imagine processor "a stream-based architecture with a bandwidth-efficient register organization". How are the registers and cache in CELL organized? Instead of a cache memory hierarchy, will it be more like a multi-level memory prefetch hierarchy?




A Bandwidth-Efficient Architecture for Media Processing



This patent here describes the same way vectors work on a the Stanford Stream process, I think. Decentralized Registers?

Apparatus and method for updating pointers for indirect and parallel register access
 
Back
Top