RAM latency in PS3, Xbox 360

Shifty Geezer · Jun 17, 2005

So XDR is cheaper than DDR, in the long run. It's not aout being technically superior. And the DDR is there because...I thought it was cheaper. And it's not like RSX couldn't be gicen an XDR interface. Two pools of 256 Mb RAM (or 1 512 Mb pool) would surely be easier to manage and cheaper to source.

I'm confused! :?

London Geezer · Jun 17, 2005

Shifty Geezer said:
So XDR is cheaper than DDR, in the long run. It's not aout being technically superior. And the DDR is there because...I thought it was cheaper. And it's not like RSX couldn't be gicen an XDR interface. Two pools of 256 Mb RAM (or 1 512 Mb pool) would surely be easier to manage and cheaper to source.

I'm confused! :?

Or maybe they weren't sure which ones would become cheaper in the long run, so they just put half and half just to be sure!

Shifty Geezer · Jun 17, 2005

You may well be right. I remember discussions here on which console costs more to produce, and the talk was 'XDR's cheaper because Sony makes it themselves (and now we're told it's simpler too)' and 'DDR's cheaper because it doesn't have RAMBUS's markup and there's loads of sources competing'. I reckon Sony were watching, got as confused as I am, and whacked in a bit of both to be on the safe side. It's the only logical explanation

AlgebraicRing · Jun 17, 2005

nAo said:
AlgebraicRing said:

What's the cost for copying memory from main memory to an SPE's local memory?

Click to expand...

Please define 'cost'.

I'm just worried about the situation where all 7 SPE's need to fetch or write main memory. If simultaneous reads/writes are not possible then I've got to wait 500*7 or 3500 cycles for the last SPE to get memory... Or am I thinking about the situation wrong?

Click to expand...

CELL mem controller can handle 128 simoultaneous memory transactions in order to better hide/reduce memory latencies as more memory pages can be opened at the same time.

I don't really know what I am thinking of in terms of cost. I would need to look at the programming model more closesly. But essentially if I wanted to send a task to be computed by an SPE, what needs to be done to package up the main memory data (and code???) and send it to the SPE to be processed? And if the data is more than 256KB, is my SPE going to sit idle while waiting for another 500 cycle mem fetch? Can I optimize the situation by pre-fetching 128KB at a time, and while the SPE is working on the first half of its local memory, I can be populating the second half of it's local memory?

Let me give an overview of where I am coming from. I am researching for my professor about whether or not we should port his language. SequenceL. to a multicore processor, specifically the Cell processor. The language is functional, so of course parallelization is easier at the compiler level, but the language's semantics is set up to make the parallelisms explicit to the programmer's eye. My task is to determine whether or not creating a Cell specific port would be worthwhile (i.e. that the Cell architecture would highlight any benefits or advantages to using the language). My hunch is that the structure of the language would make it very easy to create a compiler/scheduler for the cell processor. I need to present on the possibility of implementing SequenceL on a multicore architecture by next Friday, just a 10-15 minute presentation. I think the Cell is a perfect match because it gives the most parallel SIMD punch for the buck, and SequenceL is all about vectorization. I would just like to be more informed about the specifics of Cell, though. How well can I keep the SPE's churning with data processing? When is there going to be forced or required idle times? etc.

Got any papers you could point me to about the memory access questions? I've read that the SPEs have 128 registers. But I would love to learn more about the memory controller. Got a link?

Shifty Geezer · Jun 17, 2005

Ideally you want the opensource IBM docs, but they're not out yet. AFAIK the programmer has full control over memory access and schedules memory fetches in advance of needing them. This I think overcomes the latency so the moment you need the data (if you've set it up effeciently) it's already on it's way to the SPE. As I understand the requirements of the architecture have been pretty well thought out, and although implementation is more complex (or at least different) from throwing instructions at Pentium/PPC, when done right there's no major disadvantage.

Incedentally I think the same goes for XeCPU, so by structuring you caching you can hide latency. I don't think writing for the diffeent platforms is going to be too different.

Fafalada · Jun 17, 2005

AlgebraicRing said:
Can I optimize the situation by pre-fetching 128KB at a time, and while the SPE is working on the first half of its local memory, I can be populating the second half of it's local memory

You can have 16 outstanding memory requests on each SPE, so to use analogy to your example, local memory can be split into 16banks all being loaded while you work on... 17th

Transactions can also be started from SPE and PPE sides, so you don't actually need any overhead on PPE to give SPE work. Don't know what the DMA setup and tag overhead is yet, that's a question I'd like to see answered myself also.

Tacitblue · Jun 17, 2005

If somebody can properly translate this migraine inducing mess, it might be useful.

http://pc.watch.impress.co.jp/docs/2005/0415/kaigai170.htm

Rough translation here.

http://www.worldlingo.com/wl/transl...ch.impress.co.jp/docs/2005/0415/kaigai170.htm

jvd · Jun 17, 2005

london-boy said:
Shifty Geezer said:

So XDR is cheaper than DDR, in the long run. It's not aout being technically superior. And the DDR is there because...I thought it was cheaper. And it's not like RSX couldn't be gicen an XDR interface. Two pools of 256 Mb RAM (or 1 512 Mb pool) would surely be easier to manage and cheaper to source.

I'm confused! :?

Click to expand...

Or maybe they weren't sure which ones would become cheaper in the long run, so they just put half and half just to be sure!

Who knows , i doubt they even know . Rambus has said they wanted a premium for thier ram , its also not in mass production and will basicly be made for the ps3 as I see no other products slated to use it . So not only will it start off higher because of the premium but it will scale slower as one product uses it .

With gdr ram its widely used and has been in mass production for years . Its first used in high end graphics cards and gets a premium price when its first introduced at a new speed , but in a year or two the ram will be in the low end parts selling at very low prices .

Ms and sony don't have to worry about the ram supply drying up as they can allways use faster ram and it doesn't look like gdr ram will be phased out for another 2 years or so and even then it will be at the high end .

So its hard to say which one will be cheaper at the end . But for the first few years i believe it will be gdr ram not xdr

AlgebraicRing · Jun 18, 2005

Tacitblue said:
If somebody can properly translate this migraine inducing mess, it might be useful.

http://pc.watch.impress.co.jp/docs/2005/0415/kaigai170.htm

Rough translation here.

http://www.worldlingo.com/wl/transl...ch.impress.co.jp/docs/2005/0415/kaigai170.htm

Google's translation is a ~little~ clearer:

http://translate.google.com/transla...e=UTF-8&oe=UTF-8&prev=/language_tools

Laa-Yosh · Jun 18, 2005

Okay, so can anyone give a definite answer on RAM latency for the PS3? I've seen a value of 150 clocks, but that was a totally unofficial source, so I have my doubts...

version · Jun 18, 2005

Laa-Yosh said:
but that was a totally unofficial source

"Toshibaâ€™s XDR memory chips are configured as 4Mb word x 8 banks x 16 bits, are available with 40ns, 50ns and 60ns cycle time and 27ns or 35ns latency and have 1.8V VDD."

http://www.xbitlabs.com/news/memory/display/20031225163917.html

DeanoC · Jun 18, 2005

version said:
"Toshibaâ€™s XDR memory chips are configured as 4Mb word x 8 banks x 16 bits, are available with 40ns, 50ns and 60ns cycle time and 27ns or 35ns latency and have 1.8V VDD."

http://www.xbitlabs.com/news/memory/display/20031225163917.html

That sentence and any figures derived from it has very little to do with with the real world CPU latency...

I'd suggest re-reading this very thread for some very real numbers...

Gubbi · Jun 18, 2005

version said:
Laa-Yosh said:

but that was a totally unofficial source

Click to expand...

"Toshibaâ€™s XDR memory chips are configured as 4Mb word x 8 banks x 16 bits, are available with 40ns, 50ns and 60ns cycle time and 27ns or 35ns latency and have 1.8V VDD."
http://www.xbitlabs.com/news/memory/display/20031225163917.html

A more direct link

Cycle time is just wrong. 40ns equates to a 25MHz cycle. And cycle times higher than latency

Cheers
Gubbi

aaronspink · Jun 19, 2005

Gubbi said:
Cycle time is just wrong. 40ns equates to a 25MHz cycle. And cycle times higher than latency

Cheers
Gubbi

You don't understand how DRAM works...

40nS cycle time is very aggresive. most likely they are using 50ns or 60ns cycle time parts.

Cycle time generally refers to random access requiring a full pre-ras-cas cycle for each access while latency generally refers to either having the bank closed and clean requiring only a ras-cas or the page open requiring only a cas.

Aaron Spink
speaking for myself inc.

Laa-Yosh · Jun 19, 2005

DeanoC said:
I'd suggest re-reading this very thread for some very real numbers...

All I've found was this from ERP:

They're both comparable, GDR is faster, but both are in the 500+ cycle range for a cache miss.

So, am I right to assume that both Cell and XCPU has a memory latency of ~500 clock cycles?
And is this really such a big drawback, because of the lack of OOE?

psurge · Jun 19, 2005

I believe that the 40ns refers to tRC, which IIRC is the amount of time it takes to open a different page (row) from the one currently open in a given bank of the memory chip. AFAIK this is usually some multiple (10+) of the DRAM cycle time... I have no idea what the latency number corresponds to, but maybe it's best case latency inside the chip (no bank conflicts, page miss, or R/W turnaround)?

ERP · Jun 19, 2005

Laa-Yosh said:
DeanoC said:

I'd suggest re-reading this very thread for some very real numbers...

Click to expand...

All I've found was this from ERP:

They're both comparable, GDR is faster, but both are in the 500+ cycle range for a cache miss.

Click to expand...

So, am I right to assume that both Cell and XCPU has a memory latency of ~500 clock cycles?
And is this really such a big drawback, because of the lack of OOE?

OOOE wouldn't help you with a 500cy latency.

OOOE is more about hiding instruction latency and the latency from the L1 and L2 caches which are still pretty significant.

psurge · Jun 19, 2005

ERP - but OOOE also issues independent loads/stores close to the memory instruction that missed the caches. If these nearby memory ops also miss the cache, and can be serviced in parallel to the initial cache miss, won't the total stall time be significantly reduced compared to the in-order machine? Or is this a rare/insignificant effect on most codes?

ERP · Jun 19, 2005

psurge said:
ERP - but OOOE also issues independent loads/stores close to the memory instruction that missed the caches. If these nearby memory ops also miss the cache, and can be serviced in parallel to the initial cache miss, won't the total stall time be significantly reduced compared to the in-order machine? Or is this a rare/insignificant effect on most codes?

Sure it might save you 20 cycles out of 500......
The point is to hide instruction latnecies and cache HIT latencies, even a L1 cache at 3+GHz isn't 0 clock latency, and L2 is well into double figures.

Think about it, why do you think bigger caches make such a big differences on the OOO Pentiums?

Laa-Yosh · Jun 19, 2005

Can I please get just a yes or no for Cell + 500clk memory latency?

RAM latency in PS3, Xbox 360

Shifty Geezer

uber-Troll!

London Geezer

Shifty Geezer

uber-Troll!

AlgebraicRing

Shifty Geezer

uber-Troll!

Fafalada

Tacitblue

jvd

AlgebraicRing

Laa-Yosh

I can has custom title?

version

DeanoC

Trust me, I'm a renderer person!

Gubbi

aaronspink

Laa-Yosh

I can has custom title?

psurge

ERP

psurge

ERP

Laa-Yosh

I can has custom title?

Similar threads