RSX access to XDR latency issues again.

SPM said:
I think it is because XDR is more expensive. GDDR3 cannot connect to more than 2 devices (which is presumably why Xenon accesses GDDR3 via Xenos on the Xbox360, as Cell does via RSX on the PS3). XDR is required for Cell because it connects to the PPE and 7 SPE's as well as the RSX or another Cell chip.
Um, no...

XDR memory devices do not connect to the SPEs or PPC directly of course, but to a memory controller in that case as well, which in turn connects to the internal databus of Cell, which has ports for the SPEs and PPE as well.

It's the same with GDDR3, it speaks with a memory controller, which in turn connects to the internal datapaths inside the chip.
 
Guden Oden said:
Um, no...

XDR memory devices do not connect to the SPEs or PPC directly of course, but to a memory controller in that case as well, which in turn connects to the internal databus of Cell, which has ports for the SPEs and PPE as well.

It's the same with GDDR3, it speaks with a memory controller, which in turn connects to the internal datapaths inside the chip.

Yes, but XDR is a point to point connection which allows up to 36 devices to be connected end to end with very low latency. GDDR3 is a multidrop connection (ie. lines tapped off a bus line) which can't do this - only two devices can be connected to the bus. Flex i/o and the ringbus within Cell looks like they are point to point as well like XDR. The whole issue about latency on the RSX to XDR connection raised earlier was that if GDDR3 was used, the latency would be excessive. In other words end to end connections using GDDR3 controllers or multidrop bus connections to connect together the RSX, PPE, and 7SPEs and possible other Cell chips together would create too much latency. It has been suggested that XDR,Flex i/o, and Cell ringbus are serial links - in other words each link receives data, stores it in a register and then passes it on. This might not be the case. The articles suggest that the XDR point to point interface electronically eliminates reflections that occur at drops in multidrop connections, and that Flex i/o also adjusts phase alignment to allow extreme clock rates to be used. XDR, Flex i/o and Cell's ringbus may not be serial links at all, but a direct switched bus link passing all the way through but with reflection and phase compensating logic at every connection (ie. an electronically compensated tap rather than a physical wire tap). I may be completely wrong of course, but that is how I read the two links below.

http://www.edn.com/article/CA629315.html?ref=nbra
http://www.rambus.com/products/flexio/index.aspx

One other advantage of using an non-unified memory is double the memory bandwidth - particularly if Cell is mainly using XDR while RSX is mainly using GDDR3.
 
Last edited by a moderator:
_xxx_ said:
No, when you ramp up the clock, that leads to increased latency. So the highest clocked (overall fastest) version has the highest latency. Think of it like with DDR-->DDR2, about the same situation.

Everything else held equal, faster ram has lower latency.
For instance, CAS3 PC3200 has lower latency (in actual time) than CAS2 PC2100. In most situations, it seems that a net increase in bandwidth but a net increase in actual latency results in lower performance, whereas if they both get better or just latency decreases, performance increases. There are exceptions to this, such as applications that are more bandwidth dependent (video cards), but generally actual utilized bandwidth depends on latency anyway.
 
I was thinking of DDR-400 vs DDR2-800. Both have a command rate of 200MHz, both can be had with similar CAS latency measured in time: 1.5 cycles for DDR, 3 cycles for DDR2, both = 3.75ns.
Okay... I was thinking you were comparing equal bus speeds rather than equal DRAM speeds (DDR-400 vs. DDR2-400).

Yes, but XDR is a point to point connection which allows up to 36 devices to be connected end to end with very low latency.
36 XDR DRAM devices through one channel controller. And those 36 devices comes from the fact that one controller channel is 32 bits + 4 bits of ECC. So if you have x1 XDR DRAMs, each device covers 1 bit of that channel, so you can have 36 devices. I believe x8 and above all carry the ECC with them, so you get 2 x16 devices per channel, PS3 CELL has 2 channels, giving you 4 DRAMs. There's only a single XDR memory controller on CELL, and whether SPE or PPE, you push stuff onto a DMA command queue.

The articles suggest that the XDR point to point interface electronically eliminates reflections that occur at drops in multidrop connections, and that Flex i/o also adjusts phase alignment to allow extreme clock rates to be used.
Well, the skew compensation is part of it, but limited by the local implementation at the controller. The main thing, I figured, that enabled it to handle extreme clock rates was the tiny voltage swing (0.2V).
 
ShootMyMonkey said:
Presumably then, we should be expecting 2.5 in the PS3? That sounds unusual, in spite of the fact that it does divide evenly into the DRAM clock. 3.2 GHz transfer rate implies a DRAM clock of 400 MHz, and 2.5 ns would be 1 cycle at 400 MHz.

Also, given that the fastest clock should be 500 MHz (as the top rated XDR I know of is 4 GHz effective -- "XDR2" notwithstanding), again 3.33 doesn't add up. As it so happens, the slowest rated XDR is 2.4 GHz effective for a 300 MHz DRAM clock, which makes sense with a 3.33 ns latency. So I have to question whether this figure is supposed to be a "total" latency or just a latency between packet requests or something (i.e. that the whole thing is fully pipelined).


I'm not sure which speed DDR2 you're comparing against what speed DDR, but DDR2 is worse latency in absolute time as well since the DRAM itself is clocked at half the speed of a DDR DRAM for the same bus speed (and since the latency in clock cycles is not exactly half, it works out as a net loss). In general, I think a DDR2 DIMM at a bus speed around 566 should have the same absolute latency as a DDR DIMM at 400.


That number has nothing to do with latency and is actually just a base cycle time number.
 
Back
Top