XDR sometime?

fehu · Nov 17, 2006

Almost everything is going serial on the pc, but in the graphic realm parallel memory is still the king.
Why?
Using some xdr variant maybe very useful to reduce the "bitness", and make performing but less expensive board and gpu.

There's something that I miss? Isn't serial the future even for the graphic card?

fellix · Nov 17, 2006

The problem is [almost] not in the technology or its implementation but the "business" politic of RAMBUS as an IP holder. And the current DDR DRAM platform is well [enough] established and developing, IMHO, just read GDDR4 spec's. Sure we all want full-blown serial interface to conquer the GFX arena, as the PCIe did and does, but for the memory tech's it takes more time, as always.

complexmind · Nov 20, 2006

no no no
the latency of XDR is very high.Only when it is on 4Ghz,XDR can catch up with DDR 400.

silent_guy · Nov 20, 2006

fehu said:
Almost everything is going serial on the pc, but in the graphic realm parallel memory is still the king.
Why?

Cost of memory?
XDR isn't any more serial than DDR or multilane PCIe. It just allows higher clock rates because of differential signaling and per-bit phase compensation but it's still a completely parallel bus.

Using some xdr variant maybe very useful to reduce the "bitness", and make performing but less expensive board and gpu.
There's something that I miss? Isn't serial the future even for the graphic card?

PCB cost would be slightly lower: contrary to DDR, you have differential signalling for the databus, so you don't reduce the amount of pins and traces. But also contrary to DDR, the trace lengths only have to be matched per differential pair, so PCB layout is much easier and thus a more compact, lower layer PCB should be possible.
GPU cost could be slightly higher though. The trace matching on the PCB for the complete databus in DDR guarantees that arrival times are ok when they enter the chip, so a signal PLL/DLL per interface should be sufficient. For XDR, some kind of magic is required to align data signals, so you need a per pin PLL or DLL to align arrival times. I don't know how cost in term of area this will be, but it won't be free.

complexmind said:
]the latency of XDR is very high.Only when it is on 4Ghz,XDR can catch up with DDR 400.

Nah. Irrelevant for GPUs: RDRAM had increased latency, but XDR has mostly solved that problem. And latency is only important for CPUs anyway.

KimB · Nov 20, 2006

silent_guy said:
Nah. Irrelevant for GPUs: RDRAM had increased latency, but XDR has mostly solved that problem. And latency is only important for CPUs anyway.

Latency is important for GPU's, too. You need to add on-die cache to compensate for any increase in latency.

silent_guy · Nov 20, 2006

Chalnoth said:
Latency is important for GPU's, too. You need to add on-die cache to compensate for any increase in latency.

In the case of XDR, we're talking about 1, maybe 2, core clock cycles, which is only a few percent more than the DDR latency. (*)

This may sound counterintuitive, but it's almost always a bad idea to increase cache size to reduce incremental latency: once a cache is already in place, increasing the size will obviously help to reduce the cache miss rate, but you need to increase it significantly for it to have an effect and even then it will never help to reduce the fetch latency of data that's not already in there.
During the system architecture phase, the cache design and sizing is almost always separated from latency mitigation design, unless your system has single threaded components that block on read.

In a multi-threaded system, the way to reduce the impact of latency is to increase the amount of outstanding reads. In practise, this usually requires little more than to increase the depth of the read data FIFO by the amount of the additional latency cycles.

Edit: (*) When I say a few percent, I don't mean as measured on the IO pins of the chip, but the total average latency, as measured from the time the read is issued to the time it arrives at the place from consumption.

KimB · Nov 20, 2006

Well, by cache I really just meant general on-die memory. In order to keep hiding the latency, you need to store more in-flight pixels, as you said, which requires more storage capacity on-die. I suppose my wording wasn't the best.

MfA · Nov 20, 2006

silent_guy said:
In a multi-threaded system, the way to reduce the impact of latency is to increase the amount of outstanding reads. In practise, this usually requires little more than to increase the depth of the read data FIFO by the amount of the additional latency cycles.

The increase in context data is a bigger pain than the input data.

Davros · Nov 20, 2006

then again if you replaced pixel shaders with random number generators you wouldnt need any onboard memory

/davros once again solves one of the major problems facing the industry

silent_guy · Nov 20, 2006

MfA said:
The increase in context data is a bigger pain than the input data.

When talking about a relatively small increase in latency, not increasing the context data wont be a problem if the fetch pipeline is already saturated with requests. (In GPU that would probably mean shaders with a low number of registers and a lot of texture fetches?) But yes, with context size staying the same, the point at which you're going to see bubbles in the execution pipeline will come quicker. I guess there's no free for all...

_xxx_ · Nov 20, 2006

complexmind said:
no no no
the latency of XDR is very high.Only when it is on 4Ghz,XDR can catch up with DDR 400.

Some 3.33 ns latency at 3.2 GHz for XDR, while we have min. 11,25 ns for DDR2-533. Next time check your sources a bit better before you post

Back on topic, I said that over a year ago. The problem is (besides the politics issues) that it's still too costly in comparison and the availability isn't all that great either.

fehu · Nov 20, 2006

_xxx_ said:
Back on topic, I said that over a year ago. The problem is (besides the politics issues) that it's still too costly in comparison and the availability isn't all that great either.

when ps3 production will go full throttle the xdr production will rise toghether, and in a year will be more affordable
then if at least a manufacturer will decide to support it the production will be increased to an acceptable level

i want to belive

nonamer · Nov 20, 2006

Unless I'm missing something, XDR does in fact seems to be the way to go. It offers basically double the per pin bandwidth. The PS3's main memory offers 25.6GiB/s at only 64-bits wide, and that's with the lowest speed version of XDR DRAM available. Potentially, XDR can deliver up to 204.8GiB/s with a 256-bit bus if I'm reading the specs right.

MfA · Nov 20, 2006

_xxx_ said:
Some 3.33 ns latency at 3.2 GHz for XDR, while we have min. 11,25 ns for DDR2-533. Next time check your sources a bit better before you post

Even column access ain't that fast.

Can't readily find random access numbers for GDDDR3/4 but for XDR you are off by a factor 10.

ShootMyMonkey · Nov 20, 2006

the latency of XDR is very high.Only when it is on 4Ghz,XDR can catch up with DDR 400.

XDR at 4 GHz effective means the DRAMs themselves are at 500 MHz, and when you've got DRAMs clocked that high, latency has to suffer, which is why it's only then equal to DDR1 at 400 MHz effective (200 MHz DRAM). If you notice, DDR-2 at 600 is about where the latencies even out with DDR at 400 for the same reason (that's for average latency, not best or worst-case). Nonetheless, the latencies are still better for XDR vs. say, GDDR-3 or 4, and putting it in the graphics arena was kind of what the OP was asking.

Some 3.33 ns latency at 3.2 GHz for XDR

Ummm... no... First of all, the number is not 3.33 ns for 3.2 GHz, it's 2.5 ns. And secondly, that is not latency of data reception -- no memory architecture will ever exist that is that fast, at least not until we crack that whole space-time thingy, that's request latency (which is really more like throughput) -- which is to say that you can issue new requests with only that little delay (basically one DRAM cycle) between each request. Doesn't mean you'll actually receive that request that fast.

_xxx_ · Nov 20, 2006

And that's better with DDR2 how exactly?

Basic · Nov 21, 2006

3.2GHz XDR can transfer 1.6Gb/s per data pin (it's differential).
900MHz GDDR3 (what's on a GF 8800) can transfer 1.8Gb/s per data pin.
According to xbitlabs, Samsung has 1.6GHz GDDR4 running in their labs (3.2Gb/s per data pin).

So GDDR3/4 beats it in bandwidth per data pin.

ShootMyMonkey · Nov 21, 2006

3.2GHz XDR can transfer 1.6Gb/s per data pin (it's differential).

Ummm... noooo... 3.2 GHz XDR sends 3.2 Gb/sec per pin. That's not what differential means. Differential means that there are two pins for the same "bit-lane", one sending the data, and one sending the inverse both at the same speed (in this case 3.2 Gb/sec), which means you can verify the correctness by making sure that the "real" pin and the "inverse" pin agree. It's essentially insurance since the voltage swing of XDR is so small (0.2 V).

GDDR-3 and 4, btw, are also differential signaling. IIRC, so is DDR-2.

900MHz GDDR3 (what's on a GF 8800) can transfer 1.8Gb/s per data pin.
According to xbitlabs, Samsung has 1.6GHz GDDR4 running in their labs (3.2Gb/s per data pin).

Again, wrong, because this is all very mixed up on which numbers refer to what -- XDR's quoted speed is almost always the *effective* data rate. The bus clock is half that (which is true for XDR as well as ALL DDR mem platforms). With GDDR3, the quoted speed is almost always the bus clock. Rarely does anyone tell you that. And NO, GDDR does NOT beat XDR in data rate per pin.

I could confuse things further by getting into the DRAM clocks, which on GDDR-3 is half that of the bus clock, while on XDR it's 1/4th. So by that, 400 MHz GDDR-3 DRAMs signal at 1.6 Gb/sec per pin, but 400 MHz XDR (and GDDR-4, I think)DRAMs get 3.2 Gb/sec per pin.

silent_guy · Nov 21, 2006

ShootMyMonkey said:
GDDR-3 and 4, btw, are also differential signaling. IIRC, so is DDR-2.

GDDR and friends are not differential.

XDR has 16 differential pairs at 3.2 Gbps per pin (and up.)
GDDR4 has 32 pins at 2.2-2.8 Gbps per pin.
For now, GDDR4 has a slight bandwidth advantage per chip, but the bus is harder to route.

GDDR4 is/will be high volume and thus cheaper. In the end, that's all that counts.

psurge · Nov 21, 2006

silent_guy, what are your thoughts on XDR2? And with regards to volume - if either NV or ATI were to adopt XDR1/2, wouldn't that pretty much turn it into a volume product (who else besides ATI/NV are using GDDRx)?

XDR sometime?

Similar threads