I found this interesting link relating to GDDR3 vs XDR issues.
http://www.edn.com/article/CA629315.html?ref=nbra
It seems that XDR's characteristics are very different from GDDR3 and it can work much more efficiently than DDR when going through a number of buffered links (as in Cell's ringbus). In fact XDR seems to have been selected for Cell for this very reason. I am wondering whether the assumptions about latency issues preventing RSX from using the XDR RAM via the Flex i/o interface and Cell for texturing in the PS3 based on GDDR3 latency considerations are valid for XDR and the Flex i/o interface.
The Flex i/o interface is designed to connect two Cell chips together without any additional logic or more Cell chips with additional logic, and so it should be similar in nature to the XDR data paths rather than the massively parrallel GDDR3 interface. Why step it out to GDDR3 width and then narrow it back to XDR/Cell ringbus data width again to connect Cells together?
http://www.edn.com/article/CA629315.html?ref=nbra
Although servers and supercomputers clearly benefit from high-capacity-memory systems, designers of these and other products must determine an optimal means to obtain added capacity. Adding devices to memory subsystems such as those in servers or graphics cards is conceptually straightforward regardless of the memory technology. DDR, DDR2, and GDDR devices are capable of multidrop topologies with certain limitations.
Multidrop topologies in a memory system are those in which each link of the data bus connects to more than one DRAM device. For DDR2 systems, you can connect as many as four devices on each data link. Because GDDR-family devices usually have higher peak data rates, signal-integrity issues typically prevent more than two connections per link, and even then only if the devices reside in close proximity, such as back to back on opposite sides of a pc board.
XDR offers a slightly different approach to scaling capacities. Although the address and command bus is a multidrop configuration, it can connect to 36 devices in sequence on a channel. One reason the XDR address channel supports more devices than DDR links is that DDR multidrop connections are usually stub topologies instead of sequential connections. Stubs generate reflections, which degrade signal quality; with sequential connections, you can electrically compensate for the added capacitive loading of each device along the channel, thereby minimizing impedance discontinuities and their resulting reflections.
Each data link in an XDR system, however, is routed point to point; that is, each data link connects to only one port on the DRAM and one port on the host controller. However, XDR DRAM devices are programmable in width; for example, you can program a ×16 DRAM to act like a ×8, ×4, or ×2 device. Low-capacity systems program each DRAM wider, with more links connecting to each device. Adding capacity merely involves programming the devices to be narrower and connecting fewer data links to each device.
A 32-bit XDR interface can support as little as 64 Mbytes and as much as 1 Gbyte of memory.
However, peak bandwidth is not the only parameter to consider when optimizing for bandwidth. Remember that efficiency refers to the percentage of a memory system's total aggregate bandwidth that a controller can actually use. Fewer banks and a higher tRC (row-cycle time) in a DRAM device yield more frequent bank conflicts. Bank conflicts drastically reduce the efficiency of a memory system by forcing potentially long periods of inactivity on the data bus. Write-to-read and read-to-write turnarounds also require long periods of inactivity on the data bus.
Memory systems can experience reduced efficiency even with the data bus active 100% of the time. To keep internal pipelines full, DRAM devices implement a feature called prefetch, which allows the DRAM core to run slower than the DRAM interface. The prefetch of a DRAM technology essentially determines how much data transfers for any given transaction, commonly referred to as access granularity.
In the above example, GDDR3 implements a prefetch of four, and a single transaction would therefore yield 32 bytes of data in a configuration that allows for fine access granularity. Graphics processors work largely with units called triangles, and, as graphics-processor generations mature, each triangle decreases in bytes, because smaller triangles yield more realistic rendered images. A transfer of 32 bytes may be more than necessary to access the triangles needed for a process. For example, if only one 4-byte triangle were necessary from memory, 28 bytes of the corresponding access would go to waste. Even though the bus is active during the entire transfer, the efficiency of the transfer significantly reduces. New memory technologies, such as XDR2, are emerging to further enhance bandwidth efficiency for various applications. Figure 1 shows the effective triangle transfer rate versus triangle size for GDDR3 at 1.6 GHz and XDR2 at 8 GHz. Designers optimizing their memory system for bandwidth must consider both peak bandwidth and efficiency to get the best performance out of their host processor.
It seems that XDR's characteristics are very different from GDDR3 and it can work much more efficiently than DDR when going through a number of buffered links (as in Cell's ringbus). In fact XDR seems to have been selected for Cell for this very reason. I am wondering whether the assumptions about latency issues preventing RSX from using the XDR RAM via the Flex i/o interface and Cell for texturing in the PS3 based on GDDR3 latency considerations are valid for XDR and the Flex i/o interface.
The Flex i/o interface is designed to connect two Cell chips together without any additional logic or more Cell chips with additional logic, and so it should be similar in nature to the XDR data paths rather than the massively parrallel GDDR3 interface. Why step it out to GDDR3 width and then narrow it back to XDR/Cell ringbus data width again to connect Cells together?