No topics on new Rambus graphic memory yet?

Well, I'm not exactly one of them, but when I read things like "...and is on track to deliver the actual products in 2005 or 2006", I tend to go "meh.". My guess is that we can consider ourselves lucky if we see anything of this on the second generation of SM4.0 class cards, and who knows what might happen until then. (*cough* TBDR *cough*)
 
The article doesn't really say much about the new chips but, FWIW, I believe that in the past with Rambus you got very high data rates but the latency was also very high. This may be tolerable if you have enormous data caches (eg CPUs) but is not so great for graphics chips.

All IMHO of course.
 
From what I understand, RAMBUS actually has decent latency, much of the time, but it powers down chips that aren't currently in use, so that they don't burn up (in the PC implementation, only one chip per RIMM is every in use at once). The penalty from powering up and powering down the chips is very significant.

So, if I'm remembering correctly, it's not that the latency is huge with RAMBUS, it's that it's erratic. And on a GPU, which stripes data across all memory chips to maximize bandwidth usage, well, I just don't think it'd be feasible.

And higher and higher frequencies can only go so far. We're really getting close to the physical limits of the frequencies used in modern computers. The next step won't be higher frequencies, but memory closer to the chips (i.e. on packaging or on die).
 
DDR does not have a latency of 2ns, that's complete and utter nonsense.

As for power saving being an issue, that's for PC use where you could have one memory channel consisting of 32 devices (8 per side per RIMM - 2 RIMMs). An early RDRAM chip could draw up to 4W, and with a very liberal open pages memory controller the RIMMs would literally burn up unless power saving is used.

Does anyone know of any current graphics card with 32 memory devices on it? ;) 4-8 is standard today. That will not be a problem I predict. Besides, these things surely do not draw as much power today as they used to.
 
oh god
not hellbender again i thought (read hoped) he'd died or just plain given up.
always uninformed and masks this by giving 20 reasons for one thing...

/me is still bewildered at hellbender for the "im right about NV30" thing.
dude you got it wrong - give it up
 
Simon F said:
The article doesn't really say much about the new chips but, FWIW, I believe that in the past with Rambus you got very high data rates but the latency was also very high. This may be tolerable if you have enormous data caches (eg CPUs) but is not so great for graphics chips.

All IMHO of course.

The first genaration definitly had latency issues, but I think that was fixed in the second gen stuff. The other problem IIRC was that the interface for the first gen parts consumed quite a lot of area, not sue if this was subsequently fixed but the area may be less significant wrt todays monster GPU's anyway.

John
 
Guden Oden said:
DDR does not have a latency of 2ns, that's complete and utter nonsense.

well there are ofcourse 2ns ddr chips, but i guess you were refering to the fact that nanoseconds rating is not called "latency". anyways i think "latency" is generally an inappropriate word (and confuses ppl easily) :/
 
PoGGeh said:
oh god
not hellbender again i thought (read hoped) he'd died or just plain given up.
always uninformed and masks this by giving 20 reasons for one thing...

/me is still bewildered at hellbender for the "im right about NV30" thing.
dude you got it wrong - give it up

So instead of contributing to the thread.
You spam, insult and act condescending towards other members.
Be gone troll.
 
I don't see Rambus XDR having much of a change against GDDR-3 and GDDR-4 in terms of volume.

What I really don't understand is, with all the research poured into CELL with the goal of being in network equipment and such, Sony choose Rambus, yet RLDRAM is lightyears ahead of XDR in performance because the latency is so much lower and the read/write bandwidth is so high. So because Sony supports XDR, the market will have 3 alternitive designs: XDR, FCRAM, and RLDRAM. XDR is supposed to have a cost advantage because of its simple design allowing for a 4 layer PCB to be used. RLDRAM is going to be in more in demand because its cleary the performance champion for network applications, which should end up increasing its volume therefore lowering the price.

RLDRAM II Offers Full Bus Utilization

For years, an obvious distinction has existed between static random access memory (SRAM) and dynamic random access memory (DRAM). SRAM, comprised of six-transistor (6T) cells, accesses data more quickly but requires a large silicon area per bit. For example, in a 90nm device, a single cell typically spans approximately one square micron, however, can be accessed in just two or three clock cycles (depending on the perspective) at frequencies such as 250MHz. In contrast, mainstream DRAM is comprised of one-transistor (1T) cells yet has a relatively long data access time. DRAM's silicon area per bit is much smaller, typically, 0.065 square microns at 90nm, or 1/15th the area of SRAM. This difference in cell sizes increases with each process generation.

SRAM's last remaining advantage is its ability to randomly access the array. A typical high-speed SRAM is capable of a new random access every clock cycle. The question for DRAM becomes, can relatively slow DRAM mimic faster SRAM and thus eliminate SRAM's final advantage?

The inflection point in the evolution of memory technology is the introduction of a low-latency memory leveraging inexpensive DRAM cells that operate at very high frequencies (e.g. 533MHz clock), very fast request rates (e.g. 2ns), and reasonably good random access repeat rates (e.g. 15ns). This new memory, reduced latency DRAM (RLDRAM), in particular its second generation, RLDRAM II, bridges the gap between SRAM and traditional DRAM.

Overview of Low-Latency DRAMs

Currently, two DRAM families - fast cycle RAM (FCRAM) and RLDRAM - target low random access latencies and fast bank cycle times, though, in the opinion of the author, RLDRAM offers vastly more flexibility and covers a far broader range of practical applications.

The first generation of RLDRAM targets a 600MHz data rate and 25ns tRC. RLDRAM II adds several advanced concepts to numerous features already demonstrated in technologies such as QDR SRAM and DDR SRAM. Micron introduced the first instantiation of this architecture in August 2003. Although published device goals specified an 800MHz data rate and 20ns tRC, a 900MHz data rate and 16ns tRC were demonstrated.

Network switches, routers and line cards need better DRAM solutions. The "trick" is not offering bandwidth - many devices do - but the means to make it sustainable. RLDRAM II's reduced tRC enables higher data availability than standard DRAM.

This large, diverse segment can best be described with a concrete example. A high-definition television (HDTV) is a consumer electronics device under great pricing pressure. HDTV's price will fall rapidly as consumer demand develops, but presently its numerous memory buses make it too expensive for the average consumer. To make its price more attractive, one device must provide the scratchpad memory for all its software functions; act as shadow memory for code; satisfy the high scan rate, provide memory for processes such as decoding, standards conversion, etc. RLDRAM's low-latency/fast tRC is essential to satisfy these numerous, simultaneous demands. The 36-bit wide architecture is suitable for 3 x 12-bit color.

Performance Comparison Scenario


Click fig to enlarge
This operating scenario examines device sensitivity to the read:write ratio. For example at R:W of 4:1 RLDRAM would read from bank 0, then bank 1, then bank 2, then bank 3 and then write to bank 4 with banks always available. For DDR2, the same 4:1 ratio assumes that bursts come from the already open row: activate, read, read, read, read, precharge, activate, write, precharge. Assumptions are those most favorable to the device. Results are shown in the Fig. The curves in the figure are most easily distinguishable at a read:write ratio of 1:1. It is plain that at any given frequency, RLDRAM outperforms any competing solution. For highly data-streamed applications, GDDR3 is the performance winner, but for read:write ratios of 1:1, RLDRAM SIO with 4- or 8-word bursts comes out on top. Further, for most applications having read:write ratios of 2:1 or greater (and 1:2 or less), RLDRAM common I/O (CIO) devices outperform all other solutions, regardless of clock frequency. In this comparison we assumed availability of 333MHz FCRAMs, though this has yet to be introduced into the market.

Many other scenarios have been analyzed which for the sake of brevity cannot be discussed in detail. One such scenario is particularly interesting. The above discussion ignores most internal DRAM resource availability issues such as a bank being busy at the time of a request. When this is accounted for, it can be demonstrated that the 8-bank architecture of the RLDRAM presents additional performance advantages. In a particular scenario of 16-word read followed by 16-word write requests, RLDRAM at 4-word burst outperforms DDR2 SDRAM by 3.8x and outperforms FCRAM by 1.52x at the same clock frequency. This performance margin is extended when maximum frequencies are considered. Comparing 266MHz DDR2 and FCRAM with 400MHz RLDRAM, the RLDRAM outperforms DDR2 by 5.7x and outperforms FCRAM by 2.28x.

DDR2 SDRAM

1.8V HSTL-style receivers provide maximum system performance. They have tighter input specifications relative to Vref than SSTL_18 receivers with compatible voltage and Vref levels. Differential clocks should be employed with DDR2 devices since they cannot tolerate single-ended clocks like RLDRAM II can. The output clock situation is complicated and needs special care. QK/QKon RLDRAM differs from the optional output clocks RDQS/RDQS on some versions of DDR2. The bus controller should have low impedance drive capability, compliant to SSTL_18 standards if high loads are expected. Of course, simulations should be performed with the desired topology to verify signal integrity and termination requirements.

An important distinction for systems requiring error detection and/or correction is that RLDRAM is based on nine bits while DDR2 is based on eight. So a 72-bit wide bus requires four x18 RLDRAMs or four x16s plus one x8 DDR2 SDRAM. Four devices are far easier to route than five. RLDRAM performs its work in 17 clock cycles, whereas DDR2 requires 36 clock cycles before the command sequence can be repeated. This scenario is as favorable as a comparison gets for DDR2, because it is assumed that burst operations can continue for an already open resource, whereas for RLDRAM II, it is assumed that the next data comes from a bank that will be available. Statistically, the RLDRAM II assumption is more likely to be true than that of DDR2.

QDR SRAM

Having "invented" the QDR SRAM, we naturally considered its feature set when we created the RLDRAM SIO device. At the time, we intended RLDRAM to be used on QDR-style buses, especially when the required density cannot be achieved with SRAM; when the cost of SRAM is too high; and when the SRAM frequency is inadequate. RLDRAM is available today at 400MHz clock, whereas QDR SRAM is only available up to 250MHz. RLDRAM actually outperforms QDR SRAM when data can be so ordered as to have sufficient availability. This is quite feasible if data is "chunked" into larger groups. Some systems' performances are limited entirely by the random command repeat rate. From that standpoint, with a tRC of 20ns, RLDRAM is equivalent to a 50MHz 4-word burst QDR SRAM. If, however, larger data groups can be used, the tRC limit fades. QDR II needs no assumptions; it will always respond to random requests. The comparison assumes availability of 300MHz QDR II SRAMs, which, so far, do not exist, and shows the slower 300MHz RLDRAMs, although faster parts have been produced. The challenge to the designer is data ordering and request constraining such that full bus utilization can be sustained. If achieved, the rewards are dramatic: 100% bus utilization, increased frequency, and lower cost.

RLDRAM II offers many advantages, including the industry's fastest tRC (16ns to 20ns), full bus utilization at 2-, 4-, and 8-word data burst lengths, and the lowest bus turnaround of any memory device previously produced. It is available in x9, x18, and x36 versions. The RLDRAM SIO version is the only lower-cost alternative to QDR SRAM. An SIO permits 100% bus utilization in situations having balanced (or nearly balanced) read-to-write ratios. The device has a flexible 1.5V/1.8V I/O. The outputs are impedance controlled for wide support of different system and loading topologies. On-die termination provides clean high-frequency operation. RLDRAM is optimized for the lowest system cost. (It also offers the lowest system power consumption, owing to its low 1.8V core voltage, high internal segmentation, high bank count, and smaller data access sizes, as compared with DDR2, for example.) It is scalable to higher frequencies and lower tRC values.

by J. Thomas Pawlowski, Senior Director of Architecture Development, Micron Technology, Inc, USA

(May 2004 Issue, Nikkei Electronics Asia)

http://neasia.nikkeibp.com/nea/200405/mspe_305291.html
 
Haven't they given up on decreasing latency starting with DD2/GDDR2+ so they can focus on throughput? If so XDR and GDDRx might have comparable latencies.

The first genaration definitly had latency issues, but I think that was fixed in the second gen stuff. The other problem IIRC was that the interface for the first gen parts consumed quite a lot of area, not sue if this was subsequently fixed but the area may be less significant wrt todays monster GPU's anyway.

John

The concern was the lost space on the memory side not the cpu side IRC. And then there was the issue of paying Rambus a royalty that had everyone calling their attorneys. :devilish:
 
Rambus's latency is dependent on whether on not the memory is soldered directly on the board or installed in modules, since graphics boards don't use modules the latency will be reduced greatly. IIRC when rambus initilizes it finds out the latency to the ram furthest away from the controller and sets the latency of all chips to this value. Since embedded ram can be placed both closer to the controller and closer to each other the latency might even wind up lower than DDR.
 
Back
Top