RLDRAM II Offers Full Bus Utilization
For years, an obvious distinction has existed between static random access memory (SRAM) and dynamic random access memory (DRAM). SRAM, comprised of six-transistor (6T) cells, accesses data more quickly but requires a large silicon area per bit. For example, in a 90nm device, a single cell typically spans approximately one square micron, however, can be accessed in just two or three clock cycles (depending on the perspective) at frequencies such as 250MHz. In contrast, mainstream DRAM is comprised of one-transistor (1T) cells yet has a relatively long data access time. DRAM's silicon area per bit is much smaller, typically, 0.065 square microns at 90nm, or 1/15th the area of SRAM. This difference in cell sizes increases with each process generation.
SRAM's last remaining advantage is its ability to randomly access the array. A typical high-speed SRAM is capable of a new random access every clock cycle. The question for DRAM becomes, can relatively slow DRAM mimic faster SRAM and thus eliminate SRAM's final advantage?
The inflection point in the evolution of memory technology is the introduction of a low-latency memory leveraging inexpensive DRAM cells that operate at very high frequencies (e.g. 533MHz clock), very fast request rates (e.g. 2ns), and reasonably good random access repeat rates (e.g. 15ns). This new memory, reduced latency DRAM (RLDRAM), in particular its second generation, RLDRAM II, bridges the gap between SRAM and traditional DRAM.
Overview of Low-Latency DRAMs
Currently, two DRAM families - fast cycle RAM (FCRAM) and RLDRAM - target low random access latencies and fast bank cycle times, though, in the opinion of the author, RLDRAM offers vastly more flexibility and covers a far broader range of practical applications.
The first generation of RLDRAM targets a 600MHz data rate and 25ns tRC. RLDRAM II adds several advanced concepts to numerous features already demonstrated in technologies such as QDR SRAM and DDR SRAM. Micron introduced the first instantiation of this architecture in August 2003. Although published device goals specified an 800MHz data rate and 20ns tRC, a 900MHz data rate and 16ns tRC were demonstrated.
Network switches, routers and line cards need better DRAM solutions. The "trick" is not offering bandwidth - many devices do - but the means to make it sustainable. RLDRAM II's reduced tRC enables higher data availability than standard DRAM.
This large, diverse segment can best be described with a concrete example. A high-definition television (HDTV) is a consumer electronics device under great pricing pressure. HDTV's price will fall rapidly as consumer demand develops, but presently its numerous memory buses make it too expensive for the average consumer. To make its price more attractive, one device must provide the scratchpad memory for all its software functions; act as shadow memory for code; satisfy the high scan rate, provide memory for processes such as decoding, standards conversion, etc. RLDRAM's low-latency/fast tRC is essential to satisfy these numerous, simultaneous demands. The 36-bit wide architecture is suitable for 3 x 12-bit color.
Performance Comparison Scenario
Click fig to enlarge
This operating scenario examines device sensitivity to the read:write ratio. For example at R:W of 4:1 RLDRAM would read from bank 0, then bank 1, then bank 2, then bank 3 and then write to bank 4 with banks always available. For DDR2, the same 4:1 ratio assumes that bursts come from the already open row: activate, read, read, read, read, precharge, activate, write, precharge. Assumptions are those most favorable to the device. Results are shown in the Fig. The curves in the figure are most easily distinguishable at a read:write ratio of 1:1. It is plain that at any given frequency, RLDRAM outperforms any competing solution. For highly data-streamed applications, GDDR3 is the performance winner, but for read:write ratios of 1:1, RLDRAM SIO with 4- or 8-word bursts comes out on top. Further, for most applications having read:write ratios of 2:1 or greater (and 1:2 or less), RLDRAM common I/O (CIO) devices outperform all other solutions, regardless of clock frequency. In this comparison we assumed availability of 333MHz FCRAMs, though this has yet to be introduced into the market.
Many other scenarios have been analyzed which for the sake of brevity cannot be discussed in detail. One such scenario is particularly interesting. The above discussion ignores most internal DRAM resource availability issues such as a bank being busy at the time of a request. When this is accounted for, it can be demonstrated that the 8-bank architecture of the RLDRAM presents additional performance advantages. In a particular scenario of 16-word read followed by 16-word write requests, RLDRAM at 4-word burst outperforms DDR2 SDRAM by 3.8x and outperforms FCRAM by 1.52x at the same clock frequency. This performance margin is extended when maximum frequencies are considered. Comparing 266MHz DDR2 and FCRAM with 400MHz RLDRAM, the RLDRAM outperforms DDR2 by 5.7x and outperforms FCRAM by 2.28x.
DDR2 SDRAM
1.8V HSTL-style receivers provide maximum system performance. They have tighter input specifications relative to Vref than SSTL_18 receivers with compatible voltage and Vref levels. Differential clocks should be employed with DDR2 devices since they cannot tolerate single-ended clocks like RLDRAM II can. The output clock situation is complicated and needs special care. QK/QKon RLDRAM differs from the optional output clocks RDQS/RDQS on some versions of DDR2. The bus controller should have low impedance drive capability, compliant to SSTL_18 standards if high loads are expected. Of course, simulations should be performed with the desired topology to verify signal integrity and termination requirements.
An important distinction for systems requiring error detection and/or correction is that RLDRAM is based on nine bits while DDR2 is based on eight. So a 72-bit wide bus requires four x18 RLDRAMs or four x16s plus one x8 DDR2 SDRAM. Four devices are far easier to route than five. RLDRAM performs its work in 17 clock cycles, whereas DDR2 requires 36 clock cycles before the command sequence can be repeated. This scenario is as favorable as a comparison gets for DDR2, because it is assumed that burst operations can continue for an already open resource, whereas for RLDRAM II, it is assumed that the next data comes from a bank that will be available. Statistically, the RLDRAM II assumption is more likely to be true than that of DDR2.
QDR SRAM
Having "invented" the QDR SRAM, we naturally considered its feature set when we created the RLDRAM SIO device. At the time, we intended RLDRAM to be used on QDR-style buses, especially when the required density cannot be achieved with SRAM; when the cost of SRAM is too high; and when the SRAM frequency is inadequate. RLDRAM is available today at 400MHz clock, whereas QDR SRAM is only available up to 250MHz. RLDRAM actually outperforms QDR SRAM when data can be so ordered as to have sufficient availability. This is quite feasible if data is "chunked" into larger groups. Some systems' performances are limited entirely by the random command repeat rate. From that standpoint, with a tRC of 20ns, RLDRAM is equivalent to a 50MHz 4-word burst QDR SRAM. If, however, larger data groups can be used, the tRC limit fades. QDR II needs no assumptions; it will always respond to random requests. The comparison assumes availability of 300MHz QDR II SRAMs, which, so far, do not exist, and shows the slower 300MHz RLDRAMs, although faster parts have been produced. The challenge to the designer is data ordering and request constraining such that full bus utilization can be sustained. If achieved, the rewards are dramatic: 100% bus utilization, increased frequency, and lower cost.
RLDRAM II offers many advantages, including the industry's fastest tRC (16ns to 20ns), full bus utilization at 2-, 4-, and 8-word data burst lengths, and the lowest bus turnaround of any memory device previously produced. It is available in x9, x18, and x36 versions. The RLDRAM SIO version is the only lower-cost alternative to QDR SRAM. An SIO permits 100% bus utilization in situations having balanced (or nearly balanced) read-to-write ratios. The device has a flexible 1.5V/1.8V I/O. The outputs are impedance controlled for wide support of different system and loading topologies. On-die termination provides clean high-frequency operation. RLDRAM is optimized for the lowest system cost. (It also offers the lowest system power consumption, owing to its low 1.8V core voltage, high internal segmentation, high bank count, and smaller data access sizes, as compared with DDR2, for example.) It is scalable to higher frequencies and lower tRC values.
by J. Thomas Pawlowski, Senior Director of Architecture Development, Micron Technology, Inc, USA
(May 2004 Issue, Nikkei Electronics Asia)