Embedded DRAM: When Absolutely Nothing Else Will Do!
One of the most impressive papers I saw at ISSCC 2001 described a 3D graphics engine that integrated 32 Mbyte of high performance embedded DRAM. This device is apparently a development vehicle for a next generation graphics platform from Sony, possibly a follow on product to the Playstation 2. The device, developed in conjunction with United Memories Inc. of Colorado, is quite ambitious even for the 0.18 um, 5 level metal process used. It consists of two groups of eight 15.75 mm2 16 Mb DRAM macros arranged around a common central region containing a 3D graphics engine. The DRAM macro is distinguished by its separate 256 bit wide read and write data paths that operate in dual data rate (DDR) mode up to 714 MHz for a data transfer rate of 1.43 Gbps per macro pin. Thus each macro provides a peak bandwidth of almost 46 Gbyte/s of read or write bandwidth.
The entire device incorporates 16 DRAM macros and operates them at 500 MHz for a peak overall bandwidth of 512 GB/s of read or write bandwidth. The DRAM macros also include the capability of performing simultaneous read and write operations. To avoid separate read and write column address paths into the macro, a special first-in, first-out (FIFO) column address buffer is used to store addresses for future write operations. This permits straightforward implementation of read-modify-write (RMW) cycles as shown in Figure 8.
Figure 8 Sony Graphics Engine Embedded DRAM read-modify-write Feature
The use of the read-modify-write feature allows the device to approach 1000 GB/s of effective merged pixel fill bandwidth. This capability shows what a powerful tool architected memories like custom embedded DRAM is for system designers targeting specific applications like 3D graphics. This contrasts sharply with discrete memory solutions that utilize commodity DRAM devices. Not only do commodity DRAMs lack separate read and write data paths, but they also impose a read-to-write and write-to-read bus turn around penalty that can significantly reduce their effective bandwidth from its peak value. It would nominally take 320 direct Rambus channels (requiring about 15,000 signal, power, and ground pins!) to match just the read bandwidth of the Sony device. The bandwidth of the Sony device compared to several other game consoles and 3D graphics engines for PC applications is shown in Figure 9.
Figure 9 Comparison of 3D Rendering Bandwidth
It is obvious from just the size (252 mm2) and power (8 W for reads, 11 W for writes, and 18 W for RMW cycles) of the 16 embedded DRAM macros in this device that it is entirely unsuitable for high volume production, especially for a consumer electronics device like a game console. Perhaps Sony can incorporate the current 0.18 um device into workstation graphics applications and await a future shrink to a 0.13 um process to bring it into the cost realm of discretionary consumer products.