New Memory Interface for R5x0...

DaveBaumann said:
The point about the die size is that, while process shrinks the pads don't - a slightly smaller physical die size doesn't lend itself to more connections (hence pads), so I would guess at external physical bus widths being similar.

However, while the die size is said to be slightly smaller, at 90nm things are relatively going to be much larger (no# of transistors in relation to die size). Something you might want to think about is what issues there might be with the current memory busses in chips as the complexity, hence relative sizes increase.

IIRC if you the core size goes down ~50% if transistor count remains the same when moving from 130nm to 90nm
 
If ati pushed a 512 bit bus, I would also be surprised seeing as they're still using 128 for the low end. And I'm guessing that was a cost-based decision?
 
When Dave objected against 512bit, I thought the same as he said later. Smaller core doesn't jive with adding lots of pins.

And then I went in the other direction. First thought was Yellowstone (as Mariner said), but that interface never made it AFAIK. At least not to any memories. But RAMBUS now tries to sell another bus called XDR DRAM. I don't think it's just a rebranded Yellowstone.

However I doubt that that's it. Not realy for any technical reasons, but because RAMBUS isn't "kosher". (How's that for a good reason. :D)

I believe more in Xmas' proposition. "Advanced memory interface" == "full memory virtualization".
 
How about dual 4 channel controlers each servicing 1/2 the Quads.

How about a much bigger on die Cache servicing the controler.
 
Which of course makes the top end design scalable to the midrange and lowrange and far as functionality goes.

1/2 the Quads and one Controller for the mid range.

1/4 the Quads and one Controller for the low end.
 
Wouldn't full memory virtualization cut out the need for two dedicated sets of memory for individual processors on a multi-processor graphics board?

If that were the case it would make building a dual-core board far more cost effective and a much simplified layout. It would also certainly be more cost effective than the costs involved in an SLI setup, even if your system already had an SLI capable motherboard.
 
Something to consider is that the memory bus is not just about feeding pixels - there are many different "clients" within a chip...
 
In regards to the people talking about on-die cache, it's possible they could use the Mosys (?) embedded ram like ArtX/ATI used in the Gamecube. Their engineers already have experience with it and if memory serves, the Mosys design used a single transistor to create an SRAM cell instead of the standard three transistors. I dunno how the Mosys design is performance wise these days compared to other options. Now... back to my quiet B3D lurking.
 
DaveBaumann said:
Something to consider is that the memory bus is not just about feeding pixels - there are many different "clients" within a chip...
Are they building a little CPU in their GPU? :|
 
My guess is GDDR-4. Last time around ATI worked with Micron sorting out the issues with GDDR-3, and since then ATI got the X-Box 2 contract which included them designing the memory interface.

The GDDR-4 ecosystem might reflect a lot of Rambus XDR positive qualities.
 
DaveBaumann said:
Something to consider is that the memory bus is not just about feeding pixels - there are many different "clients" within a chip...

Well there are a couple ways to interpret this:

A move away from unified memory for vertex, textures, and buffers.

Enhancements to the internal interfaces between the various on-chip requesters and the memory controller. Maybe something to deal with the NxM complexity issue.

Maybe breaking the memory into difference interfaces using different dram technology. There is little point in buying 1GB of ultra high speed dram just to handle textures.

Different memory controller logic employing deep re-ordering capabilities

Different memory technology, increasing bandwidth per pin.

In the end I don't see them breaking up the memory space into seperate interfaces and part because of the complications that can entail.

Aaron Spink
speaking for myself inc.
 
Lots of transistors and roughly the same number of pins. Unless they got some of Intel's BBUL magic.

So, I'm thinking maybe an increment in the memory used, GDDR-4. And possibly a memory array on die.

As for clients, that could very well be a reference to the framebuffer.
 
aaronspink said:
A move away from unified memory for vertex, textures, and buffers.

Enhancements to the internal interfaces between the various on-chip requesters and the memory controller. Maybe something to deal with the NxM complexity issue.

Maybe breaking the memory into difference interfaces using different dram technology. There is little point in buying 1GB of ultra high speed dram just to handle textures.

Different memory controller logic employing deep re-ordering capabilities

Different memory technology, increasing bandwidth per pin.

In the end I don't see them breaking up the memory space into seperate interfaces and part because of the complications that can entail.
I hope they use magnetic transistors (?) to achieve this.
 
pc999 said:
Can be this ?

http://theinquirer.net/?article=18556


I know it is The Inquirer but...

Thats good speculation.

The modules consist of two DDR components, one running at normal speed and the other with a 90-degree delay clock. Each DRAM is linked to a FET, which acts as a switch so that two clock-staggered DRAM components can together spit out 4 bits of data for every 10-nanosecond clock cycle. In effect, QBM delivers 4 bits per I/O cycle instead of 2 bits with normal DDR and 1 bit with standard SDRAM.

Rambus Inc. has already shown how it can pump out 4 bits per clock cycle with its future Quad Rambus Signaling Level, but Kentron's approach is fundamentally different. Rambus uses a chip interface technology that splits the voltage so that bits are represented in four voltage increments at every clock. QBM relies on the use of an external switching mechanism to toggle between two devices so that one bit comes out every quarter clock cycle.

"Instead of packing bits on the voltage, we're trying to pack bits on the time access and pack them tighter in one clock cycle," said Badawi Dweik, applications engineering manager for Kentron.

That feature could help DDR compete against current Rambus DRAMs for main memory for Pentium 4 processors, which use a quad-pumped interface, said InQuest analyst Bert McComas, who is organizing the Platform Conference.

"The 200-MHz (effective) DDR is extremely convenient because it yields the exact same front-side bus as the Pentium 4. So you can use the slowest DDR and apply QBM to get 3.2 Gbytes/s. Otherwise you would need 128-bit-wide DDR or two channels of Rambus," McComas said.

Skeptics may question the efficacy of running the memory with such tight timing parameters, McComas said, but that could be solved by using DDR parts that are rated faster than they are actually run in the system. "It's not like they can't specify a 266-MHz to run at 200 by 2. That will buy them a lot more guardbanding," he said.

Aside from better speed, the FETs serve to reduce the capacitive load of the system, which becomes more stressed with faster bus speeds and higher memory densities. "The FET switches allow you to access only the memory device that needs to be accessed," Goodman said.

QBM can also be applied to a 128-bit memory bus and can serve to double the data rate of SyncFlash flash memory devices, he added.

http://www.eet.com/story/industry/semiconductor_news/OEG20010111S0021

So this Quad Band Technology is different from how Rambus achieves high data rates, and it can be applied to flash memory. There has already been speculation that the Xbox 2 will have a flash drive instead of a hard drive. Right now with XDR, Rambus is at an octal rate.
 
i rly am curious as to what the r520 will actually end up offering. i dont think its nearly as big of a jump as alot of people seem to think.
 
cristic said:
Couldn't a small amount of embedded ram be used as a cache for pixel and vertex shader programs ? I.e. minimizing the penalty of loading them from onboard memory... i mean a shader are smaller then textures.

I think they already have that kind of caches, made of sram.

pixel and vertex shaders are really small - we are talking about 1024 instructions limits, if one instruction is 32 bits the whole "big program" takes 4k. AFAIK some "avarage program" still is something like 10-20 instructions -> 40 - 80 bytes.
 
In an interview with Engineering Times web-site Badawi Dweik, Kentron’s applications engineering manager, said the company was planning to release QBM-2 specification and modules in the first half of 2005, bringing high-speed DDR2 800MHz options into the market. In order not to depend on third-party chipset makers, Kentron plans to install supporting logic straight on memory modules.

http://www.xbitlabs.com/news/memory/display/20040629105414.html

So QBM-2 is scheduled for 2005, which is based on DDR-II.



QBM-roadmap.gif
 
Back
Top