NVidia DDR - II - what speed and bandwidth will it have?

And if the NV30 uses a 128-bit bus, it seems reasonable to assume that their next part will double that to 256bits, and will be designed around the possibilities such a memory subsystem will offer. It's a technological knee, and I suspect many will prefer to shop after nVidia is past it, rather than before. A factor of two in bandwidth is nothing to sneeze at, and should allow for a corresponding increase in performance, if the rest of the design is balanced to take advantage of it.


Good enough reason for me to hold off on the NV30 and see what the NV35 has. Now, I am not for endless waiting for the next leap in technology. if you know there is a better chip around the corner, and wait, then wait, then wait, you'll never have anything. However in this case, being very interested in the NV3X architechure, I would just get the refresh in the spring if it's going to have 256-bit bus, and further refinments.
 
256 bit bus was just what the doctored ordered..especially DDR II powered 256 bit Bus R300's.

High resolution gaming with eye candy cranked.
 
But there's no reason why you can't do the same on a 128-bit bus.

After all, consider this. With DDR2, would you agree that the NV30 could match the R300 if neither FSAA nor anisotropic were used? It should easily be able to do that.

Then, if you look at anisotropic filtering, that doesn't take up so much memory bandwidth as fillrate, so there's no reason why the NV30 still couldn't perform well here.

The real question comes when you're talking about FSAA. But it is still possible to equal (or outperform) the R300's FSAA performance on a 128-bit bus.

Here's on way it might be done:

Every time a triangle completely covers a pixel, only one color value is written, and only one z-value with a direction (similar to Z3...32-bit z + 32 bits for direction). You'd probably have to have an 8-bit stencil elsewhere in memory for this to work.

This would make it so that every FSAA mode, in an ideal circumstance, would take less memory bandwidth than 2x FSAA. In a real-world circumstance in today's games, I doubt even 8x FSAA would usually take more memory bandwidth than 2x FSAA.

You should open up to the possibility that it's more about how smart the architecture is than about the raw bandwidth available.
 
Chalnoth said:
Every time a triangle completely covers a pixel, only one color value is written, and only one z-value with a direction (similar to Z3...32-bit z + 32 bits for direction). You'd probably have to have an 8-bit stencil elsewhere in memory for this to work.

This would make it so that every FSAA mode, in an ideal circumstance, would take less memory bandwidth than 2x FSAA. In a real-world circumstance in today's games, I doubt even 8x FSAA would usually take more memory bandwidth than 2x FSAA.
You're losing me here: How can every FSAA mode take less bandwidth than 2x FSAA? 2x might be compressable, too.
You should open up to the possibility that it's more about how smart the architecture is than about the raw bandwidth available.
Good thing the R300 was designed intelligently and has lots of bandwidth.
 
OpenGL guy said:
You're losing me here: How can every FSAA mode take less bandwidth than 2x FSAA? 2x might be compressable, too.

Sorry, meant less bandwidth than uncompressed 2x.

You should open up to the possibility that it's more about how smart the architecture is than about the raw bandwidth available.
Good thing the R300 was designed intelligently and has lots of bandwidth.

I agree. The R300 is very good with its memory bandwidth (it does more with it than the GeForce4 Ti 4600 does, which is impressive).

But, that doesn't mean that it's not possible to do better.
 
Chalnoth said:
You never know.

The NV30 just might show that a 256-bit bus is indeed overkill...

First off, the NV30 might do very well indeed with a fast 128bit memory system, but that is not to say that the same design decision would have been optimal for, say, the R300.

Second, define "overkill". It would seem unlikely that the NV30 would never be limited by available bandwidth. It's probably more a question of in what circumstances it is limited, how often these are likely to be encountered, and how much of an impact a higher bandwidth solution would have in those cases.

Another design may choose to make bandwidth limiting more rare. There is a continous spectrum between "Always bandwidth limited" and "Never bandwidth limited". A manufacturer may choose to place the design anywhere between these poles. If it hadn't been for cost we would stick to "never limited" but the history of gfx-cards so far has been closer to the other pole, trying to exploit the available bandwidth fully. Is a design that runs into bandwidth limiting less severely than another "overkill"?

Entropy
 
umm, I'm clueless, need DDR-II info

so DDR-II will debut at 500mhz actual clock speed? wow I wonder how much higher they'll be able to take that.
 
I'm still not clear as to the difference between DDR and DDR II, and I'm trying to hash it out here (guess which one is me ;) ). Is DDR II simply DDR with more transferred per clock? If so, then is it the equivalent of QDR (I)?

And if DDR II is, in fact, QDR, then are we looking at a NV30 with more bandwidth than R300?

300 * 2 * 256 / 8 = 19,200 MB/s
vs.
225 * 4 * 128 / 8 = 14,400 MB/s OR 450 * 4 * 128 / 8 = 28,800 MB/s

Or is DDR-II just DDR retooled to work at higher frequencies?

(Let's see if I'm being too impatient, and if someone will answer at TR before they do here. BTW, Wavey, is that you posting as AG Dave?)
 
Entropy said:
Another design may choose to make bandwidth limiting more rare. There is a continous spectrum between "Always bandwidth limited" and "Never bandwidth limited". A manufacturer may choose to place the design anywhere between these poles. If it hadn't been for cost we would stick to "never limited" but the history of gfx-cards so far has been closer to the other pole, trying to exploit the available bandwidth fully. Is a design that runs into bandwidth limiting less severely than another "overkill"?
That's a good point.

Traditionally, most cards have been bandwidth limited: Unable to achieve their maximum fill rate except under the most simple of situations. Take the GeForce 1. It could do 4 pixels per clock with 120 mhz engine clock. The DDR version had a clock speed of 150 mhz. That gives you a maximum bandwidth of 4.8 GB/s. Now, let's say that you are trying to hit the maximum fillrate with 32-bit color and 32-bit Z enabled. Assume a Z pass ratio of 50%. This gives us: 120* (4 (pixels per clock)* 0.50 + 4 (Z reads) + 4 (Z writes) * 0.5) * 4 (bytes per pixel) = 3.8 GB/s. Thus, the GeForce 1 wasn't bandwidth limited in this scenario. Toss in texturing and TnL and things might change, but I won't delve into that.

Now, look at the GeForce 2 GTS. The engine was clocked at 200 mhz (very good for the time) and the DDR memory clock was 166 mhz. Also added was a second TMU per pipe, but I won't consider that here. As you can see, the engine clock increased by 66% yet the available bandwidth only increased by 10.7%! There's not a chance in the world that the GeForce 2 wasn't bandwidth limited in many cases.

This is why companies have gone towards bandwidth saving techniques such as tile-based rendering (it's probably better to call it "infinite planes rendering"), hyper-Z and others.
 
Well, that settles that. :) Thanks.

Edit: Do you mind explaining the differences between both SIO and CIO DDR II and Burst 2 and 4? TIA.
 
Pete said:
Edit: Do you mind explaining the differences between both SIO and CIO DDR II and Burst 2 and 4? TIA.
Lol, I am not a HW guy :)

However, I think the Burst 2 and Burst 4 simply specifies how much data you will get on a single burst. With Burst 4, you will get twice as much data on a single burst as Burst 2. This doesn't mean twice the bandwidth, it just means the burst will be sustained for twice as many cycles.
 
Burst 2 = Better for accessing many smaller pieces of data in separate places in memory.

Burst 4 = Better for accessing sequential data.
 
g__day said:
So its 1 GHz or better a pin * 32 pins of data / 8 bits per byte to give 4 GB/sec throughput. Meaning the bus must be at least 64 bits for address and data (assuming 32 pins for address).
Actually a 128mbit chip with a 32bit/word arrangement only needs 22 address pins. (128mbit / 32bit = 2^27 / 2^5 = 2^22 )
If you pair two chips to share one bus, this means one additional "chip select" bit. And of course some bits for read/write and burst modes...
 
Burst 2 = Better for accessing many smaller pieces of data in separate places in memory.

Burst 4 = Better for accessing sequential data.

Going to burst 4 is probably neccessary because it gives DDRII time to breath. The timing windows get smaller as MHz increases. Going to burst 4 will make sure the RAM gets its data in the allotted time period.
 
Umh..interesting discussion..however it seems a 256 bits data bus for the nv30 is directly confirmed by nVidia.
Quote from http://nvmax.com/Articles/Previews/NV30_SNEAK_PREVIEW/:
The only other point we can share is that NV30 will use a 256 Bit memory bus and DDR2 memory.
Unfurtunately no details are given on the memory subsystem.
It will be interesting to see how nv30 will make against r300. when it will be out both of them should employ DDR2 memories on a 256 bits wide data bus.

ciao,
Marco
 
So that sounds like NV30 will have a 32GB/sec memory bandwidth a 400MHz or better GPU and improved HSR algorithms. WoW
 
OK, so it'll be called the Eclipse, right? At those speeds, it'll pretty much block out the sun for other 3D cards. ;)
 
Back
Top