Are you ready ? (Nv 30 256 bit bus ?)

Looking back historically, it seems obvious that the largest increases in performance with new chips have come from two-fold (or more) increases in bandwidth.

i.e. Voodoo 1 -> Voodoo 2 = clock doubled (as well as extra TMU).
(Single) Voodoo 2 -> Voodoo 3 = clock (and bus) doubled
Voodoo 3 -> Voodoo 5 = Chips doubled
GeForce SDR -> GeForce DDR = Memory clock 'effectively' doubled
Radeon 8500/GF4Ti 4600 -> Radeon 9700 = bus doubled

I applaud NVidia if they manage to improve the efficiency of the renderer enough to more than double the performance of their previous generation of chips (even though it is expected they will have a 60% or so increase in bandwidth due to the faster DDR2).

ATI can, in turn, increase their memory bandwidth by up to a further 60% over the R9700 with (relatively) small alterations to their current chips. This would give them double the bandwidth of a 128-bit NV30 and require NVidia to have a very efficient architecture indeed to compete.
 
as Parhelia has shown bandwidth without balancing fillrate or occlusion optimisation isnt the only answer - the best results have been from a balance of both i.e the Gf2 come into its own with the Ultra as had fillrate to spare initially.
 
interestingly, when Kyro went with SDR, this was generally regarded as a Good Thing(TM) on this board. They didnt need bandwidth that DDR had to offer at this time.
Similarly, nV might not need bandwidth of 256-bit bus at this time. So why did they still go with DDR2 ? Maybe just because having 128-bit DDR2 on board ( vs 256 bit DDR1) was just much cheaper ?
What will R9700 do with extra bandwidth provided by DDR2 if the core cant keep up? high-res AA&AF performance improvements ? Maybe nV's AA & AF implementations aren't very bandwitdh dependent at all. Maybe thats what they mean with their "smarter pixels", that focus is shifting more to core performance, not raw mem bandwidth.
 
Does anybody really expect the NV30 to be a deferred renderer? The concensus seems to be that it will use an IMR with greatly improved efficiency so the question is, will it be efficient enough to overcome the assumed bandwidth deficiency as compared to the R9700.

As I understand it, all AA & AF implementations are bandwidth dependent to some extent so I expect there is a limit on how efficient they can make these two processes. I seem to remember somebody mentioning something about "colour buffer compression" - Is this a way in which they could get much improved efficiency? Similarly, if they can use 16bpp fp colour at twice the speed of 32bpp fp, is it possible that NV30 could run current games a great deal faster than many expect it will be able, albeit with reduced precision?

Kirk's PR-laden waffle aside, I expect we'll learn much more on 18th!
 
Mariner said:
Similarly, if they can use 16bpp fp colour at twice the speed of 32bpp fp, is it possible that NV30 could run current games a great deal faster than many expect it will be able, albeit with reduced precision?
Having 16bit FP for colour implies 16bits per component so it would require twice the storage/bandwidth of a "current" game that used 4x8bits for RGBA.

EDIT: This is, of course, assuming that you wanted to store these higher precision values in an external buffer.
 
16 bit fp is better termed 64 bit fp (4 channels x 16 bits/channel). Most current chips and games use 32 bit integer bit color (4 channels x 8 bits/channel), 16 bit integer color (average of 4 bits per channel) is not used much anymore.

Current software, which was designed for 32 bit integer color will most likely look the same (or perhaps slightly better depending on the computations) when running with 64 bit, 96 bit, or 128 bit fp color internally. I would think the differences between all the fp precisions on current software would be indistinguishable.

So if 32 bit integer color seems good enough why go to 64 bit fp, 96 bit fp, or 128 bit fp? The reason is for the future. Although pixel shaders do many color computations, their needs are really no different than vertex shaders. In both cases you need to perform the same geometric, vector, and lighting computations. In the past, the geometric and vector computations were done only at the vertices and then interpolated across the pixels for speed. However, this shortcut produces incorrect results and limits the kinds of computations that can be done. For photorealistic effects you need to be able to perform any type of geometric, lighting, or other physical computation at every pixel not just at the vertices. This requires that pixel shaders have the same instructions and floating precision as the vertex shaders. Some of the computations, especially the lighting, can get by with less precision (64 bit fp color is usually enough).

Ideally, vertex shaders and pixel shaders would both offer the same general floating point computation capabilities and precisions so that developers wouldn't be bothered with the differences.
 
SA said:
16 bit fp is better termed 64 bit fp (4 channels x 16 bits/channel). Most current chips and games use 32 bit integer bit color (4 channels x 8 bits/channel), 16 bit integer color (average of 4 bits per channel) is not used much anymore.

I thought 16 bit was usually 5 bits per color channel? Educate me in further detail if I'm missing something...

Current software, which was designed for 32 bit integer color will most likely look the same (or perhaps slightly better depending on the computations) when running with 64 bit, 96 bit, or 128 bit fp color internally. I would think the differences between all the fp precisions on current software would be indistinguishable.

Well, since I'm here, I'll say I think this is because they are limited intentionally to work with lower precision, not necessarily because the specific engine isn't capable of effects that would benefit from it. I'd think Quake III, for example, is an engine capable of being pushed further in this way. I still don't think benefits from higher precision need take long at all with any develepor already having done work with pixel shaders, so I don't think the "future" clause you mention later is all that far away (well, not much further away than API's offering the enhanced support anyways :-?).
 
SA said:
16 bit fp is better termed 64 bit fp.

I strongly disagree.
It would be consistent with the previous nomenclature, but the previous nomenclature was bad as well. 8bits or 16 bit fp or 32 bit fp per colour component is better. The reason being that 32/64/128 bit fp is nomenclature that has been used in computing for several decades. It is not likely to change either because a new kid on the block likes to brag with big numbers.

When you used 8 bit integers per colour, the system typically still handled 32-bit chunks of data, where the additional 8 bits could be used or ignored. With 32 bits per colour, this isn't necessarily the way you would always go about it. Should RGB without alpha be called 96 bit fp? What then of 4x24 bits fp, would that be another kind of 96 bit fp?

Use a nomenclature without ambiguity, and that is consistent with the meaning of the term in all other fields of computing. Enough of this silly numbers game. It smacks of old console wars.

Entropy
 
I did mean to indicate that I was talking about 16 bit fp per component but obviously didn't make myself clear at the time! :)

As SA indicates, this could be considered as accumulating to 64-bit altogether, so am I right in assuming this could be faster (though less accurate) than the 96-bit/128-bit used by R300?

If this were the case, this might go some way towards making up for the bandwidth inequalities between NV30 and R300 where the extra precision wasn't necessary.

Disclaimer: I haven't blown the dust off my copy of Computer Graphics by Foley, Van Dam et al since I left University over 7 years ago so I could be way off base with many of my assumptions due to stuff I have forgotten.
 
Question: Texture reads from memory for texture passes. How much information is exchanged when that occurs? Is it just the 32 bits (8 each for RGB + alpha) or is more information?
 
demalion said:
SA said:
16 bit fp is better termed 64 bit fp (4 channels x 16 bits/channel). Most current chips and games use 32 bit integer bit color (4 channels x 8 bits/channel), 16 bit integer color (average of 4 bits per channel) is not used much anymore.

I thought 16 bit was usually 5 bits per color channel? Educate me in further detail if I'm missing something...

"Average" is the key word here. 16 / 4 = 4.

demalion said:
I don't think the "future" clause you mention later is all that far away (well, not much further away than API's offering the enhanced support anyways :-?).

That API is available already ... in OpenGL at least.
 
Confusion about 16 bit colours probably arises from the 16 bit colours depth for 2D (desktop). It uses 5/6/5 or 5/5/5 (Mac).

Cheers
Gubbi
 
In the newly posted interview said:
In the past few generations of NVIDIA’s products, both bandwidth and computation affect performance. Overclocking either memory speed or core speed improves performance, so that means that sometimes rendering is limited by memory bandwidth, and other times rendering is limited by pipeline processing power. To truly move to the next level in terms of performance and features, we’ll need to increase both. A wider memory interface is one way to increase bandwidth, but there are other ways. As you mention, DDR2 is another way of building a higher throughput memory system. I think that DDR2 is going to be really exciting for the graphics community, since it brings the potential of more memory bandwidth per signal pin. This is a good trend, regardless of how many bits wide the datapath is!

There are costs associated with both increasing the datapath width and increasing the computational core. If you look at the new programmable features on the OpenGL and DirectX, you’ll see that a lot more floating point math is required, which requires both bandwidth and computation growth. Both of these will be increased in the next generation. We’ll move to 256bit when we feel that the cost and performance balance is right.

(Bolded by me)

I don’t think this is particularly open to interpretation.
 
Gubbi said:
Confusion about 16 bit colours probably arises from the 16 bit colours depth for 2D (desktop). It uses 5/6/5 or 5/5/5 (Mac).

Cheers
Gubbi

actually, if I remember correctly, back in 486/pentium times, there was 15 bit modes in some GFX cards (afaik, it was a part of VESA 2.0 standard even.) it was called 32K colors mode and it had 5/5/5 construction.
 
Mariner said:
I did mean to indicate that I was talking about 16 bit fp per component but obviously didn't make myself clear at the time! :)

As SA indicates, this could be considered as accumulating to 64-bit altogether, so am I right in assuming this could be faster (though less accurate) than the 96-bit/128-bit used by R300?
I believe you are confusing internal precision with external precision. Unless the application explicitly asks for a higher precision buffer (such as 64-bit or 128-bit), the extra precision is only kept internally. In other words, R300 takes absolutely no performance hit because of its internal 24 bits per component architecture.
 
Back
Top