NV30 vs R300?

What I think everyone should realize is Nvidia's basically on the spot now. Everyone is expecting their next part to be faster, so if it's not or if it's just barely faster or faster in some ways I think it will be considered a major embarassment for them. It's kind of an unenviable position that ATi has put them in and really, if you think about it, it's the first time they've been in this position (back vs 3DFX they were the underdog so they didn't have to worry about letting people down, kind of like ATi until now, then they were the leader and no one was challenging them).

Also, I really think people are giving a little too much credibility to what is being said by sources at Nvidia. These guys aren't exactly unbiased and of course will say the NV30 is going to be faster regardless of whether it actually is (maybe they don't even know yet).

Hopefully the NV30 will be faster, and then the R400 will be faster than it and so on. The thing is, the potential is there for either company to blow it at any time. And, like I said, Nvidia's on the spot. If they release the NV30 and it's good, then they'll put ATi on the spot again. But if they only release a card comparable to the R300 it's going to look bad for them and they'll basically have lost the R300 vs NV30 round (just because they came out later).

Of course all this is great for us, since we directly benefit from ATi taking the fight to Nvidia. The best thing that could honestly happen is if the NV30 is around the same speed as the R300, because then we'd have a price war on our hands! :)
 
Another possibility is that the NV30 will be featured loaded, unlimited everything flexibility, and have an even more amazing feature set than the R300, but be very slow.

Scenario:
NV30 = GeForce 1 SDR

In other words, NVidia introduces lots of new cool stuff, far beyond DX9, but won't be taken advantage of in games for 2 years. It's an uphill developer adoption issue, and everyone knows that the new cool features won't really perform as well as needed until the NV40.

Just a possibility.
 
Xmas said:
There's one reason why I don't think it's a deferred renderer.

Quote from What comes after 4? from NVidia
Yes so...
• Consider laying down the Z buffer first
• Draw your objects into front-to-back order
• But this isn’t a per-poly sort...
• This allows you to minimise the overall cost by not spending time on unseen pixels
They wouldn't recommend something like this if their upcoming architecture wouldn't benefit from it.

Full quote:
Code:
Isn’t Renderman shading slow?
• Yes so...
   • Consider laying down the Z buffer first
   • Draw your objects into front-to-back order
       • But this isn’t a per-poly sort...
• This allows you to minimise the overall cost by
not spending time on unseen pixels
• But this means you pass all the vertex data thru
the GPU twice per frame
   • And that means you need very fast vertex engines

Why couldn't a deferred renderer do this? I do recall something about the ImgTech implementation suffering from sorted front to back order, but why would every deferred renderer implementation suffer? My understanding of what constitutes deferred rendering is a bit hazy (and there is no glossary explanation, hint hint ;) ).

It basically seems to me that as opposed to taking every triangle (after transform and vertex shading) and filling it with textures/pixel shader program output and then lighting, it sorts all the triangles transforms all the way through then only processes lighting and texturing (and all the rest of we ever get a tiler that supports it) on triangles that weren't discarded in the sorting. IMRs have been adding technology between triangle and texturing/shading steps to approach the efficiency of the tile based renderer (if you'll believe one of the phrases used in a preview about "Early Z" the R300 is actually there, but I wouldn't bet money on it at the moment), so perhaps the definitions are hazy nowadays.

In any case, I fail to see, at this point why the above phrasing excludes any form of deferred rendering...except perhaps that they mention a Z-buffer...is it just that there is no way that term would be used for any deferred rendering system? If so, is there room for them to call it a Z-buffer, programmers to treat it is a Z-buffer, but for it to be implemented in a way that wouldn't preclude deferred rendering?

Or perhaps it was only the primitive processing Democoder mentions that tied them up for this long...but where is all that "3dfx mojo" then? :LOL: Or was 3dfx perhaps working on primitive processing?

EDIT: trying to get indentation to work on my quote.
 
demalion said:
In any case, I fail to see, at this point why the above phrasing excludes any form of deferred rendering...except perhaps that they mention a Z-buffer...is it just that there is no way that term would be used for any deferred rendering system? If so, is there room for them to call it a Z-buffer, programmers to treat it is a Z-buffer, but for it to be implemented in a way that wouldn't preclude deferred rendering?
It doesn't exclude anything, but if NV30 is going to be a TBR you wouldn't read that stuff on their paper for sure.
That technique could be called a defferred rendering software driven..and even if Nvidia could optmize NV30 to handle it you have to retransform the geometry twice at least.
And that little bird also told me they aren't going to save all the transformed geometry in some off/on chip buffer.

ciao,
Marco
 
nAo said:
demalion said:
In any case, I fail to see, at this point why the above phrasing excludes any form of deferred rendering...except perhaps that they mention a Z-buffer...is it just that there is no way that term would be used for any deferred rendering system? If so, is there room for them to call it a Z-buffer, programmers to treat it is a Z-buffer, but for it to be implemented in a way that wouldn't preclude deferred rendering?
It doesn't exclude anything, but if NV30 is going to be a TBR you wouldn't read that stuff on their paper for sure.
That technique could be called a defferred rendering software driven..and even if Nvidia could optmize NV30 to handle it you have to retransform the geometry twice at least.
And that little bird also told me they aren't going to save all the transformed geometry in some off/on chip buffer.

ciao,
Marco

That little bird comment makes me think their focus is on really high order triangle setup throughput...makes sense in conjunction with primitive processing as I (think I) can see now how that possibly would not work too well with instructions that can alter/destroy primitives, and how primitive processing can place some high demands on triangle setup.

Hmmm...that even sort of ties in with the multi-chip HW T&L engine stuff that was thrown around, but didn't someone give a reason for discounting that?
 
Read carefully, demalion.
Xmas said:
They wouldn't recommend something like this if their upcoming architecture wouldn't benefit from it.

I didn't say a deferred renderer can't do this. It simply wouldn't benefit from it. In fact, it would suffer.

They recommend laying down the Z buffer first and drawing objects in front-to-back order "to minimise the overall cost by not spending time on unseen pixels".
On a deferred renderer, that would simply be a waste of time.
 
What about the little people?

Well, although the good sized market of gamers are usually intelligent upon their purchase, you also have to take into consideration the general people who have no clue what's what.... i myself know several people who would just go and buy whatever is best, and leave it at that... So their will be some definite market for the R300 in the next few months.

Although i'm not sure what the planned release date is yet, things will REALLY start cranking up with the cards when UT2003 and Doom 3 come out. As i said, i have no clue on the release dates for these, but if it's anytime before the NV30, you can expect to see a good raise in the sale of the R300, considering people such as myself with a crappy GF2 MX are gonna wanna go big for the big games :)
 
demalion said:
Why couldn't a deferred renderer do this? I do recall something about the ImgTech implementation suffering from sorted front to back order, but why would every deferred renderer implementation suffer? My understanding of what constitutes deferred rendering is a bit hazy (and there is no glossary explanation, hint hint ;) ).

Because those were recommendations to developers. Neither of the optimizations recommended would be of any use to a TBR.

Also, the biggest continuing problem with TBR's is handling excessively-high triangle counts. Since a TBR must store all post-transform data (which can take up quite a bit of space...), there will be significant stalls in games that average too close to the scene buffer size on a TBR. The stalls get more severe when FSAA is used.
 
Chalnoth,

Aren't you taking under consideration only current available TBR sollutions?
 
Well, I can see fine why a certain implementation of a deferred render wouldn't benefit from that, but I didn't see why every implementation of a deferred renderer would necessarily suffer...to me it seemed to presume that nVidia wouldn't do something new that still fit the name "deferred renderer".

I DO, however, think I understand (now) the point that they wouldn't recommend that developers do this since it duplicates the function of what anything that could be called a deferred renderer would try to do, and they wouldn't have emphasized anything that would benefit other architectures more than it benefits their next generation architecture. It seems a pretty valid reason why NV30 wouldn't be a deferred renderer, as Xmas says.

Chalnoth: even though it seems there is good reason why it isn't deferred rendering, what buffer size would be necessary to prevent stalls for the near future? How about on the fly "free" geometry compression for storage...it seems to me that geometry data is highly compression friendly. Also, it fits into the abstracting primitives on the GPU pretty well.

What exactly is nVidia likely to be incorporating from 3dfx? M buffer?

I didn't see any corrections to my limited understanding of deferred/immediate mode rendering...is it fairly accurate?
 
Ailuros said:
Chalnoth,

Aren't you taking under consideration only current available TBR sollutions?

No. All deferred renderers operate on one concept:

After T&L is finished, cache the entire scene and separate it into tiles.

It becomes obvious that there will be problems with especially-complex scenes.

Consider this, for a moment. Today's z-buffers are 4 bytes per pixel.

Uncompressed, a scene buffer may require 90 bytes per vertex. Since a smart scene buffer would throw out unused vertex data, and would try to use a system similar to a vertex buffer, let's drop that all the way down to 20 bytes per vertex, and a 1:1 triangle to vertex ratio (I consider this very, very conservative as the scene buffer also must store other data such as rendering context for each pass...).

In this way, the size of the scene buffer will outgrow a z-buffer when each triangle has an average size of about five pixels. But what about overdraw? Let's say there's an overdraw of four. That would make it so that the average triangle size on-screen is closer to 20 for similar memory size requirements.

At 1600x1200, that's 96,000 triangles per scene. A GeForce2 should be able to do triangle rates like that at over 60 fps.

You also have to consider that for optimal performance, the scene buffer must be large enough so that games are going to be very, very unlikely to exceed the maximum size.

In the end what you come up with is:
Extreme memory size requirements.
Potentially inefficient memory bandwidth as triangle counts increase and triangles are shared among different tiles.
Extreme performance hit when scene buffer is overrun.

Some (DaveB) have argued that these problems are solvable through things like geometry compression. Personally, I don't see why post-transform geometry should be any more compressible than a z-buffer, and I find it rather certain that at this point in time, geometry rates are going to be increasing much faster than fillrate.
 
Some (DaveB) have argued that these problems are solvable through things like geometry compression. Personally, I don't see why post-transform geometry should be any more compressible than a z-buffer, and I find it rather certain that at this point in time, geometry rates are going to be increasing much faster than fillrate.

Well you could do quantization of your position and normal data (or pretty much just about any vector). The Z-compresson on some the more current GPUs IIC do that, and the GCN and PS2 utilize quantization on all vector elements to alleviate bandwidth consumption when moving and storing data between discrete components. I could possibly be applied in between transform and setup stages (treating the stages as discrete processes). You could look into meshify algorithms (I've done some on the PS2, getting as low as .06 to 0.7 verts/tri per mesh patch). With the those sort of ratios you're looking at a 200,000 poly scene consuming only 600KB...

There's also the possibility of using the concept of a primitive processer inversed to pack and unpack data, basically using a variation of the meshify techniques to quantize objects in the scene and store it as a procedural description to run when the setup engine makes a request. But that's kinda WAAAAY out there in terms of feasablity (and perhaps necessity) with the rate of progression with today's hardware.
 
archie4oz said:
You could look into meshify algorithms (I've done some on the PS2, getting as low as .06 to 0.7 verts/tri per mesh patch). With the those sort of ratios you're looking at a 200,000 poly scene consuming only 600KB...

Did you mean .6 to .7? Anyway, yes, it is possible to have a 2:1 tri/vertex ratio. Still, with your 200k poly scene consuming only 600kb, are you only considering vertex position data, or all other vertex attributes? (Like texture alignment, normals, lighting value, etc.).

There's also the possibility of using the concept of a primitive processer inversed to pack and unpack data, basically using a variation of the meshify techniques to quantize objects in the scene and store it as a procedural description to run when the setup engine makes a request. But that's kinda WAAAAY out there in terms of feasablity (and perhaps necessity) with the rate of progression with today's hardware.

I agree...would require tons of power in the primitive processor, as you would end up re-processing primitives many times over.
 
Re: What about the little people?

Zap said:
i myself know several people who would just go and buy whatever is best, and leave it at that... So their will be some definite market for the R300 in the next few months.

Although i'm not sure what the planned release date is yet, things will REALLY start cranking up with the cards when UT2003 and Doom 3 come out. As i said, i have no clue on the release dates for these, but if it's anytime before the NV30, you can expect to see a good raise in the sale of the R300, considering people such as myself with a crappy GF2 MX are gonna wanna go big for the big games :)

I don't know any such buyers myself...I guess I would if I didn't know them though... :p
Meaning that they usually come to me for advice and I wouldn't recommend the best available to anyone...(or wouldn't do before DX9 cards anyhow....and it of course all boils down to what they actually need...)

I don't know a specific date either,but I'm sure NV30 is out well before DooM3....with UT2K3 though...maybe not....I'm guessing around October for its release...(so,50/50 or so,depending on which is faster DE or nVidia)

Edit:
With out in October I meant UT2K3...not very clear I realized when re-reading...
 
Personally, if I had the money to upgrade to an R300 in the next month, I wouldn't buy it right away. I would certainly take a "wait and see" attitude about the stability and drivers of the product. But, if on forums like this one, many people began to have good experiences with it, I might go for one had I the money.

I don't believe that the looming spectre of the NV30 should influence anybody's decisions until the card is at least announced.
 
Chalnoth.... what do you supose the chances of ATI having a refresh available of the Radeon 9700 by the time the NV30 hits the market? I suspect that these chances are high. Further I suspect that it will give the nv30 (assuming that it is faster then the Radeon 9700) a serious run for its money. I may be assuming you say ney? Just curious as to your thoughts regarding a refresh of the Radeon 9700?
 
Did you mean .6 to .7?

:eek: Ooops! Yeah, .6 to .7 is correct...

are you only considering vertex position data, or all other vertex attributes? (Like texture alignment, normals, lighting value, etc.).

Meshify primarily deals with managing position data, but in my cases I had normal and lighting values in as well included. No texture attributes were included since it was purely an exercise in geometry managment, although the data quantization could easily be applied there as well...
 
Back
Top