Well, thats only textures. You need multiples of that because for each frame you're going to have tests and blends on all many of the alphas.
MDolenc said:Hold on a minute here!
Why the hell would GeForceFX need to store transformed vertices back into video memory and then read them again at triangle setup stage?? The ONLY think this could be good for is their render to vertex array functionality, but you don't need to do this GENERALY in every case!? Where did you get this info from (and how sure are you about it)?
DaveBaumann said:Well, thats only textures. You need multiples of that because for each frame you're going to have tests and blends on all many of the alphas.
So, that means... 8X AF, when enabling Antialiasing, is... 95% free.
Nice! All we want to know, now, is its quality.
LeStoffer said:DaveBaumann said:Remember how some new Det. drivers made GF4's nature FPS do a massive jump? It still makes me believe that we're talking Vertex shader power limitation here.
Or it could be that FutureMark is so darn stupid that they forgot to turn off Alpha Testing when it's no longer required? Because that kills Early Z.
Which means that it would give a major fillrate & memory bandwidth boost if a driver optimization could intelligently determine if Alpha Testing is still required.
It's unlikely, but maybe, maybe...
Uttar
Uttar said:http://www.3dcenter.org/artikel/2002/11-19_b.php
1280x1024, 4X AA, 8X AF: 40.6
Compared to Maximum PC score:
1600x1200, 2X AA, no AF: 41
I'd guess 1600x1200 2X AA takes as much bandwidth as 1280x1024 4X AA, because color compression is more efficient with 4X AA, since more subpixels are identical.
So, that means... 8X AF, when enabling Antialiasing, is... 95% free.
Nice! All we want to know, now, is its quality.
Unless nVidia's homemade score at 3DCenter is increased by putting it in god-like conditions or something... But that would be surprising.
Uttar
LeStoffer said:Remember how some new Det. drivers made GF4's nature FPS do a massive jump? It still makes me believe that we're talking Vertex shader power limitation here.
Uttar said:...I'd guess 1600x1200 2X AA takes as much bandwidth as 1280x1024 4X AA, because color compression is more efficient with 4X AA, since more subpixels are identical.
So, that means... 8X AF, when enabling Antialiasing, is... 95% free....
Uttar said:Or it could be that FutureMark is so darn stupid that they forgot to turn off Alpha Testing when it's no longer required? Because that kills Early Z.
...
It's unlikely, but maybe, maybe...
Uttar
You have to realize that 3DMark is not an open source test, which is the main reason why it's so damn hard to understand what the heck is going on in each test...Anyway, my main annoyance with 3dmark 2001 is that it takes some major study before anyone really understand what the heck each test really tests. Take the car chase - high detail: It's very interesting if you want to measure your computer's subsystem memory performance (FSB, ram bandwidth etc) and even now I'm not sure that the shader benchmark (nature) really measures shader performance and not memory bandwidth.
It's okay that the final mark is a sum, but please at least give people a benchmark that measure a single aspect in a gaming situation next time. Okay, Futuremark
Uttar said:MDolenc said:Hold on a minute here!
Why the hell would GeForceFX need to store transformed vertices back into video memory and then read them again at triangle setup stage?? The ONLY think this could be good for is their render to vertex array functionality, but you don't need to do this GENERALY in every case!? Where did you get this info from (and how sure are you about it)?
This is not only the GFFX, it's also that way for every other nVidia GPU on the market. Probably for ATI/Matrox/... ones too.
Note, however, that this isn't anywhere as major as Z reads or Color writes.
Uttar
Mintmaster said:MDolenc is right. There is no need to transform all vertices first and then read them. You need to do this for tile based deferred rendering, but not for an IMR like all other video cards. I think this is one of the main reasons for not doing TBR.
You read the vertices either from video card memory or AGP (I think the latter is more common) in the order according to the primitive type, and then you transform them. The vertex cache is there to try and avoid retransforming recently transformed vertices again, thus achieving a max of 2 triangles per vertex. Once you form a primitive, you draw it, and you have no need to store it anywhere because you are done with it. Of course there are fifos to buffer out culled/clipped/tiny triangles, but these are on the chip and don't consume any bandwidth.
Uttar said:MDolenc said:Hold on a minute here!
Why the hell would GeForceFX need to store transformed vertices back into video memory and then read them again at triangle setup stage?? The ONLY think this could be good for is their render to vertex array functionality, but you don't need to do this GENERALY in every case!? Where did you get this info from (and how sure are you about it)?
This is not only the GFFX, it's also that way for every other nVidia GPU on the market. Probably for ATI/Matrox/... ones too.
And yes, I've got a source for this. I'm *not* inventing it.
It comes directly from a nVidia patent:
http://patft.uspto.gov/netacgi/nph-...Vidia.ASNM.&OS=AN/nVidia&RS=AN/nVidia
The following is speculation. However, it seems very logical to me, and it bases itself on reliable information. It could be wrong, but I'd be surprised if it were more than a few mistakes. And some of it is not speculation, but known facts.
DrawIndexedPrimitive transforms every primitive in the VB from "BaseVertexIndex + MinIndex" to "BaseVertexIndex + MinIndex + NumVertices"
It puts all of those transformed vertices in memory.
Then, once they're all there, it begins reading the indices which are in memory, in an Index Buffer ( Yes, that also takes memory. Probably at least 1GB/s in extreme cases, but I didn't do any serious calculation here )
It probably reads 128 bit of indices ( it's a 128 bit bus ) and puts them in a very small cache.
Yes, this was long. But you seemed so surprised, so I really wanted to proof this and explain it.
Anyway, I could have done mistakes/errors here. If anyone finds one, please say so and explain it to me if possible. I'd love to increase my understanding of this
Uttar
Uttar said:Reading vertices from AGP? Oh, sure
Even if the VB is dynamic ( which means AGP sends info to it every frame ) , it's stored in video memory. The only difference is that it's written to video memory every frame.
And the very proof it's not read from AGP: Dynamic data can be read multiple times, so it's kept in memory.
Mintmaster said:MDolenc is right. There is no need to transform all vertices first and then read them. You need to do this for tile based deferred rendering, but not for an IMR like all other video cards.
alexsok said:If I'm not mistaken, A02 stepping is already back from the fab ...
Uttar said:Or it could be that FutureMark is so darn stupid that they forgot to turn off Alpha Testing when it's no longer required? Because that kills Early Z.
Which means that it would give a major fillrate & memory bandwidth boost if a driver optimization could intelligently determine if Alpha Testing is still required.
It's unlikely, but maybe, maybe...
Uttar