Hyp-X: Good job on showing how stupid I am.
Thanks for correcting me so well. I didn't write the test programs yet, but I'll be sure to write them in a few days when I get the time.
However, I'd still think that transformed vertices are put in memory. Just that it doesn't put every single vertices of a DIP call in it at the same time.
Basically, it would use memory as some type of cache there. Not a huge cache, but probably larger than other ones ( else, why use memory? )
I've found a much more recent nVidia patent saying this. It's dated June 28, 2002:
http://appft1.uspto.gov/netacgi/nph...=PG01&s1=nVidia.AS.&OS=AN/nVidia&RS=AN/nVidia
Here's the quote:
However, it doesn't make it 100% sure yet. It just describes "one enbodiment of the present invention" - so, in their GPUs, they could do it differently I think.
But if we consider it right, one question remains: since it would thus be a very small cache, why not putting it in memory? I don't know...
Maybe due to texture changes, which would cause a stall? So being able to transform more vertices in advance could be essantial. But that sounds like a ridiculous and illogical explanation to me...
Any idea? Or maybe a way to proof I'm wrong again?
Aww, I've already been wrong on so much in so little posts... I must really look like a moron.
Uttar
EDIT: Small addition about Alpha Testing/Early Z:
Hmm, yeah, maybe it could delay Z Writes. But then, wouldn't that Early Z test have potentially been wasted bandwidth? So maybe it wouln't be very useful either. Not sure, never seen a nVidia/ATI document on this, so maybe it doesn't kill it.
But no matter what, it wastes something. If it's rejected after having done Early Z, you'll have done a useless Z read. If it's not, you didn't do Early Z and it would have been rejected by Early Z, you lost fillrate.
So it's all about which is most important: bandwidth or fillrate. Most of the time it would be fillrate, but there could be rare exceptions.
And anyway, if it had to potentially delay Z Writes, it would have to keep a flag to say if it does or not. And a lot of other things. Maybe that isn't supported by today's hardware? ( or maybe it is. Or maybe it's only supported by the R300/NV30. Or maybe only R300. Or maybe it'll only be supported in R400. Or... you see the idea )
It shouldn't be too hard to add it, but you got to think about adding it first ( and seeing if it's worth the cost )
Uttar
Thanks for correcting me so well. I didn't write the test programs yet, but I'll be sure to write them in a few days when I get the time.
However, I'd still think that transformed vertices are put in memory. Just that it doesn't put every single vertices of a DIP call in it at the same time.
Basically, it would use memory as some type of cache there. Not a huge cache, but probably larger than other ones ( else, why use memory? )
I've found a much more recent nVidia patent saying this. It's dated June 28, 2002:
http://appft1.uspto.gov/netacgi/nph...=PG01&s1=nVidia.AS.&OS=AN/nVidia&RS=AN/nVidia
Here's the quote:
[0215] FIG. 25 is a diagram illustrating the method by which the sequencers of the transform and lighting modules 52 and 54 are capable of controlling the input and output of the associated buffers in accordance with the method of FIG. 24. As shown, the first set of buffers, or input buffers 400, feed transform module 52 which in turn feed the second set of buffers, or intermediate buffers 404, 406. The second set of buffers 404, 406 feed lighting module 54 that drains to memory 2550.
However, it doesn't make it 100% sure yet. It just describes "one enbodiment of the present invention" - so, in their GPUs, they could do it differently I think.
But if we consider it right, one question remains: since it would thus be a very small cache, why not putting it in memory? I don't know...
Maybe due to texture changes, which would cause a stall? So being able to transform more vertices in advance could be essantial. But that sounds like a ridiculous and illogical explanation to me...
Any idea? Or maybe a way to proof I'm wrong again?
Aww, I've already been wrong on so much in so little posts... I must really look like a moron.
Uttar
EDIT: Small addition about Alpha Testing/Early Z:
Hmm, yeah, maybe it could delay Z Writes. But then, wouldn't that Early Z test have potentially been wasted bandwidth? So maybe it wouln't be very useful either. Not sure, never seen a nVidia/ATI document on this, so maybe it doesn't kill it.
But no matter what, it wastes something. If it's rejected after having done Early Z, you'll have done a useless Z read. If it's not, you didn't do Early Z and it would have been rejected by Early Z, you lost fillrate.
So it's all about which is most important: bandwidth or fillrate. Most of the time it would be fillrate, but there could be rare exceptions.
And anyway, if it had to potentially delay Z Writes, it would have to keep a flag to say if it does or not. And a lot of other things. Maybe that isn't supported by today's hardware? ( or maybe it is. Or maybe it's only supported by the R300/NV30. Or maybe only R300. Or maybe it'll only be supported in R400. Or... you see the idea )
It shouldn't be too hard to add it, but you got to think about adding it first ( and seeing if it's worth the cost )
Uttar