Report says that X1000 series not fully SM3.0?

DemoCoder said:
With VTF, you only render to texture only if you need to update the texture every frame (e.g. e.g. iterated physics) and when you do it, the possibility exists to do it only once, and reuse it with multiple vertex streams. With R2VB you update both the texture you are sampling (e.g. iterated physics) AND must render a new vertex buffer. Every new vertex stream requires a separate render.

That's incorrect. There's no more need to update the texture every frame for R2VB. You can render to it once, then reuse how many times you want. This is not at all different from VTF. Everything with regards to the texture update is exactly the same. The only difference is how it's accessed in the vertex shader. The vertex shader accesses it through a sampler. R2VB reinterprets the texture memory as a vertex buffer.

DemoCoder said:
The VTF requires extra instructions to figure out texture addresses, but the R2VB requires extra texture samples to lookup the input vertices.

Huh, "lookup the input vertices"? What's that supposed to mean?

DemoCoder said:
The assumption that vertex textures are created by the rasterizer I think is flawed. That may be the case in some instances, with the assumption of physics done on the GPU, but is also likely the case that these are done by the CPU, which is certainly more like the case on XB360/PS3, as the very system architecture was designed to allow CPU driven procedural synthesis.

That's the whole purpose of VTF to begin with, so that you can feedback results from the end of the pipe to the beginning. If you're updating stuff on the CPU there's absolutely no reason to use textures. Use a vertex buffer.

DemoCoder said:
It R2VB is such a huge win, why don't we just get rid of vertex shading on GPUs, get rid of unified shading, and just make pure PS rasterization cards? I think it is the wrong model, and that the current segmented model (vertex shader, pixel shader, and future geometry/tesselation shaders) is the correct one going forward.

It doesn't get rid of vertex shading or setup, but you can certainly shift over workload from the vertex pipes to the fragment pipes. On a unified architecture this is not needed. That doesn't mean that R2VB isn't something you'll see in the future since DX10 in many ways builds on similar concepts.
 
cho said:
Ther R2VB need to run Pixel Shader instructions ? is it transparent when developers use VTF (like the ATI driver can conversion the VTF as R2VB on X1000s) ?

R2VB and VTF are two different features, so it's not transparent in that sense, but it's easy to support both. The number of pixel shader instructions needed for R2VB and VTF is exactly the same. The first difference is when you access your newly rendered to texture.
 
Humus said:
You can render to it once, then reuse how many times you want. This is not at all different from VTF.

You missed the point. Every reuse applied to another vertex stream generates another rendering pass. This pass is saved with VTF. The two situations are not identical.

The fundamental point is that the application of the VT to the vertices, requires no intermediate storage or bandwidth to be used, as the resultant perturbed vertices go straight to the rasterizer. It's a streaming paradigm. R2VB is not a streaming paradigm. It commits a full result, no matter the size, to the FB before the next stage.

Huh, "lookup the input vertices"? What's that supposed to mean?

If you are permuting one vertex stream, using previously rendered displacement map, and wrting the resultant stream, how does the input vertex stream get sent to the R2VB program? Unless you're storing it in constant registers, this vertex stream is stored in a texture. (I guess you could render "points", but is point rendering going to be as efficient on the GPU as rendering a quad?) Thus, a R2VB must sample two textures: the original vertex, plus the displacement. With VTF, only one texture lookup is needed. The vertex input is handled by specialized vertex fetching logic.

That's the whole purpose of VTF to begin with, so that you can feedback results from the end of the pipe to the beginning. If you're updating stuff on the CPU there's absolutely no reason to use textures. Use a vertex buffer.

I disagree. Certain physics operations are more efficiently implemented on the CPU, but that doesn't neccessarily mean that you can't offload the application of the result of those computations to the GPU. I might store a force field that is calculated by explosive physics on the CPU. I might then upload this VTF to the GPU, and use it with to effect 10,000 onscreen elements.


It doesn't get rid of vertex shading or setup, but you can certainly shift over workload from the vertex pipes to the fragment pipes. On a unified architecture this is not needed. That doesn't mean that R2VB isn't something you'll see in the future since DX10 in many ways builds on similar concepts.

I never said it got rid of setup, but it could certainly be used to get rid of vertex shading. My point is, we have an abstraction: vertices. Developers should deal with high level shading concepts that deal with that abstraction. Requiring them to write multipass logic to make their pixel pipeling perform operations on their vertices I think is a bad design. Regardless of how the hardware implements VS/VTF (it could use R2VB behind the scenes) I do not feel that R2VB is the right interface to present this functionality to developers. I have no problem with it existing when developers (especially middleware vendors) need to fall back and take full control of the pipeline, especially with GPGPU stuff, but I think promoting this as the replacement for lookup capability in vertex shaders is ugly, and I would argue, not as efficient as it could be.
 
Last edited by a moderator:
Excuse the interruption, but I wonder if anyone with a R520 could try to run PowerVR's Cloth simulation demo. It uses vertex texturing and dynamic branching.
 
Ailuros said:
Excuse the interruption, but I wonder if anyone with a R520 could try to run PowerVR's Cloth simulation demo. It uses vertex texturing and dynamic branching.

Fails with an error message (as expected):

"No compatible Vertex Texturing Texture Format Found !
Please run with -RefDevice command line option"
 
Dave Baumann said:
Demirug, my point being is that I already know what you're saying (as I have published in the article), and we are kind of saying the same things, I just don't believe they went into it without checking it with MS first, it would be too risky a manouver - they are saying that MS are happy to expose R2VB this was, although they weren't under similar circumstances with GI. My understanding that DCT checks the the VT Caps bit, but doesn't mandate any texture format so just won't run any VT tests if its not there (but evidently passing even on 0 tests run!).

As long as we find nobody at Microsoft who can tell us more we both can be right. It seems that we estimate the risk in different ways.

I have checked the current DCT and it looks like there is no vertex fetch test at all.
 
DemoCoder, I am on the same road as you.

If I look at what is planed for D3D 10 I see that a primary aim for the new API is to reduce API calls that are need to get something work. VT and R2VB can be used to do the same things but VT will need less API calls. As a reduced number of API calls will get me more CPU time for other things VT is simply the better way. We use GI only for the same reason but in the case of VT vs R2VB the VT solution is easier to develop, too.
 
DemoCoder said:
You missed the point. Every reuse applied to another vertex stream generates another rendering pass. This pass is saved with VTF. The two situations are not identical.

The fundamental point is that the application of the VT to the vertices, requires no intermediate storage or bandwidth to be used, as the resultant perturbed vertices go straight to the rasterizer. It's a streaming paradigm. R2VB is not a streaming paradigm. It commits a full result, no matter the size, to the FB before the next stage.
You seem to have forgotten that you can mux sources from multiple vertex buffers into a single input stream to vertex processing.
You do not need to write full vertices when rendering to a VB. You can easily source the constant parts from one VB and the dynamic parts from another, and they'll arrive alike (as vertex attributes) in vertex processing.
 
Humus, I'd definately have to agree with DemoCoder here.

I don't think vertex texturing will be replaced by R2VB, and agree with DC's thought that for the most part the opposite will happen. It just makes sense to stick with the current organizational model of the rendering pipeline. R2VB just unnecessarily complicates things - well, it will be unnecessary when latency-free VTF comes around.

That being said, I don't think VTF on G70/NV40 is of much benefit to end users, because it's not really fast enough, so developers working on near term games will likely ignore it. Even Pacific Fighters seem to be using it more as displacement mapping with a dynamic noise texture than for real water simulation, which can be done with a vertex shader a la Nature demo by ATI. For developers, though, it's definately good to be using VTF on G70/NV40/RSX/Xenos for future game engine work.

I personally think ATI should just do a bit of driver work and convert any VS with VTF to a pixel shader that renders to a texture and passes that through the VS via R2VB. It will benefit them in the future because they have a leg up on the unified shader architecture over NVidia, which has two major benefits: load balancing and ultra-efficient VTF. You don't want to promote R2VB if it helps NVidia more than you in the future.
 
Last edited by a moderator:
#1
To throw some much needed context in here.
NVIDIA documentation said:
Section 2.1.2

While 1D and 2D texture targets for vertex textures are supported, the 3D, cube
map, and rectangle texture targets are not hardware accelerated for vertex
textures.

Just these formats are accelerated for vertex textures: GL_RGBA_FLOAT32_ARB,
GL_RGB_FLOAT32_ARB, GL_ALPHA_FLOAT32_ARB, GL_LUMINANCE32_ARB,
GL_INTENSITY32_ARB, GL_FLOAT_RGBA32_NV, GL_FLOAT_RGB32_NV,
GL_FLOAT_RG32_NV, or GL_FLOAT_R32_NV.

<...>

Because vertices are processed as ideal points, vertex textures accesses require an
explicit LOD to be computed; otherwise the base level is sampled.

<...>

NVIDIA’s GeForce 6 Series and NV4xGL-based Quadro FX GPUs do not hardware
accelerate linear filtering for vertex textures. If vertex texturing is otherwise hardware
accelerated, GL_LINEAR filtering operates as GL_NEAREST. The mipmap minification
filtering modes (GL_LINEAR_MIPMAP_LINEAR, GL_NEAREST_MIPMAP_LINEAR, or
GL_LINEAR_MIPMAP_NEAREST) operate as if GL_NEAREST_MIPMAP_NEAREST so as not to
involve any linear filtering. Anisotropic filtering is also ignored for hardwareaccelerated
vertex textures.
Source (PDF-Download)

To summarize: the hardware supports only one-channel and four-channel FP32 textures. It does not support FP16 textures. It does not support any "traditional" integer texture formats. You may only use 2D and 1D textures. No cube maps, no 3D maps.

There's no filtering ... but you can have point sampled mipmapping. For that you need to compute explicit lod bias to sample beyond mip level 0 as there's no concept of texture minification and magnification on vertices.

This, folks, is the feature set of the "legit" vertex texturing implementation on NV40 and G70. Good luck finding a reasonable use case for this. And much better luck finding a reasonable use case that is also not more elegantly handled with render-to-vertex-buffer (not using R2VB as I don't wish to refer to ATI's whatever-it-is specifically).

#2
When OpenGL guy is so confident that Microsoft is okay with the exploited VT backdoor plus the R2VB FourCC thing I have no reason not to believe him. He should know. And if that's not enough, apparently Wavey Dave was told the same thing pre NDA expiration.
"<...> however ATI maintain that Microsoft are happy for them to expose this in such a manner."
From here.

I mean, why not? Think about it. Look beyond the PC space and you'll quickly remember a recent big business deal between Microsoft and ATI. That should lend enough political power to just pull this off.
As the whole thing is rather atypical of Microsoft, there's a need for an explanation and this fits nicely IMO.

#3
It is a bad thing that Microsoft has any say at all about what and how graphics IHVs should implement. It is not beneficial to the industry and never has been, it just creates an awful lot of confusion and everybody's perfectly fine time and energy gets absorbed, over and over again.
If you always thought the ARB was bad because nothing ever gets officially done and everything that's not officially done will be done in fragmented and incompatible ways, yay, then what's this I wonder.
 
I'd actually prefer it if ATI offered some kind of driver workaround whereby if you request a VS with VT, it does R2VB behind the scenes. This would preserve the developer abstraction, and make NVidia's implementation still look bad (unless NVidia did it as well) I prefer the OGL model where if you don't support something directly in HW, you try to emulate it with a workaround. In the case of VS+VT->R2VB, the emulation should be straight forward, and performant.

They could still keep R2VB for people who specifically want to use it.
 
It seems to me that a unified shader architecture should have come before vertex texturing. When you look at it, the traditional split between vertices and pixels, given a "common language" and a requirement to access textures, just looks nuts. Dark ages.

Also, regarding DemoCoder's points about R2VB "breaking" the streaming paradigm - frame rendering, itself, consists of multiple passes these days - so the streaming paradigm has already had its chastity belt unlocked.

Jawed
 
DemoCoder said:
I'd actually prefer it if ATI offered some kind of driver workaround whereby if you request a VS with VT, it does R2VB behind the scenes. This would preserve the developer abstraction, and make NVidia's implementation still look bad (unless NVidia did it as well) I prefer the OGL model where if you don't support something directly in HW, you try to emulate it with a workaround. In the case of VS+VT->R2VB, the emulation should be straight forward, and performant.

They could still keep R2VB for people who specifically want to use it.
Actually, isn't it rather peculiar that ATI didn't "hide" this "fault" in SM3 support by implementing it in the driver.

ATI has known that it would be like this for years, so it's not as if they haven't had the time.

On the other hand the noises they're making imply that if there's demand, they'll make the driver emulate it.

In the end it would seem that ATI has de-prioritised this stuff because developer demand is so low. Though you could argue that developer demand for dynamic branching is low, too. Being kind you could say that's a case of ATI being pragmatic about what developers really want.

Jawed
 
DemoCoder said:
It is not ok to implement a subset of a spec, and then claim full support.

Well, equally it's not OK then to claim full support for a spec, but to implement elements of that spec that are either so slow or buggy or both that devs won't be supporting those elements in their software. That's what I'm talking about when I say "empty marketing bullets." I'd rather an IHV leave off something in a spec instead of implementing it just to sell chips while knowing full well that their current implementation is not particularly useful.
 
DemoCoder said:
I'd actually prefer it if ATI offered some kind of driver workaround whereby if you request a VS with VT, it does R2VB behind the scenes. This would preserve the developer abstraction, and make NVidia's implementation still look bad (unless NVidia did it as well) I prefer the OGL model where if you don't support something directly in HW, you try to emulate it with a workaround. In the case of VS+VT->R2VB, the emulation should be straight forward, and performant.

They could still keep R2VB for people who specifically want to use it.

This seems like the obvious solution, and given they seem to have told Wavey this is an option, I can imagine they might have even assured MS of the same thing --"Hey, if the whining starts from devs, we'll do it. Pinky swear." And if the current tempest-in-a-teacup results in this sooner rather than later, then all to the good. Sometimes tempests can have useful results. Which is a relief, given how often we engage in them. :LOL:
 
Last edited by a moderator:
that discussion is really very sad...
How many can claim that if same thing was made from other IHV their position would be same ? :p
 
I'm seeing a lot of people label the forums here as pro-ATI and anti-nV, and yet I'm not seeing it to the degree that some are (NV4x was roundly praised here, AFAIK, as was G70). Or are you saying ATI appears to be lazy or disinclined to take the initiative, whereas nV typically charges ahead (this would seem harder to support, as innovation seems to be somewhat cyclical)?

In this specific case, it would appear that ATI has a potential workaround to their lack of "native" VT support, so it would seem they can support it if given enough encouragement/lip. Where's the problem, considering the consensus is that nV's implementation is too slow to be really useful?

I agree, though, that considering how long R520 has been delayed, you'd think they could've taken care of this already so as to avoid another excuse (no matter how friggin' awesome parallax occlusion mapping looks :)).
 
Last edited by a moderator:
How many can claim that if same thing was made from other IHV their position would be same ? :p
I recall a few threads on MET vs MRT from a few years back.
 
DemoCoder said:
You missed the point. Every reuse applied to another vertex stream generates another rendering pass. This pass is saved with VTF. The two situations are not identical.

The fundamental point is that the application of the VT to the vertices, requires no intermediate storage or bandwidth to be used, as the resultant perturbed vertices go straight to the rasterizer. It's a streaming paradigm. R2VB is not a streaming paradigm. It commits a full result, no matter the size, to the FB before the next stage.

I think your whole argument against R2VB is because you've misunderstood what it does and does not. No, you don't have to commit a full result. As I said, it's is no way different from VTF in that respect. To take an example, I implemented a water sample at work that does all the physics on the GPU using R2VB, very similar to what my old Water demo did, except now it's using real geometry instead of being a texture space effect.

VTF method:
Code:
 - Render physics to texture.
 - Render visual results. Vertex shader displaces water surface.
   * Stream0: Vertex array containing position
   * VT Sampler0: Texture from first pass.

R2VB method:
Code:
 - Render physics to texture.
 - Render visual results. Vertex shader displaces water surface.
   * Stream0: Vertex array containing position
   * Stream1: Texture from first pass reinterpreted as vertex buffer.

DemoCoder said:
If you are permuting one vertex stream, using previously rendered displacement map, and wrting the resultant stream, how does the input vertex stream get sent to the R2VB program? Unless you're storing it in constant registers, this vertex stream is stored in a texture. (I guess you could render "points", but is point rendering going to be as efficient on the GPU as rendering a quad?) Thus, a R2VB must sample two textures: the original vertex, plus the displacement. With VTF, only one texture lookup is needed. The vertex input is handled by specialized vertex fetching logic.

And the reasoning here seems to be along the same misunderstanding as earlier (no idea how rendering points got into the picture though). The VTF method will access one vertex stream and one texture, R2VB will access two vertex streams where one stream was previously known as a texture, that is, the exact same memory VTF would.

DemoCoder said:
I disagree. Certain physics operations are more efficiently implemented on the CPU, but that doesn't neccessarily mean that you can't offload the application of the result of those computations to the GPU. I might store a force field that is calculated by explosive physics on the CPU. I might then upload this VTF to the GPU, and use it with to effect 10,000 onscreen elements.

Or you can upload the very same memory to a vertex buffer and apply it more efficiently, that's what I'm saying. It's not faster to upload it to a texture than to a vertex buffer. If anything, it's potentially slower since the driver may want to swizzle it, whereas vertex buffer are always linear memory buffers.

DemoCoder said:
I never said it got rid of setup, but it could certainly be used to get rid of vertex shading. My point is, we have an abstraction: vertices. Developers should deal with high level shading concepts that deal with that abstraction. Requiring them to write multipass logic to make their pixel pipeling perform operations on their vertices I think is a bad design.

Again, there's no difference in number of passes needed.

DemoCoder said:
Regardless of how the hardware implements VS/VTF (it could use R2VB behind the scenes) I do not feel that R2VB is the right interface to present this functionality to developers. I have no problem with it existing when developers (especially middleware vendors) need to fall back and take full control of the pipeline, especially with GPGPU stuff, but I think promoting this as the replacement for lookup capability in vertex shaders is ugly, and I would argue, not as efficient as it could be.

Noone is saying R2VB is the replacement for VTF. It's two different features, but with a certain overlap. There's nothing wrong with the VTF feature as such. It's great, but the question is whether VTF can efficiently be implemented today. We don't have a unified architecture, so to make this feature useful we'd have to spend an assload of chip space on it. If you can cover the vast majority of usage cases and be much faster at that using only an update on the driver, then that's IMO a better solution for now. Unless you think the whole idea of StreamOut in DX10 is a bad idea, I don't see a reason to think R2VB is a bad interface.
 
Back
Top