Good idea, except that "octopipe" might be confused with a James Bond movieDio said:At least if they were called hexas, octs, dodecs, hexadecs or icosas it would be less confusing with a rarely used primitive type.
Keep in mind that Fablemark is also not optimized for an NV3x-style architecture. It does ambient lighting at the same time as the first z-pass, which destroys the NV3x's ability to accelerate that first z-pass. This is not the case with DOOM3, which will be one of the few game engines to use stencil shadow volumes.Ailuros said:I can see a NV3x being merely twice as fast as a K2 in Fablemark, yet then again the latter has already as many Z/stencil units as NV40 should have.
Huh? I really don't understand what you're trying to say there. in that first paragraph. Anyway, as for the buffer consumption, high triangle densities can require similar amounts of buffer space, and I still would rather have fixed resource requirements than a variable one.There's a chance that IHVs will leave in the future AA to the ISVs for which I don't disagree with. MSAA similar in efficiency up to how many samples exactly? 4x maybe 8x in a very generous case?
Over 100MB of buffer consumption alone for 6x MSAA.
That's up to software developers. As shaders get longer, I expect initial z-passes to become common. This will have a twofold benefit:Presupposition there is a z-only pass.
Well, I suppose IMR's will need to overwrite values for blending, but they won't need to in most circumstances. So yes, I suppose blending is one inherent benefit to TBDR's that just can't be solved in an IMR, but I'm not sure it's enough...No they never actually do. Who you're kidding anyway?
- IMR's also don't need to overwrite values. See above.
I get the feeling that you were really tired (or something) when you made this post. Regardless, the benefit of total decoupling would be that if during part of the frame rendering is vertex-limited, and during another part it's pixel-limited, you'd end up wasting both pixel and vertex power.Why is it necessary on the other hand to unify grids in future APIs? Maybe just maybe because there has to be a more efficient way to handle very long shaders coupled with very short shaders at the same time?
Which I never said. They've just either solved or are on the road to solving most of the problems that TBDR solves, so I don't see much reason to go all the way to TBDR and add on other potential problems.Sad part is that without any real hardware it is and will remain a moot point. It isn't either that IMRs are supposedly near to "perfection"; it's just one particular brand.
They've just either solved or are on the road to solving most of the problems that TBDR solves, so I don't see much reason to go all the way to TBDR and add on other potential problems.
Huh? I really don't understand what you're trying to say there. in that first paragraph. Anyway, as for the buffer consumption, high triangle densities can require similar amounts of buffer space, and I still would rather have fixed resource requirements than a variable one.
That's up to software developers.
Well, I suppose IMR's will need to overwrite values for blending, but they won't need to in most circumstances. So yes, I suppose blending is one inherent benefit to TBDR's that just can't be solved in an IMR, but I'm not sure it's enough...
I think you need to think a little more clearly about the Fablemark case. If it was NOT to do a Z-only pass first, it would still have to do a colour pass later to do ambient lighting. Therefore, the Z-only pass is only an absolute consumer of time, assuming that their geometry is reasonably well front-back sorted. So this case is not inefficient - it is more efficient without the Z-pass on all hardware.Chalnoth said:Keep in mind that Fablemark is also not optimized for an NV3x-style architecture. It does ambient lighting at the same time as the first z-pass, which destroys the NV3x's ability to accelerate that first z-pass.
Out of curiosity - what if you managed to efficiently batch TONS of DIP calls, which would otherwise have to be separate? This could become more frequent if branching became more economical. Then, you couldn't do good front-back sorting, and if your pixel shaders were long, an early Z pass would be more, let us say, profitable.Dio said:Z-only passes are quite inefficient unless you are doing stencil shadows with a global lighting algorithm, as Doom3 is. Or unless there's no front-back sorting, but that's just sheer laziness and inefficiency.
That's only because the program uses no shaders. With shaders you can do ambient lighting along with other light passes. This would be particularly true with a rendering technique that uses MRT's to do all lighting by drawing a single screenspace quad.Dio said:I think you need to think a little more clearly about the Fablemark case. If it was NOT to do a Z-only pass first, it would still have to do a colour pass later to do ambient lighting. Therefore, the Z-only pass is only an absolute consumer of time, assuming that their geometry is reasonably well front-back sorted. So this case is not inefficient - it is more efficient without the Z-pass on all hardware.
Dio said:Z-only passes are quite inefficient unless you are doing stencil shadows with a global lighting algorithm, as Doom3 is. Or unless there's no front-back sorting, but that's just sheer laziness and inefficiency.
Except any directional lighting will be done after shadow calculations. Why not bake ambient lighting into one of these passes?Dio said:Yes, and you can fill in your Z buffer at the same time, and if you're reasonably front to back sorted it will STILL be significantly faster on all hardware. That's even before you start using a complex vertex shader where Z-passes get a lot slower and/or more complicated.
sigh... if you've got a lighting algorithm that absolutely requires full Z information and you can't do anything else beforehand, then yes, a Z-only pass might make sense, in that you have to form the entire Z buffer before you can get to the screen buffer anyway.Chalnoth said:Except any directional lighting will be done after shadow calculations. Why not bake ambient lighting into one of these passes?