far cry ps3 and stuff

jb · Jun 30, 2004

DaveBaumann said:
I'm wondering if we'll be seeing any PS2.0 demos from LeGreg?

Why where does LeGreg work at? Come on tell us

KimB · Jun 30, 2004

Mature? In what way?

poly-gone · Jun 30, 2004

HDR is not about simply accumulating colors which finally amount to more than 1.0f. By doing that, all you'll get is a washed out scene. HDR also includes "tone mapping", which scales the over-bright scene into the viewable range. You will need shaders to do this.

Apart from tone mapping, there's also the bloom effect, which simulates blooming and color bleeding of extremely bright areas of the scene. You'll again need shaders to do this efficiently. So, at some point you WILL NEED shaders to "do" HDR. FP blending is NOT the replacement "technique" for shaders. It only simplifies things a little.

Mr. Travis · Jun 30, 2004

darn crytek... keep showing off more and more features and doing less and less to get the sdk out

Humus · Jul 1, 2004

Evildeus said:
I still miss the contradiction. It solves it in many situation but not all. What happen in the other situations? What if Dev use branching in the other situations?

I'm saying that both solves the problem, but if you're using one there's no point in using the other. If you've already culled unlit fragments with the stencil test, there's no need anymore to do an early out from a ps3.0 shader.

This technique is really quite general. It can be used in any situation where you're optimizing by an if-statement in ps3.0. Just render "if" in a separate shader in the first pass, and in the second pass you render "then", and "else" in a third pass if you need it. Since the hardware does early culling of fragments based on the stencil test the cost of this is hardly any higher than if it would have been a single pass.

The situations where this can't be applied is atypical scenarios. I have a hard time coming up with one. I was going to say Mandelbrot rendering where you drop out as soon as length(z) > 2, but now that I think of it you can apply this technique even in this situation. It will be a bit tricker, with rendering back and forth between render-targets, and you'll probably have to use several loops granuality to see any speed-up, but it should be doable. In this case ps3.0 may win however, but in the common situations however, such as speeding up lighting, I'm confident this technique will come pretty darn close to ps3.0 or even beat it, depending on how costly dynamic branches are on nVidia's hardware.

Humus · Jul 1, 2004

Chalnoth said:
It seems to me that this is one example where dynamic branching could dramatically save the number of passes. It seems like you could do the above algorithm in one pass with PS 3.0 (well, two if you include the initial z-pass....but I suppose it really does depend upon what you were using to eliminate lights....what were you thinking about, specifically?). You may also need to clear the stencil buffer without clearing the z-buffer with the above algorithm.

Don't stare yourself blind on the number of passes, because that's the point of this technique. The overhead of multipass is near zero. Even if ps3.0 can do it in a single pass, it's not doing much less actual work. (Well two passes actually, because even with ps3.0 you'd still want to do that depth-only pass).

Imaging a scene with a light. The light has a limited radius, so only half the scene is lit by this light. Any part of the scene that's beyond for instance 100 units is completely in the dark, and thus need not be shaded.

So in ps3.0 you simply check if length(lightVec) < 100, and if so you go through the usual lighting code, otherwise immediately return zero. This means that the total workload is that the if-statement is run for the whole scene, and lighting for half the scene.

With my technique, you first render the same if-statement for the whole scene. Then in the next pass you draw the lighting where stencil = 1. Total workload is if-statement for the whole scene plus lighting for half the scene, which is the same as in the ps3.0 case.

So it boils down to the question of what's more costly, cycles spent on dynamic branching, or cycles spent on early culling with stencil. And I'm not so sure dynamic branching will turn out as the winner of that battle, cause stencil culling is really fast. I'm not sure if our hardware does something similar to Hierarchical-Z with stencil too and culls full tiles (maybe someone who knows can fill in), but my guess is that it does this, cause the cost is really very low.

pat777 · Jul 1, 2004

I've heard someone say that your technique isn't really new. I think nVIDIA can switch back between your technique and dynamic branching depending on which technique is more suitable for the given situation.

I still think there are a lot of advantages of dynamic branching(other than this situation) to be discovered. We haven't used anywhere near half the effects PS2.0 is capable of. I'm sure there's a lot of useful effects that can be dramatically sped up by dynamic branching.

DSN2K · Jul 1, 2004

I knew waiting to buy this would work out better.

Im going play Far Cry in its fall glory.... 8)

FUDie · Jul 1, 2004

DSN2K said:
I knew waiting to buy this would work out better.

Im going play Far Cry in its fall glory.... 8)

If you had a Radeon 9500 or better, you could have been enjoying Far Cry in its "fall glory" (sic) all along.

-FUDie

trinibwoy · Jul 1, 2004

I seriously doubt a 9500 can run Far Cry at its best.

Humus · Jul 1, 2004

pat777 said:
I've heard someone say that your technique isn't really new. I think nVIDIA can switch back between your technique and dynamic branching depending on which technique is more suitable for the given situation.

I still think there are a lot of advantages of dynamic branching(other than this situation) to be discovered. We haven't used anywhere near half the effects PS2.0 is capable of. I'm sure there's a lot of useful effects that can be dramatically sped up by dynamic branching.

Well, I'm sure someone have thought of it before. It's pretty simple so it wouldn't surprise me. If nothing else I have to credit my colleague Guennadi who initially brought up the idea.

Sure this technique would work on nVidia cards too (assuming that they do top of the pipe stencil culling), but there's no way the driver can just decide to turn a dynamic branching shader into this technique.

991060 · Jul 1, 2004

Humus，wouldn't your technique place more burden on vertex shader and memory bandwidth? I remember R3xx/R4xx's H-Z can only be effecient when both depth and stencil buffer is cleared, so here's another question. And what if you want to simulate nested branches? Does the technique still apply?

edit: just realized the H-Z wouldn't be a problem, you have a depth-fill pass.

cho · Jul 1, 2004

Humus, will Crytek adopt your method to speed up the shadow render of Far Cry ?

Evildeus · Jul 1, 2004

FYI:

In addition to the improvements mentioned in the updated changelog, the 1.2 patch will enable much-anticipated Shader Model 3.0 support for NVIDIA's new GeForce 6-series cards. Leveraging Microsoft's DirectX 9.0c, SM 3.0 support will enable gamers with supported hardware to achieve unprecedented levels of realism in their Far Cry experience.

http://www.farcry.ubi.com/

FUDie · Jul 1, 2004

trinibwoy said:
I seriously doubt a 9500 can run Far Cry at its best.

My point was that you don't have to wait for a patch to access all the features of the engine. NVIDIA cards are running with PS 1.1 in place of many (all?) PS 2.0 effects.

-FUDie

Evildeus · Jul 1, 2004

You can change to the R3** path (at a cost of performance on Nv3*)

_arsil · Jul 1, 2004

Humus said:
Well, I'm sure someone have thought of it before. It's pretty simple so it wouldn't surprise me. If nothing else I have to credit my colleague Guennadi who initially brought up the idea.

Sure this technique would work on nVidia cards too (assuming that they do top of the pipe stencil culling), but there's no way the driver can just decide to turn a dynamic branching shader into this technique.

This technique isn't new. See:

PEERCY, M. S., OLANO, M., AIREY, J., AND UNGAR, P. J. 2000.
Interactive multi-pass programmable shading. Proceedings of ACM SIGGRAPH 2000

It is also mentioned in:

Timothy J. Purcell Ian Buck William R. Mark Pat Hanrahan
Ray Tracing on Programmable Graphics Hardware

and dozens other papers...

It isn't also replacement for flow control in pixel shaders. Simulating flow control using some early tests (like stencil or Z-Buffer) requires multi-pass rendering, that means you have to pass all geometry multiple times. If your vertex shaders are complex or you have a lot of geometry it isn't a solution.

Also simulating nested loops, nested ifs and jumps isn't an easy problem and will certainly break your shader into 10 or 15 passes.

nAo · Jul 1, 2004

Sorry but I fail to see what's new is Humus's technique. Developers do that kind of 'tricks' all the time. It's a pretty basic and well known technique

991060 · Jul 1, 2004

Humus, if possible, I'd like to take a look at your demo's source code, maybe I can convert it to SM3.0, then we can see which method is better.

DegustatoR · Jul 1, 2004

FUDie said:
My point was that you don't have to wait for a patch to access all the features of the engine. NVIDIA cards are running with PS 1.1 in place of many (all?) PS 2.0 effects.

All?

Stop that, it's not even funny anymore. Everybody in the world knows how you love ATI by now...

far cry ps3 and stuff

jb

KimB

poly-gone

Mr. Travis

Humus

Crazy coder

Humus

Crazy coder

pat777

DSN2K

FUDie

trinibwoy

Meh

Humus

Crazy coder

991060

cho

Evildeus

FUDie

Evildeus

_arsil

nAo

Nutella Nutellae

991060

DegustatoR

Similar threads