Anyone else wondering if 6800 Ultra is really the 6800?

John Reynolds · Apr 15, 2004

Johnny Rotten said:
There are some fairly high profile titles though having or promising SM3.0 support though (Far Cry, STALKER, HL2) and apparently the difference in Far Cry (based on a 3rd person impression mind you) at least is significant.

That screenshot DC grabbed of Far Cry shows a very noticeable difference compared how the game looks on my system.

DemoCoder · Apr 15, 2004

Im uploading the "blurred" 6megapixel shot to Bjorn's site, plus the original 6megapixel of the one you saw.

The blurred version shows FarCry's beach, but there are stones all over it, and the stones are *displacement mapped* terrain using vertex textures. Also, HL2 has displacement mapping in it, but it is done by the CPU applying/transforming the vertex texture before it hits the GPU. This could probably be accelerated by the NV40 and would be a relatively easy patch.

Also in the FC Beach shot, there are dynamic soft shadows everywhere, whereas in regular FC, they are "baked in" textures. It's really hard to se anything in the blurred shot. Don't yell at me if you download it and it sucks. I wasn't prepared for the FC demo when it came up and had to adjust my camera really quick.

Pete · Apr 15, 2004

A hint at PS3.0 branching penalties?

Or merely a glimpse at beta drivers? I thought the following was relevant to the PS2/3 squabble, though I may be off base.

Google's & my translation of [url=http://www.hardware.fr/articles/491/page6.html said:
Hardware.fr's 6800U preview[/url]]We thus had resolved to test PS3.0 via some small shaders written in assembly by our care (since the HLSL compiler doesn't yet manage PS3.0). We however didn't have time to test this point in-depth. We concentrated on the cost of the conditionals while avoiding ransacking the performances in advance. For example we carried out a conditional carried out on a value which depends on the triangle. The 4 pipelines of each quad engine can thus function at the same time. We have, of course, avoided any texture use. This test resembled this:

If xxxx
Red screen
Else
Green screen
Endif

It's the simplest conditional we could realize. It took not less than 9 passes in the pipelines, which is enormous. We were expecting 2 passes in hoping that nV envisaged its architecture so that that could be done in only one. This result thus disappointed us, but it is possible that it is ascribable merely to immature drivers: time will tell!

KimB · Apr 15, 2004

DemoCoder said:
The untold story of the NV40 is the video processor. If you look at the diagrams, this thing is effectively a Scalar+SIMD semi-general purpose CPU/DSP on-core running at a different clock rate, with it's own memory controller and read/write memory heap. I think this thing can do alot more than video encoding/decoding and is an untapped potential. Think "video processor -> render to vertex buffer" and physics simulation, or audio simulation.

I haven't yet paid that close attention to the "video processor." Are we sure it's not just the pixel and/or vertex pipelines operating in a different mode?

DemoCoder · Apr 15, 2004

Its separate, atleast according to some people I talked to at the event.

KimB · Apr 15, 2004

Re: A hint at PS3.0 branching penalties?

Pete said:
Or merely a glimpse at beta drivers? I thought the following was relevant to the PS2/3 squabble, though I may be off base.

Google's & my translation of [url=http://www.hardware.fr/articles/491/page6.html said:

Hardware.fr's 6800U preview[/url]]...
It's the simplest conditional we could realize. It took not less than 9 passes in the pipelines, which is enormous. We were expecting 2 passes in hoping that nV envisaged its architecture so that that could be done in only one. This result thus disappointed us, but it is possible that it is ascribable merely to immature drivers: time will tell!

Click to expand...

Do we know what kind of branch was used, though? Was it a dynamic or static branch? I would expect that kind of a performance hit from a dynamic branch, but not from a static branch (notice the statement about ~2 cycle performance: that's the performance quoted by nVidia for VS dynamic branching).

Regardless, more interesting benchmarks would compare static branching vs. changing the shader, as well as dynamic branching vs. length of possibly skipped code (i.e. to see about where the crossover in performance is), possibly including textures into the mix.

KimB · Apr 15, 2004

DemoCoder said:
Its separate, atleast according to some people I talked to at the event.

Interesting. But I really wouldn't expect it to approach the power of the pixel/vertex pipelines in that case, so I think it'd be more useful for simulation work to take advantage of the pixel and/or vertex pipelines.

DemoCoder · Apr 15, 2004

Even if it was 1/8-1/16 the speed, it might be useful for simulation or tesselation.

bloodbob · Apr 15, 2004

In reply to Joe DeFuria

So longer shaders are now faster than shorter ones?
I'm sure a long shaders is WAY WAY WAY faster then a long shader that has been decompossed into multiple passes so it can use shorter shaders espically when storing more then about 4 registers per pass as you need to use multiple render targets ect ect ect.

How's the performance looking on those infinitely long shaders?
Well of course infinitely long shaders are never gonna finish but I'm sure that REALLY long shaders are gonna be useful. If you are using the GPU as a stream computer this is gonna run alot faster then your good old X86 for some simulations are prime example of this is calculating fractals.
A prime example of this can be seen in "Fast Floating Fractal Fun" in which the author has full optimised the calculation for modern X86 cpu with SSE/SEE II/3d now/3d now+ ect and has also implemented it in opengl.
http://www.sourceforge.net/projects/ffff

And how fast is the hardware at performing these operations, branching, etc,
Okay lets just dump PS3.0 until we can do really complex shaders with heaps of branching at 120 fps with 32x AA and 32x AF and then we will introduce it like 10 years from now.

Joe DeFuria · Apr 15, 2004

bloodbob said:
In reply to Joe DeFuria

So longer shaders are now faster than shorter ones?
I'm sure a long shaders is WAY WAY WAY faster then a short shader that has been decompossed into multiple passes espically when storing more then about 4 registers per pass as you need to use multiple render targets ect ect ect.

The question is are such shaders going to have real-time frame rates?

Okay lets just dump PS3.0 until we can do really complex shaders with heaps of branching at 120 fps with 32x AA and 32x AF and then we will introduce it like 10 years from now.

That's not the point, but I think you know that.

bloodbob · Apr 15, 2004

Joe DeFuria said:
bloodbob said:

In reply to Joe DeFuria

So longer shaders are now faster than shorter ones?
I'm sure a long shaders is WAY WAY WAY faster then a short shader that has been decompossed into multiple passes espically when storing more then about 4 registers per pass as you need to use multiple render targets ect ect ect.

Click to expand...

The question is are such shaders going to have real-time frame rates?

All depends on the case doesn't it? currently long shaders is anything in excess of 96 instruction and many shaders can run in real time for that. Though I can't promise when you decompose the shaders to run with in the PS_20 requirements they would still run in real time. I'm sure if you go look even in ATI's render monkey or ASHLI you will see numerous examples of Long shaders.

A long shader could be use heaps of registers have say 12 texture look ups but this shader also might only be accessing 16x16 textures and only used on a small porition of the screen. Then you could use that shader again with 2048x2048 texture and you could render it to a 2048x2048 texture.
Of course that latter prolly isn't going to run at real time but the if it is decompossed into shorter shader it might run over 10 times worse.

KimB · Apr 15, 2004

Joe DeFuria said:
The question is are such shaders going to have real-time frame rates?

The first branching benchmark so far seems to indicate ~9 cycle overhead for dynamic branching in the pixel shader (with no textures). Granted, I am making an assumption here (that it was dynamic branching that was tested: I don't think this was stated), but if true, then branching will become better than compares once ~10 instructions are skipped on the average pixel.

So, branching performance will depend upon the algorithm, but we already knew that. If the number of instructions skipped is greater than the branching overhead, it will be a performance win. If the number of instructions skipped is smaller, then the developer would be better off with a "execute all branches, choose correct result" algorithm instead. Hopefully compilers will take care of this in the near future.

And as for branching shaders being realtime, note that on the Unreal Engine 3 presentation at the LAN part (video courtesy of DemoCoder), it was stated that the shaders used had ~50-100 instructions. A shader of that length definitely has the capacity to benefit from dynamic branching, if a branch is called for in the algorithm.

The biggest question in my mind is, is there an overhead for static branching in the pixel shader? If so, what is it? Will there be an overhead in changing the constant that determines which branch to take? Or will there be an overhead each time the pixel shader reaches the branch instruction?

elroy · Apr 15, 2004

DemoCoder said:
Its separate, atleast according to some people I talked to at the event.

Yeah, it is separate. That what it said in Dave's preview anyway.

KimB · Apr 15, 2004

elroy said:
Yeah, it is separate. That what it said in Dave's preview anyway.

Ah, thanks. I looked over so many reviews, I didn't have time to go through every page of every one. I looked at that page, and it was very informative. It looks like the video processor is of integer precision. So, I guess I probably won't use it if an MPEG-4 codec is released that utilizes that power: too low in precision....I'm a freak when it comes to maximum quality on my DIVX encodes.

DemoCoder · Apr 15, 2004

Integer codecs don't neccessarily mean a loss of quality.

KimB · Apr 15, 2004

DemoCoder said:
Integer codecs don't neccessarily mean a loss of quality.

Well, since I don't really know the math that goes into a MPEG-4 encode, I can't really give an intelligent comment on that. All I can say is that since when I'm encoding a movie, any small errors that creep up will be there for good, so I'd rather let my computer encode for a few hours while I'm gone or sleeping than deal with slightly lower quality.

Now, I guess I will have to say that it would be awesome for my TV card. If nVidia releases a codec that allows realtime MPEG-4 encoding on my CPU with a GeForce 6x00, then I would definitely put it to use.

Pete · Apr 15, 2004

Two nice Far Cry SM3.0 shots.

Ardrid · Apr 15, 2004

So much for PS3.0 not making a difference.

Dave Baumann · Apr 15, 2004

Is that "PS3.0 making a difference" or "What the application is doing makingf a difference"?

Ardrid · Apr 15, 2004

Well, what do you think, Dave?

Anyone else wondering if 6800 Ultra is really the 6800?

John Reynolds

Ecce homo

DemoCoder

Pete

Moderate Nuisance

KimB

DemoCoder

KimB

KimB

DemoCoder

bloodbob

Trollipop

Joe DeFuria

bloodbob

Trollipop

KimB

elroy

KimB

DemoCoder

KimB

Pete

Moderate Nuisance

Ardrid

Dave Baumann

Gamerscore Wh...

Ardrid

Similar threads