More info about RSX from NVIDIA

Rockster said:
Is this referring to free norm, or something else?
Probably. But there should be even more stuff, like free formats conversion, swizzling, masking, etc..
 
2005-6-19-22-42-35-972964216.gif


2005-6-19-22-42-35-643510699.gif


does this make sense?

""I crunched the numbers for anyone who's interested. I don't have the time to evaluate wht they all mean right now..... go for it

VS = Vertex Shader
PS = Pixel Shader

Instructions/ALU

RSX = 2 per VS x 8 VS = 16
RSX = 5 per PS X 24 PS = 120

Xenos = 2 per ALU X 48 ALU = 96

Operations / ALU

RSX = 5 per VS X 8 VS = 40
RSX = 10 per PS X 24 PS = 240

Xenos = 5 per ALU X 48 ALU = 240

FLOPS / ALU

RSX = 10 per VS X 8 VS = 80
RSX = 27 per PS X 24 PS = 648

Xenos = 10 per ALU X 48 ALU = 480

Instructions / clock

RSX = 16 per clock per VS X 8 VS = 128
RSX = 120 per clock per PS X 24 PS = 2880

Xenos = 96 / ALU X 48 ALU = 4608

Operations / clock

RSX = 40 per clock per VS X 8 VS = 320
RSX = 240 per clock per PS X 24 PS = 5760

Xenos = 240 per clock per ALU X 48 ALU = 11,520

FLOPS / clock

RSX = 80 per clock per VS X 8 VS = 640
RSX = 648 per clock per PS X 24 PS = 15,552

Xenos = 480 per clock per ALU X 48 ALU = 23,040

Instructions / second

RSX = 8.8B per VS per second X 8 VS = 70.4
RSX - 66B per PS per second X 24 PS = 1584

Xenos = 48B per second per ALU X 48 ALU = 2304

Operations / second

RSX = 22B per second per VS X 8 VS = 176
RSX = 132B per second per PS X 24 PS = 3168

Xenos = 120B per second per ALU X 48 ALU = 5760

Floating Point Operations / Second

RSX = 44B per second per VS X 8 VS = 352
RSX = 356B per second per PS X 24 PS = 8544

Xenos = 240B per second per ALU X 48 ALU = 11,520

But just remember. You can't compare RAW numbers of a standard shader stright across to a Unified shader as Unified shaders are FAR more efficient. The difference is that in a real world gaming scenario, a standard shader doesn't have a chance at actually attaining those figures...... a Unifed shader can."
 
I know but Nvidia adds significant flops for norm. ATI says 33% more in these units, so 480 x 33% ~ 158 additional flops?

Do we know if the G70 vertex units have direct access to filtered textures?
 
dukmahsik, you don't multiply by the number of alu's since that has already been done. That whole post is wrong. Sorry.
 
dukmahsik said:
But just remember. You can't compare RAW numbers of a standard shader stright across to a Unified shader as Unified shaders are FAR more efficient.
Is it not poignant at this point to raise that Unified Shaders are theorectically 'far' more efficient? I don't doubt they are, but it's an unproven tech AFAIK. It needs trials on the field to determine how well they perform relative to fixed-function shaders.
 
Shifty Geezer said:
dukmahsik said:
But just remember. You can't compare RAW numbers of a standard shader stright across to a Unified shader as Unified shaders are FAR more efficient.
Is it not poignant at this point to raise that Unified Shaders are theorectically 'far' more efficient? I don't doubt they are, but it's an unproven tech AFAIK. It needs trials on the field to determine how well they perform relative to fixed-function shaders.

Sadly those trials in the field won't happen until R600 hits the streets, prolly. Might have to wait 18 months :!:

Jawed
 
Is it not poignant at this point to raise that Unified Shaders are theorectically 'far' more efficient? I don't doubt they are, but it's an unproven tech AFAIK. It needs trials on the field to determine how well they perform relative to fixed-function shaders.
In theory, you can point to the fact that while RSX would have some 44 GFLOPS dedicated to vs and 356 GFLOPS dedicated to ps, you can say that any fraction of R500's 240 GFLOPS can be for either vs or ps. And it is generally true that between vertex and pixel shaders, vertex shaders tend to be far larger.
 
Ultrashadow 2 ?

Will this help ? I haven't seen any article with Ultrashadow feature investigated thoroughly. Is it all it touted to be ?
 
V3 said:
Ultrashadow 2 ?

Will this help ? I haven't seen any article with Ultrashadow feature investigated thoroughly. Is it all it touted to be ?

It might be an issue of it being proprietary to NVidia, hence not widely used. But in a closed box, that kind of issue evaporates..there is no standard.

Doom3 uses it, IIRC, hence one of the reasons it performs so much better on NVidia cards.

ShootMyMonkey said:
And it is generally true that between vertex and pixel shaders, vertex shaders tend to be far larger.

Do you mean the total vertex load typically outweighs the total pixel load? I'm sure that's not the case. There's a reason why pixel shaders have always outnumbered vertex shaders..the load is "typically" weighted toward pixel shading.
 
ERP said:
Exactly how many FLOPs are NVidia counting the free normalise as?
They don't say, but the chart implies 9ops - which would make it the standard DOT4 + RSQRT.

Titanio said:
But in a closed box, that kind of issue evaporates..there is no standard.
True, it's just that this is a feature for accelerating a technique that's kind of a deadend IMO. I'd rather have few more free samples of PCF and no stencil accelerator.
 
Fafalada said:
ERP said:
Exactly how many FLOPs are NVidia counting the free normalise as?
They don't say, but the chart implies 9ops - which would make it the standard DOT4 + RSQRT.

Did I ever mention PSP can do this sequence in just 3 clocks? :oops:

Yes that's what I don't get

7 ops fot the DP, 1 for the RSQ, and 4 for the scale that makes 12 in my book.

Also is it now a free 32 bit Normalise now or still just 16 bit?
 
Titanio said:
Doom3 uses it, IIRC, hence one of the reasons it performs so much better on NVidia cards.

D3 doesn't (well there's an option to turn it on, but it makes practically no difference). D3's preference for NVidia hardware is solely because of the double-rate z-only pass it is capable of.

Jawed
 
Jawed said:
Titanio said:
Doom3 uses it, IIRC, hence one of the reasons it performs so much better on NVidia cards.

D3 doesn't (well there's an option to turn it on, but it makes practically no difference). D3's preference for NVidia hardware is solely because of the double-rate z-only pass it is capable of.

Jawed

I see. I thought I saw it being used as an example under Ultra Shadow II, I'm not sure how it translates into realworld performance gain though. Not sure of many other games using it!
 
Back
Top