r420 may beat nv40 in doom3 with anti-aliasing

Pete · Jun 19, 2004

pat777 said:
At this point (things may change), I'm expecting that Splinter Cell - X will only support SM 1.1 and SM 3.0 when it comes out.

Click to expand...

It's not certain but it's likely. If Splinter Cell X doesn't support SM 2.0 R420 will have to use the 1.1 version.

Hmmm, are they coding it to Xbox 1 and 2? In that case it would appear ATi has a SM3.0 part.

Testiculus Giganticus · Jun 19, 2004

I beg to differ there.Any ps 3.0 code can be made into 2.0, something you could not do with ps 1.4 and 1.1.Certainly, you can`t magically turn one into another, but it`s not the toughest thing ever either.

radar1200gs · Jun 19, 2004

Don't worry about Fuddie duddie.

The only PHD he has is in flaming people.

ChrisRay · Jun 19, 2004

FUDie said:
radar1200gs said:

Albuquerque said:

Ok, so just one more post:

Performance is no substitute for core features. See: 3dfx, Ruby.

Click to expand...

I would surely agree, if you're lacking features that are mainstream -- please show me a mainstream SM3 application, or an announced SM3 application that has no fallback to PS 1.x (let alone PS 2.x)

And in reverse, featureset is no excuse for poor performance either

Click to expand...

The thing is SM3.0 will allow you to fallback, but something like PS1.4 won't.

Click to expand...

PS 1.4 allows just as much of a fallback to PS 1.x as PS 3.0 does for PS 2.0: i.e. none whatsoever. The developer has to explicitly code the shaders and app to allow the fallback.

Wrong again, Radar.

-FUDie

Maybe he is suggesting something different. For example several Shader 3.0 Operations can fit into Shader 2.0 code,

And Shader 2.0 can sometimes fallback to Shader 1.4 Code.

But many cases Shader 1.4 cant fall back to 1.1.

I could be completely wrong but the way Skuzzy described it to me. Most Shader 2.0 Operations can be done in Pixel Shader 1.4 (Current Shader 2.0 Ops) Albeit with less precision. But still single pass.

But many Shader 1.4 operations cant be done in Shader 1.1 Without multiple passes. Maybe he's assuming Shader 3.0/2.0 would be similar.

radar1200gs · Jun 19, 2004

SM3.0 is a superset of SM2.0

Pixelshader wise almost everything in SM3.0 can be done with extra effort in SM2.x

The limitations bite harder in the vertex shaders because of vertex texturing, but even without that there are plenty of VS programs you could run on NV3x and NV40 than R3xx and R42x would not be capable of executing because they rely on the CPU for looping and branching.

KimB · Jun 19, 2004

radar1200gs said:
The limitations bite harder in the vertex shaders because of vertex texturing,

But the really big difference is the FP blending/filtering.

radar1200gs · Jun 19, 2004

Yes, I chose to stick specifically with shader related examples because of how my earlier comments were attacked and I've already argued with them about FP filtering and HDR before.

KimB · Jun 19, 2004

Well, personally I don't think the two should be separated. Unless/until we see some SM3 hardware that doesn't support FP filtering/blending, the features might as well be linked together.

Dave Baumann · Jun 19, 2004

Chalnoth said:
Well, personally I don't think the two should be separated. Unless/until we see some SM3 hardware that doesn't support FP filtering/blending, the features might as well be linked together.

No they are not.

DemoCoder · Jun 19, 2004

Yeah, but Chalnoth's point is that defacto they are. Lots of features aren't really linked per spec, but controlled by caps bits, but it make sense for developers to target self defined profiles which include a bag of features and level of performance.

e.g. "shader X only runs on HW which meets the following criteria: SM a, and fillrate performance of B pixels/s) Otherwise, you have to handle a combinatorial explosion of features.

If the next chip which ATI and others ship, which supports FP blending, also happens to support SM3.0, then for all intents and purposes, all SM3.0 HW is FP blending hardware. It makes little difference that "in theory" an SM2.0 chip could have FP blending, because developers can only work with HW that is available in the market, and they are not likely to write code for which they cannot test the code path on any available hardware.

Technically, I could check D3DPS20CAPS for gradient, predicates, >64 instructions, etc on a SM2.0 chip. There is nothing in DX that says for example, that you can't support higher limits, but in practice, does anyone bother? No, and there's even HW in existence to justify it.

The simple fact of the matter is, until other SM3.0 HW hits the market, developers are effectively going to start with the premise that FP blending is available in addition to SM3.0. If some other vendor ships SM3.0 hardware which bifurcates the market, then they'll have to find a way to fall back.

Basically, developers *want* to assume that next-gen SM3.0 HW treats floating point as a "first class citizen" with as few limitations as possible. It makes development much easier.

radar1200gs · Jun 19, 2004

Point taken Democoder.

Sadly though, around here you often have trouble enough getting one concept at a time to sink into certain heads, let alone several at once.

Dave Baumann · Jun 19, 2004

radar1200gs said:
Point taken Democoder.

Sadly though, around here you often have trouble enough getting one concept at a time to sink into certain heads, let alone several at once.

Oh shut up Greg, you are in no position to talk.

Dave Baumann · Jun 19, 2004

Technically, I could check D3DPS20CAPS for gradient, predicates, >64 instructions, etc on a SM2.0 chip. There is nothing in DX that says for example, that you can't support higher limits, but in practice, does anyone bother? No, and there's even HW in existence to justify it.

Given the software support for PS2.0 I think itâ€™s a bit early to say this as the moment â€“ sure there appear to be no apps that do it at appear to do it at the moment, but then there are few apps at all that support it PS2.0 at all. Given the profiling nature of HLSL I think you are more likely to see developers support compilation to the most capable render target for the hardware in use such that an app that may have larger instruction limits than PS2.0 doesnâ€™t go straight to SM3.0 but can be compiled (possibly single pass) to PS2a / PS2b targets â€“ I believe Futuremark may have already said something along these lines and I assume they are taking their cue from games developers, as they have done in the past.

Plus, given developers generally shied away from PS2.0 at all on the hardware precedence you speak of, I think we can agree that they are unlikely to be using shaders that exceeded base PS2.0â€™s capabilities.

However, as for the initial point Iâ€™m just stating that there is nothing actually tying them together and while 100% of the PS3.0 hardware does support both weâ€™ll have to wait and see what the other 3 (to potentially 5) vendors do as well. Weâ€™ve had the similar situation with Centriod sampling in that initially it was tied to SM3.0 but because one of the shader 2.0 architectures does support it, developers are checking for it and using it separately.

CarstenB · Jun 19, 2004

DemoCoder said:
Technically, I could check D3DPS20CAPS for gradient, predicates, >64 instructions, etc on a SM2.0 chip. There is nothing in DX that says for example, that you can't support higher limits, but in practice, does anyone bother? No, and there's even HW in existence to justify it.

The simple fact of the matter is, until other SM3.0 HW hits the market, developers are effectively going to start with the premise that FP blending is available in addition to SM3.0. If some other vendor ships SM3.0 hardware which bifurcates the market, then they'll have to find a way to fall back.

Basically, developers *want* to assume that next-gen SM3.0 HW treats floating point as a "first class citizen" with as few limitations as possible. It makes development much easier.

Developers that do not check D3DPS20CAPS and assume anything about the hardware/drivers are breaking their own application. If they start to assume based on what is existing now, they are not prepared to deal with new hardware or new drivers for existing hardware.

Just consider Farcry with the 6800 - why is it running the SM1.x path when it should be able to run the SM2.0 just fine and requires a patch to change that behaviour?

Code for the chosen API and use the exposed features. Making special codepaths for different chips will ultimately break the application down the line.

radar1200gs · Jun 19, 2004

DaveBaumann said:
radar1200gs said:

Point taken Democoder.

Sadly though, around here you often have trouble enough getting one concept at a time to sink into certain heads, let alone several at once.

Click to expand...

Oh shut up Greg, you are in no position to talk.

How about you? I've gotten several choice replies from you where you are clearly wrong.

Perhaps if you started getting serious about keeping the forums serious we wouldn't have fanboys wreaking havok in the forums and arguing nonsense in the first place.

However your bias towards one company (that you claim doesn't exist) obviously stops you from doing that.

John Reynolds · Jun 19, 2004

radar1200gs said:
Perhaps if you started getting serious about keeping the forums serious we wouldn't have fanboys wreaking havok in the forums and arguing nonsense in the first place.

Words_fail_me.

If people would just be fans of technology, period!, rather than only technology if and when it's delivered by their preferred company, boards such as B3D's would be a much more enjoyable, productive resource.

But, yes, IMO Dave needs to break out the ban stick and remove a handful of fuckwits from this board. And you, sir, should be at the top of that list.

radar1200gs · Jun 19, 2004

And the grounds for banning me would be :?:

glw · Jun 19, 2004

Some data.

ATI Ashli 1.4.0.0 Renderman shaders compiled to OpenGL ARB_fragment_program
and ARB_vertex_program.

The NV30 and NV40 targets seem to produce the same output as R300, so
I've omitted them. I'm more interested in the comparison between R300
and R420. This isn't every shader, I picked out some of those that
require mutliple passes with the R300 family.

Chip - Program - Passes - Total instructions
------------------------------------------------
R300 - Brushedmetal - 2 Passes - Instructions Pixel: 117, Vertex: 57
R420 - Brushedmetal - 1 Pass - Instructions Pixel: 89, Vertex: 28

R300 - FakeFur - 4 Passes - Instructions Pixel: 188, Vertex: 71
R420 - FakeFur - 1 Pass - Instructions Pixel: 143, Vertex: 19

R300 - Flame - 3 Passes - Instructions Pixel: 124, Vertex: 30
R420 - Flame - 2 Passes Instructions Pixel: 189, Vertex: 27

R300 - SpiderWeb - 4 Passes - Instructions Pixel: 119, Vertex: 37
R420 - SpiderWeb - 1 Pass - Instructions Pixel: 101, Vertex: 8

R300 - Batsignal - 4 Passes - Instructions Pixel: 197, Vertex: 54
R420 - Batsignal - 1 Pass - Instructions Pixel: 188, Vertex: 21

R300 - Uber - 4 Passes - Instructions Pixel: 196, Vertex: 60
R420 - Uber - 1 Pass - Instructions Pixel: 149, Vertex: 21

To me this appears to be quite promising, in particular the number
of vertex program instructions. I've no idea what the actual
performance is like, but eliminating passes probably is more
advantageous than reducing the instruction count.

All in all the R420 appears to be a far more capable shader
architecture, independent of the number of pipelines and the
differences in memory bandwidth and clockspeed.

John Reynolds · Jun 19, 2004

radar1200gs said:
And the grounds for banning me would be

Cough:

Perhaps if you started getting serious about keeping the forums serious we wouldn't have fanboys wreaking havok in the forums and arguing nonsense in the first place.

But the communicative disconnect here is, I'm quite certain, a situation that Reverend once described rather aptly: "The problem with fanboys is that they don't realize they are just that" <paraphrased>.

Dave Baumann · Jun 19, 2004

glw said:
To me this appears to be quite promising, in particular the number of vertex program instructions. I've no idea what the actual performance is like, but eliminating passes probably is more advantageous than reducing the instruction count.

All in all the R420 appears to be a far more capable shader architecture, independent of the number of pipelines and the differences in memory bandwidth and clockspeed.

If you can check the Ashli shader code you might want to check if any of the vertex programs are using SIN COS as these will be handled natively by R420 and would have to be achieved by multiple instruction macro's in R300 (I've also been told that the VS compiler already uses these instructions natively for R420 in the drivers).

r420 may beat nv40 in doom3 with anti-aliasing

Pete

Moderate Nuisance

Testiculus Giganticus

radar1200gs

ChrisRay

<span style="color: rgb(124, 197, 0)">R.I.P. 1983-

radar1200gs

KimB

radar1200gs

KimB

Dave Baumann

Gamerscore Wh...

DemoCoder

radar1200gs

Dave Baumann

Gamerscore Wh...

Dave Baumann

Gamerscore Wh...

CarstenB

radar1200gs

John Reynolds

Ecce homo

radar1200gs

glw

John Reynolds

Ecce homo

Dave Baumann

Gamerscore Wh...

Similar threads