The Rydermark Thread (TM)

trinibwoy · Jul 4, 2006

_xxx_ said:
Didn't know about GFFX supporting it though. I suppose it was slow as a**, maybe just somehow emulated in SW?

No it was in hardware and was one of the major reasons for NV30's lackluster performance IIRC.

_xxx_ · Jul 4, 2006

geo said:
Edit: Inq broke the existance of this new benchmark here: http://www.theinquirer.net/default.aspx?article=31799 So I think it reasonable to assume Faud has been in contact with them for some time.

Maybe Gosub60?

EDIT: forget it, looks like it's for cell phones and just also called Sahara

Geo · Jul 4, 2006

trinibwoy said:
Yeah I can see that perspective. But would you really like to set that precedent for a publication like the Inquirer that trades in conspiracy and rumor? As a PR manager reading that Inquirer piece what would your reaction be? Would you really want to validate that drivel with a response?

If I was NV's PR manager, and had already mostly decided (or had decided for me by company policy) not to allow/encourage some key engineers to be regular participants at some place like B3D (tho we've clearly gotten a bit more of that from NV in the last year, so good-oh on them for that), then what I'd be looking to do is to create a job that sits between PR and engineering, to be able to address this kind of thing quickly and knowledgeably without tripping over the tech details inadvertently and possibly unnecesarily faning the flames. It would be 70% tech and 30% PR/community relations. Wavey would have been a good choice for something like that.

trinibwoy · Jul 4, 2006

If I were a PR manager this thing wouldn't even register on my radar. The notion that such an article merits a response is something I don't agree with and everything else follows from there I guess. If I were Dave I'd be insulted that you think someone of my talents should be tasked with responding to this kind of junk

How do you respond to an accusation that basically says "Somebody told me that you did something bad" ? There is nothing presented in the article to refute. Vysez already took care of that in his first post in this thread!

I must admit though, I can't wait for Inquirer's follow up if it ever comes.

Tim Murray · Jul 4, 2006

"NVIDIA cards are locked at FP16 lawls!" Er, I get the feeling that given today's shader-intensive games, we'd be able to, y'know, see rampant artifacting in just about any screenshot when it was compared to an X1900 screenshot. Why am I the first person to think of this?

We have proof that this is complete and utter nonsense. One of the main reasons FP16 was preferred to FP32 so much on the NV3x was because of the gigantic register usage penalties, which became twice as evident when FP32 was used. This was improved significantly in NV40 (although it still existed) and then improved again over that in G70. There's no reason to use _pp only.

Oh jesus how is _pp running faster than not _pp aren't they using _pp all the time

PS: First person to say "Clearly they're forcing FP16 in Rydermark so they beat the R580!" gets punched in the face. Don't be ridiculous.

Geo · Jul 4, 2006

I dunno, Trini. I'd like to know, of course.

From what we know of how things would typically work with a new benchmark under development, wouldn't you expect that NV devrel has already heard from these Rydermark people? And NV PR could shoot off a "wtf?" in that direction to see if some something or other complaint/query has already surfaced from them? In other words, the response may already exist inside NV somewhere. And email is cheap. We know the circle of web editors gets *a lot* of email from both companies, and some of it on pretty minor stuff. Neither side seems hesitant to complain to the webbies when they feel they've been mistreated.

And if NV devrel *hasn't* heard from the Rydermark guys on this, then that also is a relevant fact that they'd want to get out. Along the lines of "If they had an issue they needed some more info on or addressing, then why the hell didn't they come to us?"

Edit: Worth noting it is a national holiday in the US today, of course.

Farid · Jul 4, 2006

The Baron said:
"NVIDIA cards are locked at FP16 lawls!" Er, I get the feeling that given today's shader-intensive games, we'd be able to, y'know, see rampant artifacting in just about any screenshot when it was compared to an X1900 screenshot. Why am I the first person to think of this?

Maybe because you didn't read the first reply to this thread?

Jawed · Jul 4, 2006

A fair way down this long page (3/4) you can see that FP16 still has the potential to make a massive performance difference with G71:

http://www.digit-life.com/articles2/video/g71-part2.html

in the: Cook-Torrance, Parallax Mapping and Frozen Glass shaders, the latter two regardless of the use of the arithmetic- or texturing-intensive versions of the shaders.

That article is a few months old now, but at least indicates that the driver wasn't forcing _PP on the code being executed.

Also worth noting that there were no _PP shenanigans in 3DMk06 - shaders that clearly used a mix of _PP and full precision didn't elicit anomalous IQ differences twixt ATI and NVidia - so I think it's fair to say that in that instance NVidia prolly wasn't laying on the _PP with a trowel in the driver.

Jawed

Tim Murray · Jul 4, 2006

trinibwoy said:
No it was in hardware and was one of the major reasons for NV30's lackluster performance IIRC.

You recall incorrectly. NV30's lackluster performance was primarily due to:

1. FX12 ALUs instead of floating point ALUs.
2. 128-bit memory bus.
3. 4x2 pipeline configuration.
4. Ridiculous register usage penalties.

1 and 2 were fixed in NV35, but 3 and 4 were insurmountable until NV40. But let's not mention NV3x or GFFX ever again. (Funny how as soon as somebody says "NVIDIA cheats" we go back to the GFFX days...)

Mariner · Jul 4, 2006

Isn't it just likely to be a bug of some sort in NV's shader compiler? Either that or a game optimisation is being incorrectly applied to the new benchmark thingy?

Geo · Jul 4, 2006

The below is only tangentially involved with whatever it is (still unclear) Inq is pointing at. I'm not suggesting this is what's happening. Particularly given Jawed's excellent point showing that full precision is still biting G71 in spots. Automating _pp has just always been a fascination of mine.

We know NV has had three years to push around _pp. I wonder if it might be technically doable with some extended effort over that time to come up with an automated algo in the driver to reliably predict (> 95%) some subset of shaders that could be forced to _pp? If a shader fails whatever your algo is looking for you leave it alone. That's key. The "reliably predict" part is just so far as being accurate when the algo *does* decide this particular shader can safely be _pp'ed. Then the question would be what percentage of shaders it is finding it can do that with.

The other question would be, does doing so real-time on-the-fly cause a greater performance hit from the algo than the FP32 performance hit? What if you did it at level load time instead of shader run time? Does multi-cpu come in here as an enabler to make this practical?

If NV had such an algo, and it was very good (> 95% reliability), would that count in your book as an opt or a cheat? I think I'd call it an opt, tho like any other opt I'd want the ability to turn it off.

MDolenc · Jul 4, 2006

Jawed said:
A fair way down this long page (3/4) you can see that FP16 still has the potential to make a massive performance difference with G71:

http://www.digit-life.com/articles2/video/g71-part2.html

in the: Cook-Torrance, Parallax Mapping and Frozen Glass shaders, the latter two regardless of the use of the arithmetic- or texturing-intensive versions of the shaders.

That article is a few months old now, but at least indicates that the driver wasn't forcing _PP on the code being executed.

Don't forget the free FP16 normalize...

Jawed · Jul 4, 2006

It wouldn't be computationally expensive to make the driver force _PP on ALL instructions. Don't forget that shader code is always "compiled" by the driver in realtime, anyway.

I can see a case for hypothesising that the driver does indeed force _PP - NVidia's driver team can then create game profiles that identify exceptions - or provide explicit shader code to be substituted (i.e. mixed _PP and normal precision).

But the iXBT/Digit-Life evidence implies that if there is driver-forcing, it's new.

So, right now I find it hard to believe that this dev team is losing a battle with _PP. But hey, there are way more qualified peeps around here, so let's wait for them...

Jawed

Jawed · Jul 4, 2006

MDolenc said:
Don't forget the free FP16 normalize...

Which I think was the heart of the problem with ambient lighting in D3, wasn't it?

NVidia was forcing the free FP16 normalise (shader replacement) in preference to a cubemap lookup.

http://www.beyond3d.com/forum/showthread.php?t=23195

Jawed

Geo · Jul 4, 2006

Jawed said:
It wouldn't be computationally expensive to make the driver force _PP on ALL instructions. Don't forget that shader code is always "compiled" by the driver in realtime, anyway.

Well, we know that isn't happening. The screenshots and benchmarks, as you and Baron and Vy pointed out, would have uncovered that long ago.

I can see a case for hypothesising that the driver does indeed force _PP - NVidia's driver team can then create game profiles that identify exceptions - or provide explicit shader code to be substituted (i.e. mixed _PP and normal precision).

But the iXBT/Digit-Life evidence implies that if there is driver-forcing, it's new.

Yeah, I don't see that one either. New game with some complex shaders comes out that's not in the approved list and the drivers haven't caught up, game over.

MDolenc · Jul 4, 2006

Jawed said:
Which I think was the heart of the problem with ambient lighting in D3, wasn't it?

NVidia was forcing the free FP16 normalise (shader replacement) in preference to a cubemap lookup.

http://www.beyond3d.com/forum/showthread.php?t=23195

Jawed

That doesn't mean that FP16 normalize is broken or doesn't have enough precision. It says clearly "That works on all surfaces except ambient ones where our cubemap isn't a normal cube map." FP32 normalize would also broke that simply becouse normalization cubemap was not a normalization cubemap.

Jawed · Jul 4, 2006

I brought that up as an instance of shader replacement - I wasn't trying to make a point about _PP or _PP performance, per se.

Jawed

KimB · Jul 4, 2006

geo said:
The below is only tangentially involved with whatever it is (still unclear) Inq is pointing at. I'm not suggesting this is what's happening. Particularly given Jawed's excellent point showing that full precision is still biting G71 in spots. Automating _pp has just always been a fascination of mine.

We know NV has had three years to push around _pp. I wonder if it might be technically doable with some extended effort over that time to come up with an automated algo in the driver to reliably predict (> 95%) some subset of shaders that could be forced to _pp? If a shader fails whatever your algo is looking for you leave it alone. That's key. The "reliably predict" part is just so far as being accurate when the algo *does* decide this particular shader can safely be _pp'ed. Then the question would be what percentage of shaders it is finding it can do that with.

Well, while it might be possible, I doubt that nVidia would ever do it, because it would significantly lengthen shader compile time. Modern games use a very large number of shaders, making shader compile time somewhat important.

Geo · Jul 4, 2006

Chalnoth said:
Well, while it might be possible, I doubt that nVidia would ever do it, because it would significantly lengthen shader compile time. Modern games use a very large number of shaders, making shader compile time somewhat important.

Well, this is why I mentioned multi-core cpu as a possible enabler.

Edit: Oh.

I guess I never really thot of where this "compile" is happening. . .or rather old skewled myself into thinking it must be on the cpu . . .but I guess I really don't know that. Where are shader compiles done?

Demirug · Jul 4, 2006

geo said:
Well, this is why I mentioned multi-core cpu as a possible enabler.

Edit: Oh. I guess I never really thot of where this "compile" is happening. . .or rather old skewled myself into thinking it must be on the cpu . . .but I guess I really don't know that. Where are shader compiles done?

In the driver and yes it can be offloaded to a second CPU Core.

The Rydermark Thread (TM)

trinibwoy

Meh

_xxx_

Geo

Mostly Harmless

trinibwoy

Meh

Tim Murray

the Windom Earle of mobile SOCs

Geo

Mostly Harmless

Farid

Artist formely known as Vysez

Jawed

Tim Murray

the Windom Earle of mobile SOCs

Mariner

Geo

Mostly Harmless

MDolenc

Jawed

Jawed

Geo

Mostly Harmless

MDolenc

Jawed

KimB

Geo

Mostly Harmless

Demirug

Similar threads