NVIDIA Game Works, good or bad?

The AMD version has unbounded memory requirements, which means it can unpredictably fail on complicated scenes. This makes developers somewhat wary of using AMD's OIT algorithm.
Besides the stability improvement, I think the Intel version is likely to have better performance characteristics, as well.
This comment about developers' wariness of using the Per-Pixel Linked List algorithm is unfounded. While TressFX indeed has theoretical unbounded memory requirements it can be controlled in game situations by ensuring the initial allocation is large enough for minimum camera distance and other tricks. On modern GPUs (or even consoles) with large amounts of memory allocating a decent amount (e.g. a couple of hundred MBs) is a reasonable investment for those devs who want the highest hair effect quality in their game. Nonetheless AMD also presented two alternatives for memory-constrained situations: tiled mode and mutex (although those tend to have a performance impact compared to the default solution).

I wouldn't be surprised if a PixelSync variant would run faster on Intel hardware but this is fairly moot point since only Intel supports PixelSync at the moment. Adaptive OIT is also a different algorithm which makes the comparison difficult (still relevant though).
With TressFX 2 we are now in the sub-ms cost for a model at medium-range 1080p resolution on an R280X.
 
This comment about developers' wariness of using the Per-Pixel Linked List algorithm is unfounded. While TressFX indeed has theoretical unbounded memory requirements it can be controlled in game situations by ensuring the initial allocation is large enough for minimum camera distance and other tricks. On modern GPUs (or even consoles) with large amounts of memory allocating a decent amount (e.g. a couple of hundred MBs) is a reasonable investment for those devs who want the highest hair effect quality in their game. Nonetheless AMD also presented two alternatives for memory-constrained situations: tiled mode and mutex (although those tend to have a performance impact compared to the default solution).

I wouldn't be surprised if a PixelSync variant would run faster on Intel hardware but this is fairly moot point since only Intel supports PixelSync at the moment. Adaptive OIT is also a different algorithm which makes the comparison difficult (still relevant though).
With TressFX 2 we are now in the sub-ms cost for a model at medium-range 1080p resolution on an R280X.

Are you seeing interest in TressFX from developers? So far Tomb Raider is the only game that supports it, as far as I'm aware.
 
Are you seeing interest in TressFX from developers? So far Tomb Raider is the only game that supports it, as far as I'm aware.
Definitely. High-quality hair/fur simulation and rendering is a very active area that's currently being evaluated by developers. I would say that it is in the top 3 areas of active research along with Physically Based Rendering and Global Illumination.
I can only mention the games that have publicly announced support for TressFX: Tomb Raider of course, plus Lichdom and Star Citizen.
 
Definitely. High-quality hair/fur simulation and rendering is a very active area that's currently being evaluated by developers. I would say that it is in the top 3 areas of active research along with Physically Based Rendering and Global Illumination.
I can only mention the games that have publicly announced support for TressFX: Tomb Raider of course, plus Lichdom and Star Citizen.

Thanks. I didn't know about Star Citizen.
 
While TressFX indeed has theoretical unbounded memory requirements it can be controlled in game situations by ensuring the initial allocation is large enough

Just a question: you do have virtual memory manager in GPU, as far as I understand, in all GCN-class cards.
What prevents you to just have your CPU interrupt/message/whatever to schedule & send the missing page to the GPU? You would not even need to block hundreds on Mb, or at least block the maximum reasonable for typical usage.
 
This comment about developers' wariness of using the Per-Pixel Linked List algorithm is unfounded. While TressFX indeed has theoretical unbounded memory requirements it can be controlled in game situations by ensuring the initial allocation is large enough for minimum camera distance and other tricks. On modern GPUs (or even consoles) with large amounts of memory allocating a decent amount (e.g. a couple of hundred MBs) is a reasonable investment for those devs who want the highest hair effect quality in their game. Nonetheless AMD also presented two alternatives for memory-constrained situations: tiled mode and mutex (although those tend to have a performance impact compared to the default solution).



I wouldn't be surprised if a PixelSync variant would run faster on Intel hardware but this is fairly moot point since only Intel supports PixelSync at the moment. Adaptive OIT is also a different algorithm which makes the comparison difficult (still relevant though).

With TressFX 2 we are now in the sub-ms cost for a model at medium-range 1080p resolution on an R280X.


OIT is about much more than hair. Perhaps the instability issues can be worked around for TressFX, but all the caveats and tricks make the linked-list algorithm for OIT definitely something to be wary of for the general case.

Pixelsync makes more sense - it's simpler, more flexible, and higher performance. I think we will see more of it in the future.
 
What prevents you to just have your CPU interrupt/message/whatever to schedule & send the missing page to the GPU?
Even if the hardware supports haulting on a page fault like that (it may or may not, I don't know), it's likely to be reeaaaallly slow. Definitely not something you want to be doing in the middle of a frame.

Also as with most things in games, you really want to handle the worst case. If there's a case where you're going to need a huge amount of data, might as well allocate it up front. As Nick mentioned, it usually makes more sense to just allocate a big buffer and then constrain the game/view in ways that avoid the really bad situations if possible (i.e. hair or other fairly localized effects).
 
As Nick mentioned, it usually makes more sense to just allocate a big buffer and then constrain the game/view in ways that avoid the really bad situations if possible
(and, crucially, allocates a fixed amount of memory per pixel rather than across the entire scene). .
Dont those 2 quotes contradict ?

ps:
I think I corrected you on this last time (assuming they are what I think)... these are not Intel CPU-specific features (i.e. AVX2), they are Intel *GPU* features.
I blame codemasters for calling rhe exe with the features GRIDAutosport_avx.exe
 
Dont those 2 quotes contradict ?

why would they? One is a function of pixels, the other not.

Even if the hardware supports faulting on a page fault like that (it may or may not, I don't know), it's likely to be reeaaaallly slow. Definitely not something you want to be doing in the middle of a frame.
well, "virtual memory" should support something like that, at least, I suppose.
For being slow, I do not think it would be much slower than PRT, no? As long as TressFX is not synchronous in the pipeline (aka its in the compute pipe) should be ok, I guess. Ah well, just speculating, didnt check yet the blogged sources pointed before.
 
For being slow, I do not think it would be much slower than PRT, no?
PRT/tiled resources does not allocated pages on the fly - all the allocating/mapping is still done from the CPU between frames, etc. Hitting real page faults or otherwise stopping GPU execution to wait on OS/driver handling/remapping and so on is the larger performance concern.

It's worth noting that what current GPUs call "virtual memory" is not exactly the same thing as CPU virtual memory as it is typically defined and managed in an OS. The broad strokes in terms of there being hardware page tables that map virtual -> physical are similar but a lot of the details are different.

As long as TressFX is not synchronous in the pipeline (aka its in the compute pipe) should be ok, I guess.
The part that does the OIT stuff is standard 3D pipe (i.e. rendering the hair primitives). Only the physics parts are done in compute and even there I'd be concerned about stalling on long-latency stuff like CPU handling of page faults.

@Davros: I'm not sure what you mean... you may have to clarify your question. And regarding Codemasters, yeah that is confusing :) I guess they are assuming that anything that supports pixel synchronization right now also supports AVX which, while true, are unrelated features.
 
Is there any chance that by the release of DX12 hardware, these exclusive options would be available for AMD/NVIDIA GPUs to run?

Actually NVIDIA had the recording feature since more than 6 months ago (it's called ShadowPlay), AMD is yet to play catch up to it on both compatibility and functionality.

its funny but at pax east I sat through a NVidia panel (and won a mouse) and they didn't talk about it once , they talked a lot about playing on that little handheld though.

But that's what i'm talking about , compete with those type of features don't just hinder performance for everyone
 
VCE recording in Sony/MS was not exampled in its usage by AMD, which was NOT able to even test it over hundreds of different settings, right? Come on...
That's irrelevant, it's in a beta stage for a reason, doesn't support all resolutions, inconsistent bit rates or frames, only works with some games, has quality issues in others, no desktop recording option/fps counter .. etc, you can read about it all in here :
http://www.anandtech.com/show/8224/hands-on-with-amds-gaming-evolved-client-game-dvr

its funny but at pax east I sat through a NVidia panel (and won a mouse) and they didn't talk about it once
It's available since Oct 2013, they talked about it a lot in the past.
 
If you don't see how this policy is in direct contradiction to the rhetoric you've been spouting then you're in too deep... And yeah on this point I do have a clear agenda: to point out that you're not backing up your claims about being "open" and such with actual actions. People deserve to know the incongruity with your PR.

Intel is clearly the most open here, and as such I think it's fair to criticize the situations where AMD is not when you guys started this whole openness PR campaign... If you truly cared about people being able to get easy access to the best and most efficient TressFX code you'd put it up on github and allow people to contribute/branch/etc. directly. We do that.
Thank you for this post. I've been a bit annoyed with the whole AMD marketing spiel over the past ~year completely rewriting the meaning of "open".
 
I know I'm late but wanted to say that was an excellent follow-up article at Extremetech. Mantle and Gameworks tackle very different problems and are so different in scope they probably don't belong in the same conversation.

Given past discussions on PhysX I don't quite get why folks are angry that GW runs on all DX11 hardware. Isn't that a good thing? I can think of a few alternative scenarios but their merit is questionable.

1) IHVs stay out of the middleware game completely and let developers fend for themselves. However, that in itself would not guarantee "fairness" or performance parity as developers and middleware providers can also favor one architecture over another. It also doesn't guarantee that developers would include effects of similar quality on their own.

2) nVidia should continue investing millions of dollars in middleware and give it away for free, source code included. That's silly for obvious reasons and there's no precedent for it. As someone mentioned TressFX may be comparable to HairWorks but AMD has nothing on the scale of the entire GameWorks library.

As it stands today there's no proof that GameWorks is anything but good for consumers of all hardware. It frees developer resources to work on things unique to their games instead of reinventing the wheel for commodity effects. In the absence of such proof all the negative spin is just scare tactics.

Now if we see a pattern of GW titles tanking on AMD hardware then there will be something to shout about...
 
There are a lot of companies that contribute significantly to various kinds of open source software. Developing good software that is related to yours or your hardware can be beneficial even if you share it openly.
 
Yes of course. However GW is a full featured product with included tooling and support and millions of dollars in R&D. It's not just a few lines of code on GitHub :)
 
Thank you for this post. I've been a bit annoyed with the whole AMD marketing spiel over the past ~year completely rewriting the meaning of "open".
I didn't go back in the thread long enough to find Andrew's quote so I maybe be taking this out of context, but your response is ironic in it's timing considering AMD recently opened up Mantle to Khronos. I agree the term open has been abused a bit recently though as Mantle wasn't and technically isn't open. A new Khronos API potentially based on Mantle will be open though.
 
I didn't go back in the thread long enough to find Andrew's quote so I maybe be taking this out of context, but your response is ironic in it's timing considering AMD recently opened up Mantle to Khronos. I agree the term open has been abused a bit recently though as Mantle wasn't and technically isn't open. A new Khronos API potentially based on Mantle will be open though.
Sure, and I have nothing against a new Khronos API being described as "open". Because that's what it is. However, I found and continue to find the Mantle marketing as "open" obnoxious, because it is both highly effective (looking at general internet discourse on the topic) and highly misleading - I see nothing ironic about that.
 
Sure, and I have nothing against a new Khronos API being described as "open". Because that's what it is. However, I found and continue to find the Mantle marketing as "open" obnoxious, because it is both highly effective (looking at general internet discourse on the topic) and highly misleading - I see nothing ironic about that.

How is it misleading, when they've made clear the "open"-part isn't happening before the API is ready? Sure, it's 1 company controlling the development of the API and not a consortium like Khronos, but still (and others, as far as I've understood it, can create extensiosn like AMD now has GCN extensions to Mantle)
 
Back
Top