AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
can you tell me if I can use the CF function with the RX and a 7850?




As far as I know AC in pascal is still disable. and some says that the Nvidia support for DX11 is so good that they couldnt get better performance out if DX12 but that are rumours without data to back them up.
I doubt it as the 7850 aren't in the mainline driver profile anymore. But the 290 is GCN as is the RX
 
well the 7850 is GCN also but the first iteration. I'm really thinking in getting the 480, I still play with an 7850 since I play at 1080p and I don't mind playing with the settings to play at 60 or 30FPs but now im playing Paragon and my card just can't handle it even at 720p all low no shadows...The game is not yet fully optimized but I feel the need to be able to play at at full and the 480 since to be able to do.(or a 970/980 or 380/390).
 
+
well the 7850 is GCN also but the first iteration. I'm really thinking in getting the 480, I still play with an 7850 since I play at 1080p and I don't mind playing with the settings to play at 60 or 30FPs but now im playing Paragon and my card just can't handle it even at 720p all low no shadows...The game is not yet fully optimized but I feel the need to be able to play at at full and the 480 since to be able to do.(or a 970/980 or 380/390).
you'd see a decrease in performance i'd wager although there are tests where they used a radeon 290x and the apu graphics and saw speed ups it think it would be very little and just a better idea to jump completely to a rx
 
+

you'd see a decrease in performance i'd wager although there are tests where they used a radeon 290x and the apu graphics and saw speed ups it think it would be very little and just a better idea to jump completely to a rx

let say, that mixing generations, memory capacity and performance on CFX/ SLI have never really got good result. memory capacity are reduced to the lower capacity one, and in CFX, the faster gpu will allways finish faster to render the frame and so the second become nearly not used. It should work, but not much to expect from it in term of performance.

The only time i have mix different gpu's was with the X1950XTX (GDDR4 ) and the X1900XTX ( GDDR3 ) with excellent result, but both gpu's was OC to the same core speed, and the memory capacity was the same outside the speed of it.

Basically if both gpu's have close performance ( let say a 290 and 290x ), with same memory capacity, ofc, this is working well..
 
Yes the nda ends the 29.

Well talking about cf I was referring to the new functionality of dx12, explicit multi gpu? So I can keep the 7850 doing post processing or adding effect while the new card takes the hard work. But I don't know if it will work on non-dx12 gpus.

Enviado desde mi HTC One mediante Tapatalk
 
You will need title-by-title developer support for this. And yeah, both GPUs have to understand and speak DX12-API.
 
I still don't see how this contradicts what I said. I never said that Nvidia absolutely doesn't support async. We're talking about which one is better at it. The AMD gpu has ACE's which are apparently important for async but that's all I know.
So, from the second quote posting the relevant part with a bit of bolding on my part:
From an API point of view, async compute is a way to provide an implementation with more potential parallelism to exploit.It is pretty analogous to SMT/hyper-threading: the API (multiple threads) are obviously supported on all hardware and depending on the workload and architecture it can increase performance in some cases where the different threads are using different hardware resources. However there is some inherent overhead to multithreading and an architecture that can get high performance with fewer threads (i.e. high IPC) is always preferable from a performance perspective.

When someone says that an architecture does or doesn't support "async compute/shaders" it is already an ambiguous statement (particularly for the latter). All DX12 implementations must support the API (i.e. there is no caps bit for "async compute", because such a thing doesn't really even make sense), although how they implement it under the hood may differ. This is the same as with many other features in the API.
[...]
Without that context you're effectively in the space of making claims like "8 cores are always better than 4 cores" (regardless of architecture) because they can run 8 things simultaneously. Hopefully folks on this site understand why that's not particularly useful.

Let me try and rephrase it: With AC, you get a higher percentage of your theoretical TFLOPS throughput in, say, games' frames per second. Given that card, driver, API and application use this feature. What could be better than that, right?

Maybe, if you got this higher percentage of your TFLOPS regardless of API and application?

Hence the analogy to hyperthreading. It's good that it's there, but it can only exploit bubbles in the execution pipeline. If there weren't any bubbles, hyperthreading would yield nothing at best. And it has the requirement that there's enough multithreading going on. In a single-thread application, hyperthreading couldn't do squat, regardless of how many bubbles of emptiness are in the pipeline.
 
Last edited:
So, from the second quote posting the relevant part with a bit of bolding on my part:
You can easily turn this argument around. There is also some inherent (hardware) "overhead" for creating an architecture providing a high performance with fewer threads. And as SMT/hyperthreading provides generally a performance uplift for throughput tasks, it is basically always preferable from a performance perspective, even on design A delivering a higher performance with a low number of threads than another design B. It's basically independent. If the integration is worth the effort on design A or design B, is another question. It is basically a similar question as, if the effort for implementing changes to design B so it gets the performance characteristics with a low amount of threads of design A is worth it ;).
 
So, from the second quote posting the relevant part with a bit of bolding on my part:


Let me try and rephrase it: With AC, you get a higher percentage of your theoretical TFLOPS throughput in, say, games' frames per second. Given that card, driver, API and application use this feature. What could be better than that, right?

Maybe, if you got this higher percentage of your TFLOPS regardless of API and application?

Hence the analogy to hyperthreading. It's good that it's there, but it can only exploit bubbles in the execution pipeline. If there weren't any bubbles, hyperthreading would yield nothing at best. And it has the requirement that there's enough multithreading going on. In a single-thread application, hyperthreading couldn't do squat, regardless of how many bubbles of emptiness are in the pipeline.
So you're saying async is like a fix to already existing problems that Nvidia doesn't have. The 'bubbles'. But if that's the case why would Nvidia gpu's perform worse with async on than it does when it's off.
 
The "bubbles" aren't the same "bubbles" as AMD GPU's. ACE's don't mean much to async compute, they might be important to AMD hardware, but Intel and nV's architectures, don't need them, as they use different command processors to do the same things as the different command units in GCN.
 
So you're saying async is like a fix to already existing problems that Nvidia doesn't have. The 'bubbles'.
There are occasional pipeline bubbles on nV GPUs, too. The question is, how common are they and how much could they profit from running compute shaders simultaneously on the same SMs to fill some of them. They would benefit from this capability, I'm sure about it. Maybe not as much as Radeons, but they would benefit, too.
The "bubbles" aren't the same "bubbles" as AMD GPU's. ACE's don't mean much to async compute, they might be important to AMD hardware, but Intel and nV's architectures, don't need them, as they use different command processors to do the same things as the different command units in GCN.
The command processors have not much to do with the pipeline bubbles within the SMs. There is an untapped performance potential in case of nV GPUs.
 
You can easily turn this argument around. There is also some inherent (hardware) "overhead" for creating an architecture providing a high performance with fewer threads.
I think there is room for a lot of nuance here.
There is a debate from a more workload or algorithmic form of overhead, and then there's hardware overhead. The hardware is to a significant extent engineered as much as possible to be a "so?" kind of overhead where it is only permitted so many nanoseconds or so many mm2 or so much energy out of a budget that scales somewhat fitfully.

The "overhead" for high performance in one thread is a known area of seriously diminishing returns.
However, the debate between "fewer" and "more" threads, since these GPUs through much of their processing are very SMT is much less clear cut in the middle with very clear downsides if you wander into "too many" since caches/interconnects/memory controllers/control can thrash or congest in ways where serialization turns out to be preferable, and diminishing returns on the parallel resources' contribution to hardware footprint (if the front end is highly parallel, then there is generally a back end broad enough to support it).
The SMT analogy taken from a CPU context breaks down because so much of this is so vastly parallel in the back end while the front end pipelines so deeply that internally it turns into a question of how well two different resource types are load-balanced and how much of the overall scalar component each one contributes if trying to apply Amdahl's law to parallel systems that are aside from specific points are very close in overall parallelism.

So you're saying async is like a fix to already existing problems that Nvidia doesn't have. The 'bubbles'. But if that's the case why would Nvidia gpu's perform worse with async on than it does when it's off.
Some of the tweets and back and forth over AOTS seem to indicate that there is an additional device check besides the async flag in the in that is effectively disabling it for Nvidia anyway.
The game is also rather variable, although the minor loss seems to be mostly consistent for whatever reason.
At least some of the 1080 results actually made it a wash or vacillated with tiny losses or gains, and one answer about this indicated that Pascal actually might have architectural quirks that benefit from some of the changes in the game's behavior with async on despite the device check.
However, the messaging on this has been inconsistent.
If there were a demerit for AOTS as a DX12 benchmark, or as an experimental tool in my opinion, it's this sort of potential non-orthogonality and inability to really control for factors the knobs are labeled for. Some of the confused discussion about it also makes me uncertain how the different paths structured, and if they are comparable. Having a flag for async that can be overridden by the software is one thing, but that it might be overridden imperfectly makes it seem like the innards are a bit "leaky" for drawing conclusions--particularly if we find out that different vendors/chips behave unexpectedly in different ways (and they have).
 
Sorry to keep harping on, but you're all saying I should stop fretting about Async when looking at the 1070 or 1080? Because if that's the case, it's hard not to get the Nvidia card since they generally have way better past-proofing with DX11 and they won't be behind on future-proofing with DX12.
 
With the limited DX12 games out so far, and what we know about those games and who put more support into dev support, we can't really get a good picture of what Pascal can do with async compute yet.
 
Last edited:
Status
Not open for further replies.
Back
Top