AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
In 2 years the landscape won't change too much..... from a tech point of view, we will be getting new cards around that time, but this gen that just launched will be whats still out there.
With the current integration trend it will if Nvidia doesn't get their CPUs up to par with AMD and Intel for designing MCMs. Low and soon likely even mid-range parts will all be SFF and that's the vast majority of the market.

You forgot to mention that Async is here to fix some of architectural inefficiencies in a first place, those inefficiencies are highly architecture and balance dependent.
Is that really the case though? If developers would just get their act together and always design identical workloads we wouldn't have that problem. So is async really for covering inefficiencies or a method of making hardware more adaptable?

I don't see how the first variant is worse unless you are silicon or bandwidth bound since it's universal and works out of the box everywhere, and I don't see how the second variant is better since it can't be done automatically and has tons of restrictions
The first is worse because it's far more difficult to add transistors to silicon at runtime. There is simply no way to reasonably predict the balance of all the workloads that will be encountered in the real world. Async should provide a handful of techniques to about inefficiencies and simplify development. It really is pushing GPUs more towards multi-tasking than the single core philosophy they currently use. Adapting the the workload encountered than designing for every possible situation beforehand.
 
390X on 160 bucks on ebay incoming? :runaway:

if the RX is as fast or faster than a 390x then I doubt it will sell even at 160. $40 bucks savings but almost twice the power usage , much more noise and so forth.

I'd buy a RX 480 and put an AIO water cooler on it and enjoy whisper silent gaming
 
You forgot to mention that Async is here to fix some of architectural inefficiencies in a first place, those inefficiencies are highly architecture and balance dependent. As always there are several solutions to the same problem, you can decrease low utilization pass time by simply adding more ffp hardware or you can rely upon developers to insert some compute work via Async for you, I don't see how the first variant is worse unless you are silicon or bandwidth bound since it's universal and works out of the box everywhere, and I don't see how the second variant is better since it can't be done automatically and has tons of restrictions
Actually, GCN also allows to swoop in a lightweight high priority task in an otherwise occupied CU without the need to preempt the running wavefronts and save the state of the complete CU (as appears to be the case with Pascal). That works immediately, if the required resources are still free (we are talking about lightweight things to stuff onto the CUs) or at latest if one of the currently running wavefronts terminate (which usually happens at a relatively high frequency). Alternatively, one could think of preempting just a single wavefront (or any required number) to get a new high priority one on there. Generally, it shouldn't be necessary to preempt the whole CU and save the complete state (which is potentially ~350kB per CU).
I imagine there may be some scenarios where this is a nice feature as it potentially reduces the latency of such small lightweight tasks, especially if you have a lot of them and it is hard to bundle them efficiently. NVidia mentioned about 100µs of latency for their preemption feature (but it is unclear, what it includes), that's a pretty long time. If you could get around a preemption in such cases, it would definitely be a win, irrespective of what performance you could get with fewer threads.
Or in other words: The developers may come up with other rendering schemes or algorithms otherwise not performant enough. It adds flexibility.
 
Last edited:
From what I've seen so far, Async justifies its existence because when the developers make proper use of it, the GPU offers a substantially larger potential for delivering a better end result than if it wasn't there, wrt performance per mm^2 or performance per transistor.

IMO, nvidia went around this by pumping up the geometry performance in their GPUs and using gameworks/twimtbp to put developers pushing for geometry instead of compute everywhere possible, so much that they made sure that compute is never a bottleneck in any (GW-infected) game and Fiji is barely any faster than Hawaii.

In the end, this doesn't mean that nvidia is "more efficient" at compute workloads. They just moved the bottleneck to something else in order to make their GPUs perform better.


My greatest criticism of AMD in this is that they should have focused more on boosting up geometry performance during the GCN 1-3 transitions instead of non-revolutionary changes and hoping that developers like Epic would suddenly adopt their way.
 
From what I've seen so far, Async justifies its existence because when the developers make proper use of it, the GPU offers a substantially larger potential for delivering a better end result than if it wasn't there, wrt performance per mm^2 or performance per transistor.

IMO, nvidia went around this by pumping up the geometry performance in their GPUs and using gameworks/twimtbp to put developers pushing for geometry instead of compute everywhere possible, so much that they made sure that compute is never a bottleneck in any (GW-infected) game and Fiji is barely any faster than Hawaii.

In the end, this doesn't mean that nvidia is "more efficient" at compute workloads. They just moved the bottleneck to something else in order to make their GPUs perform better.


My greatest criticism of AMD in this is that they should have focused more on boosting up geometry performance during the GCN 1-3 transitions instead of non-revolutionary changes and hoping that developers like Epic would suddenly adopt their way.


well with another round of consoles under their belt using GCN again developers like Epic will have no choice but to start adopting their way or the engines will loose favor compared to engines that take advantage of it. No one is going to want to loose out on making the best looking console games for all the consoles out there
 
It doesn't mean that the feature wasn't requested by Sony. Custom APUs of consoles are not something you can do in a year as Kaotik suggests, AMD had gotten those contracts for years before they started mass production in 2013. We don't know anything about Async in Cayman and whether it was good enough for actual use, all we can say for sure that it would have been much worse if they had it broken in GCN rather than in Cayman, prototyping in hardware is always a good thing, so they had this feature in Cayman for a reason too, it does not mean that Cayman was somehow future-proofing and then all of a sudden swapped on completely different architecture

So you're suggesting that Sony requested features to be included in GCN 4-5 years before PS4 launched? GCN was available to the public in 2011 which means development on it likely started sometime in 2008-2009. PS4 came out in 2013. In other words they requested it prior to any planning on PS4 even started?

I find your assertion highly unlikely.

Regards,
SB
 
GCN also allows to swoop in a lightweight high priority task in an otherwise occupied CU without the need to preempt the running wavefronts and save the state of the complete CU
I would call this fine grain async as opposed to coarse grain async which is mainly useful for low occupancy passes like shadow pass, I wonder whether fine grain async could be beneficial for Maxwell and Pascal, which are already much faster with the same number of FLOPs.
I also wonder what happens with vertex attributes in GCN, those are stored in shared memory since attributes interpolation has been moved to ALUs in Cypress, shouldn't vertex attributes have to be spilled into memory before CU could proceed with compute shader? If they have to be spilled first, how long does it take to do so?
 
Last edited:
So you want to say consoles are holding back PC game development? Can always look at it that way too. The horse power on PC side of things is much higher than consoles, yet, we haven't seen or used that untapped potential much.

I understood it as "AMD is generally future-proofed and it hurts them because I can keep my cards longer. Nvidia's HW gets behind".

Consoles punch above their weight because they are overall well built (but quirky) and it's a fixed target that makes for great optimization potential.
 
With the current integration trend it will if Nvidia doesn't get their CPUs up to par with AMD and Intel for designing MCMs. Low and soon likely even mid-range parts will all be SFF and that's the vast majority of the market.

This is the same crap AMD has been spewing since they bought out ATi and started the whole iGPU, I find unfounded proof of this.

Is that really the case though? If developers would just get their act together and always design identical workloads we wouldn't have that problem. So is async really for covering inefficiencies or a method of making hardware more adaptable?

Depends on how you look at making hardware more adaptable.
 
I understood it as "AMD is generally future-proofed and it hurts them because I can keep my cards longer. Nvidia's HW gets behind".

Consoles punch above their weight because they are overall well built (but quirky) and it's a fixed target that makes for great optimization potential.


They don't punch their weight beyond what we can do with PC graphics. Hell no, you sound like this guy I was talking to about 2 years ago at a bar, the topic came to gaming somehow, and he thought the the PS3 had better graphics than computer games, and cited look at the sweat from Kobie Bryant's forehead. At that point I started ignoring him and went about why I sat at that specific part of the bar to talk to the girl to the left.

The landscape of creating games changed, as PC was the pinnacle in pushing graphics and games forward, as the Xbox and later PS3 came about, with the increase of piracy, PC games became a shell of themselves when it came to potential profits, so developers and publishers switched to consoles games as their primary focus. This is what you are seeing. And in the short term which ever IHV has console contracts do have some advantage, but usually the other IHV adapts within a generation and negates that advantage.

Everything comes down to the all might $
 
Last edited:
well with another round of consoles under their belt using GCN again developers like Epic will have no choice but to start adopting their way or the engines will loose favor compared to engines that take advantage of it. No one is going to want to loose out on making the best looking console games for all the consoles out there

The problem is that this generation proved that using GCN on the consoles doesn't mean there won't be any gameworks-infection that screws up performance on the PC side. ROTR, Arkham Knight, Just Cause 3 and most Ubisoft games had really bad performance on AMD hardware, whereas their console counterparts worked just fine.
 
Last edited by a moderator:
I would call this fine grain async as opposed to coarse grain async which is mainly useful for low occupancy passes like shadow pass, I wonder whether fine grain async could be beneficial for Maxwell and Pascal,
As I said, I'm pretty sure one can come up with scenarios, where it would be very beneficial, irrespective of the other properties of the architecture, i.e. also for Pascal.
I also wonder what happens with vertex attributes in GCN, those are stored in shared memory since attributes interpolation has been moved to ALUs in Cypress, shouldn't vertex attributes have to be spilled into memory before CU could proceed with compute shader? If they have to be spilled first, how long does it take to do so?
They just stay there. It's part of the context of the wavefront and if you don't preempt you don't have to save the context to memory, there is simply no need for that. Rarely any shader (especially graphic ones) will use the full amount of available shared memory (it is actually impossible to use more than half of it for a single wavefront). In the worst case (wavefronts of high priority task don't fit on any CU), one has to wait until the appropriate number of currently running wavefronts finish so the free space can be used otherwise. As I said, this usually happens at a pretty fast rate (usually much less than the 100µs latency nV mentioned for preemption).
Just to throw out some numbers, imagine you have a heavy post processing shader running completely occupying the GPU, assume just 1080p (this equals 32,400 wavefronts to run when invoking one instance per pixel) and it runs for full 16ms (that's an awful lot of time just for PP at this low resolution, but we make up a worst case here, right?) That tells you, that on average, every 0.5 µs a wavefront has to complete (it is a bit more tricky in practice as it will fluctuate wildly and not be a steady flow of completed wavefronts, but you get the idea). That means, on average you have every 0.5µs an opportunity to shove in a wavefront from another shader on some CU without any additional switching penalty.
 
Last edited:
This is the same crap AMD has been spewing since they bought out ATi and started the whole iGPU, I find unfounded proof of this.
So you'd argue Intel is making a legitimate effort to take over the discrete GPU business because they started making their own graphics? Why even carry a graphics division if it's not necessary for their CPUs? All the players seem to see the writing on the wall. Most people I've seen looking for a replacement desktop are trending towards SFF and mobile/tablet. Gamers and the professional market being the exception there because the performance isn't there yet. Stackable memory finally makes a lot of that possible if you consider the board real estate consumed by memory or expansion cards to hold even more memory for their devices.
 
The problem is that this generation proved that using GCN on the consoles doesn't mean there won't be any gameworks-infection that screws up performance on the PC side. ROTR, Arkham Knight, Just Cause 3 and most Ubisoft games had really bad performance on AMD hardware, whereas their console counterparts worked just fine.
sure but those choices may have been made before the console design wins or early on when the console user base didn't matter. Now there are 60m+ gcn parts out there in consoles and it seems like neo , nx and scorpio will carry that further ahead. So NVidia will have to show up with more money than before to get the same gamework wins.

IF AMD can make convincing pc parts (which the rx line seems like a good start ) they could make it even harder for NVidia to buy support moving forward if they crack NVidia's add in domanince
 
So you'd argue Intel is making a legitimate effort to take over the discrete GPU business because they started making their own graphics? Why even carry a graphics division if it's not necessary for their CPUs? All the players seem to see the writing on the wall. Most people I've seen looking for a replacement desktop are trending towards SFF and mobile/tablet. Gamers and the professional market being the exception there because the performance isn't there yet. Stackable memory finally makes a lot of that possible if you consider the board real estate consumed by memory or expansion cards to hold even more memory for their devices.

There are too many other variables that would make iGPU market to take over the dGPU market. Have we seen a shift in marketshare for iGPU vs. dGPU since the integration of CPU and GPU? Hasn't really changed much. Not only didn't it change the higher segments of GPU have actually increased lol. This is the total opposite of what you are saying.
 
Awwww yiissss those 3dmark11 results we were all hoping to see:

http://www.3dmark.com/3dm11/11263084

plJHNN.png



Hawaii reference:
http://www.guru3d.com/articles-pages/msi-radeon-r9-390x-gaming-8g-oc-review,22.html
 
There are too many other variables that would make iGPU market to take over the dGPU market. Have we seen a shift in marketshare for iGPU vs. dGPU since the integration of CPU and GPU? Hasn't really changed much. Not only didn't it change the higher segments of GPU have actually increased lol. This is the total opposite of what you are saying.

we've def seen the low end ($50-$100) boards shrink or disappear.
 
we've def seen the low end ($50-$100) boards shrink or disappear.

Overall, marketshare wise dGPU's haven't changed much, and those sales have went to mainstream and performance.

Has there been changes within the segments, yes, but not overall, the graphics card companies, at least one of them has changed their focus and has succeeded thus far.
 
The VLIW architecture could do it already in CTM or just the Xbox 360 part?
Cayman was the first part to implement async compute.
http://www.rage3d.com/reviews/video/amd_hd6970_hd6950_launch_review/index.php?p=7

It doesn't mean that the feature wasn't requested by Sony. Custom APUs of consoles are not something you can do in a year as Kaotik suggests, AMD had gotten those contracts for years before they started mass production in 2013. We don't know anything about Async in Cayman and whether it was good enough for actual use, all we can say for sure that it would have been much worse if they had it broken in GCN rather than in Cayman, prototyping in hardware is always a good thing, so they had this feature in Cayman for a reason too, it does not mean that Cayman was somehow future-proofing and then all of a sudden swapped on completely different architecture


In my book futureproof things are those which work out of the box and don't require years to be "future-proofing". If developers used tessellation in a reasonable amounts for Keplers or Maxwells, they would have "future-proofing" in comparison with GCN right from the start - http://www.ixbt.com/video4/images/gp104/ts5_island.png :LOL:
Async compute in Cayman would have worked similarly to GCN had it ever been exposed by software. If this feature were added for a console it wouldn't have been put in hardware two generations early. AMD's failing here was not having a software plan during Cayman's lifetime. Unifying hardware and software under Raja will ideally make this situation less likely to happen. I don't think it's fruitful to argue about future-proofing so I'll let others debate that.

My greatest criticism of AMD in this is that they should have focused more on boosting up geometry performance during the GCN 1-3 transitions instead of non-revolutionary changes and hoping that developers like Epic would suddenly adopt their way.
Nvidia certainly has had great geometry performance since Fermi and they keep improving, but falling behind in clock speed and voltage was a bigger problem for AMD than geometry performance.

I would call this fine grain async as opposed to coarse grain async which is mainly useful for low occupancy passes like shadow pass, I wonder whether fine grain async could be beneficial for Maxwell and Pascal, which are already much faster with the same number of FLOPs.
I also wonder what happens with vertex attributes in GCN, those are stored in shared memory since attributes interpolation has been moved to ALUs in Cypress, shouldn't vertex attributes have to be spilled into memory before CU could proceed with compute shader? If they have to be spilled first, how long does it take to do so?
The fine grained switching is just how AMD's architecture has always worked and as a consequence async compute fits well with the architecture. It wasn't implemented to make up for a fundamental inefficiency, but it may provide more benefit with a GCN type architecture than with Nvidia. It's hard to tell without really understanding how Nvidia's architectures like Pascal switch between work. Pascal seems to have improved over Kepler, but likely still works differently than AMD GCN.

As Gipsel said, vertex attributes are loaded into shared memory prior to PS launch and they stay there until the PS no longer needs them. If a compute shader needs the shared memory space it can execute on that CU at that time. This is why AMD's doesn't mention supporting graphics context switching. The state isn't saved off to memory.
 
Scalar unit (for wave invariant storage & math) would bring nice gains for common CUDA code as well. There's a 3 year old paper about it:
http://hwacha.org/papers/scalarization-cgo2013-talk.pdf

Scalar unit would also save register space and power. Automatic compiler analysis (as presented in the paper) is nice, but I don't trust compiler magic. I would prefer to have language keywords for wave invariant variables. Something like invariant(N), where N is a power of two number describing the granularity. Of course a better language would help. HLSL hasn't changed much since the days of SM 2.0 pixel and vertex shaders (designed for 1:1 inputs and outputs and no cross lane cooperation).

Honestly, i absoolutely not understand why Nvidia have not allready do it. I was prettty sure that it willl happend on Pascal allready, but like it is still not there, i will bet now for Volta. Its a logical evolution, it was extremely logic when AMD have introduct it, and it will be a logical suit for the nvidia architecture.
 
Status
Not open for further replies.
Back
Top