AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Entropy · Oct 9, 2016

Razor1 said:
LOL close with what a 35% power difference? What do you think that comes from? Thin air? And just like Maxwell 2 midrange the 1060 isn't the most perf/watt card in the Pascal lineup....

You are rather extreme in your evaluation.
Polaris 10 and GP106 are quite close in performance, power draw, die size and price to end user. Polaris 10 is arguably pushed a bit further up on the voltage/power curve, resulting in less flattering perf/W, but the difference is hardly very large and is a market decision of AMDs. (And recent third party boards seem a bit different in that regard compared to the initial reference design.)
Unless you for some reason are really heavily invested in small percentages, or a particular company, if you take a step back you'd look at them and call them roughly equivalent. Differences are small enough to be ignored, and can be attributed to differences in process technology or implementation details at the particular process.
It seems even more extreme to take that small difference between Polaris 10 and GP104 and assume that a new device from AMD, where we know there will be changes to architecture and memory design and either a new iteration on the same process, or on a new process variant altogether, cannot change that balance a bit. Of course it can!
To what extent it actually will, well, the proof of that pudding will be in the eating.

Razor1 · Oct 9, 2016

Anarchist4000 said:
You completely missed the point here, which was fairly obvious. High bandwidth links between interposers have nothing to do with this. High bandwidth links on an interposer do. By far the biggest limitations of AFR/SFR is moving data between GPUs. Which in the past has largely occurred over PCIE or a proprietary link to some degree. A bandwidth measured in single GB/s with significant latency. AFR on a single GPU is pipelining. SFR I guess would be some form of partitioning, which would likely still work acceptably with some synchronization.

LOL that is not where the limitation is, its not the data between the two GPU's, its the way the rasterizers of each GPU need to communicate with each other, and it can't be done with multiple GPU's right now! And its all due to the way the current lighting models are done in games. That has to change for mGPU tech to take off, no wary around that!

Developers would likely need to change their programming model with a higher quantity of cores or GPUs. 2-4 GPUs, as has occurred in the past, is possible. Scalability for Navi will likely be more about submitting the scene as a bunch of largely independent tiles.

Again where is the problem coming from, look above.

Why would rasterizers working on independent tasks need to share lots of data? Worst case they could just duplicate the work, culling portions that are irrelevant (SFR). A task they are perfectly suited to doing. Best case rasterization hardware on each GPU is interconnected. No different than doubling current hardware units. The simplest solution is just having each GPU rasterize separate frames (AFR) if pipelined appropriately.

Because the rasterizes work on all polygons if a model crosses over a tile, and it will many times over, that will cause problems if they can't communicate with each other. And across 2 separate GPU's they just can't do it right now. It has NOTHING to do with bandwidth needs, it has everything to do with control silicon and cache, you don't want to be sending this data to an outside memory let alone to another GPU, it will cause too much latency for the pixel shaders to do their work. And thus you end up with no gains over the one card (SFR). That is what is happening isn't it? pretty much sending that data and introducing that latency SFR gives no tangible benefits over a single GPU.

AFR is different current engine technologies lighting needs information from previous frames, that information is not saved, so AFR breaks down. No such thing as pipelined different, the pipeline doesn't change what it does, what changes is what the programs needs are. That has nothing to do with the graphics pipeline, the graphics pipeline doesn't change nor should it change. And this is where I am coming from, if they don't make changes to the graphics pipeline stages, the need for mGPU technologies will not just disappear, which is what you are saying is going to happen, it won't happen, because if they change the graphics pipelines, there will be a major change in the way current game engines, older game engines work. Its going to be slow process gen to gen, not something that is going to happen in 2 generations of cards.

Seems like it. Or at least your interpretation of what he said publicly.

Yes you are making things up on AMD's behalf. They haven't even stated anything like this, or nV for that matter because they know its not anywhere close to doing things like what you are mentioning.

So SLI and Crossfire have never existed without specific dev involvement? It's not typically done, at least well, because you end up with data dependencies between frames. No dependencies and AFR is pretty simple to schedule. The biggest hitch has always been the need to move a resource from one adapter to another to be used by the next frame. A task pretty simple task for synchronization primitives in DX12 and Vulkan. They seem to work really well keeping compute shaders from executing before any resources are required. These limitations you speak of seem mostly relegated to DX11 when rendering wouldn't be pipelined. If an app held off presenting the next frame until the current one completed I guess you could be correct.

What is that data, look above, that fix has not been done yet, and WON'T be done anytime soon.

This is why it's obvious you have no clue what you're talking about. You stated that elements of a GPU can't be directly connected to one another. An obviously false supposition as it's the basis of all chips. SM/CUs don't communicate wirelessly last I checked. They are wired together to send signals. Routing through a chip, interposer, or actual wires is irrelevant outside of communication speeds.

I never stated as such, but then again, there you go to something that is BS.

Where is the freckin link that says multi gpu's will work fine without any type of mgpu technology in the next 2 years? Made up? You stated they are already for sale. It must be a figment of your imagination!

The problem I feel I've pretty well laid out. I'm just not sure you understand what the past limitations actually were or why things were done the way they were.

In your twisted little mind that goes against any sane understanding of why the technologies evolved the way they have.....

Where did I mention compute, or even graphics in regards to P100? All I indicated was that an interposer was used for connecting HBM memory, which I'm unsure you understand the point of including it. The interposer exists because running that volume of traces through a PCB is prohibitive. The same reason running lots of traces to connect multiple GPUs would likely require an interposer. While I'm not saying this is what Vega is doing, elements of each GPU could be directly connected to each other as if on the same chip.

No you didn't it, nor did I even state you did, I just stated that you are lumping graphics and compute together and making assumptions (the first part of assumptions, is ass), shit this is simple stuff why it hasn't been done before, it had nothing to do with the bandwidth between the chips, it had everything to do with the chips communicating with each other. Actually if this problem was solved, they wouldn't need the extra bandwidth *or reduced bandwidth bottlenecks* that an interposer would provide to see many tangible benefits. It would make programmers lives that much easier to not worry about mGPU technologies.

is this retardville here, I must be in the wrong forum.

Razor1 · Oct 9, 2016

Entropy said:
You are rather extreme in your evaluation.
Polaris 10 and GP106 are quite close in performance, power draw, die size and price to end user. Polaris 10 is arguably pushed a bit further up on the voltage/power curve, resulting in less flattering perf/W, but the difference is hardly very large and is a market decision of AMDs. (And recent third party boards seem a bit different in that regard compared to the initial reference design.)
Unless you for some reason are really heavily invested in small percentages, or a particular company, if you take a step back you'd look at them and call them roughly equivalent. Differences are small enough to be ignored, and can be attributed to differences in process technology or implementation details at the particular process.
It seems even more extreme to take that small difference between Polaris 10 and GP104 and assume that a new device from AMD, where we know there will be changes to architecture and memory design and either a new iteration on the same process, or on a new process variant altogether, cannot change that balance a bit. Of course it can!
To what extent it actually will, well, the proof of that pudding will be in the eating.

Well still looking at 35% change in perf/watt to reach the lowest level perf/watt in the pascal lineup, its pretty bad, cause the 1070 and 1080 is another 30% more? This is not something they just do over the course of 6 months, if they do, I would be AMAZED. And all the pascal cards are no where near their voltage and frequency limits unlike Polaris (with power usage in mind)

The difference is very large, and it wasn't a marketing discussion from AMD, 3rd party boards they still have the same perf/watt difference. The only thing is they can overclock a bit more that is it. AMD had no choice but to push the clocks up to get to where they promised VR for the TAM, if they didn't do that, they would not have had a Polaris that was VR ready.

CSI PC · Oct 9, 2016

Kaotik said:
I'm still baffled why everyone is now assuming Vega 11 is the smaller chip, something like Polaris 10 replacement and even for laptops - sure, the latest rumors thrown around by couple not-so-reputable sites suggest that, but they suggested just as long that Vega 11 would be bigger than Vega 10, IIRC they even threw around designs up to 96 CU at some point

Yeah I think it should be kept open minded.
In an interview with Ryan Shrout I thought he mentioned that the situation of Polaris 10 and 11 naming and chip size would not be replicated with Vega meaning 11 should be larger, but then that was a little while ago and modern rumours persist with 11 being smaller *shrug*.
So not sure if anyone can know for sure.

Cheers

CSI PC · Oct 9, 2016

Entropy said:
You are rather extreme in your evaluation.
Polaris 10 and GP106 are quite close in performance, power draw, die size and price to end user..

Sorry to drag this into an Nvidia vs AMD thing but the 480 has significantly greater power draw than Pascal.
Case in point Tom's Hardware spent much time mapping voltages-watts-frequency-game fps in Metro Last Light for both the 480 and 1060, the result was the 1060 only needs 61W to support its base clock of 1506MHz, while 480 is averaging 135W for its base clock of 1120MHz.
At their peak performance in the Tom's analysis for normal voltage (up to 1.1V) 1060 uses around 118W for 2050MHZ, while the 480 was around 155W for 1300MHz, given how many games they trade blows in I am not seeing how you can think they are close in performance with regards to power draw and gaming.
As I mentioned in a different topic the Tesla P4 that is GP104 manages 5.5TFLOPs FP32 at 75W - it is designed as a 50/75W card so does not need tweaking as Tom's Hardware was doing and evaluating with the 480 and 1060.

Vega I am sure is more competitive than Polaris, but I am still mulling over whether they will hit 12TFLOPs with a total board power of 230W (not sure where that figure came from but it is being reported by some sites), what may help is it being HBM.
Cheers

Edit:
Caveat, I should mention yeah you can go below the 'official' base line of 1120MHz for the 480 and reduce this further to just under 900MHz and this draws around 116W.
So ironically even at its lowest stable game voltage-frequency it still draws nearly as much power as the 1060 at full 2050MHz boost.

Anarchist4000 · Oct 9, 2016

Razor1 said:
LOL that is not where the limitation is, its not the data between the two GPU's, its the way the rasterizers of each GPU need to communicate with each other, and it can't be done with multiple GPU's right now! And its all due to the way the current lighting models are done in games. That has to change for mGPU tech to take off, no wary around that!

It is and isn't communication between chips? I stand by my point that the issue is resource A being required on GPU B with an insufficient means to get the data. This problem is solved by efficiently accessing memory on the other card. With PCIE this step was far to slow. A fast interconnect addresses this. With tiled resources, tiles could exist in the memory pool of the device likely to be using them. Pixel shaders would then remain localized to their data, but could go looking for other data as required. This is IF each GPU doesn't simply work on an asynchronous task independently for the lighting stage.

Razor1 said:
Because the rasterizes work on all polygons if a model crosses over a tile, and it will many times over, that will cause problems if they can't communicate with each other. And across 2 separate GPU's they just can't do it right now. It has NOTHING to do with bandwidth needs, it has everything to do with control silicon and cache, you don't want to be sending this data to an outside memory let alone to another GPU, it will cause too much latency for the pixel shaders to do their work. And thus you end up with no gains over the one card (SFR). That is what is happening isn't it? pretty much sending that data and introducing that latency SFR gives no tangible benefits over a single GPU.

If working on independent tasks they don't need to communicate. If working on the same task an interposer could allow them to be fully connected, acting as one. More likely is they rasterize two things at once. Wire the crossbars together and what is outside memory? If one chip has direct access to memory on the other, that latency issue largely goes away. With virtual addressing this is possible. Even with some latency you now have async compute filling in gaps and the cards were already designed to hide that latency.

Razor1 said:
I never stated as such, but then again, there you go to something that is BS.

So what differences do you think exist between connecting parts of a GPU with equal numbers of traces? Are 1000 lines over an interposer any different than 1000 lines through a die? Beyond simple propagation delay.

Razor1 said:
No you didn't it, nor did I even state you did, I just stated that you are lumping graphics and compute together and making assumptions (the first part of assumptions, is ass), shit this is simple stuff why it hasn't been done before, it had nothing to do with the bandwidth between the chips, it had everything to do with the chips communicating with each other.

It is simple, that's why it's mindboggling you can't understand it. You keep saying it has nothing to do with the chips communicating, but with the chips communicating? I'm at a loss for words here.

Razor1 · Oct 9, 2016

Anarchist4000 said:
It is and isn't communication between chips? I stand by my point that the issue is resource A being required on GPU B with an insufficient means to get the data. This problem is solved by efficiently accessing memory on the other card. With PCIE this step was far to slow. A fast interconnect addresses this. With tiled resources, tiles could exist in the memory pool of the device likely to be using them. Pixel shaders would then remain localized to their data, but could go looking for other data as required. This is IF each GPU doesn't simply work on an asynchronous task independently for the lighting stage.

The rasterizer's aren't programmable, they are what they are, that is all is it..... It has nothing to do with bandwidth between the two chips, when that information is just stuck on one chip.....the problem is even before the bandwidth. Do you understand that? If this problem can be resolved, the bandwidth just gives the capability to use the memory pool in total, but as it is right now, they can't do that, each GPU needs its own pool of memory.

If working on independent tasks they don't need to communicate. If working on the same task an interposer could allow them to be fully connected, acting as one. More likely is they rasterize two things at once. Wire the crossbars together and what is outside memory? If one chip has direct access to memory on the other, that latency issue largely goes away. With virtual addressing this is possible. Even with some latency you now have async compute filling in gaps and the cards were already designed to hide that latency.

And when you see independent tasks based on per pixel level? This has nothing to with async compute, because we are talking about rasterization, not compute shaders, so don't know why you even bring that up :/.

Back to independent rendering on a per pixel level, again, you don't understand why we are able to render GI and advanced lighting so fast, its because they aren't independent from one another :/

http://www.cs.princeton.edu/courses/archive/spr11/cos426/notes/cos426_s11_lecture16_pipeline.pdf

This is the 3d pipeline from a programmers needs BTW, do you see independency working in your favor there?

So what differences do you think exist between connecting parts of a GPU with equal numbers of traces? Are 1000 lines over an interposer any different than 1000 lines through a die? Beyond simple propagation delay.

It is simple, that's why it's mindboggling you can't understand it. You keep saying it has nothing to do with the chips communicating, but with the chips communicating? I'm at a loss for words here.

Yeah What is mind boggling is you don't shit about the graphics pipeline but you throw out crazy ideas that don't make any sense, Where is the fricken link?

In the mean time you need to know the basics of the graphics pipeline even before talking about mgpu.

The pipeline stages don't change lol, they haven't changed in many many years, since the intro of DX9

Crazy theories just because interposers are around now, just don't fly when you have years of work that must be thrown away. There most likely NO WAY to do away with mGPU (SFR/AFR) in the distant future let alone in the near future.

TurpoUrpo · Oct 9, 2016

Razor1 said:
Well still looking at 35% change in perf/watt to reach the lowest level perf/watt in the pascal lineup, its pretty bad, cause the 1070 and 1080 is another 30% more? This is not something they just do over the course of 6 months, if they do, I would be AMAZED. And all the pascal cards are no where near their voltage and frequency limits unlike Polaris (with power usage in mind)

The difference is very large, and it wasn't a marketing discussion from AMD, 3rd party boards they still have the same perf/watt difference. The only thing is they can overclock a bit more that is it. AMD had no choice but to push the clocks up to get to where they promised VR for the TAM, if they didn't do that, they would not have had a Polaris that was VR ready.

Why would they need to do it in mere six months? Vega has been in development for far longer obviously. We know that its IP9, Polaris was still 8. I would expect Vega to be major rework on the core, polaris was more about just tweaks. What i think is, that polaris was actually supposed to be released in beginning of this year (and yes, i think AMD was suprised by nVidia with pascal, and they messed up whole polaris release with upping clocks too much for practically zero gain) making gap of full year between it and Vega.

Razor1 · Oct 9, 2016

TurpoUrpo said:
Why would they need to do it in mere six months? Vega has been in development for far longer obviously. We know that its IP9, Polaris was still 8. I would expect Vega to be major rework on the core, polaris was more about just tweaks. What i think is, that polaris was actually supposed to be released in beginning of this year (and yes, i think AMD was suprised by nVidia with pascal, and they messed up whole polaris release with upping clocks too much for practically zero gain) making gap of full year between it and Vega.

There seems to be only a 6 month difference between Polaris and Vega, even 1 year is still too short of a time to make such drastic changes. Anytime these companies have been beaten by their competitors it has always taken a full generation to recover and equalize the playing field and that is if the other company didn't really innovate.

r200 from ATi, FX series from nV, r600 from AMD, gt2x0 series from nV, these four examples and their turn around was because their competition either made faults of their own or they didn't innovate for their next generation cards.... we don't see any type of complacency from nV with Pascal. And if it took AMD by surprise, then Vega is in TROUBLE because that means their targets would probably not have been as lofty as what Pascal ended up. After seeing Maxwell, AMD should have know nV could have done much more with the new node. AMD should have been expecting Pascal, yeah maybe not as soon as it came out but at least by mid of the year.

starting from the r290 AMD has been getting whipped by power draw comparisons to their competition, and nV has not been standing still, they have been innovating. Seriously after the r290, and then the r390 and Fiji, that is 1 full generation they came out with Polaris which is the second generation after they knew they had a problem in competition, the only thing that seems to have helped their power draw is the node, not the uarch. If anything the uarch per transistor performance is lower :/ than previous generation cards.

So the whole TAM thing, the power consumption debacle an entire 6 month delay of release, was all do to not expecting Pascal? If that is the case, which it could be then they weren't expecting Pascal to come out as good as it did. Pretty much if nV just did a node shrink with Maxwell, we wouldn't have seen the frequency jumps in their cores right? Not only did nV save on power but also upped the clocks at the same time! That could have surprised AMD, if that was a surprise to them, Vega is too far into development to make any of those types of changes that would help them compete on both performance and power. One or the other is all they can focus on.

firstminion · Oct 10, 2016

tick-tock, tick-tock...

Time will tell.

kalelovil · Oct 10, 2016

Razor1 said:
This is not something they just do over the course of 6 months, if they do, I would be AMAZED.

Work on Vega would have started long before Polaris was finished.
RV740's release in April 2009 did not mean Cypress only had 6 months of development.
The release of Barts in October 2010 did not mean Cayman only had 2 months of development.

Razor1 · Oct 10, 2016

Never stated they only have 6 months of development, I'm saying it has had more than 6 month over Polaris, and that is it. Even one year, that extra time doesn't constitute over a x2 perf/watt increase, since we know, Polaris ran into many issues with its perf/watt capabilities for what ever reasons in comparison to Pascal. As I stated perf/transistor is worse than Hawaii/Granada, that is unusual to see that happen with new nodes, usually these guys get a perf/transistor increase, otherwise the cost of the new node isn't really covered :/

The gtx 1050 is coming out on a process very close to GF14nm process used for Polaris, if nV is able to get more clocks out that process than TSMC 16nm while keeping power consumption relative to what is to be expected for the 1050, yeah AMD has its work cut out for them, that means GCN's architecture just can't cut it with power consumption over Pascal without a major overhaul, we won't see them catchup. Too many generations have gone by to just think this is stop gate.

Just think back when you saw the gtx 750 series did you think nV was going to have a pronounced lead in the next generation in power consumption? I did (if AMD didn't change their tactics), I just couldn't see how AMD was going to catchup to that for at least 1 gen, now its 3 generations since that point and they still haven't corrected that specific problem, its not something that was done by accident for them to fix, its inherent to the GCN architecture would be my guess at this point. And why will they all of a sudden have the capability to change it now 3 gens later? Polaris didn't show us they could change it to any appreciable amount.

Silent_Buddha · Oct 10, 2016

Razor1 said:
There seems to be only a 6 month difference between Polaris and Vega, even 1 year is still too short of a time to make such drastic changes. Anytime these companies have been beaten by their competitors it has always taken a full generation to recover and equalize the playing field and that is if the other company didn't really innovate..

You still aren't thinking things through. It's entirely possible that Vega started development before Polaris as it's potentially a radical change in architecture whereas Polaris was just a tweak to an existing architecture.

So it's entire possible that Polaris started development 1.5 years ago while Vega started development 2.5 years ago (arbitrary numbers). When the product actually comes out isn't always relevant to when a product started development.

Not saying that's actually the case, just that the possibility is there. We won't know until AMD officially releases more details on it.

Intel and Nvidia aren't the only companies that can have multiple teams working on a product line

Regards,
SB

kalelovil · Oct 10, 2016

Razor1 said:
Never stated they only have 6 months of development, I'm saying it has had more than 6 month over Polaris, and that is it.

Sorry, I misunderstood. There is precedent for a new GPU generation to deliver significant changes when released only months after ~~its predecessor~~ a prior GPU release.

Razor1 said:
Polaris didn't show us they could change it to any appreciable amount.

Polaris was a relatively minor architectural update. The SIMD structure remains largely unchanged from GCN's debut in 2011, unlike Nvidia. Perhaps that is where AMD needs to devote resources in order to catch up, hopefully Vega does that.

ImSpartacus · Oct 10, 2016

Silent_Buddha said:
You still aren't thinking things through. It's entirely possible that Vega started development before Polaris as it's potentially a radical change in architecture whereas Polaris was just a tweak to an existing architecture.

kalelovil said:
Sorry, I misunderstood. There is precedent for a new GPU generation to deliver significant changes when released only months after its predecessor.

Do you guys recall any specific precedent for that kind of situation?

I suppose it's not unthinkable, But nothing is coming to mind.

Razor1 · Oct 10, 2016

Silent_Buddha said:
You still aren't thinking things through. It's entirely possible that Vega started development before Polaris as it's potentially a radical change in architecture whereas Polaris was just a tweak to an existing architecture.

So it's entire possible that Polaris started development 1.5 years ago while Vega started development 2.5 years ago (arbitrary numbers). When the product actually comes out isn't always relevant to when a product started development.

Not saying that's actually the case, just that the possibility is there. We won't know until AMD officially releases more details on it.

Intel and Nvidia aren't the only companies that can have multiple teams working on a product line

Regards,
SB

AMD doesn't have multiple teams on different projects, just doesn't seem like that any more, remember the same guy was working on both Polaris and Vega teams even if they had multiple teams. And the same team taped out both chips if I'm not mistaken. They definitely don't have more than one design team for their CPU side of things and that takes up a hell of lot more R&D than their current GPU setup.

If they were planning on Vega and nothing else in the mid term for 2 years does that sound even remotely possible? That means their r390 and Fiji plan was just that re brand their 290 series and bring out a chip at the enthusiast end that was definitely going to get slammed and watch them get crushed once they saw the 750 from nV? I highly doubt that, I think Polaris was on their mind and Vega was there too at the same time. It was shortly after Raja's come back when he tweeted about Polaris, and it wasn't that much after that Vega showed up, they were all just renamed code names, they were already know for a while.

What you are saying is pretty much AMD didn't expect to release Maxwell 2 and they got side swiped. No there is no way AMD got side swiped with that one, 1.5 years per gen has been nV's cycle for quite some time now, and they have stuck to it, yeah missed it a couple of times with delays for up to a quarter but they have been pretty solid on it. So There is no way AMD wasn't planning for Polaris or Vega to come out in the time frames the did/are come out in. All the problems they fixed with Polaris were know for many years before hand since they created GCN to begin with so it took them 3 gens to fix parts of GCN's front end? I mean that doesn't sound like they have healthy multiple design team, it sounds like they have a meager design team. And even Raja stated they would really like more talent on their design team just that too many have left. Which speaks to not having multiple teams too.

Yeah just found a pic of the tweet they only had one team on both Polaris and Vega

https://twitter.com/GFXChipTweeter/status/745889570919682048/photo/1?ref_src=twsrc^tfw

Incredibly proud of our gpu design team here in shanghai that delivered polaris family and next Vega

Razor1 · Oct 10, 2016

kalelovil said:
Sorry, I misunderstood. There is precedent for a new GPU generation to deliver significant changes when released only months after its predecessor.

I have never seen that EVER. I have seen timelines sped up for about 1 quarter at the most, or a new product introduced and time tables pushed out because of it when one of the companies had a demanding lead, but never have seen two architectures launch within a short period of time because the need was there to do so. And added to this to do something like that, AMD would have needed to spend a ton of money to do something like that, which none of the recent financial calls for the past 3 years have stated as such. Only after Raja came back did R&D possible went up, so I don't know man nothing points to that.

Polaris was a relatively minor architectural update. The SIMD structure remains largely unchanged from GCN's debut in 2011, unlike Nvidia. Perhaps that is where AMD needs to devote resources in order to catch up, hopefully Vega does that.

We don't know, and to speculate on that, is kinda full hardy at this point knowing that their R&D was cut down to insignificant amounts for many years now.

french toast · Oct 10, 2016

Razor1 said:
Well still looking at 35% change in perf/watt to reach the lowest level perf/watt in the pascal lineup, its pretty bad, cause the 1070 and 1080 is another 30% more? This is not something they just do over the course of 6 months, if they do, I would be AMAZED. And all the pascal cards are no where near their voltage and frequency limits unlike Polaris (with power usage in mind)

The difference is very large, and it wasn't a marketing discussion from AMD, 3rd party boards they still have the same perf/watt difference. The only thing is they can overclock a bit more that is it. AMD had no choice but to push the clocks up to get to where they promised VR for the TAM, if they didn't do that, they would not have had a Polaris that was VR ready.

Certainly Polaris was a rocky launch, 480 should have launched with a 8 pin power connector and a better cooler, would have improved general tech press view about it in comparison with 1060.
Performance wise you could say they are pretty comparable, dx11 1060 wins by between 5-15% (saw somewhere 6-7% average across broad selection of games minus project cars) rx480 is maybe even further ahead with next/ current? gen Dx12 Vulcan API.

Efficiency however Polaris was clearly well behind, not even close highlighted in last generation APIs like dx11, but we know global foundries 14lpp process is a first attempt and Polaris was likely a pipe cleaner, ignoring the rumours of Sony setting the specifications of the chip for ps4 pro it is likely that Polaris architecture is significantly better than it ended up looking on first run at glofo,

Take a look at this link pulled from semi accurate forum of an xfs 480 GTR, the power charactistcs are a major improvement of the early rx480s, about 20w at the asic(90w)whilst running at a very stable 1288 boost, interestingly it over clocked to 1475mhz whilst only pushing 149w at the core.

Don't know whether this is a rev B as some have speculated or just some xfs implementation but that is very impressive if you ask me, even more so if it isn't even a new rev.

If some site could compare the performance per watt of this Polaris to 1060 at current up to date APIs such as dx12 and Vulcan (which arguably gcn was designed for and more relevant going forward) the perf per watt is probably neck and neck.

Back to topic, if Vega introduces a major revamped ip9 core, works on the rasterizer, glofo 14nm lpp process matures I can see perf per watt being within hair splitting distance at modern APIs from nvidia /amd.

sebbbi · Oct 10, 2016

My comments on the multi-GPU on interposer:

- Tiled rasterizers (Maxwell/Pascal) provide easy independent chunks of work at relatively coarse granularity. Tile has tens of thousands of pixels (+potential overdraw), so it takes considerable time to process.
- All primitives rendered to the same region are grouped to the same tile. There's no need to fine grained synchronization.
- Tiles could be output to a shared cache (like Intel eDRAM on the same interposer) and combined there. As long as the coarse tile order is obeyed, the result is correct.
- As long as the buffers for tiled geometry are long enough, tile rendering & combining latency is not going to be a problem. Each batch has N completely separate tiles, and doesn't need internal synchronization.
- Let's assume (worst case) that the tile combining takes as much time as rendering the tile. As long as you have enough tiles on each batch to fill all GPUs twice, there will never be any stalls. Tiles can always be ordered in a way that the previous tile result (same location) is finished and combined to memory (z-buffering, blending, etc works properly).

Multi GPU on interposed could work (for rasterizing). Big shared cache on the interposer (eDRAM) would be a great place to store the binned (tile) triangles and the tile pixels. But this wouldn't be as energy efficient as having one bigger die. Off-chip data transfers are significantly more expensive (even on interposer). It would still beat the current mGPU implementations by a mile

Deleted member 13524 · Oct 10, 2016

french toast said:
Don't know whether this is a rev B as some have speculated or just some xfs implementation but that is very impressive if you ask me, even more so if it isn't even a new rev.

Those are indeed some very impressive results. The card isn't using the reference PCB either, BTW. Looks like the RX 480 to get, perhaps even better than the Sapphire Nitro+.

Though one has to wonder if the lower power consumption isn't at least partially due to the lower temperatures that come from the card being used in an open case. At least in that video review.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Deleted member 13524

Guest