AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Maybe he has a different source, or is pulling all the various info together to make an educated guess.
Lol, you obviously don't know how this WCCF-clickbait works... This guy simply roams on all the known "tech" forums, messages boards, etc. like Anand, B3D, P3D, Chiphell, Fudzilla, random-taiwaneese-page, etc. When he is done, he compiles the info, makes changes here 'n' there and present it as his own on that website of his.

After its all done he just collects the click-$$$ generated by uninformed ppl. In fact, I'm wondering why linking this page hasn't been banned yet.
 
One thing that's been bothering me about wccftech is how much they've been re-spinning old stories as new based on nothing.

I used to go there as a place to get the latest rumors on time, but the amount of clickbait headlines that lead to old info is getting ridiculous. Especially with graphics cards.
 
The information regarding the SIMD goes back to an AMD patent September 2014, in fact the diagram on WCCF looks very much like how it was presented in the patent, however can it be assumed this is actually to do with Vega and not Navi *shrug*.
Given the same irregularities in the font and line choices, I'd say it's actually the same diagram as in the patent, with a few of the numeric tag lines taken out before overlaying it on the background image.

The patent leaves open a number of possibilities and implementations, so knowing to what extent if any a future GPU might implement the claims is unclear.
Getting better utilization out of the physical lanes is a decent goal, although deciding how to allocate or move resources is trickier.

The most straightforward kind of underutilization with potentially the least problem with thrashing may be after pixel/quad coverage is determined for a wavefront. The dead lanes would be known at the time of resource allocation in the CU.
Potentially, other patents or rendering methods with better visibility/culling at a tile or bin level could generate more wavefronts where this is detectable.
The branch divergence or backpressure scenarios can be tougher to figure out on the fly, or at least tougher to handle if switching gears has a software/migration cost versus more complicated forwarding.

One item that doesn't slot neatly into the existing wavefront scheme (there are others) is the outlined register file and SIMD unit allocation where there are 14 ALU lanes with dedicated register files.
GCN's current implementation is locked into the 64-item wavefront+16-wide SIMD+4 cycle cadence, and it has the operand and ALU bandwidth to match.
The patent's SIMD units cannot run 64 work items in that cadence, and the scalar ALUs do not have register files drawn that would allow them to work in concert with the SIMD units. The best-case throughput GCN has with fully-utilized wavefronts is out of this diagram's capability.
 
The information regarding the SIMD goes back to an AMD patent September 2014, in fact the diagram on WCCF looks very much like how it was presented in the patent, however can it be assumed this is actually to do with Vega and not Navi *shrug*.
http://pdfaiw.uspto.gov/.aiw?Docid=20160085551&homeurl=http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1%26Sect2=HITOFF%26d=PG01%26p=1%26u=%252Fnetahtml%252FPTO%252Fsrchnum.html%26r=1%26f=G%26l=50%26s1=%252220160085551%2522.PGNR.%26OS=DN/20160085551%26RS=DN/20160085551&PageNum=&Rtype=&SectionNum=&idkey=0637B8F2CF80
Text info on patent:
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1="20160085551".PGNR.&OS=DN/20160085551&RS=DN/20160085551

Maybe he has a different source, or is pulling all the various info together to make an educated guess.
Cheers

Yeah, I posted this here a while back. I think wccf guys might have done an article on it a while back too. But it's too big of a speculation to stake his reputation on; there's supposed to be another article as well soon so we will know soon enough.

otoh Koduri would've been shouting about this from the rooftops.
 
I would seriously doubt that AMD would say H1/2017 for Vega, if both chips would be out before the end of Q1/2017.

I don't think both chips will be out by Q1 2017. It is more likely that they release Vega 10 by early Q1 2017, because AMD lacks firepower in the high-end. NV will likely release their Pascal refresh during Computex, and that would be a good time for AMD to answer in kind with Vega 11.

They might do a slight refresh of Vega 10 to counter a possible improved version of GP102 for consumers.
 
Well there are many business factors here too;

A) We know AMD is cash strapped,
B) Zen launch is more important to them for the time being for AMD's survival aspect
C) R&D expenditure from AMD did not increase till this last quarter possibly for GPU's. That mean's Vega is out of the loop here with increased R&D, the increase would be for Navi

What does this all correlate to?

Each tape out for a chip will cost 20 million or so dollars, doing so many so quickly without money coming in? It really sounds to me a hail marry at this point to expect Vega to match up with Pascal, let alone a refresh of Pascal? That is a tall order even to begin with as it is now.

From Fiji they need 75% performance boost to get Titan P and drop power consumption by 25%

Effectively you are looking at over a 100% swing in perf/watt!

That is huge. They couldn't do it with most of the Polaris line up in general * a few applications here and there, but not that much on average*, but those 6 extra months is going to make that much of a difference? And that was over Hawaii/Granada, not over Fiji, perf/watt increase of Fiji with Polaris is pretty much flat. That means, Vega, would need to hit more than x2 perf/watt over Polaris! Guess what, AMD's own perf/watt projections don't get to that! And when you consider that chart 2.5x was not average for Polaris, man I don't know how anyone can see it being comparative on the power angle, I can see performance, they might be able to do it but at the cost of more power consumption than its competitors.

Vega 10 and 11 the most they would possibly do is go against CURRENT Pascal, performance and enthusiast levels. I don't believe in the crazy clock rumors of the refresh for Pascal, because unless nV did more work on their transistor layouts and design to increase clock speeds and maintain current power consumption, I don't see the refresh being anything like what video cardz has stated with projections of performance, a small bump in performance 10% per bracket with a power envelope that might stay the same or slightly higher.
 
Last edited:
Good points, Razor. Either way, I still think it is likely that they will release Vega 10 in early 2017, but the big GPU, because the high-end is bleeding for them. They might not do a refresh of Vega 10 for the reasons you mentioned. Vega 11 would likely be launched at the same time of Pascal refresh, because otherwise AMD has no real answer. Unless they're doing their own Polaris refresh of course. That is also a possibility.
 
Well there are many business factors here too;

A) We know AMD is cash strapped,
B) Zen launch is more important to them for the time being for AMD's survival aspect
C) R&D expenditure from AMD did not increase till this last quarter possibly for GPU's. That mean's Vega is out of the loop here with increased R&D, the increase would be for Navi

What does this all correlate to?

Each tape out for a chip will cost 20 million or so dollars, doing so many so quickly without money coming in? It really sounds to me a hail marry at this point to expect Vega to match up with Pascal, let alone a refresh of Pascal? That is a tall order even to begin with as it is now.

From Fiji they need 75% performance boost to get Titan P and drop power consumption by 25%

Effectively you are looking at over a 100% swing in perf/watt!

That is huge. They couldn't do it with most of the Polaris line up in general * a few applications here and there, but not that much on average*, but those 6 extra months is going to make that much of a difference? And that was over Hawaii/Granada, not over Fiji, perf/watt increase of Fiji with Polaris is pretty much flat. That means, Vega, would need to hit more than x2 perf/watt over Polaris! Guess what, AMD's own perf/watt projections don't get to that! And when you consider that chart 2.5x was not average for Polaris, man I don't know how anyone can see it being comparative on the power angle, I can see performance, they might be able to do it but at the cost of more power consumption than its competitors.

Vega 10 and 11 the most they would possibly do is go against CURRENT Pascal, performance and enthusiast levels. I don't believe in the crazy clock rumors of the refresh for Pascal, because unless nV did more work on their transistor layouts and design to increase clock speeds and maintain current power consumption, I don't see the refresh being anything like what video cardz has stated with projections of performance, a small bump in performance 10% per bracket with a power envelope that might stay the same or slightly higher.

NV could easily slash prices and raise stock clocks closer to 2000, they could also easily stick g5x on a 1070 but I don't know how much good it would do really

If Vega 10 achieves the reported 12tflops it'll probably perform in 1080 (oc) range in most games, in some it'll pull ahead - probably.

They could call it gtx 2000 series but I hope they don't, I have the itch to upgrade but I promised myself I'd wait for a 20tflop card :p
 
NV could easily slash prices and raise stock clocks closer to 2000, they could also easily stick g5x on a 1070 but I don't know how much good it would do really

I'm not sure how "easy" that would be. My 1070 won't come anywhere close to 2000 Mhz. If it were so easy to do that at least some of the AIBs would have done so by now, but none have that I'm aware of.

Regards,
SB
 
NV could easily slash prices and raise stock clocks closer to 2000, they could also easily stick g5x on a 1070 but I don't know how much good it would do really

If Vega 10 achieves the reported 12tflops it'll probably perform in 1080 (oc) range in most games, in some it'll pull ahead - probably.

They could call it gtx 2000 series but I hope they don't, I have the itch to upgrade but I promised myself I'd wait for a 20tflop card :p

If the report are true, the 12Tflops was point to a server / workstation gpu's. who are traditionally a good notch slower on corespeed than the gaming part.
 
Last edited:
NV could easily slash prices and raise stock clocks closer to 2000, they could also easily stick g5x on a 1070 but I don't know how much good it would do really

If Vega 10 achieves the reported 12tflops it'll probably perform in 1080 (oc) range in most games, in some it'll pull ahead - probably.

They could call it gtx 2000 series but I hope they don't, I have the itch to upgrade but I promised myself I'd wait for a 20tflop card :p


I don't think they can increase clocks like that, maybe another 200 to 300 mhz, but that would be getting mighty close to the 1080 which won't have that much room to overclock unless they sacrifice boost clocks.
 
If the report are true, the 12Tflops was point to a server / workstation gpu's. who are traditionally a good notch slower on corespeed than the gaming part.

Yeah normally, but yields may come into this and might mean the fully unlocked GPU could be reserved for server market, a good example is Nvidia with GP102 and the Tesla P40 (3840 Cuda cores) compared to the TitanXP (3584 Cuda cores).
The P40 has lower clocks but actually higher FP32 than any of the other Nvidia cards at 11.7TFLOPs due to being fully unlocked.

So similar situation may end up happening for AMD.
Cheers
 
GCN's current implementation is locked into the 64-item wavefront+16-wide SIMD+4 cycle cadence, and it has the operand and ALU bandwidth to match.
I'm leaning towards 16+1 or 16+1+1 with SMT for a SIMD. A scalar would work on a 4 item wave at the same cadence as a SIMD if clocked identically. It could jump in for any scalar workloads within a wave somewhat transparently and run a full wave with a higher cadence if required. It's not quite the variable SIMD sizes (which were still pow2 multiples), but would build on the same technology and be more flexible/efficient. All the multiples only make sense if expecting a lot of divergence approximating those sizes in the future at the expense of throughput.

We know AMD is cash strapped
Seems like they've raised plenty of cash lately. A billion or so with that last stock offering and restructuring? Two weeks we'll know what kind of profit, if any, they posted. Hard to imagine they are that strapped for cash.

R&D expenditure from AMD did not increase till this last quarter possibly for GPU's. That mean's Vega is out of the loop here with increased R&D, the increase would be for Navi
It's still GCN, so not like they needed to start from scratch. While there is some work to be done, many of the proposed new features shouldn't be that expensive to design and implement. Just tweak the existing design a bit. Some of those costs are likely shared with Zen as well.

If Vega 10 achieves the reported 12tflops
That seems a reasonable ballpark if you consider 380(4)->480(5.8) TFLOPs and then run the same math on Fiji (8.6) with some architectural improvements. Some of the partner boards would have already pushed that gain a bit higher. A dual could be interesting as the die sizes could effectively exceed known yield curves. Fiji was ~600mm2 and barely fit on the interposer, but two 400mm2 die could be bonded to yield an effective 800mm2 or larger die if they could make it fit. A size that isn't effective to fabricate.
 
12TFLOPs is not reasonable when you consider that it's supposedly on a 64CU chip which means 1.4-1.5Ghz clockspeed on a server grade gpu. It'd be more reasonable if AMD were getting out a bigger chip in specs than fiji.
 
12TFLOPs is not reasonable when you consider that it's supposedly on a 64CU chip which means 1.4-1.5Ghz clockspeed on a server grade gpu. It'd be more reasonable if AMD were getting out a bigger chip in specs than fiji.
What's a CU though? Add more SIMDs that TFLOP number goes up. More useful would be knowing the chip size as I doubt it becomes radically more dense.
 
I'm still baffled why everyone is now assuming Vega 11 is the smaller chip, something like Polaris 10 replacement and even for laptops - sure, the latest rumors thrown around by couple not-so-reputable sites suggest that, but they suggested just as long that Vega 11 would be bigger than Vega 10, IIRC they even threw around designs up to 96 CU at some point
 
Seems like they've raised plenty of cash lately. A billion or so with that last stock offering and restructuring? Two weeks we'll know what kind of profit, if any, they posted. Hard to imagine they are that strapped for cash.

Billion dollars is nothing when you have fuel R&D for a high end CPU and GPU, and maintain your current employee rates. And no they don't have a billion dollars, their current cash reserves after their contract change with GF, is much lower than that, also they have to maintain 600 million in cash for their debt pay back. So them being cash strapped is actually a very loose term, more like close to no cash at all.

It's still GCN, so not like they needed to start from scratch. While there is some work to be done, many of the proposed new features shouldn't be that expensive to design and implement. Just tweak the existing design a bit. Some of those costs are likely shared with Zen as well.

Zen has nothing to do with the GCN R&D expenditure. And yet, we can see the difference when nV puts a good deal of resources into R&D and they come out with Maxwell to Pascal, and we can see the what happens when AMD tries to do from Tonga to Polaris. A big difference in outcome, and this is not purely R&D but plans that have been in motion from nV years before hand. R&D is only as good as what your plans are for your vision of the product. Polaris was to be the disrupter of the market as AMD's marketing wanted us to believe before its release, yet it just came out with a similar upgrade path we saw from the past. Nothing special. AMD has been relegated to second rate products again, something they haven't seen since the r600/rv670 line up. Added to this the second part of your quote about dual GPU cards, they did it there too and failed miserably.


That seems a reasonable ballpark if you consider 380(4)->480(5.8) TFLOPs and then run the same math on Fiji (8.6) with some architectural improvements. Some of the partner boards would have already pushed that gain a bit higher. A dual could be interesting as the die sizes could effectively exceed known yield curves. Fiji was ~600mm2 and barely fit on the interposer, but two 400mm2 die could be bonded to yield an effective 800mm2 or larger die if they could make it fit. A size that isn't effective to fabricate.

And why would they want to do that? So they can fail when mGPU doesn't work? yes back to this topic again, putting two GPU's on an interposer you still have the same limitations as two boards...... Never worked with having 2 GPU's vs 1 GPU in the past, yet AMD would want to try to go down that road again, when software doesn't support it? That is by far foolish for them to do.

Hasn't AMD/ATi learned anything from their past, added to this, when you have your back against the wall, you don't go into high risk propositions, where the likely hood of failure can spell immediate changes to your group (graphics). You go for solid low risk paths which will garner sales/ $ and then later on expand into the risk areas. Just like AMD did in the past after their r600 debacle. We are not talking about a 1 generation screw up here, where AMD can just bounce back, we are looking at something that started with the release of the 6x0 series from nV and still continuing on today. That is 4 generations of cards.

Just a blast from the past, after the r600 screw up, it took AMD 2 gens to get back in the race with nV and that was because nV didn't innovate their architecture for tesla products (2x0 line) and then had a mess up with Fermi and its power consumption.

So in essence after AMD bought out ATi, they have only had success when nV has failed to deliver or didn't innovate. Do you see that with Pascal? I didn't see that, I saw they pushed something they had to something even more. With all the talk about Polaris's perf/watt increases, Pascal took its thunder because they showed AMD they are still no where near where they have to be to be competitive in that category.
 
Last edited:
I'm still baffled why everyone is now assuming Vega 11 is the smaller chip, something like Polaris 10 replacement and even for laptops - sure, the latest rumors thrown around by couple not-so-reputable sites suggest that, but they suggested just as long that Vega 11 would be bigger than Vega 10, IIRC they even threw around designs up to 96 CU at some point
It could be both if Vega 11 is 48 CUs and the top end card a dual Vega 11 (96 CUs). I'm not sure people are assuming as much as using the prior naming conventions to designate the chips. Even with the names flipped they are conceptually the same.

Zen has nothing to do with the GCN R&D expenditure. And yet, we can see the difference when nV puts a good deal of resources into R&D and they come out with Maxwell to Pascal, and we can see the what happens when AMD tries to do from Tonga to Polaris. A big difference in outcome, and this is not purely R&D but plans that have been in motion from nV years before hand.
They create cards that are comparable to each other? 480 and 1060 are neck and neck for the most part. Good chance that balance changes as DX12 and Vulkan get designed around.

you still have the same limitations as two boards
No you don't. If that were the case, why does even Nvidia use an interposer for P100? Why does Fiji? The reasoning why it would work with an interposer as opposed to separate cards is pretty obvious. It's also the foundation of everything AMD seems to be working on in the CPU and GPU markets. We now have synchronization primitives with fences designed into the API, tiled resources that could map out memory in different pools, and an ample bus width to connect them so why not? Microsoft is wasting a lot of cash on multi-adapter development on DX12 if your argument here is true. Or is your entire argument here based on Nvidia didn't do it and are so far ahead it must be pointless? By the same argument any GPU can't have multiple CU/SMs inside, yet all of them do.
 
They create cards that are comparable to each other? 480 and 1060 are neck and neck for the most part. Good chance that balance changes as DX12 and Vulkan get designed around.

LOL close with what a 35% power difference? What do you think that comes from? Thin air? And just like Maxwell 2 midrange the 1060 isn't the most perf/watt card in the Pascal lineup....

Its not all about performance, its about everything the card can do, and its perf/watt is at Maxwell 2 level, which is just 2 years too late.

And if throw Fiji in here too, if it didn't have the water cooler to keep its temps at 50c it would have used well over 300 watts of power, and lets not forget the use of HBM, that gave it some power advantages too. 1 degree of chip temp increase 1 watt of power increase (there about) if leakage is in check, if leakage isn't in check, which btw it probably was not that is why it couldn't overclock. It would be more than 1 to 1 increase.

Don't try to fool me into thinking AMD has caught up with Polaris, they are still a gen behind. And Pascal actually increased their perf/watt more than what Maxwell/2 did.

No you don't. If that were the case, why does even Nvidia use an interposer for P100? Why does Fiji? The reasoning why it would work with an interposer as opposed to separate cards is pretty obvious. It's also the foundation of everything AMD seems to be working on in the CPU and GPU markets. We now have synchronization primitives with fences designed into the API, tiled resources that could map out memory in different pools, and an ample bus width to connect them so why not? Microsoft is wasting a lot of cash on multi-adapter development on DX12 if your argument here is true. Or is your entire argument here based on Nvidia didn't do it and are so far ahead it must be pointless? By the same argument any GPU can't have multiple CU/SMs inside, yet all of them do.


Wait a minute, really? I think you need to read up on what the limitations are for gp100 and why it needs the high bandwidth connect between the interposers, it only solves 50% of the problem for games, the data transport, that doesn't stop the need for using methods like AFR or SFR, for compute needs its totally different then gaming needs, gaming needs all the same problems exist. If it was that easy, AMD and nV would have done it along time ago.... Raja wouldn't have stated they need programmers to change the way they are working with engines prior to Navi for scaleablity. The problem is for gaming or any 3d 2d work, you need the rasterizes to be able to share the data, even the high bandwidth interconnects don't take care of that.....

BTW still waiting for the link where they solved the multi GPU problems? There is no such link otherwise it would have been talked about here. I guess the lead of the RTG just doesn't know what he is talking about. Maybe something like his dual rx480 comparison to the gtx 1080? Or are you saying you know more then him about their own tech and roadmaps?

Microsoft isn't wasting a ton of money with mGPU, actually they haven't really been pushing mGPU to developers AT ALL! Its up to them to implement..... MS doesn't need to push it, if a developer is implementing SLi or Xfire, it would be better to do mGPU because it will work on all IHV's and gives them much more control. But the problem isn't there, the problem is developers aren't creating engines that are mGPU friendly at least not yet.

If consoles are the driving force for games, the need for mGPU just won't be there. And yeah consoles are the driving force for game development for now. AMD's own strategy has hamstringed them in the pc gaming world, if as you say they are going to go to multiple gpus on an interposer. Outside of a niche product, for benchmarks top scores, I see no need for multiple GPU's on the same interposer at least for gaming cards, HPC different story that isn't about graphics.

If you know the pipeline stages and why it can't be done without programmers involved, then you will see why it hasn't been done, it has nothing to do with who has done what so far, it is do to hard limitations of the pipeline stages vs. programming. And since you went down the road of having multiple CU's and SM's are the same thing as having multiple GPU's, yeah I can full well say you have no idea of what you are talking about. You keep circle jerking around the actual problem, and not understanding why the problem even exists in current games today and since the advent of multi GPU technologies. mGPU is pretty much SFR and AFR tech, nothing really different, just that MS has now included them into their API, so IHV's don't need to worry about driver support for individual titles and creates a common code base.

We are not talking about compute here (or you like to say, scientific applications), which why you even brought up gp100 and its needs are, is just totally dismissive of the topic at hand which is multiple gpu's for games, doesn't matter if they are on the same interposer or what ever fricken configuration, the same problems for mGPU/AFR/SFR exists. Until the programming side is solved for more than 50% of the major game developers, then and only then will multi GPU's on an interposer, or on the same card, or having two cards in a system will ever be worth while in any type of system lower than an enthusiast grade system for benchmarking. Its not the type of card/system you make for a halo product, it has never worked well in the past, and never will work well, just a waste of resources and money which AMD is short of.
 
Last edited:
Wait a minute, really? I think you need to read up on what the limitations are for gp100 and why it needs the high bandwidth connect between the interposers, it only solves 50% of the problem for games, the data transport, that doesn't stop the need for using methods like AFR or SFR, for compute needs its totally different then gaming needs, gaming needs all the same problems exist.
You completely missed the point here, which was fairly obvious. High bandwidth links between interposers have nothing to do with this. High bandwidth links on an interposer do. By far the biggest limitations of AFR/SFR is moving data between GPUs. Which in the past has largely occurred over PCIE or a proprietary link to some degree. A bandwidth measured in single GB/s with significant latency. AFR on a single GPU is pipelining. SFR I guess would be some form of partitioning, which would likely still work acceptably with some synchronization.

Raja wouldn't have stated they need programmers to change the way they are working with engines prior to Navi for scaleablity.
Developers would likely need to change their programming model with a higher quantity of cores or GPUs. 2-4 GPUs, as has occurred in the past, is possible. Scalability for Navi will likely be more about submitting the scene as a bunch of largely independent tiles.

The problem is for gaming or any 3d 2d work, you need the rasterizes to be able to share the data, even the high bandwidth interconnects don't take care of that.....
Why would rasterizers working on independent tasks need to share lots of data? Worst case they could just duplicate the work, culling portions that are irrelevant (SFR). A task they are perfectly suited to doing. Best case rasterization hardware on each GPU is interconnected. No different than doubling current hardware units. The simplest solution is just having each GPU rasterize separate frames (AFR) if pipelined appropriately.

Or are you saying you know more then him about their own tech and roadmaps?
Seems like it. Or at least your interpretation of what he said publicly.

If you know the pipeline stages and why it can't be done without programmers involved, then you will see why it hasn't been done, it has nothing to do with who has done what so far, it is do to hard limitations of the pipeline stages vs. programming.
So SLI and Crossfire have never existed without specific dev involvement? It's not typically done, at least well, because you end up with data dependencies between frames. No dependencies and AFR is pretty simple to schedule. The biggest hitch has always been the need to move a resource from one adapter to another to be used by the next frame. A task pretty simple task for synchronization primitives in DX12 and Vulkan. They seem to work really well keeping compute shaders from executing before any resources are required. These limitations you speak of seem mostly relegated to DX11 when rendering wouldn't be pipelined. If an app held off presenting the next frame until the current one completed I guess you could be correct.

And since you went down the road of having multiple CU's and SM's are the same thing as having multiple GPU's, yeah I can full well say you have no idea of what you are talking about.
This is why it's obvious you have no clue what you're talking about. You stated that elements of a GPU can't be directly connected to one another. An obviously false supposition as it's the basis of all chips. SM/CUs don't communicate wirelessly last I checked. They are wired together to send signals. Routing through a chip, interposer, or actual wires is irrelevant outside of communication speeds.

You keep circle jerking around the actual problem, and not understanding why the problem even exists in current games today and since the advent of multi GPU technologies. mGPU is pretty much SFR and AFR tech, nothing really different, just that MS has now included them into their API, so IHV's don't need to worry about driver support for individual titles and creates a common code base.
The problem I feel I've pretty well laid out. I'm just not sure you understand what the past limitations actually were or why things were done the way they were.

We are not talking about compute here (or you like to say, scientific applications), which why you even brought up gp100 and its needs are, is just totally dismissive of the topic at hand which is multiple gpu's for games, doesn't matter if they are on the same interposer or what ever fricken configuration, the same problems for mGPU/AFR/SFR exists.
Where did I mention compute, or even graphics in regards to P100? All I indicated was that an interposer was used for connecting HBM memory, which I'm unsure you understand the point of including it. The interposer exists because running that volume of traces through a PCB is prohibitive. The same reason running lots of traces to connect multiple GPUs would likely require an interposer. While I'm not saying this is what Vega is doing, elements of each GPU could be directly connected to each other as if on the same chip.
 
Back
Top