dGPU vs APU spin-off

  • Thread starter Deleted member 13524
  • Start date
Status
Not open for further replies.
Anytime soon = never.

Cheers

So you think that by e.g. 2040, which is as much time passed between now and 1990, processors and graphics processors will be separate entities?
 
Is that a counter-argument? If IHVs are using 4x the dies, the cost of silicon has gone down, by a lot.

And they won't be stacking GPU dies anytime soon, the power density would be ridiculous.

Cheers
Not necessarily if they get solid performance out of having many dice. If halving the power, stacking two may be practical, but it seems more likely we'll have memory stacked on logic. Low power on "reasonable" power dice for power density issues with the shortest connections possible. Then grid many of those on an interposer. Effectively fabricating larger dice than would be reasonable today. The stacking would only be practical if you started targeting low power nodes where the thermal issues were less prevalent.
 
There is absolutely no reason a discrete could could have a wider bus than an APU given HBM and an interposer.
There is, more power and pushing more bandwidth.
Couple months from now? Maybe a working APU with HBM demoed inside of a month?
Wait, what? you mean an APU with more memory bandwidth than dGPUs will appear a couple of months from now?
Maybe the different CPU+GPU combination part, but CPUs are becoming increasingly irrelevant to gaming performance as the GPU takes over more acceleration.
Not true, games are increasingly sensitive to multi core count and clock speeds, (recent BF1 is a good example), pushing more frames require a vast amount of CPU resources, you can't put a Titan X/1080 class APU with a core i5 class CPU and hope you will achieve massive performance!

There is absolutely no reason you couldn't have a CPU socket with an APU pulling down 500W.
Yes there is, design complexity, flexibility of use and cost. On the same argumen, there is absolutely no reason why a dGPU would be made with insane clocks consuming upward of a 500w! With every inch you scale an APU, you can scale a dGPU two inches larger, if you want to play the no limits game, it applies to dGPUs too!
Only difference is you need to adjust the cooling and power delivery of the system to accommodate it.
Exactly, a very hard process and expensive as well.

You need to forget about the whole idea that an identical piece of silicon will perform differently based on where you place it.
And you need to remember there are limitations on every design choice you make, everything comes with a cost, nothing is free, a piece of silicon will have lower clocks in a laptop or console for power considerations, lower clocks or fused off units for workstations for durability, every design has a set of priorities derived from it's form factor, target audience and cost.

if discrete GPUs and APUs use the EXACT same piece of silicon, why would you expect them to perform differently? Your entire argument is based on the fallacy that they can't be equal.
APUs will have less clocks, less units, more constrained power limit, and less memory bandwidth, just like they always have been.

Your entire argument is based on the fantasy that any design can be made with no consideration for power, cost and usability. That is simply not true. Your whole argument is based on PR slides and wishful thinking, evangelizing every new manufacturing technique like it's the second coming, wait for the implementation first before celebrating what might not even be that competitive with what we have today! if APUs became stronger and cheaper to make due to these techniques, you can bet dGPUs will become hugely more powerful as a result as well! They will pack even more units and memory bandwidth, progress is not made unilaterally, it applies to both ends.
 
There is, more power and pushing more bandwidth.
Power is an artificial limitation you are injecting into the argument. The memory systems could be identical and able to perform however designed. Do laptops limit power for battery life? Yes, but they don't have to. A giant APU in a desktop design wouldn't be much different than a discrete card.

Wait, what? you mean an APU with more memory bandwidth than dGPUs will appear a couple of months from now?
Equal is more appropriate, although no reason it couldn't be higher. Articles already mention 1024 core Zen+Vega APUs with 4GB@128GB/s of HBM2 and ~2TFLOPs of compute. The high end server variant should be 4096 cores with 32GB HBM2. Not designed as a consumer part, but I'm sure consumers could get one. That's a full 1TB/s of bandwidth on an APU, not including system memory. That part theoretically lands above P100 in total available bandwidth thanks to the system memory. Last I checked P100 has more bandwidth than any other discrete card, so yes I think we will see an APU with more bandwidth than discrete GPUs. I'd expect APUs comparable to a 1080 around the middle of next year.

Yes there is, design complexity, flexibility of use and cost. On the same argumen, there is absolutely no reason why a dGPU would be made with insane clocks consuming upward of a 500w! With every inch you scale an APU, you can scale a dGPU two inches larger, if you want to play the no limits game, it applies to dGPUs too!
Why I'm pointing out that power is an artificial design limitation. There is absolutely no need to limit an APU to say <100W. Once an interposer is in play, similar to P100 and Fiji, there is already a huge sunk cost. Adding one more chip for the CPU won't cost much more. Flexibility may be a concern, but at that point you are dealing with different form factors and performance characteristics. The system incorporating that APU could be far smaller and more power efficient thanks to decreased bus lengths.

Exactly, a very hard process and expensive as well.
Power delivery and cooling is relatively easy. Certainly no worse than what is already occurring in either system. RX480 had power circuitry for 600-800W as I recall. Cooling systems can already vary wildly with large blowers, water loops, etc. Regardless of where the heat is coming from it still needs removed. It's just a question of designing one or two power delivery and cooling systems.

And you need to remember there are limitations on every design choice you make, everything comes with a cost, nothing is free, a piece of silicon will have lower clocks in a laptop or console for power considerations, lower clocks or fused off units for workstations for durability, every design has a set of priorities derived from it's form factor, target audience and cost.
Again, artificial limitation and irrelevant to that peak performance of the part. No reason to compare thermal constraints of a laptop to a standard desktop.

APUs will have less clocks, less units, more constrained power limit, and less memory bandwidth, just like they always have been.
Do you have any engineering background or information to back that up? You're again imposing an artificial limitation because you personally deem it to be true. Your argument here is incredibly naive and narrowminded from an engineering point of view. I'll stick with all the evidence of trends since the computer was first invented. I doubt the discrete market shrinking so much since the introduction of APUs was a fluke. By your argument that would never occur as discrete would always be cheaper, faster, and easier to design. Even P100 is a prime example of the design needing to shrink to get around bandwidth and power limitations. The close proximity of HBM to a GPU saves a lot of power, reduces latency, and in turn provides more bandwidth. It really is that simple. If that weren't the case why did P100 even bother with HBM as opposed to GDDR5X like the rest of the line?

Your entire argument is based on the fantasy that any design can be made with no consideration for power, cost and usability. That is simply not true. Your whole argument is based on PR slides and wishful thinking, evangelizing every new manufacturing technique like it's the second coming, wait for the implementation first before celebrating what might not even be that competitive with what we have today! if APUs became stronger and cheaper to make due to these techniques, you can bet dGPUs will become hugely more powerful as a result as well! They will pack even more units and memory bandwidth, progress is not made unilaterally, it applies to both ends.
My argument is based on engineering and recorded history. I'd hardly call that a fantasy. Simple engineering fundamentals and experience make the argument, not PR slides. The entire industry is already moving in this direction. HBM is being used for a lot more than just graphics cards and there is a reason for that. Same with stacked dice and other SoC designs.

I'm starting to think this was actually a troll post due to just how incredibly ignorant it is.
 
I'd expect APUs comparable to a 1080 around the middle of next year.
Not gonna happen, but we'll see, next year is not that far away though.

There is absolutely no need to limit an APU to say <100W
Same thing, absolutely no reason to limit dGPUs to 300W, if through new manufacturing technologies, APUs were able to push to 300W, then dGPUs will break the 300W mark to surpass them.
Adding one more chip for the CPU won't cost much more.
I beg to differ, it will add significant cost, especially if both the CPU and GPU are powerful.
Again, artificial limitation and irrelevant to that peak performance of the part. No reason to compare thermal constraints of a laptop to a standard desktop.
Maybe irrelevant in the absolute sense, but relevant to the matter of APUs replacing dGPUs. Power, heat and cost will limit mainstream APUs. and make super mega APUs unattractive compared to traditional dGPUs.
The close proximity of HBM to a GPU saves a lot of power, reduces latency, and in turn provides more bandwidth.
No argument here, HBM benefits will be beneficial to all, dGPUs will just push HBM further than APUs though, weather through clocks, bus width, or using a higher version of it when applicable.
I'm starting to think this was actually a troll post due to just how incredibly ignorant it is.
I won't tolerate this kind of language, your post is nothing but speculations, with no data or products to back it up, calling other's posts trolls and ignorance is a problem on your part. If you can't come up with convincing data or "logical points" other than technology will reach there eventually, then I suggest you don't engage in such argument. We've heard this kind of wishful thinking five years ago, and it was a fluke and some pipe dream, I see no indication of that changing whatsoever.
 
3-They can't be touched at the moment, no single desktop APU comes close to them..

I have to modify that for you slightly. No single desktop APU comes close to the lowest end dGPU today (Polaris 11/GP107). I know you meant it but I just want to emphasize that. And for the majority of the use cases of these APUs..the iGPUs are already overpowered. Expecting them to increase in performance by orders of magnitude in a few years time is pure fantasy.

Also..you have far more patience than I do!

Edit: One more point. A "big" APU might be a good fit for certain applications..but it is a big risk..and would require significant technical and financial resources. IF AMD was doing well and IF it was cash rich..I might have had some hope of them doing it. But given their current position, do not expect them to do anything which entails more than a moderate amount of risk.
 
Last edited:
your post is nothing but speculations, with no data or products to back it up

...

Articles already mention 1024 core Zen+Vega APUs with 4GB@128GB/s of HBM2 and ~2TFLOPs of compute.
http://www.bitsandchips.it/52-english-news/7622-rumor-two-versions-of-raven-ridge-under-development
zXYY2p.png


(16*64 = 1024 ALUs)

Official roadmap:

iVCxZJl.jpg



The high end server variant should be 4096 cores with 32GB HBM2. Not designed as a consumer part, but I'm sure consumers could get one.

From mid-2015:
j7OcuRR.png


And a tidbit from that article:

A straight-CPU system was considered as an exascale candidate, but AMD believes the requisite power envelope is unattainable in this design. It has also considered a system with external discrete GPU cards connected to CPUs, but believes an integrated chip is superior for the following reasons:

+ Lower overheads (both latency and energy) for communicating between the CPU and GPU for both data movement and launching tasks/ kernels.

+ Easier dynamic power shifting between the CPU and GPU.

+ Lower overheads for cache coherence and synchronization among the CPU and GPU cache hierarchies that in turn improve programmability.

+ Higher flops per m3 (performance density).


And something that appeared in leaked slides two years ago:

WUB9zUu.jpg


Which BTW is Zeppelin and that very same name appeared in a Linux Kernel update 9 months ago.




No single desktop APU comes close to the lowest end dGPU today (Polaris 11/GP107).
Even going around this purely artificial "it must be on PC desktop" limitation (the PS4 Pro is releasing in a week with a GPU a lot faster than P11 or GP107), care to show any proof that Oland and GK208 aren't being produced any more? I'm fairly certain I've seen fairly new laptops being announced with a GK208 Geforce 920M.
 
I have to modify that for you slightly. No single desktop APU comes close to the lowest end dGPU today
And here we have people saying they will come close to the 1080 in less than a year!

(16*64 = 1024 ALUs)
.
.
.
Official roadmap:
All of the examples you listed actually proves the existing point, all are designed within limited power budget, all will be downclocked massively compared to their dGPUs counterparts, both on the GPU and memory side.
 
I'm starting to think this was actually a troll post due to just how incredibly ignorant it is.
I can say your persistence is admirable..


The high end server variant should be 4096 cores with 32GB HBM2. Not designed as a consumer part, but I'm sure consumers could get one.
I think the largest limitation for the consumer would be how large and how many chips could you fit in an interposer for socket AM4.

A good exercise would be comparing the area of the AM4 socket to e.g. the area of Fiji's interposer. I think I did the calculations once and the result was close to Fiji having 2* the area of AM4.
Fiji is made on 28nm and it has 4 stacks of HBM. That said, I think if they "halved" Fiji's die area for 14FF and used only 2 stacks HBM2 (still good for 16GB though 8GB is more likely due to SK Hynix's current production lines), they could probably still fit a 4-core Zen in there.
This would put "the largest consumer APU" possible for AM4 able to surpass both Polaris 10 and GP106.

2018 and 7nm would/will open a whole new plethora of possibilities for socket AM4 though.
 
Even going around this purely artificial "it must be on PC desktop" limitation (the PS4 Pro is releasing in a week with a GPU a lot faster than P11 or GP107), care to show any proof that Oland and GK208 aren't being produced any more? I'm fairly certain I've seen fairly new laptops being announced with a GK208 Geforce 920M.

Because..for the nth time..we are comparing desktop/mobile APUs to desktop/mobile dGPUs. We have discussed this enough..everything else is an Apples to Oranges comparison.

So you want me to compare Skylake/Kaby Lake with Oland/GK208? Why? Because it makes APUs look better? I specifically compared the latest APUs to the latest dGPUs. Otherwise I could very well have compared GP107/Polaris 11 to Haswell. Should I ask you to show any proof that Haswell is not being produced anymore? :rolleyes: Sure..lets throw a 920M into the discussion for no reason..and say "I'm fairly certain I've seen fairly new laptops"..and which is irrelevant anyway.
 
Not gonna happen, but we'll see, next year is not that far away though.
“We’re bringing the APU concept fully into the server realm,” Norrod stated. “These are high performance server APUs offering not just high-performance CPU cores and memory but multi-teraflops GPU-capability, providing a level of performance for machine learning, a level of performance for finite element analysis, and a level of performance for memory bandwidth for reverse time migration algorithms that the oil companies use to do reservoir simulation.”
https://www.hpcwire.com/2015/05/07/amd-refreshes-roadmap-transitions-back-to-hpc/
AMD has roadmapped an APU comparable to P100 for the server market. Primarily to provide full access to system memory as opposed to a lengthy bus. Similar concept as the SSG as they address similar problems, but that likely targets a different market. I wouldn't be surprised if AMD, Intel, and Nvidia design a discrete card with DIMMs in addition to HBM that can function almost as standalone processors for more scalable systems. Doesn't seem at all unreasonable for them to fill in the gaps with lesser models, especially if the designs are based around MCMs. Just a matter of designing an interposer for each configuration, which is a trivial step no different than a PCB. The same silicon that is used in a dGPU gets used in the APU, but instead of communicating through the PCB and motherboard, it happens on the interposer.

https://www.overclock3d.net/news/cpu_mainboard/amd_2015-2020_roadmaps_emerge/1

Granted this roadmap is over a year old now, but "Smooth and Efficient Transition from dGPU to SoC" with 200-300W APUs in the server market I'd expect to be fairly high performance parts. Difficult to know how large the GPU will be, but I'm sure it's something comparable to most current 16 core CPUs on the market. More likely for consumers would seem to be 8 core Zen APUs with varying Vega models. Ton of possibilities though, and little reason not to consider manufacturing all of them considering the ease of doing so. Interchangeable parts with MCM are great.

Same thing, absolutely no reason to limit dGPUs to 300W, if through new manufacturing technologies, APUs were able to push to 300W, then dGPUs will break the 300W mark to surpass them.
That's what I'm getting at, neither is superior to the other. The only difference is the design constraints, in which case a CPU socket is far more relaxed. Performance in regards to processing and power will likely be superior on the APU because of shorter signal length. Traditional APUs have been used in power constrained designs, but they aren't locked into those designs. Case in point that server APU mentioned above.

I beg to differ, it will add significant cost, especially if both the CPU and GPU are powerful.
Adding one more chip to an interposer should add an insignificant installation cost. If you have a GPU and 4 stacks of HBM, adding a CPU/one more chip while you have a system in place to add chips isn't difficult. The cooling and power delivery will be a bit different, but a 300W discrete card and 300W socket aren't all that different. In fact the socket is likely easier to cool as the area around it can be designed around cooling it. Just a matter of specifying clearance for a 120mm fan/heatsink that wouldn't be practical for a discrete card with design constraints. In fact you could probably specify a socket with a significantly larger cooling apparatus far more easily than changing add in board parameters. Just compare the sizes of current CPU (150mm^3) coolers to that of a GPU. On a GPU one of those things would block every slot in your system.

No argument here, HBM benefits will be beneficial to all, dGPUs will just push HBM further than APUs though, weather through clocks, bus width, or using a higher version of it when applicable.
Once the interposer is in play, bus width largely ceases to be a limitation. Neither design should have an advantage, although the APU likely has an IO advantage with internalized components. Integrated components won't count against your socket pin count beyond power/ground and possibly some signaling depending on purpose(display output, network).

I won't tolerate this kind of language, your post is nothing but speculations, with no data or products to back it up, calling other's posts trolls and ignorance is a problem on your part. If you can't come up with convincing data or "logical points" other than technology will reach there eventually, then I suggest you don't engage in such argument. We've heard this kind of wishful thinking five years ago, and it was a fluke and some pipe dream, I see no indication of that changing whatsoever.
PR2MWq116-002.png

http://jonpeddie.com/press-releases...sktop-discrete-gpu-shipments-increased-5.3-fr
Intel would have started shipping IGPs in 2010, AMD around the latter half of 2011. Intel doesn't ship discrete. For the sake of comparison, I'll call AMD's share fairly consistent, excluding 2014 onward. Intel appears to be gaining while Nvidia losing from 2010 onward with a gain at AMDs expense in 2014. For a company with no discrete part to be gaining share, it has to come at some expense to discrete. Even if discrete sales volume increases, if it fails to keep pace with market growth it is in decline.

As for computers getting smaller and more integrated, find your nearest electronic device and compare it to a model a few years older.

A good exercise would be comparing the area of the AM4 socket to e.g. the area of Fiji's interposer. I think I did the calculations once and the result was close to Fiji having 2* the area of AM4.
Fiji is made on 28nm and it has 4 stacks of HBM. That said, I think if they "halved" Fiji's die area for 14FF and used only 2 stacks HBM2 (still good for 16GB though 8GB is more likely due to SK Hynix's current production lines), they could probably still fit a 4-core Zen in there.
This would put "the largest consumer APU" possible for AM4 able to surpass both Polaris 10 and GP106.
HBM2 is ~40mm2, AM4 is 1600mm2, and fiji was ~1000mm2 interposer on 2500mm2 package (P100 is ~3000mm2 package). With 4 stacks of HBM2 you still have 840mm2 of interposer to stick chips using a fiji sized interposer. Even if you spend 100mm2 on Zen cores (I think they're ~5mm2 each so that's 20 cores), you'd still have room for a die larger than Fiji at ~600mm2. If you assume a similar layout to Fiji, they could probably fit a pair of 8 core zen CPUs alongside a GPU equivalent to 2.5x Polaris 10 in transistors. Performance ratio is probably higher as the fixed function stuff won't scale linearly and HBM2 will save some power. Not to mention Polaris was tight on bandwidth and any architectural changes with Vega.
 
HBM2 is ~40mm2, AM4 is 1600mm2, and fiji was ~1000mm2 interposer on 2500mm2 package (P100 is ~3000mm2 package). With 4 stacks of HBM2 you still have 840mm2 of interposer to stick chips using a fiji sized interposer. Even if you spend 100mm2 on Zen cores (I think they're ~5mm2 each so that's 20 cores), you'd still have room for a die larger than Fiji at ~600mm2. If you assume a similar layout to Fiji, they could probably fit a pair of 8 core zen CPUs alongside a GPU equivalent to 2.5x Polaris 10 in transistors.

But you're assuming they could fill the whole 1600mm^2 of AM4 with an interposer. Could they?
If Fiji's interposer only occupied 40% of the whole package, what makes you think an AM4 APU could fill close to 100% of the package area? Surely there's a reason why AMD needed a 2500mm^2 package to put a 1000mm^2 interposer, otherwise the package could have been smaller and cheaper, right?

Perhaps a better approximation would be to assume that an Interposer on AM4 would also occupy ~40% of those 1600mm^2, meaning ~650mm^2 total area. This is still good for 2*HBM2 stacks (80mm^2), ~120mm^2 CPUs and 450mm^2 for the GPU. 450mm^2 is still about 2xPolaris 10, or approximately what a Vega 10 is expected to be. Easily a 10 TFLOP GPU.

Of course, this would be a ~300W APU, so AMD would have to be very careful about what kind of validation they would allow for AM4 cooling systems, not to mention they would need to somehow impose some overclocking limitations.
Interestingly, anandtech once tested the performance of several AiO coolers by simulating CPUs with a 340W TDP, and these were temperature the results using low fan speeds (7V limit):

mV1obV.png


50Âş using the cheapest solution ($60 I think) while limiting the fan speed.
It doesn't look like such an APU would require a very high-end cooling solution.



Because..for the nth time..we are comparing desktop/mobile APUs to desktop/mobile dGPUs. We have discussed this enough..everything else is an Apples to Oranges comparison.
No, you are. We are discussing technology advancement and integration in a much broader sense based on long-term roadmaps from IHVs. Because it would be utterly ridiculous not to do so if we're talking about a 10-year lifespan.
And if you refuse to compare x86 APUs to x86 APUs because one is soldered and the other goes into a socket, then you're 100% free to leave the discussion.
What you're not entitled to is change the subject of the discussion. The thread's title is "dGPU vs. APU spin-off", not "PC dGPU vs. PC APU spin-off", or as what you're trying to mold into "current PC dGPU vs. current PC APU".
You can create your own thread about current dGPUs and APUs and discuss whatever you want there.
 
No, you are. We are discussing technology advancement and integration in a much broader sense based on long-term roadmaps from IHVs. Because it would be utterly ridiculous not to do so if we're talking about a 10-year lifespan.
And if you refuse to compare x86 APUs to x86 APUs because one is soldered and the other goes into a socket, then you're 100% free to leave the discussion.
What you're not entitled to is change the subject of the discussion. The thread's title is "dGPU vs. APU spin-off", not "PC dGPU vs. PC APU spin-off", or as what you're trying to mold into "current PC dGPU vs. current PC APU".
You can create your own thread about current dGPUs and APUs and discuss whatever you want there.

So as usual (and not unexpectedly) you change your argument and the goal posts when you fail to prove a point or are proven wrong. Below are your first two posts on this thread.
Well discrete graphics cards will probably be extinct on the (~10 years?) long term. We'll eventually just be switching high-performance SoCs/APUs with HBM/HMC/H-whatever on desktops. If nvidia as a handheld SoC maker is failing, nvidia as an automotive SoC maker could be their future.
We'll have this talk again when socket APUs with interposers, HBM and >8 TFLOPs GPUs arrive at the consumer market.

Go ahead and buy your fictional socketable PS4 APU and use it on your fictional PC motherboard. This is my last post on this thread.
 
Just in the off-chance this is a genuine misunderstanding, I'll explain one more time:

- We are discussing technology advancement and integration based on long-term roadmaps from IHVs to guess whether it'll make sense for discrete GPUs to exist 10 years from now.

10 (ten) years from now
, meaning it's completely futile to be stuck on the notion of what is and isn't available today on socketed APUs for consumers. It doesn't make any sense to somehow disallow talking about the current APUs on consoles on the premise that "they don't exist in socket form today" when we're talking about what will be available within the next several years. There's a socket transition happening within the next few months that will allow even higher performance APUs on the PC, in socket form. There's a mobile 15-35W Raven Ridge SoC with a FP32 throughput between the PS4 and Xbone in the roadmaps to be released within half a year, yet we shouldn't be talking about the console APUs?

It's too bad that you can't seem to go over that artificial limitation you decided to put on existing x86 APUs because they're not in sockets, especially because we're talking about future products.
 
But you're assuming they could fill the whole 1600mm^2 of AM4 with an interposer. Could they?
If Fiji's interposer only occupied 40% of the whole package, what makes you think an AM4 APU could fill close to 100% of the package area? Surely there's a reason why AMD needed a 2500mm^2 package to put a 1000mm^2 interposer, otherwise the package could have been smaller and cheaper, right?
I'd agree the area estimate is a bit ambitious. As stated it would be 1000mm2 out of 1600mm2 which may be reasonable still. The 1000mm2 is more or less what is required to fit all the chips involved with fiji. I'd have to double check, but the actual package size may just be a result of a standardized component. 40mm^2 for Fiji, 55mm^2 for P100 for example.

Of course, this would be a ~300W APU, so AMD would have to be very careful about what kind of validation they would allow for AM4 cooling systems, not to mention they would need to somehow impose some overclocking limitations.
That's actually far less of an issue than you would think. Insufficient cooling and it simply throttles down. Inadequate power would likely be detected and also limit performance. No different than current card designs. I'd guess we see PEG connectors located alongside the auxiliary 12V connector. Will be interesting because even USB 3.1 and thunderbolt will probably want dedicated connectors for that 100W supply requirement. Those connectors could also be used for delivering video output.

It doesn't look like such an APU would require a very high-end cooling solution.
Considering they're already running 80C and there are designs with dual(triple may be possible) 150mm fans blowing through giant heatpipes it won't take much. GPU fans are typically ~80mm with half the displacement of a typical case fan.
 
Just as only relatively few motherboards support the 220W bulldozer CPU today, only a few will support an hypothetical uber 250W APU. Cooling it will be difficult like it is today with the 220W CPU. Meanwhile a 300W dGPU can be powered directly by the PSU and its cooling system is simpler and better than the APU's. And a PC with a high-end dGPU would be cheaper since it wouldn't require a more robust and expensive motherboard as the high-end APU would. It would also be more upgradable.

It's also possible that dGPUs will incorporate fast single-thread cores in the future and become dAPUs.
 
That's actually far less of an issue than you would think. Insufficient cooling and it simply throttles down. Inadequate power would likely be detected and also limit performance. No different than current card designs.
I'm not saying it's an issue as in people would start to see their systems blowing up in flames either.
But if they don't say anything about the cooling conditions and people use 100W heatsinks in 300W APUs then the thing will overheat, throttle down and get terrible performance.


Just as only relatively few motherboards support the 220W bulldozer CPU today, only a few will support an hypothetical uber 250W APU.
Irrelevant to discuss AM3 motherboards. They're not taking any AM4 APUs.

Meanwhile a 300W dGPU can be powered directly by the PSU
And motherboards for high-performance APUs can just simply get additional 6/8-pin power connectors.
Which happens to have already been showcased in AMD's Naples motherboard earlier this year:
w7OQqjq.jpg



and its cooling system is simpler and better than the APU's.
So you're actually convinced that the graphics cards' cooling systems, which have to move the hot air perpendicularly or against a PCB that is placed at the top, preventing the hot air from going up, is simpler and better than a CPU whose ATX and ITX standards were designed to accomodate large coolers and facilitate the hot air to go up?


Cooling it will be difficult like it is today with the 220W CPU
Nope.
 
Irrelevant to discuss AM3 motherboards. They're not taking any AM4 APUs.


And motherboards for high-performance APUs can just simply get additional 6/8-pin power connectors.
Which happens to have already been showcased in AMD's Naples motherboard earlier this year:
w7OQqjq.jpg




So you're actually convinced that the graphics cards' cooling systems, which have to move the hot air perpendicularly or against a PCB that is placed at the top, preventing the hot air from going up, is simpler and better than a CPU whose ATX and ITX standards were designed to accomodate large coolers and facilitate the hot air to go up?



Nope.
You still have to move a lot of power through the motherboard regardless of socket type.
And since you mention the power connectors on "motherboards for high-performance APUs" then I assume that the average/majority of AM4 boards won't have all of those power connectors (or phases/VRMs/whatever). So, if you have a PC with one of those cheaper AM4 boards (and consequently a lower-end APU) and decide to play some AAA games, then you have to upgrade your motherboard and APU, or you could just buy a decent dGPU and use it with your current board/APU. I think the dGPU route would be cheaper with less hassle.
APUs today have way worse performance than dGPUs. That may change in the future (or not), but APUs will always have worse upgrade paths than CPUs + dGPUs, which is very important on desktop PCs.
But I admit that the CPU/APU cooling thing isn't as bad as I thought.
 
AMD has roadmapped an APU comparable to P100 for the server market.
Ton of possibilities though,
We will see how that implementation goes, expect massively reduced clocks though.

Performance in regards to processing and power will likely be superior on the APU because of shorter signal length.
Maybe, I doubt it amounts to any significant performance difference, likely will remain negligible. None the less, dGPUs have vastly more performance at hand to offset any potential benefits from shorter signal lengths.

Traditional APUs have been used in power constrained designs, but they aren't locked into those designs. Case in point that server APU mentioned above.
We still yet to see that server APU actually take off and succeed in providing a good alternative, I will reserve judgement until it's actually released, thank you very much.

but a 300W discrete card and 300W socket aren't all that different.
They are not, however a 300W dGPU has more graphics performance than a 300w socket CPU +GPU. And again If APUs started pushing to 300W limits, dGPUs will strive to exceed it significantly.

Once the interposer is in play, bus width largely ceases to be a limitation. Neither design should have an advantage,
dGPUs will have the advantage of faster memory clocks. And again, vast memory bandwidth is useless in APUs that lack the threshold of GPU and CPU power, just look at FuryX compared to the 980Ti, didn't give it the edge outside of a few corner cases.

Intel would have started shipping IGPs in 2010 .. For a company with no discrete part to be gaining share, it has to come at some expense to discrete.
There is no denying Intel iGPUs made lower half dGPUs irrelevant, but that's because the vast majority of people don't play games with this kind of hardware, they want a device to play videos and do word processing, low end dGPUs used to be the only alternative here, not anymore now obviously, so iGPUs stepped in and took that role.

However iGPUs didn't take the role of hardware that plays video games, as evident by the dwindling AMD APUs share (it's basically a flat line since 2004), and the general disinterest in them compared to Intel's iGPUs, Evangelists of dGPU's demise like to confuse the issue and blur the line in this particular point, but the market trend is clear. dGPUs have started to grow in the middle and high end as well, driven by the growth in the gaming sector.
 
Last edited:
It's also possible that dGPUs will incorporate fast single-thread cores in the future and become dAPUs.
That actually seems increasingly likely, although dGPU probably isn't the correct term at that point. Once you do that you more or less have a blade server or cluster.

But if they don't say anything about the cooling conditions and people use 100W heatsinks in 300W APUs then the thing will overheat, throttle down and get terrible performance.
The results would be interesting. Consider typical undervolting. While absolute performance will likely decrease, you would still end up with amazing perf/watt as a result of lots of low clocked silicon.

You still have to move a lot of power through the motherboard regardless of socket type.
And since you mention the power connectors on "motherboards for high-performance APUs" then I assume that the average/majority of AM4 boards won't have all of those power connectors (or phases/VRMs/whatever). So, if you have a PC with one of those cheaper AM4 boards (and consequently a lower-end APU) and decide to play some AAA games, then you have to upgrade your motherboard and APU, or you could just buy a decent dGPU and use it with your current board/APU. I think the dGPU route would be cheaper with less hassle.
APUs today have way worse performance than dGPUs. That may change in the future (or not), but APUs will always have worse upgrade paths than CPUs + dGPUs, which is very important on desktop PCs.
But I admit that the CPU/APU cooling thing isn't as bad as I thought.
That power can be localized or engineered into the design with wider traces. That still may be preferable to having wires running everywhere.

As for board quality there are a lot of factors there. Does if have a 16x PCIE slot of the proper generation? Is your CPU fast enough to drive a newer GPU? Is the ram fast enough? Is your PSU large enough to accommodate a larger GPU. Upgrade paths are still interesting and I wouldn't say one is better than the other as there are so many considerations. APU would be far more ideal for a small form factor system. Another consideration is the elements of a dGPU that are effectively stripped off in the case of an APU. So upgrading to a newer APU wouldn't require the board components, power circuitry, and cooler of a full dGPU. So the APU upgrade may in fact be cheaper. From a developer PoV the APU would also be a much more predictable platform on par with a console as your CPU and GPU capabilities may not vary as much. Some applications that don't need peak performance may benefit from that.

APU+dGPU would probably yield some of the best performance we have seen as even a small APU could be augmenting your processing power. Sound, AI, physics on a second card would free up resources for the dGPU and benefit from the better access to system memory and the CPU.

dGPUs will have the advantage of faster memory clocks. And again, vast memory bandwidth is useless in APUs that lack the threshold of GPU and CPU power, just look at FuryX compared to the 980Ti, didn't give it the edge outside of a few corner cases.
Memory capabilities and clocks in the case of HBM won't be any different. The only reason the current difference exists is because of pin limitations on the package. An APU with pins for system memory and video memory all on one package would perform the same as two discrete packages. It would actually be faster with the shorter connection between them. The only reason it hasn't occurred is the prohibitively large pin count that would be required. Designs simply had limited bandwidth to feed APU graphics. The interposer and stacked ram solve that problem.

The FuryX and 980ti comparison isn't that great. There were some obvious deficiencies in Fiji to fit on the package which is why those corner cases exist. When not held back by geometry or memory capacity performance is generally really good. Doom would seem a prime example of this. In the case of a Fiji based APU with a large interconnect, that memory capacity issue likely goes away. 3GB/s PCIE to say 30GB/s low latency interconnect goes a long way. With the APU there would be a solid argument of a GPU with VRAM+system memory capacity of available memory.

The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-with-m2-ssds-onboard

Using the SSG here as it is a reasonably close approximation of the capability and demonstrates the benefit. In the case of graphics, the developers job becomes easier as all resources can effectively be loaded. The need to manage a constrained memory pool largely goes away, and that's been one of the largest stumbling blocks of DX12/Vulkan I've seen so far. All the stuttering and frame drops because resources weren't loaded in time. It's the same model consoles currently use.

They are not, however a 300W dGPU has more graphics performance than a 300w socket CPU +GPU.
Not necessarily, but it would likely be situational. The APU would be free of all the memory management work and able to stream resources far more quickly. The dGPU on the other hand would have a bit more headroom to clock higher. So situations where CPU limited, bound my VRAM, PCIE issues, etc, the APU would likely be superior.

However iGPUs didn't take the role of hardware that plays video games
http://www.gamasutra.com/blogs/TimM...forecast_revised_with_24B_exits_last_year.php
Depends on your definition of video game market. PC only, not quite. Throw in consoles and iGPUs are the vast majority. Throw in mobiles and the share gets even smaller. "Online" seems a bit ambiguous in that example though. Likely includes MMOs, but online gambling etc is there too.
 
Status
Not open for further replies.
Back
Top