dGPU vs APU spin-off

  • Thread starter Deleted member 13524
  • Start date
Status
Not open for further replies.
Had forgotten about this thread.

Again, consoles are not desktop APUs. I was very specific in my phrases, desktop APUs have yet to use any memory other than DDR3, don't try to force a contradiction that is only in your mind.

From a previous post of your own making.

HBM might be great for an APU, but then again further down the line, HBM will be even more powerful on dGPUs, faster clocks, wider buses .. etc. Come to think of it, there is nothing stopping people from putting GDDR5X on APUs now, after all they are cheaper than HBM and competitive as well. But no body did it.

This is even without mentioning any new memory technology that might appear down the line which makes HBM obsolete. Right then we will have a repeat of the current situation, APUs will play catch up with dGPUs over again.

All that is mentioned is APU. Your own words. Now stop contradicting yourself. As I posted previously. While it isn't GDDR5x, the PS4 APU which is basically a PC APU uses GDDR.

But I suppose it's easier to move the goalposts than to admit you were incorrect.

You are dancing around the fire here, they are not upgradable/replaceable like desktop APUs, they don't have two different memory pools like desktop APUs, they have custom hardware and point to point connections/mainboards unlike desktop APUs, they have much larger system bus and bandwidth unlike desktop APUs, hence why they are not desktop APUs, they are another category altogether no matter how hard you are trying to morph them into one.

What? You mean like the Intel NUC and other related PCs? Laptops? Tablets? AIOs?

Most of which do not feature upgraded APUs or CPUs. Most of which do not have split memory pools, using main memory for both the CPU and GPU. Which reminds me of something similar. What was it? Oh yeah the PS4/XBO APUs which use X86 cores paired with PC GPU cores. Hmmm. Just like many PCs. And in fact, XBO is running a Windows (PC) OS. Meanwhile the PS4 is running FreeBSD (a PC OS). Additionally the XBO is using DDR3 for both the CPU and GPU, exactly like the vast majority of PC Laptops, tablets, AIOs, Intel NUCs, and their clones. In other words, NOT split memory pools.

As well since when did BUS width determine whether something is a computer or not a computer? Are Workstations using 8 banks of memory not PCs then since they have a significantly larger memory bus than desktop computers?

Regards,
SB
 
Perhaps AMD wants to kill dGPUs. There's a forum guy who runs a 7850 2GB, so about PS4 / Xbone performance level, but there's no proper linux support in Ubuntu 16.04 and Mint 18, because GCN 1.0 is too old. So games run like shit and stutter.
But well, Trinity/Richland are dead too, though still being sold.
Can't we get a decade of uninterrupted support?
 
All that is mentioned is APU. Your own words. Now stop contradicting yourself. As I posted previously. While it isn't GDDR5x, the PS4 APU which is basically a PC APU uses GDDR.
You probably didn't read the previous pages, this post is a continuation of a context discussing why consoles are not desktop APUs. No contradictions here, you simply chimed into the middle of the discussion with no regard for previous points, I think it is fair for me to not to count that possibility into every post I make. A thread is a stream of thoughts not headline titles to be quoted at first glance.
What? You mean like the Intel NUC and other related PCs? Laptops? Tablets? AIOs?
Last time I checked, none of those have dGPUs, with the exception of laptops with dGPUs of course.

Why are we bringing consoles are PCs into the discussion?!!! No one is even debating that point! Consoles are PCs, they are still not desktop PCs or laptops with dGPUs! They are another category for gaming, they are built for gaming, so they do not to generate profit from hardware, but from an entire ecosystem of games, online features and accessories.

For desktop APUs to repeat the same success, to dethrone dGPUs from the market of high quality, high performance gaming, they have to maintain the same formula as consoles, so it goes like this:

dGPUs are dominant in performance, but expensive.
APUs need to capture the desktop market and laptops with dGPUs market.
APUs will be built using expensive tech to be able to do that, they will need to maintain upgradeability, adequate CPU power, good cooling solutions, expensive memory technology and large system bus with (possibly) a unified memory pool into the mix, a tall order and a bunch of huge obstacles, I don't see that happening at all, it is not feasible or profitable, a dGPU will be far more powerful and profitable than that system.

And in fact, XBO is running a Windows (PC) OS. Meanwhile the PS4 is running FreeBSD (a PC OS).
Since when does an OS determine the type of the device? a PC can run Android, does that qualify it to be a mobile platform? a tablet/phablet with an X86 chip runs both Android and Windows, that's not a PC.

Most of which do not feature upgraded APUs or CPUs. Most of which do not have split memory pools, using main memory for both the CPU and GPU.
Again, the bread and butter of gaming dGPUs is the desktop PCs, and laptops with dGPUs, none of which have any of those console features.

To summarize this whole thing, and strip it out of confusion, mixed arguments and unnecessary wishful thinking:

1-dGPUs are powerful hardware built for power gaming beyond trashy graphics and low fps slideshows.
2-They are used in Desktop PCs, laptops with dGPUs, workstations and super computers.
3-They can't be touched at the moment, no single desktop APU comes close to them.
4-You need powerful desktop APUs to dethrone dGPUs
5-Consoles are not desktop APUs, they are another market category, the don't directly compete with PCs here. bringing them into the discussion is like saying consoles will obliterate PCs and PC gaming is doomed, which has been proven to be false. Do we need to go through that futile discussion again?
6-Current Consoles have PC like architecture, but they hold major advantages over current desktop APUs, among them is shared memory pool, bigger memory bandwidth, and custom hardware, none of which are available to desktop APUs, they are also built specifically for gaming and to maximize profits from aspects other than pure hardware, the same can't be said about desktop APUs.
7-You can't have powerful enough desktop APUs because dGPUs are always several steps ahead hardware and software wise
8-Super Mega APUs are not feasible, they are expensive and clumsy, not profitable, have several technical hurdles to overcome.
9-Hence why, dGPUs are here to stay.
 
Last edited:
While it isn't GDDR5x, the PS4 APU which is basically a PC APU uses GDDR.
Friendly reminder that Kaveri had a GDDR5 option at some point, so a Windows PC using GDDR5 for system memory isn't that much of a pipe dream.
Hacking the PS4 to run windows would be a blast.

As well since when did BUS width determine whether something is a computer or not a computer? Are Workstations using 8 banks of memory not PCs then since they have a significantly larger memory bus than desktop computers?
Wait, my X79 system isn't a PC? I've been playing Steam games on what, a console??
Here I thought I was Masterrace and turns out I'm a lowly peasant!

Can't we get a decade of uninterrupted support?
Erm. For Linux? No?
There's just no market for Linux gaming that would warrant even one year for games, let alone 10 years.
Why does he want to play games in Linux anyway?
 
6-Current Consoles have PC like architecture, but they hold major advantages over current desktop APUs, among them is shared memory pool, bigger memory bandwidth, and custom hardware, none of which are available to desktop APUs, they are also built specifically for gaming and to maximize profits from aspects other than pure hardware, the same can't be said about desktop APUs.
All APUs should have a shared memory pool. That isn't really unique to consoles. Even an APU with separate memory pools will likely have far more bandwidth between the pools than PCIE would allow. There's no reason a large APU can't be created, just that the technology isn't quite there. Bandwidth and pin counts are the real bottleneck for a large APU. HBM addresses that concern. MCMs address a lot of the cost and configuration concerns.

7-You can't have powerful enough desktop APUs because dGPUs are always several steps ahead hardware and software wise
Why exactly are they several steps ahead? There is absolutely no reason a difference must exist beyond GPUs traditionally evolving faster than CPUs. In the case of MCMs that issue goes away.

8-Super Mega APUs are not feasible, they are expensive and clumsy, not profitable, have several technical hurdles to overcome.
Already on AMDs roadmap and confirmed by CEO. Hurdles have already been overcome and all the tech companies see that as the future. Even Nvidia's research arm 5 years ago saw it. You mention they aren't feasible and there are still technical hurdles, yet they should be on the market inside of 8 months? I believe the IEEE paper was a 16 core Zen with 4096 core Vega APU with 16/32GB of HBM.

Erm. For Linux? No?
There's just no market for Linux gaming that would warrant even one year for games, let alone 10 years.
Why does he want to play games in Linux anyway?
PS4 is technically a linux box. Emulation of windows games also works rather well on linux, with the exception of graphics APIs. If written in OpenGL or Vulkan porting is pretty straightforward.
 
just that the technology isn't quite there. Bandwidth and pin counts are the real bottleneck for a large APU.
Exactly.
HBM addresses that concern.
By the time HBM makes it into APUs, dGPUs will have had a faster version with a wider bus of HBM, maybe even a higher version of it.
Why exactly are they several steps ahead?
They can push more watts, attain more clocks, go wider and pack more ALUs and, use wider memory buses, attach bigger capacities of VRAM.. etc! They also maintain the ability to work with more powerful CPUs, and thus reduce the chance of being CPU bottlenecked.
MCMs address a lot of the cost and configuration concerns.
I somehow doubt that. It's an old argument made anew, It will also introduce a new set of problems like thermal management and complicated manufacturing. MCM can actually add more burden to the cost if the right manufacturing infrastructure wasn't available.
I believe the IEEE paper was a 16 core Zen with 4096 core Vega APU with 16/32GB of HBM.
How about clocks? how will this thing be cooled or made within reasonable power consumption levels? never the less a discrete versions of the same CPU and GPU will have higher clocks and more performance available to it.

You are postulating dGPUs progress will slow to a crawl to allow APUs to catch up, that's not going to happen, not with the emergence of new visual challenges every year! Or are we postulating new visual frontiers to remain the same 10 years from now?! That's just against progress!
 
We'll have APU with separate memory pools, that may be inevitable in the long term?
Year 20xx : you have 64GB or 128GB of HBM as "near RAM", and "far RAM" can be configured from 0B to 2TB or 4TB.

Zen + Vega Opteron is a bit like that, next-gen Xeon Phi is like that on a single chip (but not really a GPU at all). Nvidia Volta + IBM POWER9 is fairly or exactly like that (but a fully loaded machine costs $100k). It's the "h" in hUMA.

Soon, AM4 socket and Bristol Ridge (Excavator) are out. Excavator/GCN 1.2 APU were supposed to have seamless access to dGPU memory. Perhaps you can put a R9 380 or Fury in there and do some experiments of well, academic interest.
 
We'll have APU with separate memory pools, that may be inevitable in the long term?
Year 20xx : you have 64GB or 128GB of HBM as "near RAM", and "far RAM" can be configured from 0B to 2TB or 4TB.

Zen + Vega Opteron is a bit like that, next-gen Xeon Phi is like that on a single chip (but not really a GPU at all). Nvidia Volta + IBM POWER9 is fairly or exactly like that (but a fully loaded machine costs $100k). It's the "h" in hUMA.

Soon, AM4 socket and Bristol Ridge (Excavator) are out. Excavator/GCN 1.2 APU were supposed to have seamless access to dGPU memory. Perhaps you can put a R9 380 or Fury in there and do some experiments of well, academic interest.
What is interesting from the HPC side of things is that it is now possible for both Intel with Phi (KNL and onwards) and Nvidia (with Pascal and latest Cuda/SM_52 upwards) to overcome the offload data issues that was going to be one of the biggest benefits of an APU to simplify unified memory (it will still have performance benefits in theory - although need to consider multinode hyperscaling/scale-out/etc) but now there are real-world alternatives that will continue to evolve (such as what is hinted with Volta).
And this is also being extended to Linux/Red Hat.
Cheers
 
Nick of swiftshader fame claimed avx2 would be the death of dgpu
he keeps denying it even though I post a direct quote from him.

Edit: my bad he said avx2 would be the end of gpgpu -oops
 
Last edited:
Nick of swiftshader fame claimed avx2 would be the death of dgpu
he keeps denying it even though I post a direct quote from him.

Edit: my bad he said avx2 would be the end of gpgpu -oops
IIRC, many people were (and still are) saying that about AVX512, not AVX2.
And it wouldn't be the death of the dGPU, but the death of the GPU. Or at least the current form of GPU's ALUs, because fixed-function ROPs and other units apparently still need to exist for better efficiency.

In the end, it's just a suggestion that the GPU and CPU ALUs will "merge" (Larrabee attempted just that not long ago). Which is probably going to happen too, though probably on a longer time distance.
 
IIRC, many people were (and still are) saying that about AVX512, not AVX2.
And it wouldn't be the death of the dGPU, but the death of the GPU. Or at least the current form of GPU's ALUs, because fixed-function ROPs and other units apparently still need to exist for better efficiency.

In the end, it's just a suggestion that the GPU and CPU ALUs will "merge" (Larrabee attempted just that not long ago). Which is probably going to happen too, though probably on a longer time distance.
Yeah already seems to be evolving, the latest Phi in KNL is a co/processor that along with the design changes also does not need a Xeon CPU so self boot and with a larger step change in the next generation.
Quite a few are interested in implementing that way, how this takes off is to be seen though.
Cheers
Edit:
Good summary presentation from Intel on KNL.
https://www.nersc.gov/assets/Uploads/KNL-ISC-2015-Workshop-Keynote.pdf
 
Last edited:
Yet in mobile we've seen Cortex A57 + Cortex A53 blocks pairing and many such combinations. What I mean is there could be convergence between CPU and GPU but not precluding the use of rather internally different cores or tiles, even in the same chip.
 
The design points are completely different.

CPUs (and their floating point units) are optimized for low latency operations, GPUs for throughput. Both are limited by power budget.

A CPU spends a lot of power and transistors to lower apparent latency; Multi-level out of order execution, caches, speculative load/store units, branch prediction and execution units with very low latencies. A GPU doesn't, it is optimized for throughput. This means it can have 1000s of instructions in flight, the worst case instructions are probably gather, everything else just has to be fast enough.

If you want to fuse a GPU SIMD unit into a CPU you either increase its speed and suffer excessive power consumption (AVX512 !), - or you don't and suffer excessive latencies. If you choose the latter you have to increase the ROB/issue queues which comes with a power and speed penalty.

And all this effort to save a bit of Si real estate, which is almost free today.

Cheers
 
The design points are completely different.

But they were a lot more different before than they are today.
CPUs are starting to become more parallel through multi-core and multi-threading and GPUs are getting higher single-threaded performance through e.g. larger caches.

I won't deny they're very different today, but you can't deny they've been getting closer.
 
You do realize the interposer completely eliminates those issues right? So with HBM present the pin/bandwidth issues go away entirely. The design gets better performance in the process thanks to reduced latency and power consumption.

By the time HBM makes it into APUs, dGPUs will have had a faster version with a wider bus of HBM, maybe even a higher version of it.
Couple months from now? Maybe a working APU with HBM demoed inside of a month? That might even beat discrete GPUs to HBM2, although it seems likely they could be shown at the same time. There is absolutely no reason a discrete could could have a wider bus than an APU given HBM and an interposer.

They can push more watts, attain more clocks, go wider and pack more ALUs and, use wider memory buses, attach bigger capacities of VRAM.. etc! They also maintain the ability to work with more powerful CPUs, and thus reduce the chance of being CPU bottlenecked.
None of what you just claimed here is true. Maybe the different CPU+GPU combination part, but CPUs are becoming increasingly irrelevant to gaming performance as the GPU takes over more acceleration. If you take the same exact piece of silicon and place it on an interposer located on a discrete card and APU, why would you expect it to perform differently? Because the CPU is sucking up an extra 35W or thereabouts? Cooling that is trivial and the design benefits from better access to system memory and lower power consumption thanks to eliminating a lengthy bus.

How about clocks? how will this thing be cooled or made within reasonable power consumption levels? never the less a discrete versions of the same CPU and GPU will have higher clocks and more performance available to it.
Why would it be any different? The only difference that exists is because of an artificial limitation you choose to place on the comparison. There is absolutely no reason you couldn't have a CPU socket with an APU pulling down 500W. Only difference is you need to adjust the cooling and power delivery of the system to accommodate it. Not too different from what laptops, servers, and SFF designs already do. You need to forget about the whole idea that an identical piece of silicon will perform differently based on where you place it.

You are postulating dGPUs progress will slow to a crawl to allow APUs to catch up, that's not going to happen, not with the emergence of new visual challenges every year! Or are we postulating new visual frontiers to remain the same 10 years from now?! That's just against progress!
No, I'm postulating discrete GPUs will need to adopt CPUs and larger memory systems in order to increase performance. A design choice all IHVs have indicated is the future and for a very good reason. Shorter lines of communication! Again, if discrete GPUs and APUs use the EXACT same piece of silicon, why would you expect them to perform differently? Your entire argument is based on the fallacy that they can't be equal.

And all this effort to save a bit of Si real estate, which is almost free today.
Seems more likely real estate costs will be increasing as IHVs start stacking dice. Lower voltages to the most power efficient location on the curve and use multiple dice. Take a 1V piece of silicon and drop the voltage to 500mV while stacking 4 of them to maintain the same power envelope for example. Maybe each die has 30% less performance, but you make up for it with 4x the transistors. Only works with stacking as making a die 4x larger often has impractical yields.
 
But they were a lot more different before than they are today.
CPUs are starting to become more parallel through multi-core and multi-threading and GPUs are getting higher single-threaded performance through e.g. larger caches.

I won't deny they're very different today, but you can't deny they've been getting closer.

True, but I don't think GPUs will improve single thread performance further. Everything is built around tolerating memory latencies. You need a ton of instructions in flight to cover this latency, so you need a ton of register state. The register file is just a large chunk of SRAM with register semantics (think CELL local store). It is high bandwidth, but also high latency (compared to CPUs, extremely high latency). Since data access is high latency, there is zero reason to make instruction execution low latency, so execution is in-order and high cycle count, because that is the most power efficient way.

To significantly lower latency you need to revamp the entire structure of the GPU, which would cost a lot of power and complexity which would cost on the parallel workloads GPUs normally runs.

Cheers
 
Seems more likely real estate costs will be increasing as IHVs start stacking dice. Lower voltages to the most power efficient location on the curve and use multiple dice. Take a 1V piece of silicon and drop the voltage to 500mV while stacking 4 of them to maintain the same power envelope for example. Maybe each die has 30% less performance, but you make up for it with 4x the transistors. Only works with stacking as making a die 4x larger often has impractical yields.

Is that a counter-argument? If IHVs are using 4x the dies, the cost of silicon has gone down, by a lot.

And they won't be stacking GPU dies anytime soon, the power density would be ridiculous.

Cheers
 
True, but I don't think GPUs will improve single thread performance further.
Of course they will, at the very least because their clocks have been steadily going up for each node iteration.

And they won't be stacking GPU dies anytime soon, the power density would be ridiculous.
No one suggested the GPU and CPU would fuse soon either.
 
Of course they will, at the very least because their clocks have been steadily going up for each node iteration.

Well, so will CPUs. Single thread performance relative to CPUs won't improve.

You won't see multi ported single-cycle-access register files in GPUs, you won't see branch speculation, you won't see speculative load/store units trying to eliminate potential RAW hazards in GPUs, you probably won't see multi level prefetching, you won't see cannonball scheduling etc.

Cheers
 
Status
Not open for further replies.
Back
Top