Is everything on one die a good idea?

DavidGraham said:
The future is stuffed full of crazy things that necessitates dGPUs
You seem to be confusing words; nothing you mentioned requires them.

DavidGraham said:
Progress requires more data and thus processing, not the other way around.
I never stated otherwise?


EDIT: Just to put this in context... Let's budget 32mm^2 for 8 CPU cores at 10nm. That is 6.4% of a 500mm^2 chip's die area. Best case scenario you get 6.4% more performance for eschewing an APU design (assuming you are not power or thermal limited). Real world the difference is probably smaller, in fact you may very likely net overall performance in heterogeneous workloads with the APU design. The next question is cost; does a company simply charge more to maintain profits with the die size penalty or do they go to the trouble of producing an entirely distinct chip?
 
Last edited by a moderator:
Why are you assuming they will not?!!
Because they CAN, technology is always pushing boundaries,
you seriously think dGPUs will have the same memory bandwidth as APUs?

I don't see why not. The PS4 already sports an APU with the bandwidth of a midrange GPU. And that can't even be considered a high end APU as it must fit into the budget of a 399 USD device that had to fit a power profile suitable for quiet operation in a living room environment..

Imagine what you could do if you had the budget for a high end device. An APU around 200-400 USD which doesn't need to worry about the power constraints of operating in a quiet living room environment. It's certainly possible for it to have the same memory controller as a current generation high end GPU.

The only reason it isn't done at the moment is that the market for it isn't that large. You'd basically be appealing to a demographic that is skeptical about the capabilities of an integrated solution to fill the shoes of a dedicated GPU. Not only that you'd be trying to get buyers from the very demographic which doesn't see the point in an APU.

It's going to take time until the market is ready for such a product for the desktop market. I could perhaps see it succeed now in a laptop, but there is still a very serious risk for such a product as well as the power constraints. Because of the power constraints that already precludes high end desktop graphics solutions from such a product.

Regards,
SB
 
Not only that you'd be trying to get buyers from the very demographic which doesn't see the point in an APU.

...Mantle is quite late as project, as well as asymmetric GPGPU.
imho if you could have easily used 1 GPU for the 3d pipeline and a 2nd for GPGPU, you'd have seen a market shift toward APU.
 
The big problem is that high bandwidth in general is more or less incompatible with slotted memory, so and move to a high performance APU will require RAM soldered to the motherboard. This is a nonstarter in the desktop world, but makes sense for laptops and tablets. I suppose it's reasonable to have say 32 GB stacked memory used as a cache and then DDR expansion slots for the other 128 GB.

Modern GPUs are rapidly becoming massively parallel optimized CPUs - just look at the docs for the latest version of CUDA to see what I mean (OpenCL is a very long way behind in this regard). This means there's not much room for adding large numbers of cores to your CPU - if your algorithm has that much parallelism, it's likely to be more efficient on a GPU architecture. Between this and the sequential performance wall that CPUs have hit in the last decade, and there's not much room for the CPU to do anything other than shrink with each process, which leaves more and more room for the GPU.

The real problem I see with having high performance APUs is Intel/Nvidia/AMD. Intel is not likely to release a 400+mm consumer part any time soon, so they'll have a very hard time competing with a high end GPU, especially when 50mm of that die goes to CPU. Nvidia doesn't have x86 IP rights, so they're not going to be releasing a desktop APU any time soon (though the rise of ARM makes things very interesting - if WinRT ever matures to the level where you can write serious programs with it, and free from the Windows App Store at that, an ARM based system becomes a real possibility). AMD is the most likely to release a high end APU, the consoles are already close to this, but their problem is that they've fallen behind Intel in CPUs and Nvidia in GPUs, especially when you look at the software side of things.
 
Ray Tracing, physics and particles simulations, Ultra crazy resolutions (beyond 4K and 6K), multiple monitors, hologram decks, 3D , VR Goggles (like Occulus Rift), etc. The future is stuffed full of crazy things that necessitates dGPUs, and those are the things that we know about, in 10 years time, there will probably be more, so dGPUs are here to stay. Progress requires more data and thus processing, not the other way around.

Some of those things might actually work better with APUs, thanks to tighter integration.

Still, I don't think many people are claiming that 600mm², 300W monster dGPUs meant to go into ultra-high-end, including dual-GPU boards will disappear any time soon. I certainly am not. I mean, I currently game in 5760×1080 and when I replace my monitors for 4K ones, I'll be running a 11520×2160 (24.8 MPixel!) setup. Needless to say, that's going to require some massive graphics computing power, nothing an APU is likely going to be able to manage.

But most people don't do that. Most people use moderate definitions (which tend to increase over time, of course) and are content with 100~250mm² GPUs. Those won't make much sense for long.
 
I never stated otherwise?
Yes you did, the problem with your theory is that you insist CPUs will stagnate, thus making room for more massive GPUs, I disagree.

Just to put this in context... Let's budget 32mm^2 for 8 CPU cores at 10nm.
There it is, the flaw in your whole argument, many years into the future you still expect CPUs to only have 8 cores, which is not compatible with the statement I made, progress requires more processing, not less.

But most people don't do that. Most people use moderate definitions (which tend to increase over time, of course) and are content with 100~250mm² GPUs. Those won't make much sense for long.
In 10 years time, at least one of the things I mentioned is bound to take off as a mainstream tech.

I don't see why not. The PS4 already sports an APU with the bandwidth of a midrange GPU. And that can't even be considered a high end APU as it must fit into the budget of a 399 USD device that had to fit a power profile suitable for quiet operation in a living room environment..
Exactly, they combined a low-clocked CPU with a low clocked GPU as well, not just because of noise concerns or power consumption, but durability as well, having two fast and big chips next to each other will increase the percentage of faulty chips, they can't have that anymore .. not with the shadow of the red ring of death still looming in the horizon to this day.

Not to mention consoles are a different economic problem, Sony and MS sell them with zero margins because they can recoup those elsewhere. You can't make that same argument with consumer class hardware.
 
You could argue that GPUs will need to become relatively more powerful than less: we're only at the start of an massive increase in resolutions. And that's a quadratic thing. Eventually it will top out, but not before bandwidth and calculation requirements for GPUs have gone up by much more than those required for CPU.
 
DavidGraham said:
Yes you did
Umm... where? Please find the exact post and quote it. Otherwise you are flat out lying.

DavidGraham said:
the problem with your theory is that you insist CPUs will stagnate
I never said this either. Please stop putting words in my mouth.
 
I'm sure you can put an APU + memory stacks on a socket (wide or slightly big one for the better models, like a Pentium Pro). But please keep 64bit or 128bit DDR4 there, depending on low end or higher end platform. Then PCIe lines, video outputs, random I/O etc. through the socket pins.

I'm not very pleased by the notion of a 200W APU, though. Seen an OEM consumer desktop PC?, the cases haven't changed since a decade ago. often a micro-ATX normal tower with one rear 80mm fan (if it's installed at all?) and a PSU with either 80mm or 120mm fan and then some low cost CPU heatsink (but much bigger than in the 90s) like the Intel one.
That is all very decent : this design allowed to build Pentium 4 Prescott machines that actually work.

At 220W, 225W? it seems to be a folly. Not everyone buys $150 cases and huge heatsinks on pipes with the fans that blow sideways (and maybe fail to cool the VRMs if there's no other air flow). OEMs use the most cheap motherboard quality and even with low end motherboards for the parts market, we had the boards that took up a 95 watt CPU (if that, as FX 4xxx can be troublesome) and not a 125 watt one.

Now you can have a socket APU and upgrade, downgrade whatever the hell you want but that flexibility might be useless if you want to buy a 220W APU but your motherboard only takes a 100W one.. Or the cost of the bigger power circuitry will be passed to everyone, but cooling is barely reasonable.
I fear it would be all to easy to end up with lots of throttling or instability due to power and cooling.. on a desktop.
 
Yes. One PCB with an APU socket, power conversion components, I/O only for peripherals (USB, HDMI, audio out, etc.). I guess even external RAM is bound to disappear eventually when the APUs start carrying enough memory.
On desktops, the Mini-ITX should be standard by then, IMO.

Of course, such chips are likely to cost less than the CPU+GPU+RAM discrete equivalents, but not much less since I think the tech companies will take advantage of the cheaper BoM to make more money in the end.

I would like to keep Micro-ATX at least. not everyone wants a mac mini or a mac pro, and PCs may serve very different needs and configurations. I like to open the side door and put a hard drive in, without it looking like I'm doing a disassembly and reassembly of the system. I even like the card slots (thought I only use two cards currently).

PC always had that maximal flexibility, cheapness (well, except when they were 50x more expensive that a C64) because of the card slots. You can add anything be it additional network cards, 10Gbe networking, a card with four RS232, more USB controllers or what not and it's all real devices, PCIe/PCI, low latency and low overhead.

To reduce to the extreme.. I would not like ending with a single micro-USB 3.0 port, hdmi/displayport and that's all.
 
You could argue that GPUs will need to become relatively more powerful than less: we're only at the start of an massive increase in resolutions. And that's a quadratic thing. Eventually it will top out, but not before bandwidth and calculation requirements for GPUs have gone up by much more than those required for CPU.

I'm not so sure. It's definitely happening for phones and tablets, but for PCs I'm afraid it's not, or at least much slower. The sheer number of laptops with 1366×768 displays, even 15.6" ones, is (depressingly) staggering when you consider how many phones have 1920×1080 displays, or even 2560×1440 ones now (though that's possibly pointless).

I'm sure 4K monitors will become more or less standard at some point, but I'm not sure it will happen at a pace that will require a faster increase in GPU power than is usual.
 
I'm not so sure. It's definitely happening for phones and tablets, but for PCs I'm afraid it's not, or at least much slower.

Well, I'm usually morosely pessimistic about these things, but I also can't help playing contrarian :)
The past few years may well be an anomaly brought about by the recession. I don't have graphs, so maybe my memory is playing selectivity tricks on me, but harddrive densities, which I have been waiting to see revive, seem to be growing again. My 2TB drive bearing NAS which I populated back in 2010, may finally be due for an upgrade. 6TB Reds (w/ nas3.0) are (or will be shortly) shipping, and Seagate is sampling 8TB drives. Monoprice and ASUS both have (or will shortly, in the case of monoprice) reasonably priced 4k monitors, although I'm waiting for an IPS panel before I take the plunge. 4k live content is coming -- the AX100 is a pretty affordable camcorder, and although I personally need 60fps, I'd have already bought one if I didn't take so many sports-related vids. 10Gbe is threatening to sputter to life (and I'll need it if I plan to edit 4k files I store on the NAS) -- there was a series of articles written about that over on smallnetbuilder.

My real problem is cpu cycle costs. There is at least a small sliver of a reason to be optimistic that the present situation is a diversion from baseline growth. Just, y'know, none based on any of the rumors regarding Intel's legacy desktop line ;^| OTOH, if the "news" nowadays is that broadwell is going into tablets, maybe we can hope that the workstation parts fall in price and become the new desktop platform. We certainly need the bandwidth if we hope to feed more cores....
 
Haswell is in tablets already, with CPU + chipset next to each other on a package. Maybe just the Microsoft tablet but it's there.

It's staged already : processor ending in -Y is the very low power, likely tablet.
-U is about 15 watt (ultrabook laptop), -M is regular laptop, -T low power desktop, -S slightly low power desktop (capped at 65 watt), no ending (or K) is the regular stuff.

So there's no reason in particular that Intel would let its range "slide down" so you get higher end stuff for cheaper.
I think they're going to add other steps in the range, even. In 2015 on the desktop you'll have Skylake-S (4 cores, 16 PCIe lines, overclocking disabled) < Broadwell-K (4 cores, L4 memory, 16 PCIe lines) < Haswell-E (6 cores, 24 PCIe lines) < Haswell-E (6 cores, 40 PCIe lines)

I think those are the numbers of PCIe lanes that are managed directly by the CPUs, without counting DMI link.
 
It's not even clear what is best for you lol.
If you're a memory bandwith/latency junky, Broadwell-K seems great.
If you want cores, Haswell-E.
If you're worried about PCIe bandwith (graphics card at full 16x 3.0, M.2 PCIe SSD and 10Gbe at once) then Haswell-E.

But Skylake has new GPU, and new very wide CPU instructions (AVX-512) which might be maybe useful in niches like video editing so good for your need once your software implements it. :p
 
It's not even clear what is best for you lol.

Oh, there's a definite hierarchy.
I need cores. Badly. Some of my timelines are yielding somewhere in the neighborhood of 2-4qps real-time, which is just painful. I also need lanes -- 10Gbe, a gfx card (titling and some effects), a capture card, and a native YUV output card. I don't edit locally (pita copying around files), so my need for SSD is not all that high (but that IS why I need/want 10gbe). I doubt I'm memory bound until I start dealing with 4k non-long-gop formats.

But Skylake has new GPU, and new very wide CPU instructions (AVX-512) which might be maybe useful in niches like video editing so good for your need once your software implements it. :p

They require ssse3, not avx. There's always compatibility to consider, so avx-512 support is not something I'd expect in the short-medium term :) They do support quicksync, which is helpful when generating video, but not terribly useful otherwise.

I'd dearly love to see Edius ported to take advantage of gfx cards. Not sure how likely that is given their continued resistance. Their quick sync doc indicates that *memory bandwidth* is, in fact, an issue. I'm not sure how much I buy that, with full-frame 1080-60p video taking, what, 1/2GBps? Nevertheless, may be useful as evidence in the larger question under discussion: http://www.grassvalley.com/docs/App...onal/edius/PRV-4140M_EDIUS_SandyBridge_AN.pdf

-Dave
 
I'm not so sure. It's definitely happening for phones and tablets, but for PCs I'm afraid it's not, or at least much slower. The sheer number of laptops with 1366×768 displays, even 15.6" ones, is (depressingly) staggering when you consider how many phones have 1920×1080 displays, or even 2560×1440 ones now (though that's possibly pointless).

I'm sure 4K monitors will become more or less standard at some point, but I'm not sure it will happen at a pace that will require a faster increase in GPU power than is usual.

Don't forget that as game engines become more realistic you have to put more work into each pixel. Given a choice between higher quality rendering vs. more pixels, I'll have to go with the rendering.
 
Don't forget that as game engines become more realistic you have to put more work into each pixel. Given a choice between higher quality rendering vs. more pixels, I'll have to go with the rendering.

Sure, but that's not a new phenomenon.
 
I have been thinking about this a lot lately because Linus Torvalds said something like dGPUs will no longer be made after 10 years from now. I have always thought intel (but not just intel) went the wrong path with iGPUs (due to the versatility of programmable rendering and how decent its performance could be), so I want to hear what the experts here have to say about them (like advantages, disadvantages, and alternatives).
In 10 years from now we'll have AVX-1024 and the CPU and GPU will unify.

GPUs will cease to exist, but only as we know them today. Fixed-function GPUs have long been dead and buried. Non-unified GPUs have long been dead and buried. Both have been replaced by hugely inefficient fully programmable unified computing devices. Of course they're only inefficient in terms of power and area if they were implemented on the silicon processes used back when GPUs were fixed-function and non-unified. Nobody has shed a tear about their complete disappearance, because silicon technology advanced and we didn't lose anything in absolute terms. Programmability and unification instead brought us a lot more functionality, and offered higher efficiency for things the old architectures weren't designed for.

So the death of the GPU will be a joyous moment as well. We'll get a new breed of processors that fully supersede its functionality and will extract maximum amounts of ILP, TLP and DLP from any code you throw at it. Of course you can also think of it as a continuation of the GPU, or of the CPU for that matter, but the way we know either of them today will cease to exist.

This future is inevitable due to both the Memory Wall and Amdahl's Law. The Memory Wall stems from computing power increasing at a faster rate than memory bandwidth. This has been a consistent law through each decade of computing. It has resulted in GPUs with memory interfaces running at 6+ GHz. Continuing down that path to satisfy the GPU's bandwidth hunger is impossible. It needs to vastly reduce the number of threads being processed in parallel, so that it can benefit from hierarchical caches. This makes it inherently more CPU-like. It will also help Amdahl's Law to require less parallelization and process sequential dependencies faster. Meanwhile nothing is stopping the CPU from becoming much more GPU-like by widening their vector units and adding SIMT-capabilities like gather/scatter and lane predication.

So everything on one die isn't just a good idea, it's the only way forward. Both CPUs and GPUs started out with multiple chips (e.g. the i387SX coprocessor, the Pentium II's cartridge with separate L2 cache chips, or the Voodoo 2's two TMUs and one FBI). Integration has provided many benefits, and we certainly haven't seen the end of it yet. Even at the chip level there's much integration and unification opportunity. Heterogeneous computing is merely a step in between functional separation, exploiting functional overlap, but ultimately leading to homogeneous unification.
 
Last edited by a moderator:
This future is inevitable due to both the Memory Wall and Amdahl's Law. The Memory Wall stems from computing power increasing at a faster rate than memory bandwidth. This has been a consistent law through each decade of computing. It has resulted in GPUs with memory interfaces running at 6+ GHz. Continuing down that path to satisfy the GPU's bandwidth hunger is impossible. It needs to vastly reduce the number of threads being processed in parallel, so that it can benefit from hierarchical caches. This makes it inherently more CPU-like. It will also help Amdahl's Law to require less parallelization and process sequential dependencies faster. Meanwhile nothing is stopping the CPU from becoming much more GPU-like by widening their vector units and adding SIMT-capabilities like gather/scatter and lane predication.

The Memory Wall is exacerbated by the co-location of code and data and MIMD-style, multi-core execution. In that sense, the future may be more GPU-like than CPU. It isn't clear to me how serial architectures are better at dealing with higher latencies either. The reason (well, one of them) why we only have quad-core Intel CPUs is that there isn't the bandwidth on the 115X sockets [one of the reasons why I only compared the 860 to the 4770s, instead of the 9X0s, which are on a different class socket entirely]. Whatever you think about the GPUs and memory bandwidth, CPUs are already up against the wall.

I do think we're in agreement that the issue here is memory bandwidth. I think you'll get no argument from anyone that putting a large amount of memory very near the CPU and very near the GPU would be awesome. I would love to have 100 MIMD cores and 10k SIMD cores to play with. It's less clear to me that I need to have a large number of MIMD cores co-located with my SIMD cores. Even if I accept that there's a large crossover between those worlds, it isn't clear to me whether the world belongs to traditional CPUs with vector style extensions, or GPUs with (for example) a "real" core per SMX.

I do hope that we get a few years of CPUs with vector instructions, and GPUs with arm/whatever cores on them. That's where all the fun is! Once hardware gets homogenized, we have to put our coding straightjackets back on :(
 
The Memory Wall is exacerbated by the co-location of code and data and MIMD-style, multi-core execution. In that sense, the future may be more GPU-like than CPU. It isn't clear to me how serial architectures are better at dealing with higher latencies either. The reason (well, one of them) why we only have quad-core Intel CPUs is that there isn't the bandwidth on the 115X sockets [one of the reasons why I only compared the 860 to the 4770s, instead of the 9X0s, which are on a different class socket entirely]. Whatever you think about the GPUs and memory bandwidth, CPUs are already up against the wall.

I do think we're in agreement that the issue here is memory bandwidth. I think you'll get no argument from anyone that putting a large amount of memory very near the CPU and very near the GPU would be awesome. I would love to have 100 MIMD cores and 10k SIMD cores to play with. It's less clear to me that I need to have a large number of MIMD cores co-located with my SIMD cores. Even if I accept that there's a large crossover between those worlds, it isn't clear to me whether the world belongs to traditional CPUs with vector style extensions, or GPUs with (for example) a "real" core per SMX.

I do hope that we get a few years of CPUs with vector instructions, and GPUs with arm/whatever cores on them. That's where all the fun is! Once hardware gets homogenized, we have to put our coding straightjackets back on :(

Are those two things really all that different? Isn't it mostly semantics at this point?
 
Back
Top