If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Itchy
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
|
Hi there
I was wondering why it takes so long to design a new CPU yet the GPU designers are cranking out new models at a very quick pace (typically 18 months). Intel have their tick tock strategy, AMD is not able to compete but even looking at Intel's strategy the latest major revision was the Core 2 and the i7 has been an evolution of that strategy. With GPU's we have, what seem like, major revolutions a lot quicker. I understand that GPU artchitects are still adding additional features and performance whereas CPU's then have the simpler task of adding more performance. Why does GPU design take less time than CPU design from a purely technical point of view? Is it the fabs, validation, what the CPU and GPU are tasked with? Any comments would be appreciated. |
|
|
|
|
|
#2 |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,831
|
I'd say there's a lot less validation that goes into a GPU than a CPU, since GPUs are specialized processors they don't need to be tested with eleventy billion applications. Also GPUs can produce errors and still be "good enough" whereas CPUs need to return the correct result for any given calculation 100% of the time.
There's a lot more reasons than that but with all the engineers we have around here I think they'd be better suited to fill in the other details. |
|
|
|
|
|
#3 |
|
Junior Member
Join Date: Nov 2006
Posts: 27
|
Don't really know but I'd guess that CPUs are a mature technology while GPUs are still on a steep IP growth curve. As ShaidarHaran noted, CPUs are also significantly constrained by backwards compatibility and that requires them to behave in certain ways. The interface to GPUs, on the other hand, is more abstract and the required behavior is far less constrained. They're basically large array processors. Force them to work on random memory with tons of branching and I bet their performance would suffer significantly.
|
|
|
|
|
|
#4 |
|
Member
Join Date: Aug 2005
Posts: 170
|
Maybe that's why Larrabee is taking so long??
|
|
|
|
|
|
#5 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,073
|
Quote:
I not so sure the appearance matches the reality as much as people think. First I think it is appropriate to question the actual rate that "major" revision to GPUs are done. GPUs currently are on a development cycle much closer to CPUs than most people realize. In the old days this wasn't as true but currently, they are primarily constrained by fab advances just like everyone else. Second, GPUs have slowed down in development cycle due to the increase in verification required. As they become more and more programmable, the amount of time required for verification will only go up. Third, in the past GPUs could get away with being fairly buggy and relying on software to cover up the issues which with the increase in programming is no longer possible. Also a lot of the so called "major" revisions are actually fairly minor. If you look at the last major introduction of new GPUs designs from both Nvidia and ATI, you pretty much how to go back to ~G80 and ~RV670 to find any significant difference. In addition, I would wager that the levels of physical optimization between GPUs and CPUs are totally different along with various other aspects like DPM, etc. There will always be different overhead when you know you will be shipping in the hundred of millions vs the millions for a given design. There are even significant differences between mainstream CPUs and non-mainstream CPUs. I'd wager that both Intel and AMD have lower DPM targets than something like Power5/6/7 where they can worry about catching the marginal DPM in the machine burn in vs in the fab on the tester. Lets put it this way, I haven't ever received a dead CPU but I have received several marginal/dead graphics cards. And have had several graphics cards that have died while the rest of the system was fine.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#6 |
|
Senior Member
Join Date: Dec 2004
Location: France
Posts: 2,476
|
CPU are more constrained by power and sockets can't change every year. They all have to be suitable for OEMs, be it consumer or server parts, and have to be available for a longer time
A GPU takes place on a card in an expansion slot, it only has to conform to PCIe specifications, else the board can be anything.. They get away with multiple additional power connectors and there's a much bigger power range from low end to high end. They do whatever they want. There's backwards compatibility on GPU too but it's small stuff (VGA, CGA, text mode, VESA) in the 2D engine and firmware, the rest (directX 5, 6, 7 etc., opengl) done through software (drivers, implementation) |
|
|
|
|
|
#7 |
|
Now Officially a Top 10 Poster
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 8,734
|
I've learnt a little while ago that Larrabee is basically a GPU, so that's not the reason.
Apart from that, the demands on a GPU are first of all simply higher, so the drive for improvements are larger, but they are also more specialised towards vector style calculations, which allows for more specialised in-order designs that are much easier to turn into multi-core configurations. We're slowly getting there now in the desktop space, but Windows (and desktop software development in general) has so far not done a great job in providing an environment that can easily benefit from multiple cores. Apple's work on Snow Leopard (Grand Central Dispatch) is interesting in that regard. |
|
|
|
|
|
#8 | ||
|
Senior Member
|
Quote:
Quote:
|
||
|
|
|
|
|
#9 |
|
Senior Member
Join Date: Jan 2008
Posts: 268
|
I think some issues have been mentioned here, but I'll add a few:
1. Abstraction layers GPUs have a complete software abstraction layer (DX, OGL, PTX) that includes a compiler. CPUs have the x86 ISA, which is constantly changing and not well codified and you don't control the compiler. 2. Validation Validation is easier because of 1, also easier since correctness isn't quite so important and you can fix many things in your compiler. 3. Legacy code GPU legacy code is almost all emulated/JIT'd. CPUs don't have that option. 4. Control logic CPUs have way way more control logic than GPUs. Control logic is where all the complexity lives, datapaths are pretty easy. 5. Component vs. system GPU vendors design systems, CPUs are components. CPUs need to provide way more visibility into their operation (e.g. TDP, power reqs), whereas GPUs don't always need to. For instance, NV fully specifies all their high end cards. This also feeds back into the platform stability that was mentioned earlier. 6. Full custom vs. semi custom design GPUs have to be rapidly ported to new processes (half nodes) on a regular basis. They tend to be less custom design than a CPU, which means lower design time. Note that GPUs are descended from ASICs and at one point were mostly synthesized, while CPUs used to have lots of dynamic logic that required huge amounts of manual effort. 7. RAS CPUs have lots of reliability, availability and serviceability features. GPUs don't. Anyway, this is just a list of a few items. DK
__________________
www.realworldtech.com |
|
|
|
|
|
#10 |
|
Itchy
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
|
Thanks for the replies everyone.
Recently Jen-sun Huang mentioned that GPU was set to increase 570x whereas CPU power would increase 3x over the same time frame of six years. Apart from being marketing drivel and exaggeration there is some basis in what Jen-sun Huang was saying in so far that GPU performance is increasing faster than CPU performance over a given time frame and GPU's will become more useful in other applications. If so, is it still not possible to design a Crusoe type CPU that emulates and relies on software for the rather mundane tasks of backward compatibility and then perhaps increasing performance? Also, what is to stop AMD/Intel designing chipsets that and memory controllers that are taking advantage of the latest DRAM technologies for example? I have heard of the next step in DRAM for mainboards being DDR4 but we know it was not very successful for GPU's but GDDR5 is. Why are AMD/Intel not able to take advantage of the new technologies faster? The incubation time from concept to reality seems incredibly long and (thus) much more expensive compared to GPU manufacture. I believe DDR5 availability for mainboards would follow is AMD/Intel built their CPU's around it relatively quickly. |
|
|
|
|
|
#11 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
|
CPUs and GPUs have very different design targets as far as memory support is concerned.
CPUs must support expandable DIMM-based memory pools of commodity RAM. DDR3 is the latest technology for that. GDDR5 on GPUs is soldered to the PCB, offers no expandability, does not offer ECC, and has very limited capacity. As it stands, the current form of GDDR5 is wholy innapropriate for CPUs.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#12 |
|
Junior Member
Join Date: Jul 2005
Posts: 22
|
He only said that GPU processing power would increase 1.5 times per year over the next 6 years (ie 50% faster) or about 11.4 times as powerful in six years.
|
|
|
|
|
|
#13 | ||
|
Senior Member
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
|
Quote:
Sure, we've seen some spectacular performance increase from GPUs in the last decade. But let's not forget we went from passively cooled chips to multi-GPU systems that require a 1000 Watt power supply. At the same power consumption, things are really not that impressive. Actually GPUs are hitting a second wall as well: die size. Even if you manage to keep power consumption in check they can't keep increasing the die size the way they have before as a means to increase performance. And while multi-die solutions can increase yields it increases packaging cost, requires inter-die communication and doesn't help wafer cost. So die size growth slows too. Also, GPUs have to spend an increasing amount of die space to control logic, registers and caches to become more programmable and flexible. They have to invest transistors in things that CPUs can already take for granted. Last but not least: contrary to GPUs the CPU actually has headroom for improvement beyond Moore's Law. Back in the Pentium 4 days a doubling of the number of transistors did not nearly double the performance. Nowadays the focus isn't on achieving the hightest possible clock frequency any more, but to actually optimized performance per Watt. Given the starting point, there has been a spectacular increase in computational density and there's still potential for a lot more. AVX and FMA alone would increase performance fourfold with only a minor increase in transistor count! What Jen-Sun probably referred to was double-precision performance. But from a technology standpoint a 570 fold increase doesn't mean anything. They could have had a single double-precision ALU at 1 Hz in their current chip and claim a gazillion fold incease in their next chip. Meh. Once you start comparing apples to apples, CPUs are gaining performance faster than GPUs are, and will contintue to do so for at least the entire next decade. In the end they're bound by the same physical laws. Quote:
Last edited by Nick; 01-Sep-2009 at 13:02. |
||
|
|
|
|
|
#14 | |
|
Senior Member
|
Quote:
FMA is a one off thing. You can scale vector widths, but there is a severe performance cliff there too. LRB's float16 is after taking into consideration a lot of such factors. And so while we could have float32 AVX some day, I doubt if it will happen. |
|
|
|
|
|
|
#15 | |||
|
Senior Member
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
|
Quote:
Quote:
Either way, sure, they haven't forgotten about single-threaded performance. But I don't see why they should. Clock frequency and IPC are still important factors and if they can increase performance more by spending transistors on those things instead of more cores then that's a more optimal design. NVIDIA also placed its bets on less shader cores that are clocked higher, and RV790 was all about keeping a small chip size while cranking up the frequency. Quote:
Yes, that's because CPUs are lousy to begin with. But that's what the headroom argument is about. GPUs have the same kind of headroom for double-precision operations, but that matters much less (except when doing scientific calculations). For single-precision opterations GPUs no longer have any "one off" up their sleeves. In fact by making each ALU capable of double-precision operations they lower computational density for single-precision... |
|||
|
|
|
|
|
#16 |
|
Senior Member
|
|
|
|
|
|
|
#17 | |
|
Senior Member
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
|
Quote:
CPU's still have to cater for single-threaded software, which means that simultaneous improvements to effective IPC and more cores is the best strategy. But that's just today's situation. In the future, people will care less about performance improvements of ancient single-threaded software, and focus on the performance of the multi-threaded software they run. Besides, it's not like Intel or AMD invested a lot in IPC lately. Core 2 Duo versus Core i7, and K9 versus K10, they spent only a couple percent of transistors to IPC and twice the entire transistor budget on doubling the number of cores. In that perspective things have changed in revolutionary ways since the Pentium 4 / K8. It's not unlikely for Larrabee's successor to follow the same route to some extent. If they see an opportunity to increase effective utilization by a significant amount with few extra transistors then that's obviously better than spending it on additional badly utilized cores. In fact I expect that to become more important over time, since even the most embarassingly parallel workload doesn't scale perfectly. So at some points it's more efficient to focus on making threads run faster than to add more cores to run more threads. We're already seeing this effect with graphics on the GPU. An increase in core count lowers relative utilization, even when everything else scales by the same factor. |
|
|
|
|
|
|
#18 | |
|
Itchy
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
|
Quote:
Integrating memory controllers, PCI-E controllers, adding more cache, QPI/HT3.0 all of these help with IPC in given situations. |
|
|
|
|
|
|
#19 | ||||
|
Senior Member
|
Quote:
Quote:
Quote:
Quote:
GPU's do have a trick up their sleeve though (not sure if they'll follow the trail because of various reasons), increase clock speed. rv770 runs at 750Mhz, GT200 runs at ~1.5G. They can still scale along that axis. And last but not the least, don't underestimate the burden of backward compatibility. For GPU's, all code gen is dynamic/ at runtime. So they can aggressively drop old and useless crap. |
||||
|
|
|
|
|
#20 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
|
To give an example, the 65nm single-core Pentium 4 was capable of 4 SP ops per clock in an area of 80 mm2.
The 45nm Nehalem, with 4 cores, is capable of 8 SP ops per core per clock, for a total of 32 per clock in an area of 263 mm2. The P4 had a paltry op density of .05 per mm2. Nehalem has .12 per mm2, more than double. For reference, the 80nm R600 sported 320 units on a 430 mm2 die. The 55nm RV770 increased this count to 800 on a 256 mm2 die. This gives us .74 versus 3.1, respectively, before counting FMAD as double. The clocks are somewhat lower, for the later GPU, while the CPUs could be found in the same speed grades. So as we can see, the recent history of CPUs shows them growing FLOPs by a staggering ~2x, while GPUs scaled at a mere 4x on an inconsequential order of magnitude difference in base capability.
__________________
Dreaming of a .065 micron etch-a-sketch. Last edited by 3dilettante; 01-Sep-2009 at 20:39. |
|
|
|
|
|
#21 | |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,729
|
Quote:
The fact that CPUs must operate with commodity DIMMs is not really relevant, it's purely a manufacturing decision by the PC industry that hamstrings performance at the cost of flexibility. The vast majority of consumers don't really care how the RAM is wired up. Lots of people buy iPhones and Macs with non-removable batteries, or hard to change RAM, people buy consumer electronics goods, and consoles with hardwired manufacturing techniques. Laptops seem to be extremely popular these days and people often just buy a whole new machine rather than try and upgrade it. There is an assumption that what people love about PCs is how interchangeable the parts are, but I think that assumption really only applies to a niche market, and that the broad market for computers could be very vertically integrated (like Apple has done) without most people giving a shit. The reality is, an ALU is cheaper than a CPU core, and if your problem is embarrassingly parallel, then packing in the ALUs is better than throwing in more general purpose cores. For most of the workloads modern PCs do (outside games and multimedia), there is very little gain from additional CPU-core level parallelism. Moreover, with the move cloud based computing and the web browser dominating the CPU time of most PCs out there, the purely single-threaded nature of Javascript and the browser core means a lot of power is simply going to waste. You could put 64 Nehelem cores on a chip, and it wouldn't speed up the subjective latency of most applications that people use on a daily basis, but scaling GPGPUs certainly does lead to a very measurable difference in games. Thus, I would say, scaling CPU cores these days matters more for server environments, like Google's data-centers, and for the client-side, at this point, no one will notice much of a difference except on the few applications that tax a system, e.g. games, multimedia, content-creation apps. |
|
|
|
|
|
|
#22 | |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
|
Yes. I was debating about adding a smiley.
Although the 65nm Cedar Mill was a 2006 chip, cutting the timeframe down. Penryn would have a better ratio, as Nehalem's uncore reduced the amount of die devoted to CPU cores. If I started with Penryn, however, CPUs would have regressed. Quote:
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
|
#23 |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,729
|
I'm glad, usually your analysis is super rock-solid, so when I saw this, I was like "oh, wait, he's obviously joking" given the adjectives used, but I doubted myself and thought maybe there was actually a valid argument I wasn't seeing and that the CPUs really were making more impressive gains.
|
|
|
|
|
|
#24 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,073
|
Quote:
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#25 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,073
|
They can but with their design styles and experience it is likely to be a net negative as it will increase their power and their design cycles in addition to increasing their design complexity.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
| Display Modes | |
|
|