Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 31-Aug-2009, 01:01   #1
Tahir2
Itchy
 
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
Default GPU vs CPU Architecture Evolution

Hi there

I was wondering why it takes so long to design a new CPU yet the GPU designers are cranking out new models at a very quick pace (typically 18 months).

Intel have their tick tock strategy, AMD is not able to compete but even looking at Intel's strategy the latest major revision was the Core 2 and the i7 has been an evolution of that strategy.

With GPU's we have, what seem like, major revolutions a lot quicker. I understand that GPU artchitects are still adding additional features and performance whereas CPU's then have the simpler task of adding more performance.

Why does GPU design take less time than CPU design from a purely technical point of view? Is it the fabs, validation, what the CPU and GPU are tasked with?

Any comments would be appreciated.
Tahir2 is offline   Reply With Quote
Old 31-Aug-2009, 01:40   #2
ShaidarHaran
hardware monkey
 
Join Date: Mar 2007
Posts: 3,831
Default

I'd say there's a lot less validation that goes into a GPU than a CPU, since GPUs are specialized processors they don't need to be tested with eleventy billion applications. Also GPUs can produce errors and still be "good enough" whereas CPUs need to return the correct result for any given calculation 100% of the time.

There's a lot more reasons than that but with all the engineers we have around here I think they'd be better suited to fill in the other details.
ShaidarHaran is offline   Reply With Quote
Old 31-Aug-2009, 04:37   #3
mikegi
Junior Member
 
Join Date: Nov 2006
Posts: 27
Default

Don't really know but I'd guess that CPUs are a mature technology while GPUs are still on a steep IP growth curve. As ShaidarHaran noted, CPUs are also significantly constrained by backwards compatibility and that requires them to behave in certain ways. The interface to GPUs, on the other hand, is more abstract and the required behavior is far less constrained. They're basically large array processors. Force them to work on random memory with tons of branching and I bet their performance would suffer significantly.
mikegi is offline   Reply With Quote
Old 31-Aug-2009, 04:42   #4
CNCAddict
Member
 
Join Date: Aug 2005
Posts: 170
Default

Maybe that's why Larrabee is taking so long??
CNCAddict is offline   Reply With Quote
Old 31-Aug-2009, 09:47   #5
aaronspink
Senior Member
 
Join Date: Jun 2003
Posts: 2,073
Default

Quote:
Originally Posted by Tahir2 View Post
Hi there

I was wondering why it takes so long to design a new CPU yet the GPU designers are cranking out new models at a very quick pace (typically 18 months).

Intel have their tick tock strategy, AMD is not able to compete but even looking at Intel's strategy the latest major revision was the Core 2 and the i7 has been an evolution of that strategy.

With GPU's we have, what seem like, major revolutions a lot quicker. I understand that GPU artchitects are still adding additional features and performance whereas CPU's then have the simpler task of adding more performance.

Why does GPU design take less time than CPU design from a purely technical point of view? Is it the fabs, validation, what the CPU and GPU are tasked with?

Any comments would be appreciated.

I not so sure the appearance matches the reality as much as people think.

First I think it is appropriate to question the actual rate that "major" revision to GPUs are done. GPUs currently are on a development cycle much closer to CPUs than most people realize. In the old days this wasn't as true but currently, they are primarily constrained by fab advances just like everyone else.

Second, GPUs have slowed down in development cycle due to the increase in verification required. As they become more and more programmable, the amount of time required for verification will only go up.

Third, in the past GPUs could get away with being fairly buggy and relying on software to cover up the issues which with the increase in programming is no longer possible.


Also a lot of the so called "major" revisions are actually fairly minor. If you look at the last major introduction of new GPUs designs from both Nvidia and ATI, you pretty much how to go back to ~G80 and ~RV670 to find any significant difference.

In addition, I would wager that the levels of physical optimization between GPUs and CPUs are totally different along with various other aspects like DPM, etc. There will always be different overhead when you know you will be shipping in the hundred of millions vs the millions for a given design. There are even significant differences between mainstream CPUs and non-mainstream CPUs. I'd wager that both Intel and AMD have lower DPM targets than something like Power5/6/7 where they can worry about catching the marginal DPM in the machine burn in vs in the fab on the tester.

Lets put it this way, I haven't ever received a dead CPU but I have received several marginal/dead graphics cards. And have had several graphics cards that have died while the rest of the system was fine.
__________________
Aaron Spink
speaking for myself inc.
aaronspink is online now   Reply With Quote
Old 31-Aug-2009, 14:43   #6
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Location: France
Posts: 2,476
Default

CPU are more constrained by power and sockets can't change every year. They all have to be suitable for OEMs, be it consumer or server parts, and have to be available for a longer time

A GPU takes place on a card in an expansion slot, it only has to conform to PCIe specifications, else the board can be anything.. They get away with multiple additional power connectors and there's a much bigger power range from low end to high end. They do whatever they want.

There's backwards compatibility on GPU too but it's small stuff (VGA, CGA, text mode, VESA) in the 2D engine and firmware, the rest (directX 5, 6, 7 etc., opengl) done through software (drivers, implementation)
Blazkowicz is offline   Reply With Quote
Old 31-Aug-2009, 16:00   #7
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 8,734
Default

Quote:
Originally Posted by CNCAddict View Post
Maybe that's why Larrabee is taking so long??
I've learnt a little while ago that Larrabee is basically a GPU, so that's not the reason. The real reason I think for LRB is that it's not just a hardware innovation, but much more a software one. They basically have to write something like Catalyst for it, but completely from scratch.

Apart from that, the demands on a GPU are first of all simply higher, so the drive for improvements are larger, but they are also more specialised towards vector style calculations, which allows for more specialised in-order designs that are much easier to turn into multi-core configurations. We're slowly getting there now in the desktop space, but Windows (and desktop software development in general) has so far not done a great job in providing an environment that can easily benefit from multiple cores. Apple's work on Snow Leopard (Grand Central Dispatch) is interesting in that regard.
Arwin is offline   Reply With Quote
Old 31-Aug-2009, 16:06   #8
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 2,365
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Blazkowicz View Post
CPU are more constrained by power and sockets can't change every year. They all have to be suitable for OEMs, be it consumer or server parts, and have to be available for a longer time.
I disagree. Intel changes it's platform almost every year. They are the only makers of their chipsets now, so Intel sure as hell can change sockets every year. Power is an issue that is the fundamental limitation in VLSI circuits today. It applies equally to CPUs and GPUs. GPU server farms/clusters aren't popular today, so they can get away with much larger power budgets. As it is, they are already stretching the PCIe specs.

Quote:
A GPU takes place on a card in an expansion slot, it only has to conform to PCIe specifications, else the board can be anything.. They get away with multiple additional power connectors and there's a much bigger power range from low end to high end. They do whatever they want.
Mobos on the other hand have to conform to only the form factor. Outside that, they are pretty free to guzzle as much power as they want.
__________________
The views presented here are my own and do not represent my present or past employers' views in any way.
My blog
Eigen : simd done right
rpg.314 is offline   Reply With Quote
Old 31-Aug-2009, 23:06   #9
dkanter
Senior Member
 
Join Date: Jan 2008
Posts: 268
Default

I think some issues have been mentioned here, but I'll add a few:

1. Abstraction layers

GPUs have a complete software abstraction layer (DX, OGL, PTX) that includes a compiler. CPUs have the x86 ISA, which is constantly changing and not well codified and you don't control the compiler.

2. Validation

Validation is easier because of 1, also easier since correctness isn't quite so important and you can fix many things in your compiler.

3. Legacy code

GPU legacy code is almost all emulated/JIT'd. CPUs don't have that option.

4. Control logic

CPUs have way way more control logic than GPUs. Control logic is where all the complexity lives, datapaths are pretty easy.

5. Component vs. system

GPU vendors design systems, CPUs are components. CPUs need to provide way more visibility into their operation (e.g. TDP, power reqs), whereas GPUs don't always need to. For instance, NV fully specifies all their high end cards.

This also feeds back into the platform stability that was mentioned earlier.

6. Full custom vs. semi custom design

GPUs have to be rapidly ported to new processes (half nodes) on a regular basis. They tend to be less custom design than a CPU, which means lower design time. Note that GPUs are descended from ASICs and at one point were mostly synthesized, while CPUs used to have lots of dynamic logic that required huge amounts of manual effort.

7. RAS

CPUs have lots of reliability, availability and serviceability features. GPUs don't.

Anyway, this is just a list of a few items.

DK
__________________
www.realworldtech.com
dkanter is offline   Reply With Quote
Old 31-Aug-2009, 23:46   #10
Tahir2
Itchy
 
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
Default

Thanks for the replies everyone.

Recently Jen-sun Huang mentioned that GPU was set to increase 570x whereas CPU power would increase 3x over the same time frame of six years.

Apart from being marketing drivel and exaggeration there is some basis in what Jen-sun Huang was saying in so far that GPU performance is increasing faster than CPU performance over a given time frame and GPU's will become more useful in other applications.

If so, is it still not possible to design a Crusoe type CPU that emulates and relies on software for the rather mundane tasks of backward compatibility and then perhaps increasing performance?

Also, what is to stop AMD/Intel designing chipsets that and memory controllers that are taking advantage of the latest DRAM technologies for example? I have heard of the next step in DRAM for mainboards being DDR4 but we know it was not very successful for GPU's but GDDR5 is. Why are AMD/Intel not able to take advantage of the new technologies faster? The incubation time from concept to reality seems incredibly long and (thus) much more expensive compared to GPU manufacture. I believe DDR5 availability for mainboards would follow is AMD/Intel built their CPU's around it relatively quickly.
Tahir2 is offline   Reply With Quote
Old 01-Sep-2009, 00:10   #11
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
Default

CPUs and GPUs have very different design targets as far as memory support is concerned.

CPUs must support expandable DIMM-based memory pools of commodity RAM.
DDR3 is the latest technology for that.

GDDR5 on GPUs is soldered to the PCB, offers no expandability, does not offer ECC, and has very limited capacity.
As it stands, the current form of GDDR5 is wholy innapropriate for CPUs.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 01-Sep-2009, 00:40   #12
Brodda Thep
Junior Member
 
Join Date: Jul 2005
Posts: 22
Default

Quote:
Originally Posted by Tahir2 View Post
Recently Jen-sun Huang mentioned that GPU was set to increase 570x whereas CPU power would increase 3x over the same time frame of six years.
He only said that GPU processing power would increase 1.5 times per year over the next 6 years (ie 50% faster) or about 11.4 times as powerful in six years.
Brodda Thep is offline   Reply With Quote
Old 01-Sep-2009, 08:45   #13
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
Default

Quote:
Originally Posted by Tahir2 View Post
Recently Jen-sun Huang mentioned that GPU was set to increase 570x whereas CPU power would increase 3x over the same time frame of six years.

Apart from being marketing drivel and exaggeration there is some basis in what Jen-sun Huang was saying in so far that GPU performance is increasing faster than CPU performance over a given time frame and GPU's will become more useful in other applications.
That's nonsense. GPU performance is hitting a concrete wall: power consumption.

Sure, we've seen some spectacular performance increase from GPUs in the last decade. But let's not forget we went from passively cooled chips to multi-GPU systems that require a 1000 Watt power supply. At the same power consumption, things are really not that impressive.

Actually GPUs are hitting a second wall as well: die size. Even if you manage to keep power consumption in check they can't keep increasing the die size the way they have before as a means to increase performance. And while multi-die solutions can increase yields it increases packaging cost, requires inter-die communication and doesn't help wafer cost. So die size growth slows too.

Also, GPUs have to spend an increasing amount of die space to control logic, registers and caches to become more programmable and flexible. They have to invest transistors in things that CPUs can already take for granted.

Last but not least: contrary to GPUs the CPU actually has headroom for improvement beyond Moore's Law. Back in the Pentium 4 days a doubling of the number of transistors did not nearly double the performance. Nowadays the focus isn't on achieving the hightest possible clock frequency any more, but to actually optimized performance per Watt. Given the starting point, there has been a spectacular increase in computational density and there's still potential for a lot more. AVX and FMA alone would increase performance fourfold with only a minor increase in transistor count!

What Jen-Sun probably referred to was double-precision performance. But from a technology standpoint a 570 fold increase doesn't mean anything. They could have had a single double-precision ALU at 1 Hz in their current chip and claim a gazillion fold incease in their next chip. Meh.

Once you start comparing apples to apples, CPUs are gaining performance faster than GPUs are, and will contintue to do so for at least the entire next decade. In the end they're bound by the same physical laws.
Quote:
If so, is it still not possible to design a Crusoe type CPU that emulates and relies on software for the rather mundane tasks of backward compatibility and then perhaps increasing performance?
Why? Current x86 architectures are already RISC internally. The decoding of x86 instructions to RISC instructions is not that costly. The real bottleneck is ILP. Just look at Itanium. Even though it's a massive chip it's getting some serious competition from Intel's own multi-core x86 processors.

Last edited by Nick; 01-Sep-2009 at 13:02.
Nick is offline   Reply With Quote
Old 01-Sep-2009, 09:35   #14
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 2,365
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Nick View Post
Once you start comparing apples to apples, CPUs are gaining performance faster than GPUs are, and will contintue to do so for at least the entire next decade. In the end they're bound by the same physical laws.
While I agree with many of your points, I think CPU's have much less room for perf growth than GPU's. GPU's still have a lot of fixed function hardware, which will go away and be replaced by ALU's. CPU's are still being designed to increase their single threaded IPC more than cores. Core count is doubling once every 4 years (at a given price point) while trannies double every two years.

FMA is a one off thing. You can scale vector widths, but there is a severe performance cliff there too. LRB's float16 is after taking into consideration a lot of such factors. And so while we could have float32 AVX some day, I doubt if it will happen.
__________________
The views presented here are my own and do not represent my present or past employers' views in any way.
My blog
Eigen : simd done right
rpg.314 is offline   Reply With Quote
Old 01-Sep-2009, 11:20   #15
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
Default

Quote:
Originally Posted by rpg.314 View Post
I think CPU's have much less room for perf growth than GPU's. GPU's still have a lot of fixed function hardware, which will go away and be replaced by ALU's.
There's only so much logic you can replace with ALU's. While I firmly believe that for instance texture units will eventually go away, they'll be largely replaced by generic gather units. In raw numbers, performance will go down for the same transistor count. Only in practice performance can potentially go up thanks to unification leading to higher utilization. But this thread is about the architecture and thus mainly about the raw numbers. And that's where CPUs still have lots of headroom and GPUs do not.
Quote:
CPU's are still being designed to increase their single threaded IPC more than cores. Core count is doubling once every 4 years (at a given price point) while trannies double every two years.
We've had dual-cores at 90 / 65 / 45 nm, and we'll have quad-core at 65 / 45 / 32 nm. Octa-core starts at 45 nm. So they seem pretty much on schedule to me to double the number of cores every time the transistor density doubles.

Either way, sure, they haven't forgotten about single-threaded performance. But I don't see why they should. Clock frequency and IPC are still important factors and if they can increase performance more by spending transistors on those things instead of more cores then that's a more optimal design. NVIDIA also placed its bets on less shader cores that are clocked higher, and RV790 was all about keeping a small chip size while cranking up the frequency.
Quote:
FMA is a one off thing.
It's all a one off thing. In the end it's about increasing computational density. So when we look at AVX+FMA it represents a fenomenal increase in computational density. Achieving four times higher throughput or more with hardly any extra transistors is not something GPUs still can.

Yes, that's because CPUs are lousy to begin with. But that's what the headroom argument is about. GPUs have the same kind of headroom for double-precision operations, but that matters much less (except when doing scientific calculations). For single-precision opterations GPUs no longer have any "one off" up their sleeves. In fact by making each ALU capable of double-precision operations they lower computational density for single-precision...
Nick is offline   Reply With Quote
Old 01-Sep-2009, 13:43   #16
MfA
Senior Member
 
Join Date: Feb 2002
Posts: 4,270
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by Nick View Post
Either way, sure, they haven't forgotten about single-threaded performance. But I don't see why they should.
Just so we are absolutely clear here, you consider Larrabee a dead end?
MfA is offline   Reply With Quote
Old 01-Sep-2009, 15:56   #17
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,332
Default

Quote:
Originally Posted by MfA View Post
Just so we are absolutely clear here, you consider Larrabee a dead end?
Not at all. You have to look at all the software these devices aim to run in their lifetime. Larrabee tries to be the ultimate GPGPU. It aims to efficiently run any parallel workload you can throw at it, and double as a GPU. So naturally it has lots of cores and limited per-thread performance.

CPU's still have to cater for single-threaded software, which means that simultaneous improvements to effective IPC and more cores is the best strategy. But that's just today's situation. In the future, people will care less about performance improvements of ancient single-threaded software, and focus on the performance of the multi-threaded software they run. Besides, it's not like Intel or AMD invested a lot in IPC lately. Core 2 Duo versus Core i7, and K9 versus K10, they spent only a couple percent of transistors to IPC and twice the entire transistor budget on doubling the number of cores. In that perspective things have changed in revolutionary ways since the Pentium 4 / K8.

It's not unlikely for Larrabee's successor to follow the same route to some extent. If they see an opportunity to increase effective utilization by a significant amount with few extra transistors then that's obviously better than spending it on additional badly utilized cores. In fact I expect that to become more important over time, since even the most embarassingly parallel workload doesn't scale perfectly. So at some points it's more efficient to focus on making threads run faster than to add more cores to run more threads. We're already seeing this effect with graphics on the GPU. An increase in core count lowers relative utilization, even when everything else scales by the same factor.
Nick is offline   Reply With Quote
Old 01-Sep-2009, 17:48   #18
Tahir2
Itchy
 
Join Date: Feb 2002
Location: United Queendom
Posts: 2,605
Default

Quote:
Besides, it's not like Intel or AMD invested a lot in IPC lately.
Intel recently integrated the memory controller on die which helps with IPC. Unless your definition of IPC is pure logic dedicated to processing data internally in the "logic" units of the CPU.

Integrating memory controllers, PCI-E controllers, adding more cache, QPI/HT3.0 all of these help with IPC in given situations.
Tahir2 is offline   Reply With Quote
Old 01-Sep-2009, 18:00   #19
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 2,365
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Nick View Post
There's only so much logic you can replace with ALU's. While I firmly believe that for instance texture units will eventually go away, they'll be largely replaced by generic gather units. In raw numbers, performance will go down for the same transistor count. Only in practice performance can potentially go up thanks to unification leading to higher utilization. But this thread is about the architecture and thus mainly about the raw numbers. And that's where CPUs still have lots of headroom and GPUs do not.
I do not see why you think CPUs have a lot of room for growth. Agreed, GPU's are severly reticle size and power limited, but I dont see why you think CPUs have a lot of headroom for growth. CPUs must serve the ultra low cost market too, which is being eaten up by IGPs for GPU.

Quote:
We've had dual-cores at 90 / 65 / 45 nm, and we'll have quad-core at 65 / 45 / 32 nm. Octa-core starts at 45 nm. So they seem pretty much on schedule to me to double the number of cores every time the transistor density doubles.
Yeah, we have octa cores at 45 nm, but at what price? My claim, was qualified with ".. at the same price point". I don't think we had quad cores at ~$150 in 2006. The mainstreaming of quad cores has taken longer than the shrink time of 2 years.

Quote:
It's all a one off thing. In the end it's about increasing computational density. So when we look at AVX+FMA it represents a fenomenal increase in computational density. Achieving four times higher throughput or more with hardly any extra transistors is not something GPUs still can.
How do you figure AVX and FMA can be done without a non-trivial increase in trannies? Can I have some explanation? Unless you are speaking of a 128 bit wide AVX unit, just like we had a 64 bit wide SSE unit prior to conroe. In which case, there is absolutely no perf increase at all. And IIRC, AVX wont have FMA.

Quote:
Yes, that's because CPUs are lousy to begin with. But that's what the headroom argument is about. GPUs have the same kind of headroom for double-precision operations, but that matters much less (except when doing scientific calculations). For single-precision opterations GPUs no longer have any "one off" up their sleeves. In fact by making each ALU capable of double-precision operations they lower computational density for single-precision...
And they will remain lousy simply because they need to care for single threaded IPC, which is still what matters (may be not the most). GPU's dont even look at that market and hence can run waaay faster at workloads designed for them.

GPU's do have a trick up their sleeve though (not sure if they'll follow the trail because of various reasons), increase clock speed. rv770 runs at 750Mhz, GT200 runs at ~1.5G. They can still scale along that axis.

And last but not the least, don't underestimate the burden of backward compatibility. For GPU's, all code gen is dynamic/ at runtime. So they can aggressively drop old and useless crap.
__________________
The views presented here are my own and do not represent my present or past employers' views in any way.
My blog
Eigen : simd done right
rpg.314 is offline   Reply With Quote
Old 01-Sep-2009, 20:34   #20
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
Default

To give an example, the 65nm single-core Pentium 4 was capable of 4 SP ops per clock in an area of 80 mm2.
The 45nm Nehalem, with 4 cores, is capable of 8 SP ops per core per clock, for a total of 32 per clock in an area of 263 mm2.

The P4 had a paltry op density of .05 per mm2.
Nehalem has .12 per mm2, more than double.

For reference, the 80nm R600 sported 320 units on a 430 mm2 die.
The 55nm RV770 increased this count to 800 on a 256 mm2 die.
This gives us .74 versus 3.1, respectively, before counting FMAD as double.
The clocks are somewhat lower, for the later GPU, while the CPUs could be found in the same speed grades.

So as we can see, the recent history of CPUs shows them growing FLOPs by a staggering ~2x, while GPUs scaled at a mere 4x on an inconsequential order of magnitude difference in base capability.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 01-Sep-2009 at 20:39.
3dilettante is offline   Reply With Quote
Old 01-Sep-2009, 21:20   #21
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,729
Default

Quote:
So as we can see, the recent history of CPUs shows them growing FLOPs by a staggering ~2x, while GPUs scaled at a mere 4x on an inconsequential order of magnitude difference in base capability.
Are you being sarcastic? The P4 was released in 2000, Nehalem in 2008. So by your metric, it took them 8 years to double it. Meanwhile, in your GPU comparison, it took AMD/ATI only 2 years to quadruple it. @3Ghz, that means they went from 12 GFlop to 96Gflop in 8 years, for an 8x increase. I don't even want to bother comparing the R200 to the Rv770. If you look at bandwidth in 2000 vs 2008 available to CPUs vs GPUs, it's a similar story.

The fact that CPUs must operate with commodity DIMMs is not really relevant, it's purely a manufacturing decision by the PC industry that hamstrings performance at the cost of flexibility. The vast majority of consumers don't really care how the RAM is wired up. Lots of people buy iPhones and Macs with non-removable batteries, or hard to change RAM, people buy consumer electronics goods, and consoles with hardwired manufacturing techniques. Laptops seem to be extremely popular these days and people often just buy a whole new machine rather than try and upgrade it. There is an assumption that what people love about PCs is how interchangeable the parts are, but I think that assumption really only applies to a niche market, and that the broad market for computers could be very vertically integrated (like Apple has done) without most people giving a shit.

The reality is, an ALU is cheaper than a CPU core, and if your problem is embarrassingly parallel, then packing in the ALUs is better than throwing in more general purpose cores. For most of the workloads modern PCs do (outside games and multimedia), there is very little gain from additional CPU-core level parallelism.

Moreover, with the move cloud based computing and the web browser dominating the CPU time of most PCs out there, the purely single-threaded nature of Javascript and the browser core means a lot of power is simply going to waste. You could put 64 Nehelem cores on a chip, and it wouldn't speed up the subjective latency of most applications that people use on a daily basis, but scaling GPGPUs certainly does lead to a very measurable difference in games.

Thus, I would say, scaling CPU cores these days matters more for server environments, like Google's data-centers, and for the client-side, at this point, no one will notice much of a difference except on the few applications that tax a system, e.g. games, multimedia, content-creation apps.
DemoCoder is offline   Reply With Quote
Old 01-Sep-2009, 21:45   #22
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,700
Default

Quote:
Originally Posted by DemoCoder View Post
Are you being sarcastic?
Yes. I was debating about adding a smiley.

Although the 65nm Cedar Mill was a 2006 chip, cutting the timeframe down.
Penryn would have a better ratio, as Nehalem's uncore reduced the amount of die devoted to CPU cores.
If I started with Penryn, however, CPUs would have regressed.


Quote:
The fact that CPUs must operate with commodity DIMMs is not really relevant, it's purely a manufacturing decision by the PC industry that hamstrings performance at the cost of flexibility.
I think servers might share more of the blame. They gobble up very large amounts of RAM, and they are the market that pays the fat margins for CPUs. I don't think laptops have eclipsed this segment yet.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 01-Sep-2009, 22:01   #23
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,729
Default

Quote:
Originally Posted by 3dilettante View Post
Yes. I was debating about adding a smiley.
I'm glad, usually your analysis is super rock-solid, so when I saw this, I was like "oh, wait, he's obviously joking" given the adjectives used, but I doubted myself and thought maybe there was actually a valid argument I wasn't seeing and that the CPUs really were making more impressive gains.
DemoCoder is offline   Reply With Quote
Old 01-Sep-2009, 22:49   #24
aaronspink
Senior Member
 
Join Date: Jun 2003
Posts: 2,073
Default

Quote:
Originally Posted by Tahir2 View Post
Intel recently integrated the memory controller on die which helps with IPC. Unless your definition of IPC is pure logic dedicated to processing data internally in the "logic" units of the CPU.

Integrating memory controllers, PCI-E controllers, adding more cache, QPI/HT3.0 all of these help with IPC in given situations.
those are all effectively NNC (net no change) things. the logic was going to be somewhere, by bringing it on chip you reduce the number of external chips you need and lower overall total power requirements while increasing performance in some cases.
__________________
Aaron Spink
speaking for myself inc.
aaronspink is online now   Reply With Quote
Old 01-Sep-2009, 22:52   #25
aaronspink
Senior Member
 
Join Date: Jun 2003
Posts: 2,073
Default

Quote:
Originally Posted by rpg.314 View Post
GPU's do have a trick up their sleeve though (not sure if they'll follow the trail because of various reasons), increase clock speed. rv770 runs at 750Mhz, GT200 runs at ~1.5G. They can still scale along that axis.
They can but with their design styles and experience it is likely to be a net negative as it will increase their power and their design cycles in addition to increasing their design complexity.
__________________
Aaron Spink
speaking for myself inc.
aaronspink is online now   Reply With Quote

Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:54.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.