Larrabee: Samples in Late 08, Products in 2H09/1H10

dkanter · Jan 20, 2008

Arun said:
I think we agree, that's just semantics. What defines the market segment once the product is released is its price; but if you want to actually make money, what determines its price is performance. The reason I looked at it that way is that AP was suggesting Intel would squeeze NVIDIA out of the mid-range; my arguement was that this can only happen if Intel is competitive on performance-per-$, because otherwise their parts simply won't sell.

Sure. I'd point out thought that Intel has the advantage of bundling GPU with CPU.

Sorry to be the Devil's Advocate, but both Intel and AMD's Fusion-like graphics products in 2009 are 3 chips/2 packages architectures. NVIDIA has been 2 chips/2 packages (including the CPU) since Q3 2006 for AMD and Q3 2007 for Intel.

I dont think the 3 chip versus 2 chip argument is compelling. The cost to a system builder is largely a function of the package, rather than # of dice. An extra discrete in the system costs a lot more...simply moving a few million transistors from one side to another doesn't change the cost for a system vendor much.

Additionally, I think this is the kind of situation that Intel is very good at exploiting. Once upon a time there used to be discrete network cards for desktops...but that got sucked into the chipset.

The only point to doing it this way is to save about ~1 dollar and >=1 watt on the bus. That feels like a very backwards way to handle the problem given some of the very interesting advances in bus technology in the last few years. Actually, there's another slightly more cynical reason in Intel's case: they'd probably like to manufacture southbridges at Dalian/China on an old process instead.

Perhaps. Also, no matter what happens - going across a bus is a fundamentally expensive exercise in terms of power. The more localized your data transfers the cooler the system.

EDIT: Of course, in the longer-term, I agree on-CPU-chip integration makes the most sense. But that's not what either of those companies will do in 2009 apparently, let alone 2008. Well, unless AMD is being more aggressive than their latest public roadmaps imply, that'd be interesting...

Fair enough.

But volume doesn't really matter. Right now, the gross profit in the >=$150 segment of the market is significantly higher than in the <$150 segment of the market. I only expect this to become more true over time as other components become less expensive for a mid-range gaming PC (CPU, DRAM, Chipset, etc.)

Volume always matters. We've discussed this at RWT, and I doubt either of our views have changed over the last few weeks (although I'm hoping your's have : ) ).

Just because NVIDIA is fabless does not mean they don't worry about volume. There are still the costs associated with tools, developing your custom circuit libraries, etc. etc.

That's exactly the dynamic that I suspect happened in 1H07 when NVIDIA surprised everyone with incredibly strong earnings despite weak seasonality. DRAM prices crashed and CPU prices kept getting lower, so for a given segment of the market the GPU's ASPs went up.

So you believe that lower CPU prices --> consumers spend more on GPUs?

It's not strictly unimaginable, but I'm very skeptical of that. As I said, it's all about performance-per-dollar (and per-watt) both at the high-end *and* the low-end. If they can't win in the high-end, then the only place where they could still win is in the commodity 'performance doesn't matter' market. And honestly, I don't think Jen-Hsun cares:

Click to expand...

The problem is that a lot of people don't care about graphics performance. If in 5 years from now, 90% of the people are satisfied with integrated graphics, that's a really bad position to be in.

DK

Click to expand...

Arun · Jan 20, 2008

dkanter said:
Volume always matters. We've discussed this at RWT, and I doubt either of our views have changed over the last few weeks (although I'm hoping your's have : ) ).

Just because NVIDIA is fabless does not mean they don't worry about volume. There are still the costs associated with tools, developing your custom circuit libraries, etc. etc.

I thought I'd reply to this in a separate post so that my arguement is as clear as it can be. Here goes...

Operating Income = (Volume * (ASPs - Unit Cost)) - (R&D Costs + Administrative Costs).
Fabless: Unit Cost = (Wafer Price / Units per Wafer).
Fab: Unit Cost = (Wafer Cost / Units per Wafer) + (Factory Cost / Volume).
Fab Gross Profit = Factory Cost + (Volume * (ASPs - Smaller Unit Cost)
Factory Cost = f(Target Volume) where f(2x) < 2f(x).

As can clearly be seen from these relationships, profit for fab-heavy companies is a superlinear function of volume, while it is a linear function for fabless companies. Clearly, that means it's still important; but some of the dynamics from the CPU world you try to extrapolate to GPUs simply don't apply.

For a fabless company, selling 1M chips with a gross profit of $100 per chip has the exact same financial results as selling 100M chips with a gross profit of $1 per chip. In the world of fab heavyweights, that's obviously not the case (depending on your definition of gross profits but you get the point). Of course, you'd rather sell 2M chips with a gross profit of $100 per chip, but that's another debate completely and has nothing to do with my arguement.

As a side note, it *is* noteworthy that this claim becomes slightly invalid if the company fabless company we're talking about uses a significant amount of the volume at its foundry. However, I don't think NVIDIA is responsible for much more than 10% of TSMC's business right now, so that's not a very significant factor yet.

Arun · Jan 20, 2008

dkanter said:
Sure. I'd point out thought that Intel has the advantage of bundling GPU with CPU.

Agreed, and that's certainly not negligible. Fighting Intel/AMD bundling is what Hybrid SLI is all about though: the idea is that as long as NVIDIA has superior discrete GPU solutions, NVIDIA chipsets also are at an advantage. That's why every single future NV chipset will include an IGP.

OEMs often use the same base setup for different standard configs; some only have an IGP, others have a discrete GPU. What do you think happens when your discrete GPU configs are a disadvantage because your IGP is from Intel? Another advantage is that you can now sell more chipsets, since people now have a good reason to use your chipsets with your GPUs even if they're not doing SLI.

In theory, that strategy is pure genius IMO. In practice, their current execution with MCP78 feels very subpar, so we'll see how it actually turns out.

I dont think the 3 chip versus 2 chip argument is compelling. The cost to a system builder is largely a function of the package, rather than # of dice. An extra discrete in the system costs a lot more...simply moving a few million transistors from one side to another doesn't change the cost for a system vendor much.

If that was completely true, we'd see a lot more SiPs in the handheld market than we do today. I agree that the cost difference isn't very large, but it's definitely there.

Perhaps. Also, no matter what happens - going across a bus is a fundamentally expensive exercise in terms of power. The more localized your data transfers the cooler the system.

Agreed, I was simply pointing out that perhaps it wasn't worth all the fuss - especially as a superior architecture might save you a lot more power than that. Of course, if you combined better architecture and better locality, then that's ideal.

So you believe that lower CPU prices --> consumers spend more on GPUs?

I don't think a non-gamer is going to see it that way, no. But there were two important factors in 2007:
- Many IGPs not being optimal for Vista and H.264/VC1 Video Decoding. This and the reduction in cost for CPUs/DRAM likely made discrete GPUs more attractive for OEMs in non-gaming systems.
- The performance gap between $200 and $300 GPUs was huge (8600 GTS 256MiB vs 8800 GTS 320MiB). Given the lower cost of CPUs and DRAM, this likely encourage mid-range gamers to buy a $300-400 GPU instead of a $200 one with what they saved on their other components.

Going into 2008, CPU and DRAM prices aren't going down as fast, $150-$200 GPUs are incredibly good deals, and IGPs are becoming 'good enough' once again. That implies 2007 was likely more of an exception than the rule, IMO. Although there still are some interesting factors going against that:
- Intel's G45 has proper video decoding, but G43 still doesn't. That risks making competing IGPs or discrete GPUs more attractive to a part of the market.
- The performance in the high-end is going to go up LOL-fast, and that's probably an understatement.

I guess the big thing to watch in 2008 is whether NVIDIA's IGPs for Intel CPUs start taking a serious amount of marketshare. I'll make a bold prediction here: By the end of 2008, they'll have 20 to 25% of the Intel IGP market. That's not just a random number, I actually estmated that based on the market share of a few key OEMs that tend to be pro-NV, and the void that VIA will leave in the market once they exit in April 2008.

The problem is that a lot of people don't care about graphics performance. If in 5 years from now, 90% of the people are satisfied with integrated graphics, that's a really bad position to be in.

Nearly 10 years ago, VIA thought that very shortly, a good chunk of the PC market would become a commodity and that single-chip solutions would dominate. But it turns out they lost their bet, and that there won't even have been a single decent single-chip PC solution this decade.

Either way, the problem here is the exact same for CPUs: how can you justify more than a dual-core for Joe Consumer? That's also a very bad spot to be in. In the end, that part of the industry will become a integration-focused commodity market competing exclusively based on price and brand awareness.

However, GPUs do have one advantage: they still have the potential to become more flexible and thus increase the size of their target market. That's what GPGPU is all about; but as Baron said, you don't want to think just about HPC. You also want to think about what GPGPU can do in the consumer space. That's where the fight will be in the next 5 years, IMO.

Thorburn · Jan 20, 2008

Arun said:
Sorry to be the Devil's Advocate, but both Intel and AMD's Fusion-like graphics products in 2009 are 3 chips/2 packages architectures. NVIDIA has been 2 chips/2 packages (including the CPU) since Q3 2006 for AMD and Q3 2007 for Intel.

The only point to doing it this way is to save about ~1 dollar and >=1 watt on the bus. That feels like a very backwards way to handle the problem given some of the very interesting advances in bus technology in the last few years.

Probably irrelevant, but just wondered anyway. Would this approach not also bring advantages in the form of moving the IGP closer to the memory bus, since on both AMD and Intel based designs in this time frame the memory controller will be on the CPU.

Arun · Jan 20, 2008

Thorburn said:
Probably irrelevant, but just wondered anyway. Would this approach not also bring advantages in the form of moving the IGP closer to the memory bus, since on both AMD and Intel based designs in this time frame the memory controller will be on the CPU.

That's what I meant by saving on the bus, although as you're probably thinking there's also the advantage of lower latency. But that's not a too big of a problem for GPUs so I didn't focus on it.
Oh, and the memory controller won't be on-chip for the dual-core in that timeframe: http://www.beyond3d.com/content/news/540

Thorburn · Jan 20, 2008

Arun said:
That's what I meant by saving on the bus, although as you're probably thinking there's also the advantage of lower latency. But that's not a too big of a problem for GPUs so I didn't focus on it.
Oh, and the memory controller won't be on-chip for the dual-core in that timeframe: http://www.beyond3d.com/content/news/540

Ah yes looks like you are right there, although IGP and MC in one package is still closer than having it sat on the northbridge and the memory controller in the CPU...

I was thinking mostly in terms of latency, but also about bandwidth as if that article is correct then the bandwidth between NB and CPU is reduced vs. current parts.

ArchitectureProfessor · Jan 20, 2008

Demirug said:
In theory. But even writing a fast Direct3D 10 software rasterizer isn’t an easy job.

Here in the ivory tower of academia, we have a phrase for this: "It's just a simple matter of programming".

No really, I totally understand that software is difficult. But it is soft (malleable, changeable, upgradable), where hardware is, well, hard. Harder to change, to add new features to.

An a different note, another trend in computer hardware is from special-purpose chips to more general-purpose chips. We've seen that for media processing. Audio processing/compression/decompression used to need a special DSP chip (in fact, the NeXT Stations had a on-board DSP for just that reason). Today? All done on a general-purpose processor. The same is true with video. Special-purpose chips were used, but today they are much more general purpose. Perhaps they are still tailored to the specific domain, but there is a much larger (and more flexible) software component involved.

GPUs have shown a similar progression. From totally fixed function to pretty general purpose. Prior generations had separate vertex and pixel shaders (I think those are the right terms, but forgive me if they aren't), but the G80 uses just a single type of unit for both. GPGPU is a result of that generalization.

So, as CPUs become multicore (like GPUs) and GPUs become more general purpose, they might eventually blend together to become the same thing. I don't think both Intel and NVIDIA can win that battle (unfortunately). Perhaps Intel and NVIDIA will be so focused on each other than AMD/ATI can outflank them and gain market share. Unfortunately, I'm betting that Goliath will make a good showing on this one.

ArchitectureProfessor · Jan 20, 2008

dkanter said:
I have to admit, I don't know the top 10 CE schools for sure, but I'd guess that means:

Stanford, MIT, Berkeley, UIUC, UT-Austin, UWisconsin-Madison, UWashington, UMichigan, perhaps Harvard and Princeton.
David

When I said "big-10", I actually meant schools in the Big Ten. I realize it was ambiguous.

That said, your list of two schools is a good list of some of the top CE schools. I wouldn't include Harvard, but I would include Princeton. For computer architecture specifically, I also include UC-San Diego (UCSD) and Georgia Tech.

ArchitectureProfessor · Jan 20, 2008

Memory bandwidth limitations of future CPU-GPUs

Thorburn said:
Probably irrelevant, but just wondered anyway. Would this approach not also bring advantages in the form of moving the IGP closer to the memory bus, since on both AMD and Intel based designs in this time frame the memory controller will be on the CPU.

One of the reasons IGPs are slow is they use the system's main DDR (or whatever) DRAM, which has much less bandwidth than GDDRx memory. The GDDR family of memory have really high bandwidth, but it is more expensive per bit. Near-term GPUs will have, what, 50GB/second or 100GB/second (or more) of memory bandwidth? Current high-end CPUs have maybe 6 or 12 GB/second of memory bandwidth. This is almost 10x in bandwidth.

For fully-integrated graphics to be competitive with discrete GPUs, they CPU manufactures are going to need to either (1) find something better than DDR for memory bandwidth or (2) put on enough cache that memory bandwidth is no longer the bottleneck, or (3) use some combination of DDR and GDDR memory controllers on chip, or (4) bite the bullet and use all GDDR.

I think that #1 and #2 aren't going to happen any time soon. Option #3 is pretty interesting from on architectural design perspective. Perhaps the main OS could manage the GDDR memory by deciding where to allocate each page of physical memory. Or, the hardware could detect "high bandwidth" pages and automatically migrate those pages to the GDDR. Option #4, at least right now, is too expensive.

The multi-core CPU manufactures need to figure this out anyway, because a 8-core (16-thread) or more system is also going to need more bandwidth than DDR can comfortably supply. In that past, CPUs programs typically don't need that much memory bandwidth (multi-megabyte caches work well for many CPU programs), but with multi-core CPUs that is less and less true. Two effects are causing a memory bandwidth squeeze. First, the cache per core is actually decreasing or staying the same. Second, the number of threads/cores generating misses is going up.

Oh, there is one more option. Intel has been playing around with die stacking. I've been rambling on about "integration" earlier. The actual DRAM is another prime candidate for being sucked into the package. Apparently some reasonable fraction (10-20%) of the die area (and power consumption) of DRAM is just for the high-speed circuits for moving the data on and off chip. If Intel was to stack some DRAM on top of a multi-core CPU (or GPU) chip, that will give ridiculous amounts of memory bandwidth for less overall power (and die area). Last I heard someone from Intel talk about this, however, they were discouraged, because the DRAM manufactures weren't interested, and Intel didn't really want to get back into the business of fabricating DRAM.

Consider a future system with four layers of DRAM on top of a CPU. That would likely provide a few GBs of memory capacity, maybe 20 or 30ns access time, and something like a terabyte of memory bandwidth. Yea, heat is going to be an issue, but that can be mitigated somewhat (DRAM is lower power than a CPU and DRAMs typically don't have hot spots), but I think we'll get there eventually. Heck, combine that with a solid-state disk of some sort, and a GB or two of DRAM might be enough (as swapping to and from disk could be very fast).

Obviously, this isn't the 2009/10 timeframe of Larrabee, but maybe by 2015 or so?

Demirug · Jan 20, 2008

ArchitectureProfessor said:
Here in the ivory tower of academia, we have a phrase for this: "It's just a simple matter of programming". No really, I totally understand that software is difficult. But it is soft (malleable, changeable, upgradable), where hardware is, well, hard. Harder to change, to add new features to.

Beside of build some simple logic circuits I was always a software guy. But I understand the problem of modern hardware design.

ArchitectureProfessor said:
An a different note, another trend in computer hardware is from special-purpose chips to more general-purpose chips. We've seen that for media processing. Audio processing/compression/decompression used to need a special DSP chip (in fact, the NeXT Stations had a on-board DSP for just that reason). Today? All done on a general-purpose processor. The same is true with video. Special-purpose chips were used, but today they are much more general purpose. Perhaps they are still tailored to the specific domain, but there is a much larger (and more flexible) software component involved.

Creative will disagree.

But you are right that sound processing is moving in the domain of the CPU. XAudio 2 doesn’t support dedicated sound hardware anymore. IMHO this is a direct result of the multi core CPU advent. Sound processing is something that can easily moved to another core.

ArchitectureProfessor said:
GPUs have shown a similar progression. From totally fixed function to pretty general purpose. Prior generations had separate vertex and pixel shaders (I think those are the right terms, but forgive me if they aren't), but the G80 uses just a single type of unit for both. GPGPU is a result of that generalization.

I agree but to boost GPGPU we need generalization on the software side too. Currently OpenGL and Direct3D are simply not the best tool for the job.

ArchitectureProfessor said:
So, as CPUs become multicore (like GPUs) and GPUs become more general purpose, they might eventually blend together to become the same thing. I don't think both Intel and NVIDIA can win that battle (unfortunately). Perhaps Intel and NVIDIA will be so focused on each other than AMD/ATI can outflank them and gain market share. Unfortunately, I'm betting that Goliath will make a good showing on this one.

Don’t get me wrong. I am all the way for more programmability. If someone can build a CPU that can beat a GPU when it comes to 3D graphics I will be more than happy.

I will be even happier if Intel’s move will reduce the gap between a common PC and a gaming PC. This gap cause big headaches today.

But all this requires that Intel understand the needs of applications that uses GPU. Something they failed in the past. In the 3D business building the hardware is only the beginning. The software part ,called driver, become more and more important.

Arun · Jan 20, 2008

ArchitectureProfessor said:
The same is true with video. Special-purpose chips were used, but today they are much more general purpose. Perhaps they are still tailored to the specific domain, but there is a much larger (and more flexible) software component involved.

I think that's mostly right, but there's still are a lot more fixed-function hardware blocks out there than you seem to think there are. That's the difference between companies claiming VGA H.264 @ 400mW and those claiming 720p H.264 @ 40mW...

Programmable elements remain important because any block that's idling during the decoding process is still wasting power. So what you really want is a delicate mix of fixed-function blocks and programmable engines. Or well, that's the theory - in practice, there always are those ambitious enough to challenge the traditional wisdom, such as Icera with their full-custom pure-software 3G baseband chip.

GPUs have shown a similar progression. From totally fixed function to pretty general purpose. Prior generations had separate vertex and pixel shaders (I think those are the right terms, but forgive me if they aren't), but the G80 uses just a single type of unit for both. GPGPU is a result of that generalization.

Those are the right terms, and agreed...

So, as CPUs become multicore (like GPUs) and GPUs become more general purpose, they might eventually blend together to become the same thing.

Current CPUs are oriented towards maximizing serial throughput by extracing ILP out of increasingly complex workloads. GPUs are massively parallel and latency tolerant. I honestly think those two things are just so fundamentally different you can't merge them; which is why Larrabee is in-order and won't be competing against Nehalem.

But there's another aspect to this: MIMD vs SIMD. That's the other fundamental distinction between current CPUs and current GPUs: even G80 is fundamentally made of 16-wide SIMD processors, and that fundamentally limits what you can do on it. So the question is, are GPUs going to become full-MIMD? The answer is... maybe.

If you want that to happen, you need a way to keep the control overhead really low, and that's going to require some really exotic tricks in terms of ISA, possibly beyond what the academia is looking into (which shouldn't be very surprising since there doesn't seem to be that much focus on massively parallel processors out there - sad, really, because companies like Ambric are proving it's a very interesting field even beyond GPUs).

I don't think both Intel and NVIDIA can win that battle (unfortunately). Perhaps Intel and NVIDIA will be so focused on each other than AMD/ATI can outflank them and gain market share. Unfortunately, I'm betting that Goliath will make a good showing on this one.

I agree AMD definitely has a lot of potential there. However, it also depends a lot on execution and how aggressive these companies are timeframe-wise. We'll see how it turns out in the next few years...

ArchitectureProfessor · Jan 20, 2008

Arun said:
Nearly 10 years ago, VIA thought that very shortly, a good chunk of the PC market would become a commodity and that single-chip solutions would dominate. But it turns out they lost their bet, and that there won't even have been a single decent single-chip PC solution this decade.

I think VIA's problem was that their x86 CPU (and perhaps GPUs) just wasn't in the same league as their x86 (and GPU) competitors. I just don't think they found a good market niche.

That doesn't mean they were wrong in thinking that single-chips would dominate. They either executed poorly or were just off by a decade or so.

As I said early in the thread, I personally find that it is easier to predict what will happen then *when* it will happen. It seems like most of the key to making money requires knowing the "when" part as well.

Either way, the problem here is the exact same for CPUs: how can you justify more than a dual-core for Joe Consumer? That's also a very bad spot to be in.

Yea, the CPU manufactures are scared out of their minds. But---so far---their strategy has been working okay. The server space is just fine. The real key is how quickly do interesting desktop applications start to really use the multi-core goodness. Video compression is already there. Photoshop is either there or soon to be there. Hopefully games that really exploit multi-core are under development (those reading this board would know more about that than me). One thing in the CPUs manufactures favor is that it is much easier for programmers to migrate from two core to four cores than from one core to two cores. That is, once you already have a program that exploits multiple cores, tuning it to run under more cores is easier than the initial conversion.

However, GPUs do have one advantage: they still have the potential to become more flexible and thus increase the size of their target market. That's what GPGPU is all about; but as Baron said, you don't want to think just about HPC. You also want to think about what GPGPU can do in the consumer space. That's where the fight will be in the next 5 years, IMO.

I think the debate about multi-core CPUs vs GPUs for graphics is a really interesting one. However, once it comes to GPGPU, it seem a many-core CPU has lots of advantages over GPUs. I guess at some point with GPGPU, GPUs or multii-core CPUs will basically look the same (lots of cores). I suspect that programmers will either find the locks-and-threads model familiar enough or that the various GPGPU languages can be compiled to a multi-core CPU just as well. GPUs will likely still have the edge in peak performance, but real-world performance might go to the CPUs.

Thorburn · Jan 20, 2008

Arun said:
Either way, the problem here is the exact same for CPUs: how can you justify more than a dual-core for Joe Consumer? That's also a very bad spot to be in. In the end, that part of the industry will become a integration-focused commodity market competing exclusively based on price and brand awareness.

Awesomely thats a fairly big part of my job at the moment, answers on a postcard? :???:

ArchitectureProfessor · Jan 20, 2008

Demirug said:
But all this requires that Intel understand the needs of applications that uses GPU. Something they failed in the past. In the 3D business building the hardware is only the beginning. The software part ,called driver, become more and more important.

Intel is a big company. Like all big companies, they don't have one big monolithic hive mind (Google might be the only exception to that today, but I digress).

Larrabee has no lineage back to Intel's IGP. The folks doing Larrabee came right from some of Intel's top CPU design groups in Oregon. It was actually funded by Intel venture capital as an internal skunk works project. The Larrabee guys fully understand how important the software part of the equation is. Heck, they have bet everything on the idea that they can do graphics computations on mostly general-purpose CPUs. To make that claim credible, they have likely had software guys on board from the very beginning.

I don't think that Intel's IGP lack of performance says *anything* about Larrabee.

ArchitectureProfessor · Jan 20, 2008

Arun said:
...possibly beyond what the academia is looking into (which shouldn't be very surprising since there doesn't seem to be that much focus on massively parallel processors out there - sad, really...

I agree that massively parallel processing our out of vogue in academic research. To academia, "massively parallel" means "big scientific computations that only the government and a few scientists on federal grants really cares about".

The problem is that so much work has been done on massively parallel processor over the last four decades, and---for the most part---nothing came of it. It just refused to take off. Every decade startup companies come and go (Thinking Machines and KSR are great example of ones that came out of MIT). Academics basically gave up on it.

Yet, here we are with GPUs. Where as the Core 2 Duo really doesn't look that different from a CDC 6600 or old IBM 360/91 (you just have to squint a bit), GPUs look radically different than *any* computer produced in that sort of volume. There has been a quite revolution going on, and very few people noticed until the last few years. Graphics was an application in which parallelism both made sense and had enough volume to drive down costs. With GPGPU or GPCPU, either way we actually have high-volume low-cost massively parallel chip for doing general-purpose computations.

This is a radical departure from the past. This will make the future very exciting.

Arun · Jan 20, 2008

ArchitectureProfessor said:
I think VIA's problem was that their x86 CPU (and perhaps GPUs) just wasn't in the same league as their x86 (and GPU) competitors. I just don't think they found a good market niche.

It's not just that. Integration doesn't make sense when a 'good enough' CPU is ~100mm² and a 'good enough' GPU is also ~100mm². The cost savings there really aren't as obvious as with two smaller chips.

Anyway, if you want a good example of a company that doesn't think integration will kill them, that's CSR, the leader in bluetooth single-chip solutions. Their expectation, and I agree with them, is that in 5 years they'll still have a good 40%+ of the market (versus their current 50%). The key is superior architecture (better performance, lower power, lower costs) and minimizing package cost and size to the extreme. And that's with chips that only cost about $3!

The real key is how quickly do interesting desktop applications start to really use the multi-core goodness. Video compression is already there. Photoshop is either there or soon to be there.

The real key is... what prevents NVIDIA/AMD from accelerating those exact same applications on GPUs? Photoshop not just going to support dual-core CPUs, it's also going to support OpenGL and CUDA for GPGPU processing...

As for video encoding, things like motion estimation are very well suited to modern GPU shader cores. Other parts of the encoding process are totally undoable there, but I'm sure a small special-purpose unit would do the job. I'd be very surprised if both AMD and NV weren't working on this as we speak. (well, maybe not, it's a sunday after all... but you get the point!)

Hopefully games that really exploit multi-core are under development (those reading this board would know more about that than me). One thing in the CPUs manufactures favor is that it is much easier for programmers to migrate from two core to four cores than from one core to two cores.

Yeah, I wouldn't be too worried about gaming. There definitely are uses for many-core CPUs there; although if GPUs became massively parallel MIMD machines, that could change radically (AI, Physics, etc. could be 100% offloaded from the CPU).

However, once it comes to GPGPU, it seem a many-core CPU has lots of advantages over GPUs.

I don't really understand how a processor that tries to extract maximum ILP can achieve superior aggregate performance in many-core configurations. After all, the control overhead as a function of average ILP tends to be superlinear...

I guess at some point with GPGPU, GPUs or multii-core CPUs will basically look the same (lots of cores).

See my previous post: the key differences are ILP maximization vs latency-tolerance and MIMD vs SIMD. It's not just a matter of programmability; those are fundamental traits of the architecture and the workloads it is tuned towards. You just can't get away from that.

GPUs will likely still have the edge in peak performance, but real-world performance might go to the CPUs.

GPUs have traditionally been much more capable of reaching their peak performance than CPUs, both in synthetic workloads and real-world scenarios. And that shouldn't be very surprising given their ILP is much more constant and they are inherently latency tolerant!

MTd2 · Jan 20, 2008

ArchitectureProfessor said:
"The Act also affected copyright terms for copyrighted works published prior to January 1, 1978, also increasing their term of protection by 20 years, to a total of 95 years from publication."

That said, you can only copyright the actual floorplan or RTL for a chip. You can't copyright an operation or any such thing. That is what patents are for.

ArchitectureProfessor, please, consider that he was not using the proper terms. Mask right protection has both characteristis of copyright and a patent. In fact it is closer to patents to utility models, because they don't show a unity of inovation, yet, they just increment on existing model.

Let's see an example, from the USA law, which is bascialy similar to everywhere in the world because of TRIPS agrement:

§ 904. Duration of protection
How Current is This?
(a) The protection provided for a mask work under this chapter shall commence on the date on which the mask work is registered under section 908, or the date on which the mask work is first commercially exploited anywhere in the world, whichever occurs first.
(b) Subject to subsection (c) and the provisions of this chapter, the protection provided under this chapter to a mask work shall end ten years after the date on which such protection commences under subsection (a).
(c) All terms of protection provided in this section shall run to the end of the calendar year in which they would otherwise expire.

So, let's view some examples, which will prove you that NVIDIA can come up with a FULL fledged GPU that can compete with AMD and INTEL, and so NVIDIA can go on, making their projects of world domination, if they wish. http://download.intel.com/technology/architecture/new-instructions-paper.pdf , page 5.

See that even SSE wil be available on jan 1st 2010, and SSE2 on jan1s 2011.AMD's X86 64 will be available even earlier, on jan. 1st 2010, and that's most.

Can you explain me how NVIDIA will be beaten to death?

ArchitectureProfessor · Jan 20, 2008

Arun said:
As can clearly be seen from these relationships, profit for fab-heavy companies is a superlinear function of volume, while it is a linear function for fabless companies. Clearly, that means it's still important; but some of the dynamics from the CPU world you try to extrapolate to GPUs simply don't apply.

One way to think of Larrabee is as a fabless design for which Intel is acting as a foundry.

No, really. Intel has enough CPU volume that adding a few graphics chips into the mix is just fine.

In the past, Intel was of the opinion that they would get lower return on investment by fabricating anything other than CPUs on its high-end fabs. That includes DRAM, graphics, chipsets, etc. I think Intel is slowly changing its tune on that for Larrabee.

As a side note, it *is* noteworthy that this claim becomes slightly invalid if the company fabless company we're talking about uses a significant amount of the volume at its foundry. However, I don't think NVIDIA is responsible for much more than 10% of TSMC's business right now, so that's not a very significant factor yet.

Most of TSMC's business doesn't require the bleeding edge process technology that NVIDIA might want sometimes. Sun Microsystems has been fabless for a long time (or maybe forever) with TI fabricating its SPARC sever chips. The problem with that is TI's semiconductor process is tuned for DSPs and embedded chips, not big server chips. I've heard former Sun designers say that using the TI fab really handicapped them in terms of competing on cost & performance with Intel and IBM.

NVIDIA might have similar problems with TSMC.

I don't know much about TSMC's fabrication process. What I do know is that Intel's (and most everyone else's) 65nm process wasn't really very good. Transistors were leaky, has more variation in speed and power, and not that much faster than 90nm. It seems that Intel's 45nm process with metal gates and lot of other really advanced geometric tricks with the transistors have made its 45nm process *much* better than its predecessor.

How will TSMC, IBM, and AMD's 45nm process compare to Intel's? I'm actually not sure. What I do know, is that Intel finally has its ducks in a row (good process technology, 64-bit x86, a return to reasonable micro-architectures, good multi-cores, advanced system interconnects, and on-chip memory controllers). All of Intel's blunders that allowed AMD to catch up (especially 64-bit x86 and on-chip memory controllers), they have basically fixed.

So, the end question becomes: how much advantage (in terms of cost/power/performance) will Intel's process technology give it over NVIDIA? 10%? 20%? More?

ArchitectureProfessor · Jan 20, 2008

MTd2 said:
ArchitectureProfessor, please, consider that he was not using the proper terms. Mask right protection has both characteristis of copyright and a patent. In fact it is closer to patents to utility models, because they don't show a unity of inovation, yet, they just increment on existing model.

Interesting. I've never before know than masks have special rules for them. Are there any rules that say Intel must give anyone their masks? Can't they just keep them fully in house?

In addition, anything that is patented is still under patent, right? Just because the specific mask loses protection, does that invalidate the patent?

I'm actually honestly asking, as I've never heard about this before.

So, let's view some examples, which will prove you that NVIDIA can come up with a FULL fledged GPU that can compete with AMD and INTEL, and so NVIDIA can go on, making their projects of world domination, if they wish.... Can you explain me how NVIDIA will be beaten to death?

My arguments above have been on technical grounds, not based on intellectual property issues.

AMD and Intel have a IP cross licensing agreement. Yet, AMD is behind Intel right now? Why? Well it isn't for intellectual property reasons.

The actual engineering matters (as well as non-technical market forces).

ArchitectureProfessor · Jan 20, 2008

Arun said:
...the key differences are ILP maximization vs latency-tolerance and MIMD vs SIMD. It's not just a matter of programmability; those are fundamental traits of the architecture and the workloads it is tuned towards.

When I've used the term "CPU", I'm not just talking about large out-of-order multiple-issue beasts (like the Core 2, for example). I'm also talking a simple, less-superscalar, in-order cores. Cores like the three-core Xenon used XBox 360. Cores like the in-order single-issue cores in the eight-core Niagara. Cores like Intel's Silverthorne (its new super low-power x86 chip).

I completely agree that a multi-core of Core 2s (something like Nehalem) simply isn't going to compete with a GPU for flop per watt or flop per dollar.

However, there has been lots of talk about chips (somewhat like Cell) that has a few bigger cores and many smaller "off load" cores. It is hard to tell exactly what Intel is working on right now (they kill off projects and start new ones up all the time), but at least project I've heard of was working on just that sort of same-ISA but different size cores system.

It may be that Intel has already missed its window that that GPGPU will take off and swamp their multi-core roadmap. That would be an interesting development. Without Larrabee on the horizon, I would be worried about just that happening. But at least with Larrabee we know that at least someone at Intel is paying attention.

One more thing to consider: what if some important parallel algorithms have a hard time scaling past 32 or 64 cores? What if some computations can't take advantage of the 128+ units organized in a SIMD fashion on the GPU? In those cases, CPUs (maybe even beefier CPUs might have some advantages).

Seymore Cray once said something like: "If you're plowing a field, would you rather have two strong oxen or 1024 chickens?"

Larrabee: Samples in Late 08, Products in 2H09/1H10

dkanter

Arun

Unknown.

Arun

Unknown.

Thorburn

Moderator

Arun

Unknown.

Thorburn

Moderator

ArchitectureProfessor

ArchitectureProfessor

ArchitectureProfessor

Demirug

Arun

Unknown.

ArchitectureProfessor

Thorburn

Moderator

ArchitectureProfessor

ArchitectureProfessor

Arun

Unknown.

MTd2

ArchitectureProfessor

ArchitectureProfessor

ArchitectureProfessor

Similar threads