Larrabee: Samples in Late 08, Products in 2H09/1H10

How does that explain Intel's dominance of the low-end graphics market? Intel has captured much of the low-end graphics market, but yet it hasn't (yet) captured any of the high-end market.

Don’t overrate the Intel dominated IGP market. Intel has played with dirty tricks in the past to get their IGP chipsets in the market. The results of this are many systems with an IGP and a discrete GPU. Additional beside of the casual market game developers don’t care about Intel’s graphics solutions.

Therefore they are not much better then a new player when it comes to graphics solutions for gamers. To make things worse Intel has a bad reputation when it comes to drivers.

Here's another big claim. Once GPU/CPUs just become more generic MIMD/SIMD cores, what Microsoft does with DirextX just won't matter as much. You want some new feature, just buy a cross-platform library from a company that does it. No need to wait for Microsoft to bless it. This is why Intel's acquisition of Havok is so interesting...

Once you have enough flops, why bother going through DirectX? Or rather, why DirectX versus some other graphics libraries from some other company?

Unification. I am sure that there are not many developers out there who want a second glide disaster. The PC game market is not large enough to limit yourself to a subset of the systems out there. Therefore any API that will not work with all hardware solutions of the same generation would be hardly used.
 
I have a question for you. As a game developer, do you find that you're forced to use lock-free or other techniques that don't use locks to fully synchronize access to shared memory?

IMHO games still fight too much with multi core support in general. If we get better with this we can start worry about the cycles we lost with locked synchronization. And even then we need to be very careful with what we are doing. There are too many developers working on a single codebase for a game today.
 
Unification. I am sure that there are not many developers out there who want a second glide disaster.

Not being a game developer, I'm not sure what the glide disaster was. Can you say more?

Also, is DirectX used for game development for systems like the Wii and PS3? If not, how do developers target all these platforms? I assume that DirectX works well for PC and XBox, but what about the other systems? How much does Microsoft control?

The PC game market is not large enough to limit yourself to a subset of the systems out there. Therefore any API that will not work with all hardware solutions of the same generation would be hardly used.

I was thinking more along the lines of Havok and AGEIA. Both companies make libraries that work on all the major gaming systems.

With AGEIA, they have a business model in which they give away the library for free to developers, as it runs best on their physics acceleration hardware (but it does run okay on other hardware). Then they make their money from selling the acceleration hardware to consumers. Seems like a pretty cool business model.

I'm not sure about Havok's business model, but now that they have been bought by Intel, it seems like Havok's engine will likely be highly optimized for Larrabee.

BTW, I've heard that AGEIA is for sale. It seems like the only eligible suitors would be AMD/ATI or NVIDIA. I think Intel wouldn't be allowed to buy them for anti-trust reasons (as Intel already owns Havok). It may be a bit off topic, but I wonder which company is going to purchase AGEIA? Might it be AMD/ATI to mirror Intel's purchase of Havok? Or NVIDIA?
 
How does that explain Intel's dominance of the low-end graphics market? Intel has captured much of the low-end graphics market, but yet it hasn't (yet) captured any of the high-end market.
That 'dominance' is largely a myth nowadays: a) The double-attach rate is very high (discrete GPUs being added on top of IGPs), which means it's actually NV which has the largest overall 'used' GPU share. b) Much of Intel's IGP volume is in the enterprise market where GPUs are mostly useless.

Either way, Intel's chipsets fight in an ultra-low-end 'commodity' section of the market where performance doesn't really matter. That might not be completely true of their next-gen chipset coming out in Q2, the G45. We'll see how that one goes and how much market share it obtains in the consumer market; call me highly skeptical for now, though, especially given at least one non-design-win I'm privy to.

I don't think you understood my point. There is no magical reason for Intel to be able to compete in the mid-range but not in the high-end. You need to lose your CPU mindset: that's a very normal thing to happen there because you can't just double your IPC by having twice as many units. But since graphics is massively parallel, you can just double the number of execution units and you'll ~double your performance. That means per-unit performance is completely irrelevant and that what matters is performance/(mm²||$) and performance/watt.

As for volume: what matters in the fabless world is gross profit, not volume. You amortize fabs via volume, but you amortize R&D via gross profits. It's an incredibly important distinction that the likes of David Kanter still don't seem to have properly understood (;)) but ah well. Of course, higher volume increases gross profit - but so does higher ASPs. So the factors are once again slightly different there.

I'm kinda tired of CPU experts being completely unable to consider the graphics market dynamics properly. If you still don't agree with me, may I suggest rereading what I said until you do? :D

Here's another big claim. Once GPU/CPUs just become more generic MIMD/SIMD cores, what Microsoft does with DirextX just won't matter as much. You want some new feature, just buy a cross-platform library from a company that does it. No need to wait for Microsoft to bless it. This is why Intel's acquisition of Havok is so interesting...
I think it's much more accurate to say that Direct3D will become significantly more lightweight. As long as there is more than one graphics processor manufacturer, there will be a need for standardization because unlike with x86, they certainly aren't going to agree on an ISA!

Now, if you made that claim a bit more specific, I might agree with you: if every manufacturer had a GPGPU API that exposed the chip in a better way, I could imagine AAA game engine developers to create highly optimized paths for the different manufacturers using those proprietary APIs. I don't think that's very likely either, but heh.

Once again, the key word here is 'standardization'. Your average developer doesn't want to have completely different rendering engines for every form of hardware out there.

This is part of what I mean about the "GPU mindset". Those into GPUs think about specific way of doing things, from what sort of computations work well, to what sort of computations don't work well (anything with fine-grain synchronization), to what works and does not ("caches don't work as well for GPUs as for CPUs"), who provides the middleware (DirectX versus whatever else).
I'm not sure I fully understand or agree with these assertions. Caches don't work well? They work very nicely, thank you very much - there's just no point having bigger ones because data accesses in the graphics world tend to be either very regular or very random.

Also, I'm still not sure you fully understand what DirectX does. You DO realize it's just an abstraction layer to the hardware, and as far from a renderer as can be, right? Maybe I just don't understand your phrasing here, sorry if that's the case. As for fine-grain synchronization, certainly that won't work in DirectX - but depending on your definition, that's supported in CUDA in a perfectly good way.
 
Not being a game developer, I'm not sure what the glide disaster was. Can you say more?

Glide was the proprietary API for the 3dfx Voodoo 3d accelerators. It doesn’t work with any other 3d accelerators. In the beginning that wasn’t a big problem as 3dfx dominates the market. But then more and more companies started to build 3d accelerators and 3dfx lost market share. If your game was build on glide your customer base started to melt away, too.

Also, is DirectX used for game development for systems like the Wii and PS3? If not, how do developers target all these platforms? I assume that DirectX works well for PC and XBox, but what about the other systems? How much does Microsoft control?

Console systems have their own APIs. But if you look at the number of game copies sold for PCs and the different Consoles this isn’t a big problem.

With AGEIA, they have a business model in which they give away the library for free to developers, as it runs best on their physics acceleration hardware (but it does run okay on other hardware). Then they make their money from selling the acceleration hardware to consumers. Seems like a pretty cool business model.

This is not cool this is despair. After they tried to sold the physic engine they first opened it for non commercial projects and finally for everyone. This is their last hope to getting more games support their hardware.

I'm not sure about Havok's business model, but now that they have been bought by Intel, it seems like Havok's engine will likely be highly optimized for Larrabee.

Havok sell their engine to game developers.
 
That 'dominance' is largely a myth nowadays: a) The double-attach rate is very high (discrete GPUs being added on top of IGPs), which means it's actually NV which has the largest overall 'used' GPU share. b) Much of Intel's IGP volume is in the enterprise market where GPUs are mostly useless.

Ok, that all seems reasonable to me. I'll concede that Intel's IGP really isn't used by anyone that cares at all about GPU performance.

I don't think you understood my point. There is no magical reason for Intel to be able to compete in the mid-range but not in the high-end. You need to lose your CPU mindset: that's a very normal thing to happen there because you can't just double your IPC by having twice as many units.

I agree that scaling a GPU from mid-range to high-end is easier than for a single-core CPU. Things are changing in the CPU world with the advent of multi-core chips. If multi-core really becomes successful (in that many application use multiple cores), you could imagine that exact same scaling applying to CPUs, too. That is CPUs are becoming like GPUs in this way (as well as many others).

As for volume: what matters in the fabless world is gross profit, not volume. You amortize fabs via volume, but you amortize R&D via gross profits.

I agree that fabless design greatly reduces the volume needed. However, you still need to amortize other fixed costs (the hardware design team, the driver development, etc.). I agree that the volumes for GPUs are such that even if their volume dropped significantly, they would still be able to amortize these fix costs.

I'm kinda tired of CPU experts being completely unable to consider the graphics market dynamics properly. If you still don't agree with me, may I suggest rereading what I said until you do? :D

I think you're missing a key point: eventually (in, say, ten years), there won't be a "discrete GPU" market. Zero. Integration is *the* dominate trend in computing hardware. Either we'll have big GPUs with a little CPU in the corner (NVIDIA wins), or we'll have big multi-core CPUs with a little GPU logic (Intel wins). My prediction is that for high-end systems you'll include multiple identical copies of these GPU/CPU chips, but we won't have today's "one CPU/one GPU" divide.

Think back to the last time we have such a processor/co-processor support. It wasn't GPUs it was FPUs! The Intel 386 (and other processors of the day) couldn't fit a floating point unit on the chip. So there were separate floating-point co-processors (FPUs). Then, a few generations latter, they could fit a FPU on the CPU, and the FPUs simply went away.

When I say that Intel will take away the mid-range, I'm not talking about the "discrete GPU mid-range". I'm talking about on-chip GPUs taking more and more market share away from discrete GPUs.

Consider the following situation. What if Intel starts including 8 or 16 Larrabee cores in addition to a few bigger x86 cores on *every* chip it sells. Given a reasonable power and area budget, it seems like these chips would have good enough GPU performance. Perhaps it would have half or 1/4th the performance of a discrete GPU. If a system needs more GPU performance, the system could just include two of these chips (dual-socket motherboards are pretty common these days). ATI/AMD will likely follow a similar strategy. In such an environment, I think fewer PC purchasers will be willing to pay as much for a discrete GPU from NVIDIA. Certainly some hard-core gamers will, but not as many. I think such trends favor AMD/ATI and Intel.

Then again, perhaps the desktop market is irrelevant. Maybe the next war will be fought entirely in the mobile and game console space. In such spaces, x86 compatibility is mostly irrelevant. Perhaps some new upstart will come along and push Intel off its thrown by taking over the embedded space. Perhaps this has already happened...

Of course, these markets are even more cost conscious than desktop systems. In many of these systems there isn't even room for two chips (example: Apple's new MacBook Air that uses a special chip from Intel with a smaller package). To push down costs in the game console market, they will also be looking for more integrated solutions. This all goes double for mobile gaming (PSP, gameboy).

Integration is an unstoppable force.

Also, I'm still not sure you fully understand what DirectX does. You DO realize it's just an abstraction layer to the hardware, and as far from a renderer as can be, right?

Umm.. no, I really didn't know that. As you've castigated me for, I really am a multi-CPU guy trying to learn something about GPUs. GPUs and CPUs are on a collision course. I'm trying to figure out how that might play out. As GPUs and CPUs seem to be on this collision course, I think that CPUs guys (such as myself) and GPUs guys (such as those on this board) can learn a lot from each other.

As for fine-grain synchronization, certainly that won't work in DirectX - but depending on your definition, that's supported in CUDA in a perfectly good way.

I know a bit about CUDA, but my impression was that thread-to-thread synchronization (such as using locks) was still significantly more expensive than it would be on a multi-core CPU. Is that not the case? I do know that the GPUs have some support for fast barriers that coordinate many threads, which gives GPUs the edge on that sort of thing. But I was under the impression that even finer grain synchronization was expensive (or even really hard to express). Of course, graphics hardware is changing quickly enough, this might no longer be the case.
 
DirectX 10.1, DirectX 11 and xbit labs article

I read something from xbit labs that I think helps clarify some of the pro/cons of the Larrabee approach.

From an article titled Intel Promises to Sample Larrabee Processors in Late 2008:

"If Intel’s discrete graphics processors emerge in late 2009 or early 2010, then it is unclear whether these GPUs support DirectX 10.1 feature-set, or are made according to Microsoft DirectX 11 specification."

As Larrabee is just general purpose x86 cores, the hardware will support whatever version of DirectX. As such, I'm not sure the above quote makes any sense for Larrabee. Intel can (and will) just update the software portion of its drivers or whatever and that will be that. Sure, you could imagine that some of the features of DX11 might be more efficient in later generations of Larrabee hardware, but it would likely run pretty well on the first-generation of Larrabee, too. If Microsoft comes out with some new Physics acceleration standard (as they have been rumored to do), Intel will just write the software and Larrabee will likely do just fine.
 
Then again, perhaps the desktop market is irrelevant. Maybe the next war will be fought entirely in the mobile and game console space. In such spaces, x86 compatibility is mostly irrelevant. Perhaps some new upstart will come along and push Intel off its thrown by taking over the embedded space.
As far as I can see, x86 compatibility is very-very convenient for many. Intel, AMD and convenience are really backing x86 and with enough thrust, everything can fly ;) - can you count how many times x86 was declared soon-to-be-dead and RISC/MIPS/IA64 would obviously win ;).

Of course, better tools can mask the differencies of multiple platforms (Peakstream-type middleware, that could compile to AMD/Intel/GPU/Cell/Larrabee/Fusion/whathaveyou). But there is really tremendous infrastructure around x86. As you said youself - even if for Intel the Larrabee was a (yet another ;))"clean beginning with future in mind", yet they didn't drop x86 because of tools and infrastructure. Maybe they learned their lesson with Itanium :cool:?

Edit:
I really believe in x86 compatibility: with fast evolving platforms there really is a need for "stable API" to provide some kind of upgrade path: it really is not cool when after the upgrade you lose your programs. Take the latest consoles, for example: wouldn't you love when upgrading the console would be like upgrading the PC/Linux/Windows, where most things just work? How long it took when XBox/PS3/IPod had reasonable amount of stable connectivity and programs for them?

If you target Windows, you target very large consumerspace and you can hope your app works even after 10 years (Win98 apps in Vista, this year's software in Win2000). If you target Blackberry/NGage, you target only the owners of this generation of devices.

We'll soon have Cell2, Larrabee, Fusion algong with regular CPUs. All these are evolving platforms, where every 2 years there will be something new. If you were the developer or user, how much would you pay (in % of performance, perhaps) for backward-forward compatibility?
 
Last edited by a moderator:
While I wouldn't strictly exclude that possibility, you seem to be taking a controversial position just for the sake of it, instead of actually considering what would get us there.

What defines the mid-range is not absolute performance, but relative performance compared to the high-end. What makes a company/architecture able to compete in the mid-range is its performance/dollar. And if you can compete in performance/dollar, let us not forget that graphics is a wonderfully scalable task, so that means you can also compete in the high-end.

Anyone who claims Intel could 'squeeze NVIDIA into a niche end-end role' needs to seriously reconsider the math behind his arguement, and you're certainly not the only one out there with that absolutely senseless arguement. It's simple, really: either Intel beats NVIDIA/AMD everywhere, or they beat them nowhere.

NVIDIA's current market cap is $15B. Why should they consider refocusing on a market that's probably at least two orders of magnitude smaller than their current ones? Given the ambitions of those at the head of the company, I would be more willing to consider bankruptcy than that. And honestly, if anyone told me NVIDIA was going bankrupt in five years right now, I'd laugh him off.

I think it's rather... original to say GPUs have forced game developers to think about computation in a specific way. It would be much more accurate to say that the last several decades of graphics research have forced GPU engineers to architect GPUs the way they are. Ironically, those algorithms were often developed with CPUs in mind rather than SIMD machines.

But there definitely are many very interesting algorithms that won't work on semi-traditional SIMD machines like GPUs. Much of these have been known for a very long time, while many others are still being discovered; and if hardware supporting it at much higher speeds became mainstream, then I'm sure the state of the art would advance even quicker there.

Either way, what makes Larrabee really interesting is MIMD and SIMD on the same chip. The fact it's on the same core doesn't even really matter; it has advantages, but it also has disadvantages. I don't know exactly what NV and AMD's roadmaps are, but what I recently realized is that I'd be very surprised if neither had any support whatsoever for scalar MIMD processing in the 2010/DX11 timeframe. However, if it turned out that neither will, then I'd certainly have to take Larrabee even more seriously than I am right now.

EDIT: Just making sure this post isn't deemed overly aggressive - it certainly wasn't meant that way! :)

Arun, some points:

1. What defines mid-range versus high-end versus low-end is not performance, but price. Price is usually closely related to relative performance (which is what you claimed), but not always. For instance, x86 was always low-end, while RISC was always high-end (in terms of price). However, after a while, x86 ended up with superior performance due to the advantage of having 10X the volume.

For instance, integrated graphics which costs an OEM $1-6 extra on BOM is always going to be low-end, which in turn defines where the mid-range and high-end must be to extract a premium (admittedly software requirements also play into this as well)

2. It's very easy for me to see Intel forcing NVIDIA out of the low-end GPU market for Intel based systems. It's conceivable, but not quite as easy to see AMD doing the same thing for AMD based systems. In general, NVIDIA is least attractive at the low-end relative to the competition who can integrate their GPUs into the CPU package (or eventually the die - once that happens it's really game over at the low-end).

3. The low-end of the graphics market will continue to get exponentially more powerful, as will the high-end. The only question is how much will "the market's" appetite for performance grow. It's quite conceivable that 5-10 years from now, nobody has discrete GPUs except for gamers. That would be a huge problem for NVIDIA because it would reduce their volumes.

4. I can easily see Intel beating out NV at the low-end, but losing in the mid-range and high-end.

DK
 
Yes, memory ordering models is probably these most confusing thing in processor design.



Intel recently released a document that updates the official x86 memory ordering model. This newest model is more strict than what they have said in the past. The new document explicitly states that load/load re-ordering in x86 is *disallowed*. This really helps clarify the memory ordering rules of x86:


So, x86 does allow some memory order relaxations, but not ones that commonly cause problems.



Yes, this is quite a problem. Java now has a formal memory model defined that says explicitly what the compiler is and is not allowed to do. C++ is working on a specification, but it isn't official yet. They both boil down to the compiler generally won't re-order instructions across lock acquires and lock releases. The really tricky cases is what happens without strict locking (say, you forget a lock or you're doing lock-free synchronization), that is where everything becomes really ugly in a hurry. For these cases Java (and C++, I suppose), uses reads and writes "volatile" variables as partial memory ordering points. For example, I think some of the rules would prevent re-ordering of two access to volatile variables. The language-level issues of memory ordering are really complicated, so take anything I've said above with a grain of salt (I'm much more familiar with the hardware-level memory ordering models).

The real problem is that x86 doesn't have a memory ordering model per se - it's really more like an accretion of implementation decisions that someone managed to coalesce together into one big blob.

That and there isn't a single textbook out there that I've seen which adequately addresses MOMs. H&P does a horrible job describing consistency and that's just for caches...not I/O, etc.

In general, the MOM for x86 was described to me as: "Everything had better behave like a 486", by Mike Haertel, who is definitely one of the x86 encyclopedias out there.

I'd point out that x86's MOM gives you a lot of what you want, see this thread for more details:
http://www.realworldtech.com/forums/index.cfm?action=detail&id=68137&threadid=67433&roomid=2
http://www.realworldtech.com/forums/index.cfm?action=detail&id=68194&threadid=67433&roomid=2

As Brannon points out, there is a real cost to having a more software friendly MOM. You need to use CAMs to hold every store in a system in x86, instead of much more area and power efficient SRAMs. That being said, I haven't designed an x86, so I don't know exactly what the penalty is relative to say, IA64.

Anyway, while I'm tossing gasoline on the fire...

There's a 3rd option out there, which is transaction memory, and appears to be 'the sexy topic' in research at the moment. That being said, I'm not sure the research guys have found a solid application of TM which is a win.

DK
 
No, but I know most of those guys. These days they all seem to work for Google or D.E. Shaw's Protein Folding Supercomputing group. :) Not many of them went into academia.

I'd like to stay somewhat anonymous, so let's just say that I graduated with my PhD in the last ten years from one of the big-10 schools known for computer engineering (e.g., U of Illinois, U of Wisconsin, U of Michigan, etc.). I'm now a professor at a top-20 computer science department on the East coast.

I have to admit, I don't know the top 10 CE schools for sure, but I'd guess that means:

Stanford, MIT, Berkeley, UIUC, UT-Austin, UWisconsin-Madison, UWashington, UMichigan, perhaps Harvard and Princeton.

The real problem is that most faculty don't list a whole lot about their dissertations, since generally you are judged based on your students work.

David
 
Ok, that all seems reasonable to me. I'll concede that Intel's IGP really isn't used by anyone that cares at all about GPU performance.

But that's a growing percentage of the market. I remember when EVERYONE needed a discrete 2D card (NVIDIA TNT anyone?), and some had a 2D plus a 3D card (thanks for that idea Voodoo!).

Ultimately, the folks who need discrete GPUs are gamers. Right now, there's an uptick b/c of Vista.

I agree that scaling a GPU from mid-range to high-end is easier than for a single-core CPU. Things are changing in the CPU world with the advent of multi-core chips. If multi-core really becomes successful (in that many application use multiple cores), you could imagine that exact same scaling applying to CPUs, too. That is CPUs are becoming like GPUs in this way (as well as many others).

I agree that fabless design greatly reduces the volume needed. However, you still need to amortize other fixed costs (the hardware design team, the driver development, etc.). I agree that the volumes for GPUs are such that even if their volume dropped significantly, they would still be able to amortize these fix costs.

It's worse than that. Volume has a huge positive impact on binning. Intel can afford to have a bin that is +30-50% nominal frequency and still have enough parts to sell in reasonable volumes. NVIDIA and ATI kind of do that as well, which is why it would be hard to start up a graphics company today.

Intel currently doesn't bin integrated GPUs, but perhaps they might in the future...

I think you're missing a key point: eventually (in, say, ten years), there won't be a "discrete GPU" market. Zero. Integration is *the* dominate trend in computing hardware. Either we'll have big GPUs with a little CPU in the corner (NVIDIA wins), or we'll have big multi-core CPUs with a little GPU logic (Intel wins). My prediction is that for high-end systems you'll include multiple identical copies of these GPU/CPU chips, but we won't have today's "one CPU/one GPU" divide.

Think back to the last time we have such a processor/co-processor support. It wasn't GPUs it was FPUs! The Intel 386 (and other processors of the day) couldn't fit a floating point unit on the chip. So there were separate floating-point co-processors (FPUs). Then, a few generations latter, they could fit a FPU on the CPU, and the FPUs simply went away.

The one problem that I could see is heat. 3D packaging lets you have separate dice specialized (one analog, one logic, one DRAM). But mid-range GPUs put out as much power as CPUs, and that forces separation.

When I say that Intel will take away the mid-range, I'm not talking about the "discrete GPU mid-range". I'm talking about on-chip GPUs taking more and more market share away from discrete GPUs.

Consider the following situation. What if Intel starts including 8 or 16 Larrabee cores in addition to a few bigger x86 cores on *every* chip it sells. Given a reasonable power and area budget, it seems like these chips would have good enough GPU performance. Perhaps it would have half or 1/4th the performance of a discrete GPU. If a system needs more GPU performance, the system could just include two of these chips (dual-socket motherboards are pretty common these days). ATI/AMD will likely follow a similar strategy. In such an environment, I think fewer PC purchasers will be willing to pay as much for a discrete GPU from NVIDIA. Certainly some hard-core gamers will, but not as many. I think such trends favor AMD/ATI and Intel.

Such trends favor folks with CPUs. NVIDIA could probably pick one up cheaply if they wanted to.

Then again, perhaps the desktop market is irrelevant. Maybe the next war will be fought entirely in the mobile and game console space. In such spaces, x86 compatibility is mostly irrelevant. Perhaps some new upstart will come along and push Intel off its thrown by taking over the embedded space. Perhaps this has already happened...

Of course, these markets are even more cost conscious than desktop systems. In many of these systems there isn't even room for two chips (example: Apple's new MacBook Air that uses a special chip from Intel with a smaller package). To push down costs in the game console market, they will also be looking for more integrated solutions. This all goes double for mobile gaming (PSP, gameboy).

Integration is an unstoppable force.

I generally agree, but as you pointed out...there are many ways to integrate and the folks that come from below are also an issue. As you said, cell phones may be the way forward.

DK
 
Either way, Intel's chipsets fight in an ultra-low-end 'commodity' section of the market where performance doesn't really matter. That might not be completely true of their next-gen chipset coming out in Q2, the G45. We'll see how that one goes and how much market share it obtains in the consumer market; call me highly skeptical for now, though, especially given at least one non-design-win I'm privy to.
I tend to share your skepticism towards future Intel offerings. Compared to discrete offerings IGPs will always have a fraction of the bandwidth and will share it with more and more CPU cores if the current trends continue. Especially when many techniques used in modern games tend to be very heavy on bandwidth.
 
I think you're missing a key point: eventually (in, say, ten years), there won't be a "discrete GPU" market. Zero. Integration is *the* dominate trend in computing hardware. Either we'll have big GPUs with a little CPU in the corner (NVIDIA wins), or we'll have big multi-core CPUs with a little GPU logic (Intel wins). My prediction is that for high-end systems you'll include multiple identical copies of these GPU/CPU chips, but we won't have today's "one CPU/one GPU" divide.
Think back to the last time we have such a processor/co-processor support. It wasn't GPUs it was FPUs! The Intel 386 (and other processors of the day) couldn't fit a floating point unit on the chip. So there were separate floating-point co-processors (FPUs). Then, a few generations latter, they could fit a FPU on the CPU, and the FPUs simply went away.
GPUs are far more complex systems than an FPU. I would be very hesistant to write off Nvidia. For example, a while back, when I wrote code for graphics and drivers in a popular OS, several coworkers predicted that the latest Intel CPU and its memory interface were so fast that it would mean the end of discrete graphics companies like ATI. I brought up a couple of points at the time:

1) An optimized system will always outperform a more generic one

2) ATI's survival depends on discrete systems outperforming generic CPU-based solutions

What happened? Well, the year was 1992 or 1993 and the blazingly fast new CPU from Intel was the original Pentium-60! :oops:

I think my points apply even when a CPU company outright buys a GPU company. The CPU company will eventually merge the GPU guts into their CPU and/or support chips and the result will be an underperforming mess. I hope that Nvidia stays independent and continues to produce high-performance graphics hardware.

Mike
 
There's a 3rd option out there, which is transaction memory, and appears to be 'the sexy topic' in research at the moment. That being said, I'm not sure the research guys have found a solid application of TM which is a win.
Admittedly I haven't followed transactional memory that much (maybe even not as much as I should have), but I've always heard that performance tanks once you have nested transactions. Has anyone managed to reasonably deal with this problem yet?

(also, it's damn good to see David Kanter and Matt Pharr posting!)
 
3. The low-end of the graphics market will continue to get exponentially more powerful, as will the high-end. The only question is how much will "the market's" appetite for performance grow. It's quite conceivable that 5-10 years from now, nobody has discrete GPUs except for gamers. That would be a huge problem for NVIDIA because it would reduce their volumes.
Well, that's certainly the endgame of the whole GPGPU thing--making high-end GPUs attractive to more than just gamers and people who need data visualization. I've always thought that the consumer killer app is image processing, because once you can say "Photoshop is 32x faster with a Gxxx," you start selling a whole lot of cards. Of course, once your CPU has 16 or 32 cores it's hard to really estimate how much faster a GPU will be, but assuming you have truly parallel algorithms (e.g., not task-based parallelism, but true data parallelism and language constructs that allow you to take advantage of that), a GPU in that same timeframe should be faster. The real question here is, how programmable will GPUs be?

Of course, I like where Ct is going with that, even if I think CUDA and Brook+ will probably beat them to the nested data parallelism on shipping hardware part.
 
Admittedly I haven't followed transactional memory that much (maybe even not as much as I should have), but I've always heard that performance tanks once you have a transaction inside a transaction. Has anyone managed to reasonably deal with this problem yet?

(also, it's damn good to see David Kanter and Matt Pharr posting!)

Heh, I'd humbly suggest that Matt Pharr and I should be listed in different categories. I have a good understanding of CPU architecture, but I hardly design them : )

I appreciate your warm welcome though - thanks!

Performance for transactional memory is a very tricky subject and unfortunately, there are so many groups doing research in that area that it's very hard to figure out what is realistic. The two leading groups are at UW-Madison and Stanford though and my understanding is that their models are relatively similar, they just differ in terms of how aggressive they are about assuming whether transactions will collide or not.

One interesting point is that most TM tends to try to make easier programming, not necessarily for higher performance.

DK
 
I know a bit about CUDA, but my impression was that thread-to-thread synchronization (such as using locks) was still significantly more expensive than it would be on a multi-core CPU. Is that not the case? I do know that the GPUs have some support for fast barriers that coordinate many threads, which gives GPUs the edge on that sort of thing. But I was under the impression that even finer grain synchronization was expensive (or even really hard to express). Of course, graphics hardware is changing quickly enough, this might no longer be the case.
Global synchronization in CUDA (which is probably what you mean by thread-to-thread synchronization) is incredibly expensive using traditional compare-and-swap and such (assuming you support the Compute 1.1 spec, which added atomic instructions for global memory). Local synchronization can use shared memory, which is infinitely faster; however, like you said, it's very hard to express that effectively.

(side note: thinking of threads in the CUDA sense is tricky and seems a bit counterproductive to me. I've found it a lot easier to visualize a CUDA thread warp as a thread that operates on a vector, and each CUDA thread as an element in the vector. The problem there is that blocks don't have an easy analogy!)

As I mentioned a few posts ago, I don't think this will be the case for too long as evidenced by the more interesting parallel languages that have emerged in the past ten or fifteen years (NESL, X10, Cilk). The increasing flexibility of D3D will drive flexibility in GPGPU languages (or potentially vice-versa), and I think we'll see more than just the relatively restricted programming paradigms available now from graphics IHVs by the time Larrabee ships.

Also, thanks for the info, dkanter--it's definitely something I need to look into more rigorously. The most I see about transactional memory is what I see on fliers for upcoming lectures, which invariably conflict with some class...

P.P.S. -- this is my favorite thread in a very long time. :)
 
1. What defines mid-range versus high-end versus low-end is not performance, but price.
I think we agree, that's just semantics. What defines the market segment once the product is released is its price; but if you want to actually make money, what determines its price is performance. The reason I looked at it that way is that AP was suggesting Intel would squeeze NVIDIA out of the mid-range; my arguement was that this can only happen if Intel is competitive on performance-per-$, because otherwise their parts simply won't sell.

2. It's very easy for me to see Intel forcing NVIDIA out of the low-end GPU market for Intel based systems. It's conceivable, but not quite as easy to see AMD doing the same thing for AMD based systems. In general, NVIDIA is least attractive at the low-end relative to the competition who can integrate their GPUs into the CPU package (or eventually the die - once that happens it's really game over at the low-end).
Sorry to be the Devil's Advocate, but both Intel and AMD's Fusion-like graphics products in 2009 are 3 chips/2 packages architectures. NVIDIA has been 2 chips/2 packages (including the CPU) since Q3 2006 for AMD and Q3 2007 for Intel.

The only point to doing it this way is to save about ~1 dollar and >=1 watt on the bus. That feels like a very backwards way to handle the problem given some of the very interesting advances in bus technology in the last few years. Actually, there's another slightly more cynical reason in Intel's case: they'd probably like to manufacture southbridges at Dalian/China on an old process instead.

EDIT: Of course, in the longer-term, I agree on-CPU-chip integration makes the most sense. But that's not what either of those companies will do in 2009 apparently, let alone 2008. Well, unless AMD is being more aggressive than their latest public roadmaps imply, that'd be interesting...

The low-end of the graphics market will continue to get exponentially more powerful, as will the high-end. The only question is how much will "the market's" appetite for performance grow. It's quite conceivable that 5-10 years from now, nobody has discrete GPUs except for gamers. That would be a huge problem for NVIDIA because it would reduce their volumes.
But volume doesn't really matter. Right now, the gross profit in the >=$150 segment of the market is significantly higher than in the <$150 segment of the market. I only expect this to become more true over time as other components become less expensive for a mid-range gaming PC (CPU, DRAM, Chipset, etc.)

That's exactly the dynamic that I suspect happened in 1H07 when NVIDIA surprised everyone with incredibly strong earnings despite weak seasonality. DRAM prices crashed and CPU prices kept getting lower, so for a given segment of the market the GPU's ASPs went up. Of course, I could be wrong here! :)

4. I can easily see Intel beating out NV at the low-end, but losing in the mid-range and high-end.
It's not strictly unimaginable, but I'm very skeptical of that. As I said, it's all about performance-per-dollar (and per-watt) both at the high-end *and* the low-end. If they can't win in the high-end, then the only place where they could still win is in the commodity 'performance doesn't matter' market. And honestly, I don't think Jen-Hsun cares:
Jen-Hsun in November 2006 said:
And the market that we will serve, the market we will continue to serve is where we believe we can capture the value from the products that we sell. You know that we're not fanatical about revenues; we're fanatical about profits. We don't need to go chase profitless prosperity. We're just not going to do it. We are going to target the segments of the marketplace where our work and our brand is valued. And we will leave the other segments to other people.
 
Back
Top