Larrabee and Intel's acquisition of Neoptica

Silent_Buddha · Nov 29, 2007

santyhammer said:
I think Intel hired those guys ( and the Havok ones http://www.fool.com/investing/value/2007/09/17/intel-wreaks-havok.aspx and the Ct laguage team) to make the Larrabee's API.

Perhaps gonna be DX and OGL compatible + use that API for things not yet implemented in DX ( for example, raytracing or physics )... although I would prefer just to use Ct to program my own raytracing or physics.

One thing is clear... Intel is moving chess figurines before launching the serious attack!

Hmmm, wasn't there rumors about a year or so ago that MS was possibly looking at licensing Havoks GPU physics as the basis of a possible Direct Physics in DX 11 or beyond?

IF...big IF. But IF MS was indeed serious about something like that. Intel buying out Havok gets them quite a few things. Licensing/Leverage with MS for one. A leg up on those parts of DX whatever that might include Direct Physics. After all while the specs would eventually have to be opened up to anyone interested in making DX Physics compatible hardware, Intel would have quite a leg up having had access to and research in it for all this time.

Intel already is successful in the integrated market. However if they want to make a splash in the consumer market they would presumably have to have good gaming performance, which at this time would mean a well performing DX-whatever card.

Then again as some have speculated above, it's looking more and more likely that this is a move to strengthen a future Larrabee that might be aimed at the HPC market by weakening the competition through removal of key avenue's of penetration/implementation. Edit - and that's just the type of behavior I fear. As while competition is good for the consumer, this could be a pro-active move to limit or remove competition in future products, thus trapping the consumer into only having one viable source...Intel.

Regards,
SB

Demirug · Nov 29, 2007

I would not expect “DirectPhysic” anytime sone if it would ever happen. It is to hard to make a general API for this field. The biggest problem with physic today is the performance and to solve it we simply need more math power. Something that GPUs already can offer. But using GPUs for this would cause another problem. We need a low latency way to send data for and back to the GPU. No one want’s to wait multiple frames for the physic to response. Therefore I am more for a general DirectX extension that allows using the math power of additional chips (like GPUS) or the SSE units in a general way.

ShaidarHaran · Nov 29, 2007

Demirug said:
I would not expect “DirectPhysic” anytime sone if it would ever happen. It is to hard to make a general API for this field. The biggest problem with physic today is the performance and to solve it we simply need more math power. Something that GPUs already can offer. But using GPUs for this would cause another problem. We need a low latency way to send data for and back to the GPU. No one want’s to wait multiple frames for the physic to response. Therefore I am more for a general DirectX extension that allows using the math power of additional chips (like GPUS) or the SSE units in a general way.

There were job postings @ MS awhile back which specifically mentioned the term "Direct Physics" (it was in quotes too, so name not decided yet).

3dilettante · Nov 29, 2007

MfA said:
Compared to the pipeline depth of ye average GPU how is that relevant?

I'm not sure, which is why I'm wondering.

Let's assume PCI-E is integrated on the CPU.
That takes out the latency of hopping over the northbridge to get the CPU.

I haven't found much about hard numbers on latency over PCI-E.
This link expounding on how Hypertransport is a superior interconnect is the only one I could find.

http://www.hypertransport.org/docs/...rTransport_PCIe_in_Communications_Systems.pdf

For an 8 byte read request, PCI-E 1.0 would take (not counting the latency of the DRAM's response in this test) about 220 ns.
For 2KiB and large packets, it's about 830 ns for the total back and forth, not counting the latency of the device sitting between request and response.

Whether this is acceptable really depends on how intensely linked the GPU and CPU sides of the equation are.

The latency doesn't matter so much when traffic is mostly one-way, but the text hints at more exotic algorithms that would either hit bandwidth or latency walls.

Data dependencies might be workable if they are very rare and very small.
Any control dependencies, if they ever arise between GPU and CPU with more complex algorithms, would require running multiple contexts to overlay execution.
The GPU's hardware pipeline itself can't really hide that latency.

Demirug · Nov 29, 2007

ShaidarHaran said:
There were job postings @ MS awhile back which specifically mentioned the term "Direct Physics" (it was in quotes too, so name not decided yet).

As DirectX MVP I know about this job positing. The description talked about using GPUs for physic.

But this doesn’t change my view on this project. Limiting the usage of the math power to physic jobs only would not work in the end.

neliz · Nov 29, 2007

http://www.mdronline.com/watch/watch_Issue.asp?Volname=Issue+#112607&on=1#item2

(courtesy of L'Inq)

Nick · Nov 30, 2007

Demirug said:
The biggest problem with physic today is the performance...

The biggest problem with physics today is that it's a hype and everyone thinks CPU performance is the problem.

Recent games like Crysis and Unreal Tournament 3 have almost exactly the same amount of physics as Far Cry and Unreal Tournament 2004. It's simply gameplay bound, and only for extreme synthetical physics heavy tech demos (read: CellFactor) might the CPU performance become a problem. But it's not like ten times more performance is the solution for real games, at least not for long...

...and to solve it we simply need more math power. Something that GPUs already can offer.

A 3 GHz Core 2 Quad has more GFLOPS than a GeForce 8600's shaders, without requiring a trip over the PCIe bus.

CPU's are just highly underestimated, and horribly abused. The same people that use an interpreted scripting language for their games are the ones that call the CPU too slow for physics. I've done a bit of profiling on modern physics engines, and while I expected the bottlenecks to be lean SSE code it was often old x87 code with incredibly slow square roots and divisions.

In my opinion the "physics problem" just solves itself if CPU's keep scaling the way they do and physics engines are properly optimized (Intel's aquisition of Havok can only be a good thing). I also think Larrabee is just an experiment for future generations of CPU's. More cores, but simpler ones. Larrabee has to prove whether or not lower single-thread performance is an option. All Intel wants to do is determine what has to end up in future generation CPU's they can sell to the masses. And if they have an intermediate product they can sell to the HPC market, they have nothing to lose. In this light AMD's strategy of buying ATI might even be more brilliant.

Intel has roadmaps with server chips up to 32 cores in 2010, a strategy that needs no question, but for desktop and mobile chips the decision is much harder. Nehalem's 4-core architecture with Hyper-Threading is in my eyes yet another experiment to see how many threads software developers can put to work. The decisions Intel has to make have a gigantic impact on what computers will look like the next decade...

ShaidarHaran · Nov 30, 2007

Nick said:
The biggest problem with physics today is that it's a hype and everyone thinks CPU performance is the problem.

Recent games like Crysis and Unreal Tournament 3 have almost exactly the same amount of physics as Far Cry and Unreal Tournament 2004. It's simply gameplay bound, and only for extreme synthetical physics heavy tech demos (read: CellFactor) might the CPU performance become a problem. But it's not like ten times more performance is the solution for real games, at least not for long...

Farcry->Crysis has a readily apparent increase in physics effects. How can you say they "have the same amount"? The sheer number of object interactions and destructability of the environments alone makes that obvious to anyone that's ever played both games, or even just seen a video of them for that matter.

Nick said:
A 3 GHz Core 2 Quad has more GFLOPS than a GeForce 8600's shaders, without requiring a trip over the PCIe bus.

Remind us again what the price difference is between those two components, and which the average machine is more likely to have.

compres · Nov 30, 2007

Nick said:
CPU's are just highly underestimated, and horribly abused. The same people that use an interpreted scripting language for their games are the ones that call the CPU too slow for physics. I've done a bit of profiling on modern physics engines, and while I expected the bottlenecks to be lean SSE code it was often old x87 code with incredibly slow square roots and divisions.

What a wonderful post. I confess I am not sure about the FLOP counts in your post, but with this paragraph in particular I agree so much.

Geo · Nov 30, 2007

The reek of geek musk and the ringing of nerd antlers clashing in here is getting a bit much. Dial it back a notch, fellas.

Mark · Nov 30, 2007

Geo said:
The reek of geek musk and the ringing of nerd antlers clashing in here is getting a bit much. Dial it back a notch, fellas.

See, Geo, you really are more personable than I am.

Arun · Nov 30, 2007

compres said:
What a wonderful post. I confess I am not sure about the FLOP counts in your post, but with this paragraph in particular I agree so much.

Conroe has 8 SP flops/cycle per-core, so that's 96 GFlops total. The 8600 GT, on the other hand, has about 76GFlops without the MUL and 113GFlops with it. For physics, on the G84 specifically, it would be fair not to count it at all, which is what I presume Nick did.

Of course, it is noteworthy that two Conroe cares represent 2x143mm² (=286mm²) on 65nm, while G84 is about 170mm² on 80nm. So you might say the G84's perf/mm² (on the same process) would be about twice that of the Core 2 Quad, and that's before considering Intel's process uses more advanced (and costly!) materials, increasing the cost per mm².

Of course, GFlops figures aren't everything, and for physics workloads you would probably expect the CPU to be at an efficiency advantage. However, I think if you look at a roadmap for Sandy Bridge (aka Gesher/new 32nm microarch), you'll see it likely won't go up to much more than 500GFlops or so in the desktop market (16 flops/core, 8 cores, 4GHz+). Suddenly, this will look a lot less impressive even compared to a mid-range GPU...

ShaidarHaran · Dec 1, 2007

Arun said:
Of course, GFlops figures aren't everything, and for physics workloads you would probably expect the CPU to be at an efficiency advantage. However, I think if you look at a roadmap for Sandy Bridge (aka Gesher/new 32nm microarch), you'll see it likely won't go up to much more than 500GFlops or so in the desktop market (16 flops/core, 8 cores, 4GHz+). Suddenly, this will look a lot less impressive even compared to a mid-range GPU...

Last I heard Sandy Bridge was scheduled to hit around 200 DP GFLOPs. Although, I guess with Conroe/Penryn (quad) hitting near 100 GFLOPs and Nehalem doubling the core count, 200 GFLOPs for Sandy Bridge does seem a bit low.

Nick · Dec 1, 2007

ShaidarHaran said:
Farcry->Crysis has a readily apparent increase in physics effects. How can you say they "have the same amount"? The sheer number of object interactions and destructability of the environments alone makes that obvious to anyone that's ever played both games, or even just seen a video of them for that matter.

The same amount in relative terms, not absolute terms. It just scales slowly the way it has always scaled. In my eyes there is no real sudden need for more physics processing. Physics is so much a hype that game developers actively look for additional physics to cram in. You can stretch it a certain amount, but a snapping leaf is not a falling tree and a hand grenade is not an A-bomb.

Remind us again what the price difference is between those two components, and which the average machine is more likely to have.

I picked these parts to show that CPU's are not that extremely far behind when it comes to programmable floating-point performance. I'm not even sure AGEIA's chip has this amount of GFLOPS (530 million sphere-sphere collisions per second is not that impressive). CPU's are very slow compared to GPU's when it comes to graphics. This is mainly because of the fully pipelined texture samplers. But I don't know of any specialised component that would speed up physics by a large amount. It mainly needs multiplications and additions and ever since CPU's have gone multi-core they're not at any significant disadvantage.

Anyway, CPU prices are strongly exponential. So you can get an ever so slightly slower Q6600 for 280 US$ or less. Also, you need a CPU anyway, so instead of buying say a E6600 and two 8600's you could buy a Q6600 and one 8600 for roughly the same money. And this way you also accelerate games that are not that physics heavy but for instance A.I. heavy (using an interpreted scripting language)...

But feel free to disagree. This is just my personal opinion, at the moment. I believe that dedicated physics has no long-term future, and Larrabee will determine what future CPU's will look like.

Nick · Dec 1, 2007

Arun said:
Of course, it is noteworthy that two Conroe cares represent 2x143mm² (=286mm²) on 65nm, while G84 is about 170mm² on 80nm. So you might say the G84's perf/mm² (on the same process) would be about twice that of the Core 2 Quad, and that's before considering Intel's process uses more advanced (and costly!) materials, increasing the cost per mm².

Yes, it's an unfair comparison technology-wise. But this is reality. Intel will always have an advantage there. From a higher perspective that's simply because there is more need to have the fastest possible general-purpose processors.

Perf/trans also appears to be going up for CPU's, while for GPU's it's stagnating or even lowering. Core 2 doubled the SIMD execution unit width, at a relatively small transistor cost. And if Intel chooses the path of 'more cores but simpler ones' then theoretical perf/trans is going to go up further. Larrabee's in-order cores achieve exactly that. Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well. So whereas CPU's have pleny of options to catch up with Moore's law, GPU's are bumping into non-architectural limitations like heat dissipation.

nAo · Dec 1, 2007

Nick said:
Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well.

What do you mean by that? An additional thread should give you more opportunities to trash a cache

hoho · Dec 1, 2007

nAo said:
What do you mean by that? An additional thread should give you more opportunities to trash a cache

In the case of cache miss CPU can simply switch over to another thread without too big performance hit. Of course cache trashing will increase needed memory bandwidth but I guess 3-channel DDR3 with IMC helps at least a bit

Arun · Dec 1, 2007

Nick said:
Yes, it's an unfair comparison technology-wise. But this is reality. Intel will always have an advantage there. From a higher perspective that's simply because there is more need to have the fastest possible general-purpose processors.

And you expect this to remain true... how long? I'm on a 8800GTX here, and I bought an E4300 with the hope of overclocking it massively. For a variety of reasons it didn't, so it's often at stock (northbridge voltage for OC is ridiculous) or less than 40% higher. And if anything, I'm shocked by the fact I practically don't ever feel the need for anything more, even in games; when the CPU really is too slow, it's likely my 8800GTX is too.

Perf/trans also appears to be going up for CPU's, while for GPU's it's stagnating or even lowering.

I think you're in for a very tough reality check in the next 2 years... As I said, Intel's Sandy Bridge in late 2010 will likely come as an 8-cores chip maximum in the desktop market. That represents 500 SP GFlops or 250 DP GFlops.

We'll see GPUs reaching 2TFlops on 45nm in a single-chip configuration, and likely for less than Sandy Bridge will come out at. I suspect this will be in late 2008, but I don't know NV and ATI's roadmaps enough to tell you that. Anyway, given TSMC's plans, I'd also expect the first 32nm GPUs at around the same time as Sandy Bridge or very slightly later, and you're likely talking 5+ TFlops there. That's an order of magnitude for roughly the same segment of the market...

Larrabee's in-order cores achieve exactly that. Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well. So whereas CPU's have pleny of options to catch up with Moore's law, GPU's are bumping into non-architectural limitations like heat dissipation.

Oh, because CPUs are *not* bumping into heat dissipation limits, perhaps? I'd argue the PC architecture makes GPUs a more viable target for 200W+ TDPs anyway, since they're a discrete PCB and have plenty of place for cooling. However, it obviously remains a limitation, both marketing-wise and technologically.

Realistically though, the only reason why GPUs are so hot is because perf/mm² remains very important. If you were willing to sacrifice a bit more of perf/mm² in exchange for perf/watt, there should be no problem whatsoever creating a GPU with much lower wattage for a level of performance. If heat does become such a limitation, it will only be a temporary obstacle, resulting in a one-time drop in perf/mm².

As for Larrabee: it obviously has a lot of potential for math-heavy computations, and especially physics. I said as much several times in posts and news pieces - however, I'd still argue it's not a "CPU", because it's the center of the PC architecture. It is, at best, a general-purpose coprocessor with a bit of fixed-function hardware around it when the target market requires that.

nAo · Dec 1, 2007

hoho said:
In the case of cache miss CPU can simply switch over to another thread without too big performance hit.

Not really relevant, how does this lower the need for larger caches?

ShaidarHaran · Dec 1, 2007

nAo said:
Not really relevant, how does this lower the need for larger caches?

Assuming SoEMT, it's just as hoho said. In the case of a cache miss, a core can simply switch to another thread until the data for the previous thread is retrieved.

A side-effect of this is that smaller caches could theoretically be used. Otherwise, it can make a CPU core with a medium-sized cache deliver the performance of a core with a larger cache. This all assumes a multi-thread friendly environment, of course.

Larrabee and Intel's acquisition of Neoptica

Silent_Buddha

Demirug

ShaidarHaran

hardware monkey

3dilettante

Demirug

neliz

GIGABYTE Man

Nick

ShaidarHaran

hardware monkey

compres

Geo

Mostly Harmless

Mark

aka Ratchet

Arun

Unknown.

ShaidarHaran

hardware monkey

Nick

Nick

nAo

Nutella Nutellae

hoho

Arun

Unknown.

nAo

Nutella Nutellae

ShaidarHaran

hardware monkey

Similar threads