Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 29-May-2007, 03:33   #1
B3D News
Beyond3D News
 
Join Date: May 2007
Posts: 440
Default Intel on data-parallel languages and raytracing

Intel is currently researching data-parallel languages, reports EETimes. These languages would most likely be for massively parallel architectures, such as Larrabee and Terascale, putting them in direct competition with NVIDIA and AMD in the GPGPU market.

Read the full news item
B3D News is offline   Reply With Quote
Old 29-May-2007, 10:01   #2
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,929
Default

I don't want to sound like I'm bashing Intel's efforts (200 researchers? zomg, ftw, etc.) but it is interesting to note that these comments were made at the same time and by the same person who complained about the lack of effort by software makers to make their programs more parallel.

These comments were primarily aimed at large companies (such as Microsoft), but they hardly apply exclusively to these. The arguement presented is fundamentally flawed imo: "if one company doesn't do it, a competitor will"... That has been Intel's motto for a long time, and it was certainly true 10+ years ago, when Joe Consumer's computing experience was clearly limited by the CPU performance.

Nowadays, things are different. Client workloads are not very CPU limited at all. Look at Windows Vista, the latest version of Office and the couple of other apps that pretty much everyone uses. The only really mass-market apps that I can think of that might benefit from higher performance are antiviruses, and fixed-function hardware that humiliates any CPU at the task has been its debut in recent months/years. You'd expect that could eventually be integrated in chipsets and become a commodity.

I'm not arguing that a number of apps won't benefit from multithreading. They obviously will. Games will benefit massively - that's just a matter of time, and a large number of non-mass-market applications will. That doesn't justify the purchase of a octo-core CPU for Joe Consumer though, and the problem is that I fail to see not what justifies it today, but what justifies it in 5+ years when it will have become a commodity. The only interesting emerging workloads that might become more important (such as voice recognition) seem to benefit more from throughput cores (or even GPU cores) than CPU cores.

Of course, Intel must partially realize that, thus Larrabee. The big question there is its perf/cost$ (including perf/mm²), as unlike traditional CPUs it won't have any inherent advantage in terms of backwards compatibility etc. - but assuming that Intel plays its process advantage properly, and that the architecture is efficient enough... We'll see. Outside of graphics and HPC, it will also obviously depend on how fast emerging applications increase in importance, and how well they manage to benefit from data-parallel architectures. As for HPC and gaming, both perf/cost$ and perf/watt are obviously the key the factors.
__________________
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 29-May-2007, 16:08   #3
Blazkowicz
Senior Member
 
Join Date: Dec 2004
Posts: 4,923
Default

Quote:
Originally Posted by Arun Demeure View Post
I'm not arguing that a number of apps won't benefit from multithreading. They obviously will. Games will benefit massively - that's just a matter of time, and a large number of non-mass-market applications will. That doesn't justify the purchase of a octo-core CPU for Joe Consumer though, and the problem is that I fail to see not what justifies it today, but what justifies it in 5+ years when it will have become a commodity. The only interesting emerging workloads that might become more important (such as voice recognition) seem to benefit more from throughput cores (or even GPU cores) than CPU cores.
well, business as usual we might say, Joe didn't need a Pentium 4 either.
at least joe's CPU should be reasonably power efficient, with 6 or 7 idle core in a good sleeping mode and IGP/"fusion units" taking care of video playback and maybe some other throughtput things, as you say.
Blazkowicz is online now   Reply With Quote
Old 29-May-2007, 18:01   #4
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Well, I've used this example a couple times now, including in my contribution to our commentary on Carmean's presentation. . . but I think Intel is still rather bitter --and judging by this, possibly a little alarmed-- at how their hyperthreading technology was out in the world for several years and really did not gain the kind of traction that might have made a significant difference for them at a time when AMD was pretty much kicking their butt with enthusiasts/gamers.

Then along comes X2 and suddenly there are game patches that note that the game will see signifcant improvements for dual core processors *and* Intel's old HT-enabled P4s. What does that tell you? It tells me that Intel did a lousy job of evangelizing HT to what should have been their showcase ISV audience for *several years*.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 29-May-2007, 18:58   #5
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,326
Send a message via MSN to Demirug
Default

Well Intel tries very hard but at this time most developers still believes that faster cores are just around the corner. After it was clear that the free lunch is over they were in a state of shock. Multithread programming was not on the skill list of most developers and if you were not one of the lucky teams that get direct help from Intel or AMD you have to learn it the hard way by your own.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 29-May-2007, 19:20   #6
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,448
Default

Quote:
Originally Posted by Geo View Post
Then along comes X2 and suddenly there are game patches that note that the game will see signifcant improvements for dual core processors *and* Intel's old HT-enabled P4s. What does that tell you? It tells me that Intel did a lousy job of evangelizing HT to what should have been their showcase ISV audience for *several years*.
Or HT wasn't really all that great, and that using the P4 to spearhead the usage of SMT was not the best way to get people to multithread their code.

The performance gains were noticeably mixed with HT, which was a much cruder implementation of SMT than other multithreaded processors of the time.
SMT could hurt overall performance if done badly, and the P4 had a number of other architectural weaknesses that were worsened by HT, particularly in the way HT crudely divided shared buffers in half for each thread, and how multiple threads interacted with the P4's complex scheduling hardware.

HT wasn't even fully fledged until several revisions of Netburst, and the best example of Netburst HT was Prescott. That core's other issues dragged HT down as well.

Dual-core is much more reliable when it comes to maintaining single-threaded performance, and unlike the subset HT P4s, it was a market-wide shift.

Dual cores most likely reached market just in time to benefit from the initial forays with HT, and in general they could be counted on to give consistent performance gains.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 29-May-2007, 21:55   #7
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by 3dilettante View Post
Or HT wasn't really all that great, and that using the P4 to spearhead the usage of SMT was not the best way to get people to multithread their code.
The problem isn't with HT, though, is it? Isn't it more of a problem with SMT slaughtering cache coherence in general? I remember benchmarks that showed that HT should be disabled on any machine running... Apache, I think? because performance tanked when the number of cache hits decreased enormously.

I am waiting to see just what the graphics-oriented things they're working on are, though. A hybrid raytracer/rasterizer is all well and good, but it will never, ever catch on unless you have a fantastic API that is wonderful for developers to use. And then you need a killer app...
Tim Murray is offline   Reply With Quote
Old 30-May-2007, 11:58   #8
Gubbi
Senior Member
 
Join Date: Feb 2002
Posts: 2,862
Default

Quote:
Originally Posted by Tim Murray View Post
The problem isn't with HT, though, is it? Isn't it more of a problem with SMT slaughtering cache coherence in general? I remember benchmarks that showed that HT should be disabled on any machine running... Apache, I think? because performance tanked when the number of cache hits decreased enormously.
That was mostly for Northwood P4s which suffered heavily from trace cache and D$ thrashing. Prescott added a lot of measures to better support SMT: 4x bigger D$ with higher associativity, two trace caches, one for each active context, and many more registers. Of course all these measures were negated in large part by the basic performance parameters Prescott had: Twice the D$ load-to-use latency (2 cycles to 4) and the much longer pipeline and associated miss predict latency (although the branch predictor in Prescott is better).

SMT on OOO processors looks like it's dead, in essense you have to enhance your I and D caches, have a scheduler for each thread (because one big one holding all instructions in flight is simply too slow), architected registers for each thread. All these structures become slower, so you introduce pipestages (or run at a lower clock) so it impacts single thread performance, and in the end the large effort is for better utilization of the execution units which in a modern CPU takes up less than 25% (including massive SIMD FP units). Better to just replicate the entire core and get a guaranteed 2x speedup on independent threads.

Cheers
__________________
I'm pink, therefore I'm spam
Gubbi is online now   Reply With Quote
Old 31-May-2007, 13:34   #9
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,448
Default

Quote:
Originally Posted by Tim Murray View Post
The problem isn't with HT, though, is it? Isn't it more of a problem with SMT slaughtering cache coherence in general? I remember benchmarks that showed that HT should be disabled on any machine running... Apache, I think? because performance tanked when the number of cache hits decreased enormously.
SMT was used successfully in IBM's POWER5. It wasn't for every workload, but it didn't have as many adverse effects as HT did for Netburst.

IBM's method was more flexible, with software-controlled priority levels as well as hardware mechanisms that balanced overall instruction flow to keep a stalled thread from getting in the way other threads.

POWER5 was also a wider design than Netburst, it had larger caches, and it wasn't as aggressively speculative as P4.
IBM had more spare units that could be used, and it didn't fill its queues as readily with instructions that would have to be replayed.

There were a number of things that were characterized as glass jaws for the Netburst architecture. One big one was its highly speculative instruction scheduling and replay mechanism.

The design's long pipeline and emphasis on speculation made it so that the chip would issue an instruction several cycles before it was known if a cache access would hit.
It is usually the case that cache misses happen more often with SMT, and P4's smaller caches tended to feel the impact more than most.
Since many instructions would be issued incorrectly, the P4's replay mechanism would loop the instructions back into the pipeline on every cache miss and access to memory.
Not only that, but it was possible for the replay loop to get clogged by multiple replays during long dependency chains.

For single-threaded performance, replay could be a headache, since the P4 would sometimes for no visible reason take hundreds or thousands of cycles to complete a simple stretch of code.
For SMT, the massive amount of speculation consumed limited resources on a rather narrow core.

SMT should have filled in stall cycles in the long P4 pipeline. The problem was that it had competition from the replay mechanism, which also tried to fill stall cycles. In pathological cases, the replay mechanism would fill stall cycles with instructions that inevitably stalled again and again.

Prescott theoretically improved its threading resources and replay mechanism.
It was not enough to correct for the thermal ceiling that killed its clock scaling.

edit:
There is an interesting article on the replay mechanism on xbit:

http://www.xbitlabs.com/articles/cpu...ay/replay.html

There's a section on its influence on hyperthreading that also includes a comparison between Prescott and Northwood that shows how much Prescott improved HT.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 31-May-2007 at 22:23.
3dilettante is offline   Reply With Quote
Old 01-Jun-2007, 14:43   #10
INKster
Senior Member
 
Join Date: Apr 2006
Location: Io, lava pit number 12
Posts: 2,108
Default

Huh ?!?

"Larrabee" as a joint Intel-Nvidia effort.
I wonder what this would mean for the future "Fusion" products (beyond the initially "simple" IGP/CPU on the same die)...
INKster is offline   Reply With Quote
Old 01-Jun-2007, 17:35   #11
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

I'm not taking that one to the bank yet. I can see why Intel would like Nvidia's participation. . .it's less clear to me why Nvidia would want to play ball unless there's a pretty sizeable revenue/royalty stream associated with it for them. Would Intel make that kind of deal? Doesn't seem in character for them.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 01-Jun-2007, 17:43   #12
INKster
Senior Member
 
Join Date: Apr 2006
Location: Io, lava pit number 12
Posts: 2,108
Default

Quote:
Originally Posted by Geo View Post
I'm not taking that one to the bank yet. I can see why Intel would like Nvidia's participation. . .it's less clear to me why Nvidia would want to play ball unless there's a pretty sizeable revenue/royalty stream associated with it. Would Intel make that kind of deal? Doesn't seem in character for them.
Extraordinary circumstances call for extraordinary partnerships, and the AMD/ATI merger was certainly one of them.
Hannibal, in his late April article about "Larrabee" at arstechnica.com, shared some insider info about it which i found a bit... suspicious, given the constant reference to the G80 architecture.
This could be why Intel "named certain names", excluding the R600 (aside from the fact that they are competitors, Intel could have used the AMD "Fusion" project as a bashing bullet on "why our solution is better than theirs").

Let's wait and see.
INKster is offline   Reply With Quote
Old 01-Jun-2007, 17:46   #13
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Oh, I'm not ruling it out. I'm just saying when I read it I didn't exactly go "Ah, of course. . ."
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 01-Jun-2007, 18:08   #14
Panajev2001a
Senior Member
 
Join Date: Mar 2002
Posts: 3,187
Send a message via MSN to Panajev2001a
Default

Quote:
Originally Posted by Arun Demeure View Post
I'm not arguing that a number of apps won't benefit from multithreading. They obviously will. Games will benefit massively - that's just a matter of time, and a large number of non-mass-market applications will. That doesn't justify the purchase of a octo-core CPU for Joe Consumer though, and the problem is that I fail to see not what justifies it today, but what justifies it in 5+ years when it will have become a commodity. The only interesting emerging workloads that might become more important (such as voice recognition) seem to benefit more from throughput cores (or even GPU cores) than CPU cores.
Arun, many do agree that not a lot of everyday use applications are going to benefit from having faster and faster cores and more of these cores running in parallel, but this might be true as far as each application taken alone.

A lot of people, intentionally or not (tons of programs installed as start-up programs that do work while the PC is idle or that steal a few cycles here and there), are running more and more programs/processes in parallel and multi-core systems for users such as myself do benefit from multiple cores as the whole environment feels more responsive.
__________________
"Any idea worth a damn is already patented... twice" -Mfa
Panajev2001a is offline   Reply With Quote
Old 01-Jun-2007, 18:22   #15
nutball
Senior Member
 
Join Date: Jan 2003
Location: en.gb.uk
Posts: 1,623
Default

Quote:
Originally Posted by Panajev2001a View Post
A lot of people, intentionally or not (tons of programs installed as start-up programs that do work while the PC is idle or that steal a few cycles here and there), are running more and more programs/processes in parallel and multi-core systems for users such as myself do benefit from multiple cores as the whole environment feels more responsive.
This is an oft stated argument. It scales to ... maybe two cores on the desktop for a typical user. Four cores tops, but not for a typical user. It's not really a good justification for 8 or 16 CPU cores becoming the default option when buying a PC (unless our favourite operating system vendor can come up with new and even more pretty ways to waste our computing resources for us).
__________________
2+2 is not a matter of opinion.
nutball is online now   Reply With Quote
Old 01-Jun-2007, 18:23   #16
AlNets
Posts may self-destruct
 
Join Date: Feb 2004
Location: In a Mirror Darkly
Posts: 15,178
Default

Quote:
Originally Posted by Panajev2001a View Post
A lot of people, intentionally or not (tons of programs installed as start-up programs that do work while the PC is idle or that steal a few cycles here and there), are running more and more programs/processes in parallel and multi-core systems for users such as myself do benefit from multiple cores as the whole environment feels more responsive.

Even so, I find the hard drives to be quite limiting for carrying out multiple tasks despite dual core. I suppose ideally, you'd have multiple programs and multiple hard drives from which to do those tasks. But there's too much (thrashing?) conflicting use with the one hard drive that many computers only have (e.g. laptops or other template-built computers from Dell or HP for instance).
AlNets is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:17.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.