Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 29-Nov-2007, 21:27   #26
neliz
MSI Man
 
Join Date: Mar 2005
Location: In the know
Posts: 4,885
Send a message via ICQ to neliz Send a message via MSN to neliz
Default

http://www.mdronline.com/watch/watch...607&on=1#item2

(courtesy of L'Inq)
__________________
I miss you CJ, 1976 - 2010
neliz is offline   Reply With Quote
Old 30-Nov-2007, 07:07   #27
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by Demirug View Post
The biggest problem with physic today is the performance...
The biggest problem with physics today is that it's a hype and everyone thinks CPU performance is the problem.

Recent games like Crysis and Unreal Tournament 3 have almost exactly the same amount of physics as Far Cry and Unreal Tournament 2004. It's simply gameplay bound, and only for extreme synthetical physics heavy tech demos (read: CellFactor) might the CPU performance become a problem. But it's not like ten times more performance is the solution for real games, at least not for long...
Quote:
...and to solve it we simply need more math power. Something that GPUs already can offer.
A 3 GHz Core 2 Quad has more GFLOPS than a GeForce 8600's shaders, without requiring a trip over the PCIe bus.

CPU's are just highly underestimated, and horribly abused. The same people that use an interpreted scripting language for their games are the ones that call the CPU too slow for physics. I've done a bit of profiling on modern physics engines, and while I expected the bottlenecks to be lean SSE code it was often old x87 code with incredibly slow square roots and divisions.

In my opinion the "physics problem" just solves itself if CPU's keep scaling the way they do and physics engines are properly optimized (Intel's aquisition of Havok can only be a good thing). I also think Larrabee is just an experiment for future generations of CPU's. More cores, but simpler ones. Larrabee has to prove whether or not lower single-thread performance is an option. All Intel wants to do is determine what has to end up in future generation CPU's they can sell to the masses. And if they have an intermediate product they can sell to the HPC market, they have nothing to lose. In this light AMD's strategy of buying ATI might even be more brilliant.

Intel has roadmaps with server chips up to 32 cores in 2010, a strategy that needs no question, but for desktop and mobile chips the decision is much harder. Nehalem's 4-core architecture with Hyper-Threading is in my eyes yet another experiment to see how many threads software developers can put to work. The decisions Intel has to make have a gigantic impact on what computers will look like the next decade...
Nick is offline   Reply With Quote
Old 30-Nov-2007, 15:29   #28
ShaidarHaran
hardware monkey
 
Join Date: Mar 2007
Posts: 3,900
Default

Quote:
Originally Posted by Nick View Post
The biggest problem with physics today is that it's a hype and everyone thinks CPU performance is the problem.

Recent games like Crysis and Unreal Tournament 3 have almost exactly the same amount of physics as Far Cry and Unreal Tournament 2004. It's simply gameplay bound, and only for extreme synthetical physics heavy tech demos (read: CellFactor) might the CPU performance become a problem. But it's not like ten times more performance is the solution for real games, at least not for long...
Farcry->Crysis has a readily apparent increase in physics effects. How can you say they "have the same amount"? The sheer number of object interactions and destructability of the environments alone makes that obvious to anyone that's ever played both games, or even just seen a video of them for that matter.

Quote:
Originally Posted by Nick View Post
A 3 GHz Core 2 Quad has more GFLOPS than a GeForce 8600's shaders, without requiring a trip over the PCIe bus.
Remind us again what the price difference is between those two components, and which the average machine is more likely to have.
ShaidarHaran is offline   Reply With Quote
Old 30-Nov-2007, 15:36   #29
compres
Member
 
Join Date: Jun 2003
Location: Germany
Posts: 553
Send a message via AIM to compres Send a message via MSN to compres Send a message via Skype™ to compres
Default

Quote:
Originally Posted by Nick View Post
CPU's are just highly underestimated, and horribly abused. The same people that use an interpreted scripting language for their games are the ones that call the CPU too slow for physics. I've done a bit of profiling on modern physics engines, and while I expected the bottlenecks to be lean SSE code it was often old x87 code with incredibly slow square roots and divisions.
What a wonderful post. I confess I am not sure about the FLOP counts in your post, but with this paragraph in particular I agree so much.
compres is offline   Reply With Quote
Old 30-Nov-2007, 19:48   #30
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

The reek of geek musk and the ringing of nerd antlers clashing in here is getting a bit much. Dial it back a notch, fellas.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 30-Nov-2007, 19:50   #31
Mark
aka Ratchet
 
Join Date: Apr 2002
Location: Newfoundland, Canada
Posts: 606
Send a message via ICQ to Mark Send a message via MSN to Mark
Default

Quote:
Originally Posted by Geo View Post
The reek of geek musk and the ringing of nerd antlers clashing in here is getting a bit much. Dial it back a notch, fellas.
See, Geo, you really are more personable than I am.
__________________
confident, cocky, lazy, dead.
Mark is offline   Reply With Quote
Old 30-Nov-2007, 20:24   #32
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Quote:
Originally Posted by compres View Post
What a wonderful post. I confess I am not sure about the FLOP counts in your post, but with this paragraph in particular I agree so much.
Conroe has 8 SP flops/cycle per-core, so that's 96 GFlops total. The 8600 GT, on the other hand, has about 76GFlops without the MUL and 113GFlops with it. For physics, on the G84 specifically, it would be fair not to count it at all, which is what I presume Nick did.

Of course, it is noteworthy that two Conroe cares represent 2x143mm˛ (=286mm˛) on 65nm, while G84 is about 170mm˛ on 80nm. So you might say the G84's perf/mm˛ (on the same process) would be about twice that of the Core 2 Quad, and that's before considering Intel's process uses more advanced (and costly!) materials, increasing the cost per mm˛.

Of course, GFlops figures aren't everything, and for physics workloads you would probably expect the CPU to be at an efficiency advantage. However, I think if you look at a roadmap for Sandy Bridge (aka Gesher/new 32nm microarch), you'll see it likely won't go up to much more than 500GFlops or so in the desktop market (16 flops/core, 8 cores, 4GHz+). Suddenly, this will look a lot less impressive even compared to a mid-range GPU...
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is online now   Reply With Quote
Old 01-Dec-2007, 00:20   #33
ShaidarHaran
hardware monkey
 
Join Date: Mar 2007
Posts: 3,900
Default

Quote:
Originally Posted by Arun View Post
Of course, GFlops figures aren't everything, and for physics workloads you would probably expect the CPU to be at an efficiency advantage. However, I think if you look at a roadmap for Sandy Bridge (aka Gesher/new 32nm microarch), you'll see it likely won't go up to much more than 500GFlops or so in the desktop market (16 flops/core, 8 cores, 4GHz+). Suddenly, this will look a lot less impressive even compared to a mid-range GPU...
Last I heard Sandy Bridge was scheduled to hit around 200 DP GFLOPs. Although, I guess with Conroe/Penryn (quad) hitting near 100 GFLOPs and Nehalem doubling the core count, 200 GFLOPs for Sandy Bridge does seem a bit low.
ShaidarHaran is offline   Reply With Quote
Old 01-Dec-2007, 06:30   #34
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by ShaidarHaran View Post
Farcry->Crysis has a readily apparent increase in physics effects. How can you say they "have the same amount"? The sheer number of object interactions and destructability of the environments alone makes that obvious to anyone that's ever played both games, or even just seen a video of them for that matter.
The same amount in relative terms, not absolute terms. It just scales slowly the way it has always scaled. In my eyes there is no real sudden need for more physics processing. Physics is so much a hype that game developers actively look for additional physics to cram in. You can stretch it a certain amount, but a snapping leaf is not a falling tree and a hand grenade is not an A-bomb.
Quote:
Remind us again what the price difference is between those two components, and which the average machine is more likely to have.
I picked these parts to show that CPU's are not that extremely far behind when it comes to programmable floating-point performance. I'm not even sure AGEIA's chip has this amount of GFLOPS (530 million sphere-sphere collisions per second is not that impressive). CPU's are very slow compared to GPU's when it comes to graphics. This is mainly because of the fully pipelined texture samplers. But I don't know of any specialised component that would speed up physics by a large amount. It mainly needs multiplications and additions and ever since CPU's have gone multi-core they're not at any significant disadvantage.

Anyway, CPU prices are strongly exponential. So you can get an ever so slightly slower Q6600 for 280 US$ or less. Also, you need a CPU anyway, so instead of buying say a E6600 and two 8600's you could buy a Q6600 and one 8600 for roughly the same money. And this way you also accelerate games that are not that physics heavy but for instance A.I. heavy (using an interpreted scripting language)...

But feel free to disagree. This is just my personal opinion, at the moment. I believe that dedicated physics has no long-term future, and Larrabee will determine what future CPU's will look like.
Nick is offline   Reply With Quote
Old 01-Dec-2007, 06:56   #35
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by Arun View Post
Of course, it is noteworthy that two Conroe cares represent 2x143mm˛ (=286mm˛) on 65nm, while G84 is about 170mm˛ on 80nm. So you might say the G84's perf/mm˛ (on the same process) would be about twice that of the Core 2 Quad, and that's before considering Intel's process uses more advanced (and costly!) materials, increasing the cost per mm˛.
Yes, it's an unfair comparison technology-wise. But this is reality. Intel will always have an advantage there. From a higher perspective that's simply because there is more need to have the fastest possible general-purpose processors.

Perf/trans also appears to be going up for CPU's, while for GPU's it's stagnating or even lowering. Core 2 doubled the SIMD execution unit width, at a relatively small transistor cost. And if Intel chooses the path of 'more cores but simpler ones' then theoretical perf/trans is going to go up further. Larrabee's in-order cores achieve exactly that. Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well. So whereas CPU's have pleny of options to catch up with Moore's law, GPU's are bumping into non-architectural limitations like heat dissipation.
Nick is offline   Reply With Quote
Old 01-Dec-2007, 07:14   #36
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by Nick View Post
Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well.
What do you mean by that? An additional thread should give you more opportunities to trash a cache
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 01-Dec-2007, 08:24   #37
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

Quote:
Originally Posted by nAo View Post
What do you mean by that? An additional thread should give you more opportunities to trash a cache
In the case of cache miss CPU can simply switch over to another thread without too big performance hit. Of course cache trashing will increase needed memory bandwidth but I guess 3-channel DDR3 with IMC helps at least a bit
hoho is offline   Reply With Quote
Old 01-Dec-2007, 10:23   #38
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Quote:
Originally Posted by Nick View Post
Yes, it's an unfair comparison technology-wise. But this is reality. Intel will always have an advantage there. From a higher perspective that's simply because there is more need to have the fastest possible general-purpose processors.
And you expect this to remain true... how long? I'm on a 8800GTX here, and I bought an E4300 with the hope of overclocking it massively. For a variety of reasons it didn't, so it's often at stock (northbridge voltage for OC is ridiculous) or less than 40% higher. And if anything, I'm shocked by the fact I practically don't ever feel the need for anything more, even in games; when the CPU really is too slow, it's likely my 8800GTX is too.

Quote:
Perf/trans also appears to be going up for CPU's, while for GPU's it's stagnating or even lowering.
I think you're in for a very tough reality check in the next 2 years... As I said, Intel's Sandy Bridge in late 2010 will likely come as an 8-cores chip maximum in the desktop market. That represents 500 SP GFlops or 250 DP GFlops.

We'll see GPUs reaching 2TFlops on 45nm in a single-chip configuration, and likely for less than Sandy Bridge will come out at. I suspect this will be in late 2008, but I don't know NV and ATI's roadmaps enough to tell you that. Anyway, given TSMC's plans, I'd also expect the first 32nm GPUs at around the same time as Sandy Bridge or very slightly later, and you're likely talking 5+ TFlops there. That's an order of magnitude for roughly the same segment of the market...

Quote:
Larrabee's in-order cores achieve exactly that. Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well. So whereas CPU's have pleny of options to catch up with Moore's law, GPU's are bumping into non-architectural limitations like heat dissipation.
Oh, because CPUs are *not* bumping into heat dissipation limits, perhaps? I'd argue the PC architecture makes GPUs a more viable target for 200W+ TDPs anyway, since they're a discrete PCB and have plenty of place for cooling. However, it obviously remains a limitation, both marketing-wise and technologically.

Realistically though, the only reason why GPUs are so hot is because perf/mm˛ remains very important. If you were willing to sacrifice a bit more of perf/mm˛ in exchange for perf/watt, there should be no problem whatsoever creating a GPU with much lower wattage for a level of performance. If heat does become such a limitation, it will only be a temporary obstacle, resulting in a one-time drop in perf/mm˛.

As for Larrabee: it obviously has a lot of potential for math-heavy computations, and especially physics. I said as much several times in posts and news pieces - however, I'd still argue it's not a "CPU", because it's the center of the PC architecture. It is, at best, a general-purpose coprocessor with a bit of fixed-function hardware around it when the target market requires that.
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is online now   Reply With Quote
Old 01-Dec-2007, 15:10   #39
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by hoho View Post
In the case of cache miss CPU can simply switch over to another thread without too big performance hit.
Not really relevant, how does this lower the need for larger caches?
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 01-Dec-2007, 18:09   #40
ShaidarHaran
hardware monkey
 
Join Date: Mar 2007
Posts: 3,900
Default

Quote:
Originally Posted by nAo View Post
Not really relevant, how does this lower the need for larger caches?
Assuming SoEMT, it's just as hoho said. In the case of a cache miss, a core can simply switch to another thread until the data for the previous thread is retrieved.

A side-effect of this is that smaller caches could theoretically be used. Otherwise, it can make a CPU core with a medium-sized cache deliver the performance of a core with a larger cache. This all assumes a multi-thread friendly environment, of course.
ShaidarHaran is offline   Reply With Quote
Old 01-Dec-2007, 18:19   #41
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Uhm, I think you're both thinking of two different advantages of caches: saving memory bandwidth and hiding memory latency. SoEMT will reduce the importance of caches for hiding memory latency, but may actually increase their importance in terms of saving bandwidth, as trashing may go up. The same principles also apply to GPUs.
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is online now   Reply With Quote
Old 01-Dec-2007, 18:23   #42
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by ShaidarHaran View Post
Assuming SoEMT, it's just as hoho said. In the case of a cache miss, a core can simply switch to another thread until the data for the previous thread is retrieved.
The other thread will at best re-use the same data the first thread was using. This means that it can potentially evict data from the cache that the first thread is going to need in the near future, at least in a worst case scenario.
In the general case 2 threads working on different data sets will require a larger cache, not a smaller one.
Quote:
A side-effect of this is that smaller caches could theoretically be used.
Again, how can a second thread increase the efficiency of your cache so that yout might endup using a smaller one?
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 01-Dec-2007, 18:28   #43
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

I guess the question is how often you're bandwith limited vs how often you're idling because of bandwidth latency. For the quad-core Nehalem though, you're looking at a 192-bit DDR3 IMC... So I don't think latency is too much of a concern for once!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is online now   Reply With Quote
Old 01-Dec-2007, 18:30   #44
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by Arun View Post
Uhm, I think you're both thinking of two different advantages of caches: saving memory bandwidth and hiding memory latency. SoEMT will reduce the importance of caches for hiding memory latency, but may actually increase their importance in terms of saving bandwidth, as trashing may go up. The same principles also apply to GPUs.
I see your point but SoEMT is mostly a cheap way to put at some use idle units while paying a fairly small cost for it, it's not an opportunity to constantly trash your caches hoping that the your improved ability to hide memory accesses latency will save your arse, not in the general case for sure.
Unless you never re-use your data, but then you don't need a cache in the first place.
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way

Last edited by nAo; 01-Dec-2007 at 18:33. Reason: freudian slip: wrote texture latency instead of general memory latency
nAo is offline   Reply With Quote
Old 01-Dec-2007, 19:02   #45
Silent_Buddha
Regular
 
Join Date: Mar 2007
Posts: 8,958
Default

Add to that, has even a 4 core Penryn come even remotely close the amount of math it can do in comparison to years old R580?

If Folding at Home is any indication of relative realworld workload and potential Physics performance then it'll still be years (Larrabee perhaps?) until Intel matches R580 much less anything newer.

Then again I'm not an expert in this area so I could be attributing far too much importance to GPU performance in FAH.

And the amount of physics used in Crysis is indeed magnitudes higher than those used in Farcry. It's not just a simple evolutionary rise in useage. The complexity of the calculations might have only gone up slightly but the sheer number of calculations going on in any given scene absolutely dwarfs those used in Farcry.

It can, of course, be argued that games don't need that level of physic nor that number of calculations per scene. But I would argue it goes a long ways towards immersion and the all important WOW factor. I'll be a sad buddha if the trend reverts to doing less because it isn't "needed."

Regards,
SB
Silent_Buddha is offline   Reply With Quote
Old 03-Dec-2007, 01:13   #46
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Arun, your arguments are correct but you're making the general mistake of only looking at the high-end. The average system sold today does not come with a GeForce 8800, but it does have dual-core. In fact more than half of all systems are laptops, so 200+ Watt GPU's will never become the norm. Lots of people, including occasional gamers, are even content with integrated graphics or a low-end card. Sure we'll see multi-teraflop GPU's in the not too distant future, but we'll see the average system equipped with quad-cores sooner.

So basically the average system has a potent CPU, but a modest GPU. For games this means that the GPU should be used only for what it's most efficient at; graphics. CPU's are more interesting for everything else. I come to the same conslusion when looking at GPGPU applications. Only with high-end GPU's they achieve a good speedup, but actual work throughput versus available GFLOPS is often laughable.

So I believe that the arrival of teraflop GPU's will have an insignificant effect on the current balance. Advanced multi-core CPU's on the other hand will be adopted relatively quickly, making it ever more interesting not to offload things like physics anywhere else. The megaherz race is over, but the multi-core race has just begun and has some catching up to do.

From an architectural point of view, GPU's have only three limited types of data access:
- Registers: very fast, but not suited for storing actual data structures.
- Texture cache: very important to reduce texture sampling bandwith. Close to useless for other access patterns.
- RAM: high bandwidth but high latency. Compression techniques only useful for graphics.

For CPU's this becomes:
- Registers: extremely fast and since x64 no longer a big performance limiter.
- L1 cache: very fast and practically an extension of the register set. Suited for holding actual data sets.
- L2 cache: a bit slower but can hold the major part of the working set.
- RAM: not that high bandwith, but still tons of potential when multi-core increases the need.

So we'd have to see major changes in GPU architecture to make them more suited for non-graphics tasks, likely affecting their graphics performance. That might be ok for the high-end but the mid- and low-end have no excess performance for anything else. The CPU on the other hand is already well on its way to be able to handle larger workloads, and effectively ending up in every system.
Nick is offline   Reply With Quote
Old 03-Dec-2007, 02:35   #47
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by Nick View Post
From an architectural point of view, GPU's have only three limited types of data access:
- Registers: very fast, but not suited for storing actual data structures.
- Texture cache: very important to reduce texture sampling bandwith. Close to useless for other access patterns.
- RAM: high bandwidth but high latency. Compression techniques only useful for graphics.
DX10 GPUs have constant buffers and associated caches.
G80 also exposes a fast on chip memory through CUDA
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline   Reply With Quote
Old 03-Dec-2007, 08:43   #48
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Quote:
Originally Posted by Nick View Post
Arun, your arguments are correct but you're making the general mistake of only looking at the high-end. The average system sold today does not come with a GeForce 8800, but it does have dual-core.
And they'll play old games that already use the CPU for physics anyway. Your arguement only stands if this dynamic remains true going forward, which it very possibly won't IMO (more below).

Quote:
In fact more than half of all systems are laptops
I don't know where you get your stats, but I suspect you should reconsider your source. Even the most optimistic estimates don't expect that to happen before 2010.

Quote:
Lots of people, including occasional gamers, are even content with integrated graphics or a low-end card.
There's the gaming market, and then there's the gaming market. I could have a lot of fun playing casual games and 3-5 years old games, but that's not what I'm doing. There will ALWAYS be a market for games that run on IGPs. I say IGPs because low-end cards will die within the next 2 years, as they'll become essentially senseless: if you look at AMD's and NVIDIA's upcoming DX10 IGPs, they're practically good enough for Windows 7 and for a few years after that. All you'll see after that are incremental increases in performance & video decoding quality, imo.

But it's not because there is a segment for games on what will be a $300 PC market that there won't be a market for games above that; and as has traditionally been the case, these two will be completely separate.

Quote:
Sure we'll see multi-teraflop GPU's in the not too distant future, but we'll see the average system equipped with quad-cores sooner.
I have massive doubts that more the 2-3 cores makes sense in the 'low-end commodity PC market'. If a game aims at the low-end, it should aim at that, and that some artificial segment with average performance that nobody really fits in. Anyway, that's arguable, but to the next point now...

Quote:
So basically the average system has a potent CPU, but a modest GPU. For games this means that the GPU should be used only for what it's most efficient at; graphics.
You assume this to remain true: once again, it will not. Rather than looking at the present, it might be a good idea to try and look at the future instead. In the 2010-2011 timeframe (32nm), dual-cores will remain widely available for the low-end of the market. These will be paired with G86-level graphics performance in the ultra-low-end, with probably a higher ALU:TEX ratio. So in that segment of the market, you'll see maybe 200GFlops on the GPU and 75GFlops on the CPU.

And indeed, I can't really imagine any circumstance where offloading the physics to the GPU makes sense there, but GFlops still aren't massively in the CPU's favor and this market would mostly play casual and old games; amusingly, given the performance of current GPUs in DX10 games, they would presumably still play DX9 games!

Now, look at another segment of the market: $120 CPU, $120 GPU, $60 Chipset. In 2010-2011, that would probably correspond to a 3GHz+ quad-core on the CPU side of things (with a higher IPC than Penryn). That's about 150GFlops, maybe. On the GPU side of things, however, you'll easily have more than 1TFlop: just take RV670, which manages 500GFlops easily on 55nm at 190mm2. It's not exactly hard to predict where things will go with 40nm and 32nm...

Quote:
Only with high-end GPU's they achieve a good speedup, but actual work throughput versus available GFLOPS is often laughable.
Uhm, that's just wrong. In apps that make sense for their current architecture, and there are *plenty* of them, the efficiency in terms of either GFlops or bandwidth (whichever is the bottleneck) is perfectly fine.

Quote:
The megaherz race is over, but the multi-core race has just begun and has some catching up to do.
Yes, it has a lot of catching up to do in terms of, as you kinda said yourself, politics and hype. You don't see to realise that when predicting the future configurations of PCs (i.e. are most consumers going to go with a $300 CPU with a $150 GPU, or a $100 CPU with a $350 GPU?) what maters is what decisions the developers take. If the CPU is never the bottleneck, then why would you want more than a $100 CPU anyway?

If physics acceleration on the GPU doesn't take off, then obviously you'll want more than a $100 CPU. But if it does, then who knows - and that's why NVIDIA and ATI are so interested in it. They want to increase their ASPs at the CPU's expense, and there is no fundamental reason why they cannot succeed. It's all about their execution against Intel's.

And if what happens is that GPUs capture more out of a PC's ASPs, then you're looking at $100 CPUs being paired with $500 GPUs. Heck, as I said I'm already a pioneer in that category - slightly overclocked E4300 ($150) with a $600 GPU. The difference in GFlops between the two is kind of laughable, really, and in this case GPU Physics would clearly make sense. *That* is the dynamic that NVIDIA and AMD are trying to encourage, and that's why it's a political question, not really a technical one (although perf/watt for CPUs vs GPUs for physics also matters).

I might be right, or you might be right, but we aren't personally handling any of these companies' mid-term strategies so I wouldn't dare claiming anything with absolute certainty given that it's not even really a technical debate from my POV. I do agree that a 100GFlops CPU is just fine for very nice physics, but I do not believe that closes the debate either.

Quote:
So we'd have to see major changes in GPU architecture to make them more suited for non-graphics tasks, likely affecting their graphics performance.
Those are already happening and barely affecting graphics performance, as they are mostly minor things and the major things can easily be reused for graphics. There is no fundamental reason why this will not keep happening.

P.S.: I just thought I'd point out that I do NOT consider Larrabee to be a CPU here, but that obviously it might be a very interesting target for GPGPU/Physics. An heterogeneous chip with Sandy Bridge and Larrabee cores on 32nm, if the ISA takes off, ought to be much better than a GPU for Physics and other similar workloads either way (assuming there's enough memory bandwidth). So this is another important dynamic of course, and arguably much nearer to the discussion subject of this thread...
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is online now   Reply With Quote
Old 04-Dec-2007, 00:14   #49
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

Quote:
Originally Posted by nAo View Post
I see your point but SoEMT is mostly a cheap way to put at some use idle units while paying a fairly small cost for it, it's not an opportunity to constantly trash your caches hoping that the your improved ability to hide memory accesses latency will save your arse, not in the general case for sure.
Unless you never re-use your data, but then you don't need a cache in the first place.
Wouldn't fine-grained round-robin or a hybrid scheme like Niagra be more effective?

SoEMT is rather pessimistic about data sharing and coherent thread behavior.
It's suited to long-latency events where speculation within the same thread is mostly pointless, but it also assumes that there's a somewhat limited amount of non-speculative work in other threads, which is why it will stick with a thread for quite a stretch between events.

If the workload has massive amounts of non-speculative work available in other threads with a high likelihood that they are working in the same place, why bother running with just one thread many instructions ahead of the pack when each instruction taken increases the chance of tripping up one of the other threads through a cache invalidation?
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:04.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.