If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#26 |
|
MSI Man
|
__________________
I miss you CJ, 1976 - 2010 |
|
|
|
|
|
#27 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
The biggest problem with physics today is that it's a hype and everyone thinks CPU performance is the problem.
Recent games like Crysis and Unreal Tournament 3 have almost exactly the same amount of physics as Far Cry and Unreal Tournament 2004. It's simply gameplay bound, and only for extreme synthetical physics heavy tech demos (read: CellFactor) might the CPU performance become a problem. But it's not like ten times more performance is the solution for real games, at least not for long... Quote:
CPU's are just highly underestimated, and horribly abused. The same people that use an interpreted scripting language for their games are the ones that call the CPU too slow for physics. I've done a bit of profiling on modern physics engines, and while I expected the bottlenecks to be lean SSE code it was often old x87 code with incredibly slow square roots and divisions. In my opinion the "physics problem" just solves itself if CPU's keep scaling the way they do and physics engines are properly optimized (Intel's aquisition of Havok can only be a good thing). I also think Larrabee is just an experiment for future generations of CPU's. More cores, but simpler ones. Larrabee has to prove whether or not lower single-thread performance is an option. All Intel wants to do is determine what has to end up in future generation CPU's they can sell to the masses. And if they have an intermediate product they can sell to the HPC market, they have nothing to lose. In this light AMD's strategy of buying ATI might even be more brilliant. Intel has roadmaps with server chips up to 32 cores in 2010, a strategy that needs no question, but for desktop and mobile chips the decision is much harder. Nehalem's 4-core architecture with Hyper-Threading is in my eyes yet another experiment to see how many threads software developers can put to work. The decisions Intel has to make have a gigantic impact on what computers will look like the next decade... |
|
|
|
|
|
|
#28 | |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Quote:
Remind us again what the price difference is between those two components, and which the average machine is more likely to have. |
|
|
|
|
|
|
#29 | |
|
Member
|
Quote:
|
|
|
|
|
|
|
#30 |
|
Mostly Harmless
|
The reek of geek musk and the ringing of nerd antlers clashing in here is getting a bit much. Dial it back a notch, fellas.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee "Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel ". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006 "Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss |
|
|
|
|
|
#31 |
|
aka Ratchet
|
See, Geo, you really are more personable than I am.
__________________
confident, cocky, lazy, dead. |
|
|
|
|
|
#32 | |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Quote:
Of course, it is noteworthy that two Conroe cares represent 2x143mm˛ (=286mm˛) on 65nm, while G84 is about 170mm˛ on 80nm. So you might say the G84's perf/mm˛ (on the same process) would be about twice that of the Core 2 Quad, and that's before considering Intel's process uses more advanced (and costly!) materials, increasing the cost per mm˛. Of course, GFlops figures aren't everything, and for physics workloads you would probably expect the CPU to be at an efficiency advantage. However, I think if you look at a roadmap for Sandy Bridge (aka Gesher/new 32nm microarch), you'll see it likely won't go up to much more than 500GFlops or so in the desktop market (16 flops/core, 8 cores, 4GHz+). Suddenly, this will look a lot less impressive even compared to a mid-range GPU...
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
|
#33 | |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Quote:
|
|
|
|
|
|
|
#34 | ||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Quote:
Anyway, CPU prices are strongly exponential. So you can get an ever so slightly slower Q6600 for 280 US$ or less. Also, you need a CPU anyway, so instead of buying say a E6600 and two 8600's you could buy a Q6600 and one 8600 for roughly the same money. And this way you also accelerate games that are not that physics heavy but for instance A.I. heavy (using an interpreted scripting language)... But feel free to disagree. This is just my personal opinion, at the moment. I believe that dedicated physics has no long-term future, and Larrabee will determine what future CPU's will look like. |
||
|
|
|
|
|
#35 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Perf/trans also appears to be going up for CPU's, while for GPU's it's stagnating or even lowering. Core 2 doubled the SIMD execution unit width, at a relatively small transistor cost. And if Intel chooses the path of 'more cores but simpler ones' then theoretical perf/trans is going to go up further. Larrabee's in-order cores achieve exactly that. Also note that Hyper-Threading can lower the need for huge caches and thus increase perf/trans as well. So whereas CPU's have pleny of options to catch up with Moore's law, GPU's are bumping into non-architectural limitations like heat dissipation. |
|
|
|
|
|
|
#36 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
What do you mean by that? An additional thread should give you more opportunities to trash a cache
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
|
|
|
|
|
#37 |
|
Senior Member
|
In the case of cache miss CPU can simply switch over to another thread without too big performance hit. Of course cache trashing will increase needed memory bandwidth but I guess 3-channel DDR3 with IMC helps at least a bit
|
|
|
|
|
|
#38 | |||
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Quote:
Quote:
We'll see GPUs reaching 2TFlops on 45nm in a single-chip configuration, and likely for less than Sandy Bridge will come out at. I suspect this will be in late 2008, but I don't know NV and ATI's roadmaps enough to tell you that. Anyway, given TSMC's plans, I'd also expect the first 32nm GPUs at around the same time as Sandy Bridge or very slightly later, and you're likely talking 5+ TFlops there. That's an order of magnitude for roughly the same segment of the market... Quote:
Realistically though, the only reason why GPUs are so hot is because perf/mm˛ remains very important. If you were willing to sacrifice a bit more of perf/mm˛ in exchange for perf/watt, there should be no problem whatsoever creating a GPU with much lower wattage for a level of performance. If heat does become such a limitation, it will only be a temporary obstacle, resulting in a one-time drop in perf/mm˛. As for Larrabee: it obviously has a lot of potential for math-heavy computations, and especially physics. I said as much several times in posts and news pieces - however, I'd still argue it's not a "CPU", because it's the center of the PC architecture. It is, at best, a general-purpose coprocessor with a bit of fixed-function hardware around it when the target market requires that.
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|||
|
|
|
|
|
#39 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
Not really relevant, how does this lower the need for larger caches?
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
|
|
|
|
|
#40 | |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Quote:
A side-effect of this is that smaller caches could theoretically be used. Otherwise, it can make a CPU core with a medium-sized cache deliver the performance of a core with a larger cache. This all assumes a multi-thread friendly environment, of course. |
|
|
|
|
|
|
#41 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Uhm, I think you're both thinking of two different advantages of caches: saving memory bandwidth and hiding memory latency. SoEMT will reduce the importance of caches for hiding memory latency, but may actually increase their importance in terms of saving bandwidth, as trashing may go up. The same principles also apply to GPUs.
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#42 | ||
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
Quote:
In the general case 2 threads working on different data sets will require a larger cache, not a smaller one. Quote:
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
||
|
|
|
|
|
#43 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
I guess the question is how often you're bandwith limited vs how often you're idling because of bandwidth latency. For the quad-core Nehalem though, you're looking at a 192-bit DDR3 IMC... So I don't think latency is too much of a concern for once!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#44 | |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
Quote:
Unless you never re-use your data, but then you don't need a cache in the first place.
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way Last edited by nAo; 01-Dec-2007 at 18:33. Reason: freudian slip: wrote texture latency instead of general memory latency |
|
|
|
|
|
|
#45 |
|
Regular
Join Date: Mar 2007
Posts: 8,958
|
Add to that, has even a 4 core Penryn come even remotely close the amount of math it can do in comparison to years old R580?
If Folding at Home is any indication of relative realworld workload and potential Physics performance then it'll still be years (Larrabee perhaps?) until Intel matches R580 much less anything newer. Then again I'm not an expert in this area so I could be attributing far too much importance to GPU performance in FAH. And the amount of physics used in Crysis is indeed magnitudes higher than those used in Farcry. It's not just a simple evolutionary rise in useage. The complexity of the calculations might have only gone up slightly but the sheer number of calculations going on in any given scene absolutely dwarfs those used in Farcry. It can, of course, be argued that games don't need that level of physic nor that number of calculations per scene. But I would argue it goes a long ways towards immersion and the all important WOW factor. I'll be a sad buddha if the trend reverts to doing less because it isn't "needed." Regards, SB |
|
|
|
|
|
#46 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Arun, your arguments are correct but you're making the general mistake of only looking at the high-end. The average system sold today does not come with a GeForce 8800, but it does have dual-core. In fact more than half of all systems are laptops, so 200+ Watt GPU's will never become the norm. Lots of people, including occasional gamers, are even content with integrated graphics or a low-end card. Sure we'll see multi-teraflop GPU's in the not too distant future, but we'll see the average system equipped with quad-cores sooner.
So basically the average system has a potent CPU, but a modest GPU. For games this means that the GPU should be used only for what it's most efficient at; graphics. CPU's are more interesting for everything else. I come to the same conslusion when looking at GPGPU applications. Only with high-end GPU's they achieve a good speedup, but actual work throughput versus available GFLOPS is often laughable. So I believe that the arrival of teraflop GPU's will have an insignificant effect on the current balance. Advanced multi-core CPU's on the other hand will be adopted relatively quickly, making it ever more interesting not to offload things like physics anywhere else. The megaherz race is over, but the multi-core race has just begun and has some catching up to do. From an architectural point of view, GPU's have only three limited types of data access: - Registers: very fast, but not suited for storing actual data structures. - Texture cache: very important to reduce texture sampling bandwith. Close to useless for other access patterns. - RAM: high bandwidth but high latency. Compression techniques only useful for graphics. For CPU's this becomes: - Registers: extremely fast and since x64 no longer a big performance limiter. - L1 cache: very fast and practically an extension of the register set. Suited for holding actual data sets. - L2 cache: a bit slower but can hold the major part of the working set. - RAM: not that high bandwith, but still tons of potential when multi-core increases the need. So we'd have to see major changes in GPU architecture to make them more suited for non-graphics tasks, likely affecting their graphics performance. That might be ok for the high-end but the mid- and low-end have no excess performance for anything else. The CPU on the other hand is already well on its way to be able to handle larger workloads, and effectively ending up in every system. |
|
|
|
|
|
#47 | |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
Quote:
G80 also exposes a fast on chip memory through CUDA
__________________
[twitter] More samples, we need more samples! [Dean Calver] The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way |
|
|
|
|
|
|
#48 | ||||||||
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Quote:
Quote:
Quote:
But it's not because there is a segment for games on what will be a $300 PC market that there won't be a market for games above that; and as has traditionally been the case, these two will be completely separate. Quote:
Quote:
And indeed, I can't really imagine any circumstance where offloading the physics to the GPU makes sense there, but GFlops still aren't massively in the CPU's favor and this market would mostly play casual and old games; amusingly, given the performance of current GPUs in DX10 games, they would presumably still play DX9 games! Now, look at another segment of the market: $120 CPU, $120 GPU, $60 Chipset. In 2010-2011, that would probably correspond to a 3GHz+ quad-core on the CPU side of things (with a higher IPC than Penryn). That's about 150GFlops, maybe. On the GPU side of things, however, you'll easily have more than 1TFlop: just take RV670, which manages 500GFlops easily on 55nm at 190mm2. It's not exactly hard to predict where things will go with 40nm and 32nm... Quote:
Quote:
If physics acceleration on the GPU doesn't take off, then obviously you'll want more than a $100 CPU. But if it does, then who knows - and that's why NVIDIA and ATI are so interested in it. They want to increase their ASPs at the CPU's expense, and there is no fundamental reason why they cannot succeed. It's all about their execution against Intel's. And if what happens is that GPUs capture more out of a PC's ASPs, then you're looking at $100 CPUs being paired with $500 GPUs. Heck, as I said I'm already a pioneer in that category - slightly overclocked E4300 ($150) with a $600 GPU. The difference in GFlops between the two is kind of laughable, really, and in this case GPU Physics would clearly make sense. *That* is the dynamic that NVIDIA and AMD are trying to encourage, and that's why it's a political question, not really a technical one (although perf/watt for CPUs vs GPUs for physics also matters). I might be right, or you might be right, but we aren't personally handling any of these companies' mid-term strategies so I wouldn't dare claiming anything with absolute certainty given that it's not even really a technical debate from my POV. I do agree that a 100GFlops CPU is just fine for very nice physics, but I do not believe that closes the debate either. Quote:
P.S.: I just thought I'd point out that I do NOT consider Larrabee to be a CPU here, but that obviously it might be a very interesting target for GPGPU/Physics. An heterogeneous chip with Sandy Bridge and Larrabee cores on 32nm, if the ISA takes off, ought to be much better than a GPU for Physics and other similar workloads either way (assuming there's enough memory bandwidth). So this is another important dynamic of course, and arguably much nearer to the discussion subject of this thread...
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
||||||||
|
|
|
|
|
#49 | |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Quote:
SoEMT is rather pessimistic about data sharing and coherent thread behavior. It's suited to long-latency events where speculation within the same thread is mostly pointless, but it also assumes that there's a somewhat limited amount of non-speculative work in other threads, which is why it will stick with a thread for quite a stretch between events. If the workload has massive amounts of non-speculative work available in other threads with a high likelihood that they are working in the same place, why bother running with just one thread many instructions ahead of the pack when each instruction taken increases the chance of tripping up one of the other threads through a cache invalidation?
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|