If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#376 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Quote:
__________________
x: RCP_sat R2.x, R1.y y: RCP_sat ____, R1.y z: RCP_sat ____, R1.y |
|
|
|
|
|
|
#377 | |
|
Senior Member
|
Quote:
edit: And that's where the argument started, we can now move on with the primary topic
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! Last edited by CarstenS; 29-Jun-2011 at 17:37. |
|
|
|
|
|
|
#378 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
But we can let him speak. The original question was about the power consumption of a cache, which was beefed up to serve the same usage profile as the register files of GPUs. So while this sentence is normally true:
Quote:
So where should the power consumption advantage come from, if the actual memory arrays doesn't differ anymore? If you add the few additional tasks a cache must be able (how expensive or cheap that may be) to handle and the simple fact that the cache is very likely physically further away from the units than the register files (which are even splitted, so each lane of a vector ALU has its own register file to place it closer to the individual ALU) and it costs energy to drive data over a distance, it necessarily follows, that it would have a higher power consumption if used as register file. What you gain is some flexibility and the performance will decrease more gracefully, if you need more register space than offered by the register file. Eventually, we may very well see something like the thing proposed by nvidia in that paper, where you have a few registers basically within each ALU to cover the operands for 4 or 5 instructions only, backed up by a larger register file, backed up by a cache system. That way the data transfer between the levels further away from the ALU decreases, i.e. it requires less transfers over larger distances, lowering the power consumption.
__________________
x: RCP_sat R2.x, R1.y y: RCP_sat ____, R1.y z: RCP_sat ____, R1.y |
|
|
|
|
|
|
#379 |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
But they do behave "nice enough". If they stop doing that, they are creating just noise, what you normally want to avoid anyway, because the picture probably starts to look awful at this point. A bit of noise can improve the perceived realism, but that bit doesn't mean everything.
__________________
x: RCP_sat R2.x, R1.y y: RCP_sat ____, R1.y z: RCP_sat ____, R1.y |
|
|
|
|
|
#380 | |
|
Senior Member
Join Date: Jul 2008
Posts: 2,155
|
Quote:
Looking at APU + motherboard prices, it's almost like a €50 computing-capable graphics card is being offered for free. From a market standpoint, it should be a game changer, increasing the interest for software in supporting OpenCL\DirectCompute. Especially if Llano's demand for laptops is as it's been rumoured to be. |
|
|
|
|
|
|
#381 |
|
Senior Member
|
CPU's run OCL only when specifically asked to do so.
fGPU will run transparently the code written for dGPU. |
|
|
|
|
|
#382 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Intel is expected to support OpenCL on Ivy Bridge's IGP. For the sake of argument, let's assume performance will be horrendous. Then clearly the existence of OpenCL applications doesn't prove that GPGPU on an IGP is the future. Similarly there is no proof yet that Llano's architecture has any merit beyond mere graphics. In other words, just because Llano runs OpenCL, doesn't mean it's a convincing incentive for developers to invest more into OpenCL development. I'd love to see a software renderer written purely in OpenCL (not using any fixed-function hardware), and compare that against SwiftShader. Then we'd be able to get a true picture of the value of IGPs for computing... |
|
|
|
|
|
|
#383 |
|
Senior Member
Join Date: Jul 2008
Posts: 2,155
|
Why would an OpenCL-based software renderer be a better benchmark than many already-available image-editing, video-editing, video-encoding and password decrypting applications?
|
|
|
|
|
|
#384 | |||
|
Senior Member
|
Quote:
EDIT Quote:
EDIT Quote:
|
|||
|
|
|
|
|
#385 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Because many GPGPU applications and benchmarks claim extraordinary speedups by comparing the results of high-end GPUs against a plain C implementation on the CPU.
|
|
|
|
|
|
#386 |
|
French frog
Join Date: Jun 2005
Location: France
Posts: 4,172
|
In regard to software renderer vs IGP it would be imho more interesting to see an updated version of Unreal vs BF3 running on IGP (either llano, Sandy Bridge and Ivy bridge).
__________________
What's trying to be a bunch of presentations PS360 youtube channel Sebbbi about virtual texturing Tuned EADGCF and liking it :) |
|
|
|
|
|
#387 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Unless Intel just went out of business and every consumer decided to upgrade today, nothing is going through the roof any time soon.
|
|
|
|
|
|
#388 |
|
Senior Member
|
If Intel's software graphics rendering power is as you claim, then why wouldn't their OpenCL computational power scale just as well? People can use an OpenCL CPU device just as easily as a GPU device.
__________________
I speak only for myself. |
|
|
|
|
|
#389 | ||
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
CPUs are also subject to cache thrashing due to TLP and frequent context switching, especially on larger data sets. There is no magic that will enable CPUs to feed much wider execution resources efficiently and do so for free. GPUs will rely less on DLP in the future but it's doubtful that x86+AVX will offer much competition in graphics workloads. IGPs may be doomed but anything more than that has the memory bandwidth and transistor/power budget to put CPUs to shame. Quote:
__________________
What the deuce!? |
||
|
|
|
|
|
#390 |
|
Senior Member
|
SB class igp's are doomed. I don't see any reason why strong alternatives like Llano's projected successors are doomed as well.
|
|
|
|
|
|
#391 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,570
|
Quote:
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#392 | |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
Also, what's going to happen when games aren't based on 6 yr old console hardware any more? All IGPs will then resume their place in the trash bin.
__________________
What the deuce!? |
|
|
|
|
|
|
#393 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#394 | |||||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
So there are no compromises to legacy scalar execution, and it also exploits DLP in practically the same way as a GPU! Besides, there is no viable alternative. You said you agree they will converge but wonder whether CPUs or GPUs are more representative (i.e. closer to the result of the convergence)? GPUs have a very long way to go to offer acceptable sequential performance. Some form of out-of-order execution, and a comprehensive cache hierarchy are an absolute must to be able to compete with CPUs. For CPUs to compete with GPUs the only thing lacking is AVX-1024... Quote:
Quote:
Quote:
Actually it's a simple question of growing the IGP or growing the CPU cores to threaten the mid-end discrete GPU market. Given that AVX2 brings us everything to drastically speed up software rendering and other high throughput applications, and it's readily extendable to 1024-bit registers, Intel seems focused on increasing CPU DLP. They only have to keep an adequate IGP around for long enough to make the transition. Software rendering is not limited by the API so once developers start using the CPU more directly it would even compete with high-end discrete cards. It will take many years, but the convergence isn't stopping so this is bound to happen. Perhaps by the end of this decade buying a discrete graphics card may seem as silly as buying a discrete sound card. They'll still exist but for the majority of consumers won't offer any worthwhile benefit. Quote:
Last edited by Nick; 30-Jun-2011 at 12:41. |
|||||
|
|
|
|
|
#395 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
|
|
|
|
|
|
|
#396 |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Due to the splitted nature of the register files (each slice serves just a single or only a few vector lanes) and that the units ecxecute the same instruction for the same nominal register (just a different lane of the logical vector each clock) over 2 to 4 clocks, the register files do not have more ports than your typical L1. You can get away with a single read and a single write port.
__________________
x: RCP_sat R2.x, R1.y y: RCP_sat ____, R1.y z: RCP_sat ____, R1.y |
|
|
|
|
|
#397 | |
|
Senior Member
Join Date: Jul 2008
Posts: 2,155
|
Quote:
Those charts are a good representation of cumulative sales over the past 15 years. Regarding sales for the past 2-3 years (which is what matters the most for OEMs and computing-demanding software developers), they're a bit useless, as the top 5 GPUs aren't even in the market anymore. I think you're downplaying graphics a bit too much, as if we were in ~2005. Even if the end customer is ignorant of that fact for 99% of the cases, OEMs know that a better GPU drastically enhances gaming, video and web-browsing performance.That's why Brazos was sold out in Q1 2011, and has taken quite a chunk out of Atom shipments. So if OEMs value better performing iGPUs and prefer the option to bundle AMD APUs, more PCs with AMD APUs will be on the market, more people will buy AMD APUs, and more developers will put a nice, big and shiny stamp in their latest software claiming it takes full advantage of the iGPU in people's newly-bought PCs. |
|
|
|
|
|
|
#398 |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Pretty trivial to pull off when they offer middling performance. Have you seen comparisons between desktop Llano and the lowly 6670?
__________________
What the deuce!? |
|
|
|
|
|
#399 |
|
Senior Member
|
Llano has noticeable gap in it's integration today. Needn't be the case with it's successors.
Besides, if IB's dram stacking takes off, it might reduce the bw advantage of discretes to a dead heat with integration benefits. |
|
|
|
|
|
#400 |
|
French frog
Join Date: Jun 2005
Location: France
Posts: 4,172
|
Sorry to derail the conversation but I've some questions about larrabee/knight's corner.
I read news about Intel managing to get CMOS @32nm they expect this technology now to scale along with their process progress (they were stuck @65nm till then). Are ring buses made out using this technique or they are done other way? About larrabee/K'sC, basically can we expect change from the original larrabee text units removal aside? I've the feeling that K'sC is clearly a "filler product", something Intel push out to somewhat compete with GPGPU and make some money out of their investments. As nick is saying Intel is putting is strength in AVX2 instruction set (and proper implementation). It doesn't make much sense to launch next year something that use a completely different instruction set (hence my feel about K'sC being a filler product). We know really few about Haswell but I don't believe it's the architecture that will allow Intel to do it all. It may allow software rendering with acceptable result for casual gamers, do marvels for physics, AI, etc. for the others but that's it. GPUs (and GPGPUs) will still be a compliant target for the workload that map well to their architectures. 500 GFLOPS won't cut it against modern GPUs. Intel needs a more throughput oriented design if they want to stop GPUs to bite into their market share. It may also help them to reach (or definitively secure) others markets. Honestly I don't know much but after reading some stuffs about UltraSparc CPU line or upcoming IBM POWERPC A2, it looks like to me that the way larrabee was design is no longer adapt to the goals Intel may pursue now. May be it's nothing but I noticed that in all those designs the cores can access a "shared L2" (as I understand it vs larrabee local subset of the L2 is that they can read and write anywhere on the L2 cache whereas larrabee core can only read&write on their local subset of the L2 and read from the others). Could this be a wanted feature for the kind of works larrabee successors (after K'sC) migh be intended to? (Or Intel could/should scale back the number of cores and include an L3?) There is also the focus on power consumption, 16 wide SIMD may not be workable within the design, it supposedly consume a lot, it set terrible constrain on the memory system. A move AVX2 as Nick is proposing sounds like a win to me, actually I wonder if it worse it to get them push 4FLOPS per cycle (Haswell is supposed to do 2 FMAC per cycle so twice 2 FLOPS right? I'm not sure I got this properly while reading). I also notice that in POWERPC A2 the designer have put a huge focus on chip to chip communication. Something that seems absent from Larrabee/K'sC and the looks like a huge lack. They need something that scale well. So what are you POV(s) on the matter, do you believe Intel after experimenting with larrabee, with Itanium crumbling support, a possible threat on their CPUs dominance could launch a proper throughput cores? How could they look like? It could be a win for Intel as Haswell might be awesome but I don't believe the silicon budget will allow proper do it all architecture (if it is to happen), they could have their way with heterogeneous designs, different cores but using the same ISA(s).
__________________
What's trying to be a bunch of presentations PS360 youtube channel Sebbbi about virtual texturing Tuned EADGCF and liking it :) |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|