Perhaps the geomagnetic poles flipped, but SI are are now NI
http://semiaccurate.com/2010/08/26/amd-spills-beans-northern-island-codenames/
http://semiaccurate.com/2010/08/26/amd-spills-beans-northern-island-codenames/
Perhaps the geomagnetic poles flipped, but SI are are now NI
http://semiaccurate.com/2010/08/26/amd-spills-beans-northern-island-codenames/
Uhmm, ever had divergence between vector lanes in a "true" SIMD unit?
VLIW can do everything (SIMD) vector units could do and then some more
No, actually quite common and a recommended technique (brings often even some benefits for nvidia GPUs as you most of the time reduce the granularity of memory accesses). The hindrance is often only the effort the developer has to put in, but this is a minor inconvinience in my book if you get twice the performance.
I think it's time to blow Rohit's mind:
Code:46 x: LDS_ADD ____, PV45.z, (0x00000001, 1.401298464e-45f).x y: ADD_INT T1.y, (0x0000000C, 1.681558157e-44f).y, T0.w z: MULADD_e T2.z, PV45.x, -0.5, 0.5 w: MULADD_e T1.w, PV45.w, 0.5, 0.5 t: MUL_e ____, R3.w, KC0[6].w
He's basing that only on the "NI" in front of the codenames.
I'd put my money on "NI" just marking the architecture, but the family (as codenames suggest) is still SI
N.I. before S.I.
Something is wrong in that equation. If, as your analysis shows, AMD's cards are significantly faster when processing shaders then where does Nvidia catch up? With GT200 you could sorta pin it on the texturing but now Cypress has an advantage there too.
Die Texelfüllraten mit 16:1 Tri-AF in Relation zur theoretischen Peak-Texelfüllrate:
HD 5870: 0,381
HD 5850: 0,397
HD 5830: 0,402
HD 5770: 0,491
HD 4870: 0,510
GTX 470: 0,597
GTX 465: 0,595
GTX 460: 0,511 (1024 MiB) bzw. 0,497 (768 MiB)
GTX 260: 0,509
Radeon Mobility series:
Vancouver
Granville (N.I.)
Capilano (N.I.)
Robson CE (N.I.)
Ski Resorts:
Blackcomb (S.I.)
Whistler (S.I.)
Seymour (S.I.)
Robson LE (S.I.)
Lexington was Cypress Mobility, got cancelled (because?)
Desktop parts tape-out was some months ago.
For backwards looking comparisons (e.g. RV770 v GT200) non-ALU factors are significant.Cypress has for more fillrate than Fermi (theoretical and measured). Are you suggesting that Fermi's Z-rate makes up for a deficiency in flops, texturing and fillrate?
And a smaller chip.I'm not trying to be difficult but I just don't see how you can have lots of flops + high utilization + higher texturing rate = equal performance.
GT200 had a huge fillrate advantage too.Something is wrong in that equation. If, as your analysis shows, AMD's cards are significantly faster when processing shaders then where does Nvidia catch up? With GT200 you could sorta pin it on the texturing but now Cypress has an advantage there too.
Ooh, that looks like L2->L1 bandwidth and L2 size are key factors. The 20 cores of Cypress are each getting a meagre share of L2 bandwidth.The table shows the efficiency of the different GPUs with 16:1 AF compared to the theoretical fillrate. So you see that a GTX 470 is nearly 57% more efficient with 16:1 AF compared to a HD 5870. According to Gipsel this is related to the difference in bandwidth feeding the TMUs in the different architectures (as far as I understand it!).
It is the GPU-Z screenshot, however, which reveals more details. GPU-Z recognizes the sample as an "ATI Radeon HD 6800 Series". The GPU ID is 6718, which according to the Catalyst 10.8 codename list, indicates a Cayman XT GPU - well in line with rumours thus far. The core clock is same as the HD 5870 - 850 MHz. This suggests the performance boost comes from more functional units (perhaps 1920 SP) or improved performance per clock (using some of Northern Islands' units) or a combination of both. The GDDR5 speed is boosted by a whopping 33% to 1.6 GHz, or a whopping 6.4 GHz effective. The same 256-bit memory interface is retained, but the ultra fast memory results in a massive memory bandwidth of 204.8 GB/s - well over the GTX 480's 177.4 GB/s. Of course, one of the possibilities for such a high memory clock speed, as well as the impressive benchmark, could be that the card is benchmarked overclocked.
Well, I don't hold out any hopes for recursive algorithms. But apparently some do.My doubts are over vectorizing code over VLIW lanes while managing branches and divergence efficiently.
My doubts are over vectorizing code over VLIW lanes while managing branches and divergence efficiently.
Probably as with other chips like GF100, 106 and maybe even 108 as well: To have something to add besides some 50-100 MHz of clock speed when AMD refreshes this fall.GF104 really should be capable of matching HD5870 in games, I don't understand why NVidia hasn't even tried. Is the cost of testing to find the chips that will do that really so high?
The flatness of Larrabee is where it's at, in my view. But it's not a product.
It'll be really interesting to program that as a SIMD-16 (explicit parallelism, i.e. vectorisation) as well as scalar SIMD (implicit parallelism).
You are missing the point. The discussion is about two levels of vectorization (SIMD width - 64 and VLIW width - 4) in the same program vs one (just the SIMD width) and not about vectorization per se.It will be at least as difficult to vectorize such code using AVX on a CPU. My point was that the direction in CPUs seems to be towards more vectorization and parallelization.
The discussion is about two levels of vectorization (SIMD width - 64 and VLIW width - 4) in the same program vs one (just the SIMD width) and not about vectorization per se.