AMD: R9xx Speculation

Comparing Cayman to Cypress - it's simply hard to tell how big Cayman is?

79529783.jpg


EDIT: It should be less than 400mm2 die size.
 
Last edited by a moderator:
Comparing Cayman to Cypress - it's simply hard to tell how big Cayman is?

I'd say that board does not have the same number of memory-chips (but 2x as expected from 2x the memory), or there is a childrens playground hidden under the case. :D
 
No, it's silly. There's obviously no way for ATI/NV to know what parts of the final image came from 3D rendering (and should have MLAA) vs not. The right solution is for the developers to just implement MLAA (or one of the numerous better alternatives like... MSAA or a hybrid) and apply it to the right places. None of this control panel BS.
I dont think you've understood what was said in that post. Obviously ATI/NV treats full scene in the post-processing, thats where alpha-masks helps. I see no reason why masks wouldnt work in post-processing with MLAA, they work just fine with other post-processing filters. Only issue is extra work involved.

And another solution is a smart algorithm. What AMD uses already discriminate which pixels to blur and which not. Think of filter who has semi-OCR features, as joked by Malo above ;) I'm not sure if its doable, and even if it is, - what if fps would suffer more than its worth?

Of course its the best if devs do everything themselves, but as we can see, its unreasonable to expect that either. There will be always devs who dont care/cant afford to implement proper AA.
 
"Cormorant" code name?

Also that extension has a very short name, perhaps cl_khr_fp64? I'm not sure, but I think Evergreen doesn't support this, but cl_amd_fp64. Prolly over-analysing this, though. I don't know how that app reports those things or orders the list.
 
Last edited by a moderator:
Or maybe its Cayman Pro with only 24 of 30 SIMDs active.
Clock looks too high for me for Cayman Pro. Even if that's the Pro it seems unlikely so many simds would be deactivated, that is XT would have fewer than 30 simds. Also, if that's 24 simds what's the organization? 2x12? 3x8? 4x6?
24 simds is what I considered the absolute minimum for Cayman though. With that clock peak flop rate would be the same as HD5870, but that certainly wouldn't meant it can't be faster. IMHO though this would point to a quite "small" (not much larger than Cypress) chip.
 
indeed
(860*1920)/(890*1536)= %20,7
A "Pro" with a higher core clock than a "XT"? Would be a first.

EDIT: ok, mczak was faster.

About the amount of RAM and clockspeeds, maybe it's only an early sample.


@mczak: The last rumor about die area I read said 360mm², then it would fit.
 
Clock looks too high for me for Cayman Pro. Even if that's the Pro it seems unlikely so many simds would be deactivated, that is XT would have fewer than 30 simds. Also, if that's 24 simds what's the organization? 2x12? 3x8? 4x6?
If 1920 is viable for VLIW-4, then that would be 30 SIMDs of 64.

Which would appear to be organised as 3 shader engines, each with 10 SIMDs. Disabling 2 SIMDs in each would lead to 24 SIMDs.

One of my qualms here is that 3 shader engines doesn't mesh with 8 quads of ROPs. If screen space is tiled amongst shader engines, then it doesn't divide equally amongst 3 shader engines.

Alternatively 1920 ALU lanes was someone's mistaken interpretation of 24 SIMDs, each with 80 ALU lanes - way back in the mists of time. If it's really 64 lanes per SIMD, then we get 1536 lanes. The increase in SIMD count and the increased average utilisation of VLIW-4, in comparison with VLIW-5, would compensate for 1536 being lower than Cypress's 1600 lanes.
 
How much bandwidth would 6970 generate if it's 1536 @ 890Mhz with 1GB memory @ 4800 Mhz?
Didn't understand you there , could you please reformulate the question ?
If it's really 64 lanes per SIMD, then we get 1536 lanes. The increase in SIMD count and the increased average utilisation of VLIW-4, in comparison with VLIW-5, would compensate for 1536 being lower than Cypress's 1600 lanes.
If VLIW 5 lanes had a utilization rate of 60~80% , then VLIW 4 lanes would up that to 80~85% (maybe more) , an improvement of 20% at least .

However , we still need to make up for the lost efficiency of processing transcendental operations , so 1536 ALU might be a little insufficient for that , and the need for higher ALUs count would be direr .
 
According to Muropaketti, the early Cayman samples had 1680 stream processors. But the same article states that AMD has a habit of sending the first samples with a deliberately reduced amount of stream processors.
 
Which would appear to be organised as 3 shader engines, each with 10 SIMDs. Disabling 2 SIMDs in each would lead to 24 SIMDs.

One of my qualms here is that 3 shader engines doesn't mesh with 8 quads of ROPs. If screen space is tiled amongst shader engines, then it doesn't divide equally amongst 3 shader engines.
Yes, but I was thinking lately maybe efficiency drops if you have "too many" simds per shader engine. And if you don't like the 3 shader engines, what about 4 instead? Though I agree only 6 per dispatch processor would be quite low - I want the chip to have 28 simds in a 4x7 arrangement, with the pro being 4x6 instead :). (Barts also has only 7 simds in a group, though they are of course VLIW-5.)

Alternatively 1920 ALU lanes was someone's mistaken interpretation of 24 SIMDs, each with 80 ALU lanes - way back in the mists of time. If it's really 64 lanes per SIMD, then we get 1536 lanes. The increase in SIMD count and the increased average utilisation of VLIW-4, in comparison with VLIW-5, would compensate for 1536 being lower than Cypress's 1600 lanes.
This makes sense. Even if you assume you could get the same performance out of a VLIW-4 simd compared to a VLIW-5 (which is a bit of a stretch) 24 is only 20% more simds however, so performance improvements beyond that have to come from elsewhere. Also note there's an obvious difference between utilization of alu slots and alu instructions issued per clock - since transcendentals now require 3 slots even serial dependent transcendentals have 75% utilization - but obviously they aren't any faster than the 20% utilization of the same sequence in Evergreen.
24 simds though also sound low if you consider that those vliw-4 units should be smaller than the vliw-5 ones - I have no good idea how much smaller (does distributing the tables from the t unit to xyz also make them smaller cause they are backed by 3 alus instead of one?) but to me it sounds reasonable to assume 24 vliw-4 simds wouldn't need more die area than 20 vliw-5 ones.

L1 is 32KB then?
Isn't that just LDS size (would be same as Evergreen)?
 
Back
Top