What process node for the beastly GX6650?
Up to the customer but sensibly it's 28nm and lower.
What process node for the beastly GX6650?
The F16 ALUs can't be combined to do higher precision. Hopefully we talk about exactly what happens inside each pipe soon, I just couldn't swing that by the powers that be before this stuff needed to be released.
The F16 ALUs can't be combined to do higher precision. Hopefully we talk about exactly what happens inside each pipe soon, I just couldn't swing that by the powers that be before this stuff needed to be released.
The F32 and F16 minions party separately in a given cycle.
Well, here's what the blog post says:
That seems to slightly contradict the diagram (i.e. 2x4 flops rather than 4x2 flops). Also the issue with sharing resources between FP16 and FP32 is that it's a ~4x difference for the multiplier's size (like FP32->FP64) not 2x.
I wanted the diagram to match the max "ALU core" count we want to put across for marketing. The text is closer to what actually happens in the pipe. Both add up to the same ops throughput.I. glad I wasn't the only one confused. The diagram and narrative didn't match for me either. The series 6 USC diagram clearly shows each FP16 having 3 flops, whilst the series 6XT USC clearly shows 2 (but the narrative says 4).
I did ask this in the comments section of the blog.
Its ok not wanting to say things, but to have a narrative and an associated diagram completely at odds with one another just frankly seems poor proof reading.
I wanted the diagram to match the max "ALU core" count we want to put across for marketing. The text is closer to what actually happens in the pipe. Both add up to the same ops throughput.
One is for marketing, the other is for those that actually care about how the hardware works.
One is for marketing, the other is for those that actually care about how the hardware works.
What I am basically asking is whether describing a rogue core's GPU compute in terms of only its ALU32 count is fair, given that it has far more (albeit, less useful) ALU16 cores.
Two reasons:Anandtechs decision to only count 32-bit FP in their BogoFLOP chart seems strange to me. For the most part the GPU will process graphics, and the bulk of graphics operations seems as if they could be done in 16-bit, (limited precision needed, minimal iteration) so why focus on 32-bit performance alone?