I'm shocked that this thread made it to two pages overnight!
Aren't you shocked that Barts is VLIW4 architecture in the first place? :smile:
I'm shocked that this thread made it to two pages overnight!
thanks for that answer mczak, ive asked the question a couple of times but never really got an answer
so in simple terms vliw4 is groups of 4 while gcn are single entities ?
According to this book it does.Math does not lie.
I don't know what's happening in your example, but a VLIW5 SIMD has more shader power than a VLIW4 SIMD so it could perform faster if the shader's co-issue well and have transcendental instructions. The advantage of VLIW4 is smaller area and much of the time the extra unit doesn't provide an advantage.Why does Barts XT perform so well in games against Cayman specs-wise if it's VLIW5 rather than VLIW4?
Sorry to quote myself, but I'm a bit confused about the parallel discussion going on in these two threads:
http://forum.beyond3d.com/showpost.php?p=1623427&postcount=156
Short excerpt, so that there's at least something posted on the interent.
A roughly even mixture of a lengthy shader not doing anything useful with MUL, MADD, MIN, MAX and SQRT (and AMD program from HD2900 launch basically)
HD 5870: 1.206 GI/s. (Giga-Instructions per second)
HD 6870: 893 GI/s.
HD 6970: 877 GI/s.
HD 7970: 1.101 GI/s.
ROPs do in fact still have access to all the memory partitions; though one is now communicating to two partitions rather than just one.As for HD 5830 - the GPU has 256bit memory interface, but as far as I remember, by disabling half of the ROPs, interface between ROPs and memory controller (or was it L2?) was effectively halved. All functional units with the exception of ROPs could utilize all the available bandwidth.
I think my past crusade against the ridiculousness of terming something "scalar vector unit" or "scalar SIMD unit" was not as successful as I wished. GCN's SIMDs are vector units of course (the Wavefront size is the vector size). And AMD also named them simply "vector ALUs". There is no reason to be confused about the terms as in the past (or with nVidia's terminology, they use vector units too). The GCN architecture actually features some real scalar units. But that is a single one in each CU (shared between 4 vector ALUs), which doesn't exactly qualify as "shader unit" by the usual terms.Technically, GCN's shaders would qualify as scalars, yes. But they can get starved by the front end quite easily in graphics contexts, so you only get the full benefits of them being scalar with very long shader programms.
It doesn't. In some games one architecture is more efficient, and in others vice versa (which, BTW, is very clear evidence that Barts and Cayman have different architectures). Look at Crysis and Stalker, where the 6950 is 23-30% and 29-40% faster, respectively, than the 6870.Why does Barts XT perform so well in games against Cayman specs-wise if it's VLIW5 rather than VLIW4?
In what world does that happen? Or are you normalizing performance to shader count?Why does Barts XT absolutely destroy HD 5850 and HD 5870, specs-wise, by a ridiculous margin?
The 5830 has always been an underperformer, taking a bigger hit vs the 5850 than the 6790 takes vs the 6850, despite similar handicaps. It's an outlier, so that comparison is meaningless.Why does HD 6790 perform about the same as HD 5830 if the latter has 33% more shader and texturing power, with other specs being roughly the same - if BOTH are VLIW5?
This is very clear when you compare the 9600GT to the 9800GT. Both are 256-bit, 16 ROP cards with equal bandwidth and similar clocks. However, the 9600GT has only 64 SPs to the 8800GT/9800GT's 112, yet the former is almost as fast as the latter in games.
I think my past crusade against the ridiculousness of terming something "scalar vector unit" or "scalar SIMD unit" was not as successful as I wished. GCN's SIMDs are vector units of course (the Wavefront size is the vector size). And AMD also named them simply "vector ALUs". There is no reason to be confused about the terms as in the past (or with nVidia's terminology, they use vector units too). The GCN architecture actually features some real scalar units. But that is a single one in each CU (shared between 4 vector ALUs), which doesn't exactly qualify as "shader unit" by the usual terms.
Yeah sorry. I did it on purpose so you could storm in ranting about how dumb I am.
So: GCN cores are four 16-wide vector units, scheduled in a round-robin fashion for a four-cycle execution time for SPFP-math (Add, Mul etc.) out of a private pool of up to 10 wavefronts each, taking longer on DPFP-math or special functions either 8 or 16 clocks depending on whether or not there's a Mul involved and being able to execute scalar workloads with no loss in efficiency under specific circumstances supported by a variety of SRAM arrays (organized in register files, r/w caches and data shares) and a real scalar coprocessor which can share resources over four GCN cores.
I guess I forgot the smiley in the post above.
I know that you know it. And also that it is kind of hard not to use these stupid (in my opinion) marketing driven terms from time to time.
Where does he do that? And what is the reasoning? I can't think of any reasonable ones (besides artificially inflating the "core" count). It is simply silly to call an SIMD lane a core.You know, they're not quite as marketing-driven as they might seem. Michael Shebanow, for instance, insists that SIMD lanes in Tesla/Fermi ought to be called cores.
Where does he do that? And what is the reasoning? I can't think of any reasonable ones (besides artificially inflating the "core" count). It is simply silly to call an SIMD lane a core.
Edit: I hope you don't mean the presentations linked there. I stopped reading when seeing the definition of "SIMT" and "threads" is not even self-consistent ("threads" within a warp [vector] don't execute independently as claimed because they all share a single instruction pointer, which is said in the exact same sentence, case closed). That terminology is just a huge pile of crap and confuses the people.
Edit2: Hennessy and Patterson: "Computer Architectures" would be a good start for this nV fellow.
As said above, that's exactly what is missing.but fundamentally, as long as you have distinct execution units executing instructions from distinct threads, no matter how much logic and memory they may share, they're cores.