First -- Let's look at the 128-bit behavior of HD 5830 and then get that out of the way so that we could understand the behavior of HD 6790 better (as AMD would probably like for us to believe that they have already decoupled the memory controllers from the ROPs in an efficient manner by then, which does not seem to be the case)..
If we look at the 5830 in the first bar chart on this page:
http://techreport.com/articles.x/18521/5
We see that 3D Mark Vantage color fill test is strongly correlated with the available bandwidth. If the 5830 were 256-bit, it would have had identical bandwidth with the 5850. However, the performance shows that this is not the case. It is barely half of 5850's performance, and also much slower than HD 4870 which has just 16 ROPs at a lower clock.
Next, if we look at BeHardware's ultimate scrutiny:
http://www.behardware.com/articles/783-3/preview-radeon-hd-5830.html
We see that the 5830 has far lower FP16 and FP32 GPixel/s writes than not only the 5770 that has a slightly higher fill rate, but also the 4890 to a far greater degree. The test is directly linear to the available bandwidth as the 4890 is so much faster than the 5770, let alone 5830 in that respect.
Both the 5830 and 6790 behave like as if the memory is only 128 bits wide. So, these cards should be called 128-bit. Otherwise, AMD could just as well say that the 5770 is 256-bit and we'd be believing them.
One more thing is that as with Barts architecture, it really does look like it is based on VLIW4 architecture, not the traditional VLIW5 one. It seems that AMD wanted to save the "thunder" for Cayman's launch by reserving the announcement for what desperately needed as much thunder as possible. Just look at how close 6870's performance is to 5870 while comparing 6870's 1120sp and 4.2GHz memory to 5870's 1600sp and 4.8GHz memory.
__________
Now, let's compare the 6790 to the similarly-spec'ed 5830:
We can see that the 5830 has far higher numbers in these areas:
44.8 GT/s
1790 GFLOPs
And the 6790 has only
33.6 GT/s
1344 GFLOPs
The numbers above are 33% greater for the 5830.
By the way, the 6790 has just 5% greater pixel fillrate and memory bandwidth than the 5830. 5% in ROP pixel fillrate and bandwidth translates to no more than 2% in overall performance.
If the 6790 is also VLIW5-based, why is the 5830 only 2-3% faster than 6790 in all of the Anandtech benchmarks ( http://www.anandtech.com/show/4260/amds-radeon-hd-6790-coming-up-short-at-150 )? Why is not the 5830 at least 30% faster according to the numbers above?
Do you think all of the Barts-derived GPU's behave like as if they're VLIW4 instead of VLIW5 (with the 6870 being so close to 5870 that has 43% more shaders and TMU's)?
____________
Well, as to the "WHY", my guess is that AMD just wanted to adhere to their traditional naming structure where all x8xx (3870, 4850, 5850, etc..) cards were 256-bit. In the past, whenever a card had the ROPs cut in half, like the HD 4730, AMD acknowledged that it has 128-bit memory.
See:
http://www.hexus.net/content/item.php?item=19175&page=2
But the 5830 was not named 5730 in that respect, like the 4730 derivative of RV770XT. The 5830 was of Cypress (not Juniper) branch of Evergreen, since it was a cut-down version of RV870XT chip.
Also, with no DX11 competition at that time, the 5830 was introduced at a much higher-than-expected price, so simply announcing a 256-bit bus helped to justify the $240 price and to help the higher-end 5850 maintains its inflated price at $30 over the original MSRP by serving as a silly buffer. (Woe to those who had only $240, taking the plunge for 5830 rather than the faster 4890 that was $75-100 less).
I guess AMD decided that since it still used the same PCB as the 5870, with the same number of memory traces, they could still claim "256-bit bus" with its high 100+ GB/s bandwidth (without reminding us that the halved ROPs means that the access to available bandwidth is effectively "halved"). This 256-bit claim sure did get past most of us video card enthusiasts. Such sawed-off cards were a tad bit scarce anyways, so it did not create too much of a fuss.. the prices were outrageous anyways.
(Never mind that the tradition of the numbering structure is now changed with the 6790, 68xx, and 69xx cards, etc.. but I would prefer straight-up honesty like with the HD 4730 and its 128-bit memory. In the link, do you see how we are told that the 4730 uses 128-bit bus with its halved ROPs?)
About Barts really being VLIW4.. I guess that AMD just wanted to save the thunder for the Cayman cards that still disappointed the expectations of most AMD fans. I just remembered that the DP (double precision, FP16) capability was left out of Barts cards, so it's harder to prove whether it's really VLIW4 instead of VLIW5, but come on, just look at the numbers. Why is the 6790 only 2-3% slower than the 5830 that has 40% more shaders and TMU's while the rest of the specifications are practically identical, if both are indeed VLIW5-based?
___________________
But is it just a conspiracy theory? Let's look at the 5830 again..
Why is it actually slower than the 4890 in DX9/10 games if:
The 5830 has 40% more shaders, 40% more TMU's, 32% more GFLOPs output, 32% more texels per second..
while both have the same number of ROPs and the same "alleged 256-bit bus"? The 5830's memory is actually clocked higher than the 4890.
____________
Ryan Smith posted a reply over at Anandtech comments section of the 6790 review:
1) If it's not possible to separate the performance from a "graphics" rather than "compute" point of view, then should not the performance be linked for all "graphics" point of views (as it is a "graphics" card to begin with)? Even the "compute" applications (FP16 and FP32 analysis at http://www.behardware.com/articles/783-3/preview-radeon-hd-5830.html ) show the card to behave like as if it's 128-bit, so Ryan is only correct in the part where it's not possible to separate the performance of sawed-off halved ROPs from memory bandwidth.
2) Why does Ryan not have any reason to believe.. because AMD said so? If a manufacturer of a LCD panel advertises 1ms G2G response time, but it looks like 16ms, does he still have no reason to believe it's 16ms just because the manufacturer said so?
3) If the L2 cache is cut down in proportion with the castrated shaders/TMUs/ROPs, then it should not affect performance, let alone "harder though".
____________
Just ONE of the many proofs that Barts XT is based on VLIW4 architecture like its HD 6930 cousin:
HD 6930 has 14% more bandwidth than HD 6870, to begin with.
Both have the same 32 ROPs, but HD 6870 has higher 900MHz clock vs HD 6930's 750 MHz clock. However, HD 6870 has drastically lower number of TMUs, at only 56 vs HD 6930's 80 TMUs. Realistically, Barts XT already has plenty of ROPs for its capacity. See more specs here, like the Gpixels/sec and Gtexels/sec as related to the ROPs and TMUs.
http://www.xbitlabs.com/articles/graphics/display/radeon-hd-6930.html
Now, let's look at the shader performance:
HD 6930 with 1280sp at 750MHz yields 1920 TFLOPs
HD 6870 with 1120sp at 900MHz yields 2016 TFLOPs
The 6870 actually has 5% more shader power.
However, the 6930 comes out at about 4-5% faster overall, according to Xbitlab's first chart here: http://www.xbitlabs.com/articles/graphics/display/radeon-hd-6930_8.html#sect0
It is already anticipated that the 6870 is moderately bandwidth-bottlenecked. With 14% more bandwidth, the 6930 is probably making up for the deficit in shader power with the bandwidth alone.
While the number of ROPs are already bountiful for both, yielding in less than 0.5% real-world performance difference with the theoretical maximum pixel fillrate (if it were 16 ROPs, then a 100% increase to 32 ROPs would have yielded up to 5% overall), so the "GPU 101" course teaches that the pixel fillrate is not a factor here (since both cards are not bottlenecked by the 32 ROPs in 99.8+% of cases).
The 6930 has a 31% increase in texturing power. While having 31% more texturing power, the 14% increase in bandwidth more than allows for the card to make up for the deficit in shader power against Barts XT, in that it actually ends up around 4-5% faster overall.
If Barts XT were VLIW5-based rather than VLIW4-based, Barts XT would have to have considerably more shader power just to maintain this position. It would have to have roughly the same number of shaders as HD 5850.
____
Let's see..
HD 5850 has 1440sp, 72 TMUs, 32 ROPs, all clocked at 725 MHz for 2.09 TFLOPs, 52.2 GTexels/s, 23.2 GPixels/s. The memory is 4.0Gbps, for 128 GB/s bandwidth. It's rated at 112 VP. Increasing the clocks (both GPU and memory) by 8% in order to equal Barts XT (HD 6870) would cause it to have 2.26 TFLOPs, 56.4 GTexels/s, 25 GPixels/s and nearly identical bandwidth to HD 6870.
Now, that is 12% more shader power than Barts XT, while all of the other specs are nearly identical (except that Barts XT is still lagging behind in the TMU department by well over 10%) just for HD 5850 to be equal to Barts XT in performance.
Conclusively, let's just say that a VLIW5-based card with identical specs to Barts XT (HD 6870 with only 56 TMUs) needs about 14-15% more shaders just in order to match HD 6870 in performance. How could that be if AMD claimed Barts XT to be VLIW5-based in one of their slideshows, along with the rest of the world believing it?
I have already done several examples just like this one in the other thread. Why am I the only one who has really pointed this out, out of thousands of video card enthusiasts posting on the interwebs (and telling ATF repeatedly that it's VLIW4-based while providing proof, yet they still state it as VLIW5)? :scratch:
(I could break these into separate posts, but I don't know if that's against the rules.. Heard that B3D was really strict with the banning and stuff..)
It's the nerdy stuff, so I thought you "aficionados" here would enjoy some hard-core stuff! :smile:
If we look at the 5830 in the first bar chart on this page:
http://techreport.com/articles.x/18521/5
We see that 3D Mark Vantage color fill test is strongly correlated with the available bandwidth. If the 5830 were 256-bit, it would have had identical bandwidth with the 5850. However, the performance shows that this is not the case. It is barely half of 5850's performance, and also much slower than HD 4870 which has just 16 ROPs at a lower clock.
Next, if we look at BeHardware's ultimate scrutiny:
http://www.behardware.com/articles/783-3/preview-radeon-hd-5830.html
We see that the 5830 has far lower FP16 and FP32 GPixel/s writes than not only the 5770 that has a slightly higher fill rate, but also the 4890 to a far greater degree. The test is directly linear to the available bandwidth as the 4890 is so much faster than the 5770, let alone 5830 in that respect.
Both the 5830 and 6790 behave like as if the memory is only 128 bits wide. So, these cards should be called 128-bit. Otherwise, AMD could just as well say that the 5770 is 256-bit and we'd be believing them.
One more thing is that as with Barts architecture, it really does look like it is based on VLIW4 architecture, not the traditional VLIW5 one. It seems that AMD wanted to save the "thunder" for Cayman's launch by reserving the announcement for what desperately needed as much thunder as possible. Just look at how close 6870's performance is to 5870 while comparing 6870's 1120sp and 4.2GHz memory to 5870's 1600sp and 4.8GHz memory.
__________
Now, let's compare the 6790 to the similarly-spec'ed 5830:
We can see that the 5830 has far higher numbers in these areas:
44.8 GT/s
1790 GFLOPs
And the 6790 has only
33.6 GT/s
1344 GFLOPs
The numbers above are 33% greater for the 5830.
By the way, the 6790 has just 5% greater pixel fillrate and memory bandwidth than the 5830. 5% in ROP pixel fillrate and bandwidth translates to no more than 2% in overall performance.
If the 6790 is also VLIW5-based, why is the 5830 only 2-3% faster than 6790 in all of the Anandtech benchmarks ( http://www.anandtech.com/show/4260/amds-radeon-hd-6790-coming-up-short-at-150 )? Why is not the 5830 at least 30% faster according to the numbers above?
Do you think all of the Barts-derived GPU's behave like as if they're VLIW4 instead of VLIW5 (with the 6870 being so close to 5870 that has 43% more shaders and TMU's)?
____________
Well, as to the "WHY", my guess is that AMD just wanted to adhere to their traditional naming structure where all x8xx (3870, 4850, 5850, etc..) cards were 256-bit. In the past, whenever a card had the ROPs cut in half, like the HD 4730, AMD acknowledged that it has 128-bit memory.
See:
http://www.hexus.net/content/item.php?item=19175&page=2
But the 5830 was not named 5730 in that respect, like the 4730 derivative of RV770XT. The 5830 was of Cypress (not Juniper) branch of Evergreen, since it was a cut-down version of RV870XT chip.
Also, with no DX11 competition at that time, the 5830 was introduced at a much higher-than-expected price, so simply announcing a 256-bit bus helped to justify the $240 price and to help the higher-end 5850 maintains its inflated price at $30 over the original MSRP by serving as a silly buffer. (Woe to those who had only $240, taking the plunge for 5830 rather than the faster 4890 that was $75-100 less).
I guess AMD decided that since it still used the same PCB as the 5870, with the same number of memory traces, they could still claim "256-bit bus" with its high 100+ GB/s bandwidth (without reminding us that the halved ROPs means that the access to available bandwidth is effectively "halved"). This 256-bit claim sure did get past most of us video card enthusiasts. Such sawed-off cards were a tad bit scarce anyways, so it did not create too much of a fuss.. the prices were outrageous anyways.
(Never mind that the tradition of the numbering structure is now changed with the 6790, 68xx, and 69xx cards, etc.. but I would prefer straight-up honesty like with the HD 4730 and its 128-bit memory. In the link, do you see how we are told that the 4730 uses 128-bit bus with its halved ROPs?)
About Barts really being VLIW4.. I guess that AMD just wanted to save the thunder for the Cayman cards that still disappointed the expectations of most AMD fans. I just remembered that the DP (double precision, FP16) capability was left out of Barts cards, so it's harder to prove whether it's really VLIW4 instead of VLIW5, but come on, just look at the numbers. Why is the 6790 only 2-3% slower than the 5830 that has 40% more shaders and TMU's while the rest of the specifications are practically identical, if both are indeed VLIW5-based?
___________________
But is it just a conspiracy theory? Let's look at the 5830 again..
Why is it actually slower than the 4890 in DX9/10 games if:
The 5830 has 40% more shaders, 40% more TMU's, 32% more GFLOPs output, 32% more texels per second..
while both have the same number of ROPs and the same "alleged 256-bit bus"? The 5830's memory is actually clocked higher than the 4890.
____________
Ryan Smith posted a reply over at Anandtech comments section of the 6790 review:
The words are rather confusing, like as if it's a marketing PR guy trying to defend AMD by saying things that defends the company but does not exactly make clear sense.Ryan Smith said:From a graphics point of view it's not possible to separate the performance of the ROPs from memory bandwidth. Color fill, etc are equally impacted by both. To analyze bandwidth you'd have to work from a compute point of view. However with that said I don't have any reason to believe AMD doesn't have a 256-bit; achieving identical performance with half the L2 cache will be harder though.
1) If it's not possible to separate the performance from a "graphics" rather than "compute" point of view, then should not the performance be linked for all "graphics" point of views (as it is a "graphics" card to begin with)? Even the "compute" applications (FP16 and FP32 analysis at http://www.behardware.com/articles/783-3/preview-radeon-hd-5830.html ) show the card to behave like as if it's 128-bit, so Ryan is only correct in the part where it's not possible to separate the performance of sawed-off halved ROPs from memory bandwidth.
2) Why does Ryan not have any reason to believe.. because AMD said so? If a manufacturer of a LCD panel advertises 1ms G2G response time, but it looks like 16ms, does he still have no reason to believe it's 16ms just because the manufacturer said so?
3) If the L2 cache is cut down in proportion with the castrated shaders/TMUs/ROPs, then it should not affect performance, let alone "harder though".
____________
Just ONE of the many proofs that Barts XT is based on VLIW4 architecture like its HD 6930 cousin:
HD 6930 has 14% more bandwidth than HD 6870, to begin with.
Both have the same 32 ROPs, but HD 6870 has higher 900MHz clock vs HD 6930's 750 MHz clock. However, HD 6870 has drastically lower number of TMUs, at only 56 vs HD 6930's 80 TMUs. Realistically, Barts XT already has plenty of ROPs for its capacity. See more specs here, like the Gpixels/sec and Gtexels/sec as related to the ROPs and TMUs.
http://www.xbitlabs.com/articles/graphics/display/radeon-hd-6930.html
Now, let's look at the shader performance:
HD 6930 with 1280sp at 750MHz yields 1920 TFLOPs
HD 6870 with 1120sp at 900MHz yields 2016 TFLOPs
The 6870 actually has 5% more shader power.
However, the 6930 comes out at about 4-5% faster overall, according to Xbitlab's first chart here: http://www.xbitlabs.com/articles/graphics/display/radeon-hd-6930_8.html#sect0
It is already anticipated that the 6870 is moderately bandwidth-bottlenecked. With 14% more bandwidth, the 6930 is probably making up for the deficit in shader power with the bandwidth alone.
While the number of ROPs are already bountiful for both, yielding in less than 0.5% real-world performance difference with the theoretical maximum pixel fillrate (if it were 16 ROPs, then a 100% increase to 32 ROPs would have yielded up to 5% overall), so the "GPU 101" course teaches that the pixel fillrate is not a factor here (since both cards are not bottlenecked by the 32 ROPs in 99.8+% of cases).
The 6930 has a 31% increase in texturing power. While having 31% more texturing power, the 14% increase in bandwidth more than allows for the card to make up for the deficit in shader power against Barts XT, in that it actually ends up around 4-5% faster overall.
If Barts XT were VLIW5-based rather than VLIW4-based, Barts XT would have to have considerably more shader power just to maintain this position. It would have to have roughly the same number of shaders as HD 5850.
____
Let's see..
HD 5850 has 1440sp, 72 TMUs, 32 ROPs, all clocked at 725 MHz for 2.09 TFLOPs, 52.2 GTexels/s, 23.2 GPixels/s. The memory is 4.0Gbps, for 128 GB/s bandwidth. It's rated at 112 VP. Increasing the clocks (both GPU and memory) by 8% in order to equal Barts XT (HD 6870) would cause it to have 2.26 TFLOPs, 56.4 GTexels/s, 25 GPixels/s and nearly identical bandwidth to HD 6870.
Now, that is 12% more shader power than Barts XT, while all of the other specs are nearly identical (except that Barts XT is still lagging behind in the TMU department by well over 10%) just for HD 5850 to be equal to Barts XT in performance.
Conclusively, let's just say that a VLIW5-based card with identical specs to Barts XT (HD 6870 with only 56 TMUs) needs about 14-15% more shaders just in order to match HD 6870 in performance. How could that be if AMD claimed Barts XT to be VLIW5-based in one of their slideshows, along with the rest of the world believing it?
I have already done several examples just like this one in the other thread. Why am I the only one who has really pointed this out, out of thousands of video card enthusiasts posting on the interwebs (and telling ATF repeatedly that it's VLIW4-based while providing proof, yet they still state it as VLIW5)? :scratch:
(I could break these into separate posts, but I don't know if that's against the rules.. Heard that B3D was really strict with the banning and stuff..)
It's the nerdy stuff, so I thought you "aficionados" here would enjoy some hard-core stuff! :smile:
Last edited by a moderator: