They also claim 4.9Bil transistors and 64 ROPs.
I'm just wondering why they are going with a supposed* rectangular die design if they are sticking to 384?
They also claim 4.9Bil transistors and 64 ROPs.
To match the rectangular heatsink?I'm just wondering why they are going with a supposed* rectangular die design if they are sticking to 384?
Bah, they won't dare to go 512-bit interface (again)
This particular sub-test is using alpha blending and writes the result in a R16G16B16A16 render target. There could be several limiting key factors, depending on the architecture, though ALU throughput is not an issue here.Why does it seem that AMD's ROP config is so much more dependent on bandwidth than Nvidia, specifically GTX680 vs 7970? Roughly equal spec-wise but 7970 has +40% bandwidth but they score roughly the same on 3dmark Vantage Pixel Fill. Is that synthetic just not accurate? Is there some inherent difference between their ROPs?
What about Bonaire? Dual setup/rasterizer like Pitcairn and Tahiti but just 16 ROPs as CapeVerde.Bah, they won't dare to go 512-bit interface (again)... unless there's a strategy to overtake NV in the HPC/Workstation markets. 64xROPs is plausible, if the setup pipes are to double up (16 fragments x 4 pipes). AMD has been more consistent in keeping the setup : pixel ratio equal for the last few generations.
Kepler somehow manages to exceed the fillrate allowed by the external bandwidth for blending and 4xfp16 targets. Must be a caching phenomenon.This particular sub-test is using alpha blending and writes the result in a R16G16B16A16 render target. There could be several limiting key factors, depending on the architecture, though ALU throughput is not an issue here.
they actually say the opposite. in Cayman the peak ROP throughput could not be reached, increasing their count would not be that useful, the solution was to increase the efficiency. AMD made that by decoupling the ROPs from the memory controller. So they kept the ROP count, but increased the efficiency (and the bandwidth).In Anand's review of Tahiti they mentioned specifically that Cayman's ROPs were bandwidth starved and that for Tahiti they didn't really need to do anything but increase bandwidth to get the performance they wanted out of them.
so, you're saying, AMD's ROPs are bandwidth limited, yet 40% more bandwidth is not resulting in 40% better fillrate -> sounds like you cancel out your own argumentWhy does it seem that AMD's ROP config is so much more dependent on bandwidth than Nvidia, specifically GTX680 vs 7970? Roughly equal spec-wise but 7970 has +40% bandwidth but they score roughly the same on 3dmark Vantage Pixel Fill. Is that synthetic just not accurate? Is there some inherent difference between their ROPs?
That's what LordEC911 said.they actually say the opposite. in Cayman the peak ROP throughput could not be reached, increasing their count would not be that useful, the solution was to increase the efficiency. AMD made that by decoupling the ROPs from the memory controller. So they kept the ROP count, but increased the efficiency (and the bandwidth).
"With Tahiti AMD would need to improve their ROP throughput one way or another to keep pace with future games, but because of the low efficiency of their existing ROPs they didn’t need to add any more ROP hardware, they merely needed to improve the efficiency of what they already had."
Or compression.Kepler somehow manages to exceed the fillrate allowed by the external bandwidth for blending and 4xfp16 targets. Must be a caching phenomenon.
For non-MSAA render targets? How?Or compression.
he said they didn't need to do anything but increase the bandwidth, while the article says they didn't need to do anything than increase the efficiency of the bandwidth usage. (I know that sounds picky, but I think in regards why the GTX680 vs 7970 performs the same, it might be important to note, that both might be purely ROP limited).That's what LordEC911 said.
Per 128b chunk the Pitcairn PHY is about half the size and Tahiti has 3 of them.
There was a bandwidth increase, but also a change in how many memory controllers the ROPs could send data to, which made them much more flexible in utilizing the bandwidth the chip had.In Anand's review of Tahiti they mentioned specifically that Cayman's ROPs were bandwidth starved and that for Tahiti they didn't really need to do anything but increase bandwidth to get the performance they wanted out of them.
FP10 and FP16 RT pixel writes are full-rate on Kepler's ROPs, though with blending enabled the rate falls to Fermi levels (half).seems NVidia did something similar in GTX680 vs GTX580, same bandwidth, slightly lower theoretical ROP throughput, yet about 30% more efficient (at least those are the results).
An alternative to a really botched test (hardware.fr data shows this too, btw.) would be that the setup/raster stuff works differently on nV GPUs. AFAIK fillrate tests basically draw a lot of screen filling quads on top of each other. All what would be needed is that Kepler keeps 4 triangles in flight (i.e. two of the quads) while ensuring by some means that the ROPs carry out the write to the render target in the right order. That way, one would have some reuse of the data in the cache. One could test for that by using some more triangles than just two to fill the screen.Or the test metric is probably botched anyway.
But just have a look here:FP10 and FP16 RT pixel writes are full-rate on Kepler's ROPs, though with blending enabled the rate falls to Fermi levels (half).
Texture fill-rates, L1 and L2 bandwidth are doubled all over the previous generation.
Also, the global atomics op's are an order of magnitude faster.
I think this is a proprietary benchmark they are testing with and it could be using optimized data set to align/fit the screen tiles nicely into the cache?But just have a look here:
http://www.hardware.fr/medias/photos_news/00/35/IMG0035598.gif
The GTX680 manages 33.3 GPixel/s with RGBA8 blending and 16 GPixel/s with 4xFP16 blending. This requires a bandwidth of 266 GB/s or 256 GB/s, which the GTX680 clearly doesn't have (192 GB/s).
he said they didn't need to do anything but increase the bandwidth, while the article says they didn't need to do anything than increase the efficiency of the bandwidth usage. (I know that sounds picky, but I think in regards why the GTX680 vs 7970 performs the same, it might be important to note, that both might be purely ROP limited).
Anandtech said:As it turns out, there’s a very good reason that AMD went this route. ROP operations are extremely bandwidth intensive, so much so that even when pairing up ROPs with memory controllers, the ROPs are often still starved of memory bandwidth. With Cayman AMD was not able to reach their peak theoretical ROP throughput even in synthetic tests, never mind in real-world usage. With Tahiti AMD would need to improve their ROP throughput one way or another to keep pace with future games, but because of the low efficiency of their existing ROPs they didn’t need to add any more ROP hardware, they merely needed to improve the efficiency of what they already had.(ROPs)
The solution to that was rather counter-intuitive: decouple the ROPs from the memory controllers. By servicing the ROPs through a crossbar AMD can hold the number of ROPs constant at 32 while increasing the width of the memory bus by 50%. The end result is that the same number of ROPs perform better by having access to the additional bandwidth they need.
http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/4