True, but isn't the 7750's DDR3 dedicated while Kaveri's GPU obviously will share bandwidth with the CPU?Performance loss due to DDR3 should be even smaller for Kaveri as it has DDR3 2133 official support and DDR3 2400+ unofficial support.
True, but isn't the 7750's DDR3 dedicated while Kaveri's GPU obviously will share bandwidth with the CPU?Performance loss due to DDR3 should be even smaller for Kaveri as it has DDR3 2133 official support and DDR3 2400+ unofficial support.
True, but isn't the 7750's DDR3 dedicated while Kaveri's GPU obviously will share bandwidth with the CPU?
That's with 1600MHz DDR3, but Kaveri should support up to 2400MHz, though I don't expect to see more than 2133MHz in most systems. Plus it also has to feed the CPU.
Yes I haven't seen anything that would indicate GCN 1.1 is somehow more bandwidth efficient - maybe it would help with compute tasks. To really make a difference with cache you'd need quite a lot more (and Cape Verde already had twice the cache per MC compared to other family members, didn't seem to help much with the ddr3 version...), not to mention the ROPs don't even use it.Kaveri's GPU is based on the same IP level as Bonaire and Hawaii, so the front-end is different from Cape Verde's. But I'm not sure that will do anything for bandwidth requirements. There's always the relatively straightforward option of making the L2 bigger.
Why it should support DDR3-2400? JEDEC spec goes up to DDR3-2133 and there is no sign that Kaveri comes with something faster than DDR3-2133. I don't think that most system are going to use DDR3-2133, DDR-1600 is more likely for OEMs. Memory bandwidth is shared with the CPU, I doubt there is a big advantage over a dedicated card with DDR3-1600.
GDDR5 latency is virtually the same as DDR3 latency, the interface doesn't change the DRAM latency.Conversely, if you try and replace the main memory interface with GDDR5 you hurt the CPU performance (pretty significantly @ 4Ghz) due to increased latency
While the bolded part is true, GCN1.1 GPUs appear to be more bandwidth efficient in fillrate tests. They can actually exceed the fillrate that would be allowed by the memory bandwidth (a clear difference to GCN1.0 GPUs) especially for blending operations in the tests of hardware.fr (Kepler can do the same, but is using its larger L2 for backing up the ROPs iirc), which is a sign for a changed ROP cache behaviour with GCN1.1. It shows an increased the bandwidth efficiency at least for that fillrate test, but no idea how this translates to real world situations.Yes I haven't seen anything that would indicate GCN 1.1 is somehow more bandwidth efficient - maybe it would help with compute tasks. To really make a difference with cache you'd need quite a lot more (and Cape Verde already had twice the cache per MC compared to other family members, didn't seem to help much with the ddr3 version...), not to mention the ROPs don't even use it.
Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.GDDR5 latency is virtually the same as DDR3 latency, the interface doesn't change the DRAM latency.
Not in the benchmarks I've seen:While the bolded part is true, GCN1.1 GPUs appear to be more bandwidth efficient in fillrate tests. They can actually exceed the fillrate that would be allowed by the memory bandwidth (a clear difference to GCN1.0 GPUs) especially for blending operations in the tests of hardware.fr (Kepler can do the same, but is using its larger L2 for backing up the ROPs iirc), which is a sign for a changed ROP cache behaviour with GCN1.1. It shows an increased the bandwidth efficiency at least for that fillrate test, but no idea how this translates to real world situations.
I think you've answered that yourself? Lower than 9 clocks latency for ddr3 is only the overclocked stuff for ddr3-1600 (there's plenty enough modules with 11-11-11 latencies at ddr3-1600 too). If you compare that to gddr5 5ghz (so base clock slightly more than 50% higher) which would have 15 clocks(*) that IS virtually the same latency.Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.
As mczak already pointed out, the number of clocks on the memory interface is basically irrelevant when you are talking about a specific CPU to lose performance when exchanging the DDR3 against a GDDR5 interface (as you did). From the point of view of the CPU core, what would happen is that the maximally usable bandwidth would go up and the memory latency would stay the same (measured in time or CPU clocks).Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.
That's funny, I wasn't aware of the earlier tests and that it appears to have changed at some point in time. But the "new" results are shown in multiple hardware.fr tests. Maybe Tridam weighs in and can shed some light if they have just changed their benchmark procedure or if that is a real improvement introduced by some driver fix (which doesn't apply to retested GCN 1.0 GPUs).Not in the benchmarks I've seen:
http://www.hardware.fr/articles/890-4/performances-theoriques-pixels.html
In fact based on these numbers it would be less efficient (7790 has 33.3% more memory bandwidth than 7770 but the blending results come in at just below 30% faster), which seems unlikely but at the very least not better.
If you're talking about the results from here though:
http://www.hardware.fr/articles/910-6/performances-theoriques-pixels.html
Then it indeed looks much better.
Not sure what to think of it. Either it's a measurement error or some new bits were enabled in a newer driver (somewhere in the area of color buffer compression or whatnot).
Guess I missed that link. If anyone has it I'd be curious As mczak also noted, it's not typically something that is easy to find for GDDR5 modules for some reason.Somewhere here in the forum there should be a link to an exhaustive Hynix GDDR5 datasheet.
Actually looking closer at them I suspect this is indeed some new scheme for compressed (non-msaa) color buffers. I believe nvidia does the same for some time now (I don't think their results exceeding theoretical max are due to unified caches), though it's actually difficult to see with nvidia's chips often due to the limited pixel export capability (and very slow fp32 blend capability).That's funny, I wasn't aware of the earlier tests and that it appears to have changed at some point in time. But the "new" results are shown in multiple hardware.fr tests. Maybe Tridam weighs in and can shed some light if they have just changed their benchmark procedure or if that is a real improvement introduced by some driver fix (which doesn't apply to retested GCN 1.0 GPUs).