Haswell vs Kaveri

True, but isn't the 7750's DDR3 dedicated while Kaveri's GPU obviously will share bandwidth with the CPU?

This is true, but previous AMD APU's showed that GPU has priority access to memory bandwidth in most cases and a lot of GPU intensive apps (like games) don't need much of memory bandwidth for CPU side in first place.
If you would like to run STREAM benchmark on CPU while rendering data on iGPU things might look different but I haven't tested that scenario yet.
Not having very fast CPU with only 128 bit SIMD also puts less stain on DDR3 controller from CPU side of things leaving more to play with for GPU side.
Intel's cores are a lot more demanding from memory as they can process a lot of data per clock with 256bit AVX and much wider core.
This might be interesting investigation to be had if someone has hardware on hand and is willing to test both camps APU's in various CPU workloads with constant GPU load observing resulting drop of performance for both pure CPU and pure GPU base line.
 
Last edited by a moderator:
That's with 1600MHz DDR3, but Kaveri should support up to 2400MHz, though I don't expect to see more than 2133MHz in most systems. Plus it also has to feed the CPU.


Why it should support DDR3-2400? JEDEC spec goes up to DDR3-2133 and there is no sign that Kaveri comes with something faster than DDR3-2133. I don't think that most system are going to use DDR3-2133, DDR-1600 is more likely for OEMs. Memory bandwidth is shared with the CPU, I doubt there is a big advantage over a dedicated card with DDR3-1600.
 
Kaveri's GPU is based on the same IP level as Bonaire and Hawaii, so the front-end is different from Cape Verde's. But I'm not sure that will do anything for bandwidth requirements. There's always the relatively straightforward option of making the L2 bigger.
Yes I haven't seen anything that would indicate GCN 1.1 is somehow more bandwidth efficient - maybe it would help with compute tasks. To really make a difference with cache you'd need quite a lot more (and Cape Verde already had twice the cache per MC compared to other family members, didn't seem to help much with the ddr3 version...), not to mention the ROPs don't even use it.
I agree OEMs are unlikely to use more than ddr3-1600. And you can only hope they will actually use dual-channel...
 
Why it should support DDR3-2400? JEDEC spec goes up to DDR3-2133 and there is no sign that Kaveri comes with something faster than DDR3-2133. I don't think that most system are going to use DDR3-2133, DDR-1600 is more likely for OEMs. Memory bandwidth is shared with the CPU, I doubt there is a big advantage over a dedicated card with DDR3-1600.

I thought I'd seen something about DDR3-2400 support in Kaveri somewhere, but I might be wrong. A bunch of FM2+ motherboards from AsRock support DDR3-2400/2600, for what it's worth.

I don't think that most systems will use DDR3-2133 either, just that it's usually going to be the upper bound. But memory prices are coming down again so DDR3-1866 isn't unrealistic.

Of course this depends a lot on the kind of system Kaveri will power. On a $1000~1500 machine, faster RAM isn't a problem, but that can only happen if CPU performance is deemed sufficient for high-end laptops, which I'm afraid isn't going to happen. Nevertheless, a shift to somewhat nicer systems vs. what Trinity/Richland ever achieved seems plausible. For the most part, I expect Kaveri to end up in cheap notebooks with DDR3-1600 or less. I can only join mczak in his prayer for dual-channel.
 
Conversely, if you try and replace the main memory interface with GDDR5 you hurt the CPU performance (pretty significantly @ 4Ghz) due to increased latency
GDDR5 latency is virtually the same as DDR3 latency, the interface doesn't change the DRAM latency.

edit:
Yes I haven't seen anything that would indicate GCN 1.1 is somehow more bandwidth efficient - maybe it would help with compute tasks. To really make a difference with cache you'd need quite a lot more (and Cape Verde already had twice the cache per MC compared to other family members, didn't seem to help much with the ddr3 version...), not to mention the ROPs don't even use it.
While the bolded part is true, GCN1.1 GPUs appear to be more bandwidth efficient in fillrate tests. They can actually exceed the fillrate that would be allowed by the memory bandwidth (a clear difference to GCN1.0 GPUs) especially for blending operations in the tests of hardware.fr (Kepler can do the same, but is using its larger L2 for backing up the ROPs iirc), which is a sign for a changed ROP cache behaviour with GCN1.1. It shows an increased the bandwidth efficiency at least for that fillrate test, but no idea how this translates to real world situations.
 
Last edited by a moderator:
GDDR5 latency is virtually the same as DDR3 latency, the interface doesn't change the DRAM latency.
Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.

I agree that fundamentally there's no reason they need to be worse, but do you happen to have links to modules with lower latencies out of curiosity?
 
While the bolded part is true, GCN1.1 GPUs appear to be more bandwidth efficient in fillrate tests. They can actually exceed the fillrate that would be allowed by the memory bandwidth (a clear difference to GCN1.0 GPUs) especially for blending operations in the tests of hardware.fr (Kepler can do the same, but is using its larger L2 for backing up the ROPs iirc), which is a sign for a changed ROP cache behaviour with GCN1.1. It shows an increased the bandwidth efficiency at least for that fillrate test, but no idea how this translates to real world situations.
Not in the benchmarks I've seen:
http://www.hardware.fr/articles/890-4/performances-theoriques-pixels.html
In fact based on these numbers it would be less efficient (7790 has 33.3% more memory bandwidth than 7770 but the blending results come in at just below 30% faster), which seems unlikely but at the very least not better.
If you're talking about the results from here though:
http://www.hardware.fr/articles/910-6/performances-theoriques-pixels.html
Then it indeed looks much better.
Not sure what to think of it. Either it's a measurement error or some new bits were enabled in a newer driver (somewhere in the area of color buffer compression or whatnot).
 
Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.
I think you've answered that yourself? Lower than 9 clocks latency for ddr3 is only the overclocked stuff for ddr3-1600 (there's plenty enough modules with 11-11-11 latencies at ddr3-1600 too). If you compare that to gddr5 5ghz (so base clock slightly more than 50% higher) which would have 15 clocks(*) that IS virtually the same latency.
(*) I never really could find useful latency numbers for gddr5 memory, datasheets seem to be scarce...
 
Most GDDR5 modules I've seen have roughly double (~15) the latency of typically DDR3 (~7-9) in terms of clocks. Obviously clock speeds vary (although if we're comparing to 2133/2400 memory, less so) and there may be more options out there, but as far as the "typical" GDDR5 that I could find specs for, it doesn't seem to come with low latency numbers.
As mczak already pointed out, the number of clocks on the memory interface is basically irrelevant when you are talking about a specific CPU to lose performance when exchanging the DDR3 against a GDDR5 interface (as you did). From the point of view of the CPU core, what would happen is that the maximally usable bandwidth would go up and the memory latency would stay the same (measured in time or CPU clocks).
Somewhere here in the forum there should be a link to an exhaustive Hynix GDDR5 datasheet. I remember it was pointed out several times, that all latencies in nanoseconds were virtually the same as with average DDR3-2133 (but frequency doesn't play much of a role as in absolute terms latency tends to be rather constant if you don't resort to OC modules with increased voltages).

Not in the benchmarks I've seen:
http://www.hardware.fr/articles/890-4/performances-theoriques-pixels.html
In fact based on these numbers it would be less efficient (7790 has 33.3% more memory bandwidth than 7770 but the blending results come in at just below 30% faster), which seems unlikely but at the very least not better.
If you're talking about the results from here though:
http://www.hardware.fr/articles/910-6/performances-theoriques-pixels.html
Then it indeed looks much better.
Not sure what to think of it. Either it's a measurement error or some new bits were enabled in a newer driver (somewhere in the area of color buffer compression or whatnot).
That's funny, I wasn't aware of the earlier tests and that it appears to have changed at some point in time. But the "new" results are shown in multiple hardware.fr tests. Maybe Tridam weighs in and can shed some light if they have just changed their benchmark procedure or if that is a real improvement introduced by some driver fix (which doesn't apply to retested GCN 1.0 GPUs).
 
Last edited by a moderator:
Somewhere here in the forum there should be a link to an exhaustive Hynix GDDR5 datasheet.
Guess I missed that link. If anyone has it I'd be curious :) As mczak also noted, it's not typically something that is easy to find for GDDR5 modules for some reason.
 
That's funny, I wasn't aware of the earlier tests and that it appears to have changed at some point in time. But the "new" results are shown in multiple hardware.fr tests. Maybe Tridam weighs in and can shed some light if they have just changed their benchmark procedure or if that is a real improvement introduced by some driver fix (which doesn't apply to retested GCN 1.0 GPUs).
Actually looking closer at them I suspect this is indeed some new scheme for compressed (non-msaa) color buffers. I believe nvidia does the same for some time now (I don't think their results exceeding theoretical max are due to unified caches), though it's actually difficult to see with nvidia's chips often due to the limited pixel export capability (and very slow fp32 blend capability).
But HD 7790 beating HD 7870 sure has to be something along these lines, the fp16 results are especially telling. Doesn't seem to work for 128bit render targets, though.
I guess that should indeed help Kaveri quite a bit then.
 
s0fc.jpg


A10-7850K@4.9GHz: 547
http://wccftech.com/amd-kaveri-a107850k-benchmark-surfaced/

compare:
A10-6800K@5.1GHz: 403
http://hwbot.org/benchmark/cinebench_r15/rankings?hardwareTypeId=processor_2826#start=0#interval=20
 
Extremely promising.... assuming scaling continues to 8 cores/4 modules then a 4 module Steamroller looks to be quite a lot faster in this benchmark clock for clock than a quad Haswell. AMD really need to get one of these baby's out in 4 module form running at 5Ghz if that kind of performance carry's over to other benchmarks. It could easily topple Intel in the mainstream segment if that's the case (until Intel respond of course).
 
There's been no indication that any product is going to clock that high, and hasn't that image's authenticity been debunked?
 
How much is it, and is there a 4 core version?

As an example, my i4770K running at 4.7 ghz on all four cores (air cooled) does 947 on Cinebench, so it's at least getting AMD up to parity if they have more cores/threads, and above if they can get to 5ghz at stock.
 
CPU-world reported +20% EUs for GT1-GT3 Broadwell, so yes the GenX diagram from HPG2012 could belong to Broadwell/Gen8. Based on 96EU @1 Ghz we might look for 1,2-1,3 Tflops for the faster GT3 parts assuming frequency can go up to 1,2-1,3 Ghz. At least Broadwell-K with high TDP should hold it.
 
Back
Top