Haswell vs Kaveri

Ray tracing workloads tend to be irregular (especially for non primary rays) and they are likely to perform better on architectures that use a narrow SIMD width. This is just an hypothesis but AMD 64-wide vectors/wavefronts might be partially responsible for the observed performance in Luxmark.
 
while other consider it a genuine performance regression vs. VLIW for Luxmark-like workloads.

Naah.. 7870 is 37% faster than 6970, even with 6% less peak flops and 14% less bandwidth..
Luxmark.png


edit: seems to have widened with newer drivers:
07-Luxmark-2.png
0103-Luxmark.png
 
Last edited by a moderator:
Ray tracing workloads tend to be irregular (especially for non primary rays) and they are likely to perform better on architectures that use a narrow SIMD width. This is just an hypothesis but AMD 64-wide vectors/wavefronts might be partially responsible for the observed performance in Luxmark.
That's probably part of the reason why Intel does so well, but the other is that AMD's APUs underperform equivalent discrete parts by a lot in this test. The A10-5800k should be 1/4 the 6970's performance, not 1/9th, and GCN makes a decent difference as well. It's a low BW test. It's possible that the APUs have an architectural omission causing this, but I'm thinking that this is a driver or throttling issue.

Still, fantastic showing by Intel. Does anyone know what the CPU alone gets in that test?
 
The lack of global caching on all the pre-GCN architectures from AMD is also responsible for the poor RT/PT performance. Memory access in those algorithms is also highly irregular, even with accelerated structures and this is one of the weakest points for any massively parallel machine.

Remember, when Fermi hit the market, we were all blown away by its performance in SmallPT right here, in B3D. ;)
 
Still, fantastic showing by Intel. Does anyone know what the CPU alone gets in that test?
Some more numbers are here:
http://techreport.com/review/24879/intel-core-i7-4770k-and-4950hq-haswell-processors-reviewed/13

Interestingly apparently the OCL CPU ICD does not yet support FMA (casts some doubt on AVX2 support in general). Also interesting to see the eDRAM having a positive effect on a pure CPU workload - enough to bump it past the higher clocked desktop chip.

I'm also curious how the load balancing happens for the "CPU + GPU" modes on the IGPs... that's the case where significant TDP/thermal constraints are going to kick in for the i7-4950HQ. It still seems to do better overall than purely using the IGP, but I'd love to see what frequencies get chosen for the CPU/GPU/ring under such a load.

The whole TR review is worth a read - interesting stuff, and happy to see them actually test some of the new stuff in Haswell a bit (AVX2, FMA) rather than the other reviews so far that run some mostly single threaded code, scoff at the mere 10% or so IPC gains and call it disappointing... I feel like the memo on parallelism went out many years ago now ;)
 
Interesting to see how close the HD 4600 is to Trinity and the 5200 Pro in the Techreport's review. Hexus got very different numbers with Trinity thrashing HD 4600 to the tune of ~35% so we're still looking at pretty wildly different results depending on the game.

http://hexus.net/tech/reviews/cpu/56005-intel-core-i7-4770k-22nm-haswell/?page=13

AMD really needs to get the finger out though as clearly Richland isn't going to open up much more of a gap and a GT3 i3 could be a bit too close for comfort.
 
AMD really needs to get the finger out though as clearly Richland isn't going to open up much more of a gap and a GT3 i3 could be a bit too close for comfort.
TBH i don't even see the point of Richland, just seems like wasted time/effort for little gain.

I hope AMD jump on the 20nm train fast, how long it took them to get their CPU/APUs onto 28nm is a joke.
 
Also interesting to see the eDRAM having a positive effect on a pure CPU workload - enough to bump it past the higher clocked desktop chip.
Still, it becomes clear that the caching that already exists is really great – an enormous L4 cache only gives relatively low gains. On the other hand, as I understood it, somebody already mentioned the option of incorporating a smaller eDRAM cache in the CPU die directly, where it would have less latency and maybe more bandwidth.

run some mostly single threaded code, scoff at the mere 10% or so IPC gains and call it disappointing... I feel like the memo on parallelism went out many years ago now ;)
Considering how much IPC Ivy already has, I'd call those 10% gains awesome, especially since they did not come at a disproportionate cost. I mean, look at ARM – quad-core A5 SoCs are a tier below dual-core A7 SoCs, probably because the die cost of more cores is cheaper than the die cost of cores with higher IPC; the same story as with A7/A15.
And clearly, parallelism has not been figured out yet. Which is actually why Intel can afford devoting more and more die space exclusively to the GPU. If there was a consumer application for 6 to 10 cores, Intel would have to think much harder about segmentation – i.e. there would be a market for consumer desktop chips with more cores, not higher performing GPUs.

Also, it does not seem that Intel really cares about parallelism for consumers. I'm one of those people who would gladly buy an i7-4xxxR chip (BGA with GT3e on desktop), except that TSX is disabled on this chip – and if TSX does become very useful for consumer applications, it will be unavailable to me.
 
TBH i don't even see the point of Richland, just seems like wasted time/effort for little gain.

I hope AMD jump on the 20nm train fast, how long it took them to get their CPU/APUs onto 28nm is a joke.

The point of Richland is to have something more competitive until Kaveri arrives, if it is due in Q4 then it will miss the "back to school" season and maybe Christmas. And, if Intel goes up in performance 5-10% due to IPC gains and AMD goes up 5-10% due to frequency gains at the same TDP (I'm refering to the CPU part only) then the gap will remain the same - but it's true that on the GPU side the gap will be quite reduced if it will exist at all. I'm referring to the HD4600, Iris Pro is too expensive for a comparison with Trinity/Richland. And price is again on AMD's side - comparing the newr flagship from Intel with the mainstream APU is itneresting from a theoretical point of view, but it is pointless from a market persppective as they are not competing against each other.
Also, I'm not convinced that a respin (which Richland seems to be) is really a big effort for AMD.

@jimbo75: TR tests the platforms also with DDR3-2133, and I think everyone knows GPUs of the Trinity/Richland/HD4600 ranks are held back by bandwidth, too...
Also, AT tested the HD4600 with DDR2400, and Trinity with DDR2133, which is not a completely fair comparison, but anyway...
 
Last edited by a moderator:
Why not look at price points before sentencing Richland to an early death?

The most expensive Richland is priced at $150, so it's a competitor to the i3 line.
The Haswell line starts at $190.
Richland is not competing with Haswell. And since there's no desktop i3 Haswell, Richland will be competing with the Ivybridge i3 models.
And yes, Richland's iGPU is quite a bit faster than the i3's HD2500.

Reviewers who get samples for free tend to forget how important the price point is and like to sing victory to a product that costs twice as much as its nearest competitor.

AMD is simply not competing against the Haswell line with APUs at the moment, period.
 
Still, it becomes clear that the caching that already exists is really great – an enormous L4 cache only gives relatively low gains.
Agreed, it's really only a big win if you have a working set in the "dozens of MBs" sort of size range of course. That's quite common for graphics, but less common for CPU workloads; that said, some of that is because they have been optimized for current L3$ sizes :)

Considering how much IPC Ivy already has, I'd call those 10% gains awesome ... And clearly, parallelism has not been figured out yet. Which is actually why Intel can afford devoting more and more die space exclusively to the GPU.
Yep totally agreed, that was the point I was trying to make :) People keep unrealistically expecting their legacy stuff that uses 1-4 threads to keep getting faster when that is clearly not going to happen indefinitely... frankly I consider *any* IPC improvements at this stage to be minor miracles.

Also, it does not seem that Intel really cares about parallelism for consumers. I'm one of those people who would gladly buy an i7-4xxxR chip (BGA with GT3e on desktop), except that TSX is disabled on this chip – and if TSX does become very useful for consumer applications, it will be unavailable to me.
Yeah the TSX segmentation seems like a poor idea to me, especially with it not being supported on the K-series parts. Don't get that all. Granted it is more important with more cores, but still.

Reviewers who get samples for free tend to forget how important the price point is and like to sing victory to a product that costs twice as much as its nearest competitor.

AMD is simply not competing against the Haswell line with APUs at the moment, period.
To consumers, sure, but the point is that in reality they simply set the price depending on how competitive it is in practice. You don't think they *want* to charge $300 for it? You think they are choosing to not have a high-end competitive part? Obviously not, especially with all of the noise they have been making about APUs.

So sure, they're not going to commit suicide by pricing their stuff above higher performing parts, but the retail prices are sort of incidental in an architectural discussion.
 
some of that is because they have been optimized for current L3$ sizes :)
Good point :D it is always interesting to find places where this kind of situation exists

Yeah the TSX segmentation seems like a poor idea to me, especially with it not being supported on the K-series parts
Segmenting the K-series parts like this is, IMHO, fine. If you really need VT-d (and, it seems, TSX) right now, you're probably a server/HPC guy, and Intel probably wants server/HPC guys to buy more chips instead of overclocking.
 
Vt-d would be useful to a very minor segment of the population that would like to do gaming in a VM (and that segment could become a tiny bit less minor)

It ought to be an "enterprise" feature needing VMWare Expensive Edition (tm) or an IBM mainframe but is technically available for free with Xen, like some people run FreeBSD or FreeNAS with ZFS at home.
 
Vt-d would be useful to a very minor segment of the population that would like to do gaming in a VM (and that segment could become a tiny bit less minor)

It ought to be an "enterprise" feature needing VMWare Expensive Edition (tm) or an IBM mainframe but is technically available for free with Xen, like some people run FreeBSD or FreeNAS with ZFS at home.

Lack of VT-D is the reason a bought a 8350 instead of a 3770K. ESXi is free, anyone with a home "file server" should be running ESX/Hyper V etc. it just makes thing easy. Sure its a small part of the market, but it also tends to be the market who put together the BOM's when enterprise and government go out to buy stuff as well :LOL:.
 
Lack of VT-D is the reason a bought a 8350 instead of a 3770K. ESXi is free, anyone with a home "file server" should be running ESX/Hyper V etc. it just makes thing easy. Sure its a small part of the market, but it also tends to be the market who put together the BOM's when enterprise and government go out to buy stuff as well :LOL:.

Why did you avoid the 3770 (non-K version)?
Since no-one wants instability in servers you can rule out over-clocking as a reason.

It has all features enabled including VT-d.
http://ark.intel.com/products/65719
 
o/c with an unlocked multiplier and CPU always able to do 1GHz over its default clock isn't really o/c. Such rigs usually have an aftermarket cooler, case fans, is memtested, tested for stability and heat at prolonged 100% CPU use and we don't make a Serious Business lose one million dollars if it crashes.

Not sure why you would want to run a file server from an o/ced 8350 though. (but making your main rig the NAS is something to do if you wouldn't ever need the NAS to be on while main rig is down)
 
Why did you avoid the 3770 (non-K version)?
because with the workloads i run a 3770 has a performance deficit
Since no-one wants instability in servers you can rule out over-clocking as a reason.
logical fallacy. My 8350 is overclocked to 4600 and undervolted to 1.275 volts. Its rock solid stable.
I pass through my RAID card to a freenas VM to run ZFS.
I run file,print,domain, firewall ( sidewinder) , SSL VPN ( F5 BIG-IP), IPTV server with real time transcoding to H264 and DLNA server (mezzmo) again with real time transcoding. i also run very large network simulations/development using things like dynamips/virtual ASR/qemu etc.

It spends a very large part of the day at 100% utilisation.

It has all features enabled including VT-d.
http://ark.intel.com/products/65719

i know that.

the price different also allow for an additional SSD so a could mirror my Guest OS's drive.
 
Back
Top