Haswell vs Kaveri

HD 5000 base clock rate is halved (200 MHz) compared to HD 4000 series (400 MHz). With twice as many EUs, the GPU can run at 1/2 the clocks and still provide the same performance. As everyone here knows, double clocks consume more than double power (4x is more closer in general). So Intel actually traded die area to power savings. I think it was worth it as MBA 2013 has outstanding 12 hour battery life (+5 hours compared to the last year's model AND slightly better performance).

Sadly we still don't have any real power usage numbers during gaming. I don't get why Anand can't just do that.

HD 5000 (GT3) can still turbo clock up to 1100-1300 MHz in cases where the (15W) TDP allows it. But that shouldn't occur as often as it did in the past (but at 650 MHz it should already offer similar performance).
But this isn't really happening is it? If the gpu has a certain TDP it can boost to, it'll boost to it and the results should be seen in higher fps. If HD 5000 offers similar performance at 650 MHz compared to HD 4000 at ~1200 MHz, it should boost higher to increase performance.

Performance is higher and the TDP is a little bit lower, but it's nowhere near what the doubling of EU's should perform like without TDP restrictions.

Also without Crystalwell the GPU will be severely bandwidth bound at maximum clocks, lowering the potential performance gains even further. Anand's tests with GT3e showed that GT3 is TDP bound even at 47W. He increased the TDP to 55W (using Intel Extreme Tuning Utility) and it brought noticeable gains in games (but not that much in pure GPU synthetic benchmarks, as the CPU half is pretty much idling in those and gives its TDP to the GPU).
I'm not sure...I have a hard time believing sub-20W parts are bandwidth bound by that much even with their consistent yearly improvements, and I think we'll see much better (~30%) percent gains for Haswell at 28W with the same bandwidth.
 
Yeah as usual sebbbi nails it and his comments are in-line with my experience.

The most interesting thing about the 15W GT3 parts in my opinion is their ability to run more stuff v-sync'd without spinning up the fan (i.e. at ~600Mhz GT frequency or similar). In cases where GT2 has to turbo up and and generate a lot of heat, GT3 can do it while keeping the system cool. Definitely useful for more casual gaming or running old stuff (I play a lot of Myth II still :)).

Anand's tests with GT3e showed that GT3 is TDP bound even at 47W. He increased the TDP to 55W (using Intel Extreme Tuning Utility) and it brought noticeable gains in games (but not that much in pure GPU synthetic benchmarks, as the CPU half is pretty much idling in those and gives its TDP to the GPU).
Right, if you're using a non-trivial amount of CPU (games tend to), it's even TDP bound at 55W, hence the 65W R-series version :)

I really is a power/cooling game at this point. The exact same chip can perform vastly differently depending on the quality of chassis/cooling it is paired with. With configurable TDP and such large turbo ranges, there are definitely hills and valleys in the "user experience" landscape.

And let me say, running v-sync off is a valley. While I often do this for FPS games on my desktop/discrete setup, it is absolutely not a good idea on any TDP/thermally-constrained parts. It's better to pick a target frame rate and hit it, console-style. Will definitely have some game developer education to do here.

The oddity for bench-marking as well is that things like time-demos start to become a bad way to represent quality of user experience as well, since they just max out TDP and heat up the chip, often for no real gain over running a game simulation at "normal" rates v-synced. We're going to have to evolve benchmarking as well.
 
Yeah as usual sebbbi nails it and his comments are in-line with my experience.

The most interesting thing about the 15W GT3 parts in my opinion is their ability to run more stuff v-sync'd without spinning up the fan (i.e. at ~600Mhz GT frequency or similar). In cases where GT2 has to turbo up and and generate a lot of heat, GT3 can do it while keeping the system cool. Definitely useful for more casual gaming or running old stuff (I play a lot of Myth II still :)).

I didn't know about this (v-sync) but it's similar to something I advocated that AMD might end up doing in order to save power with Llano on mobile, many moons ago on another forum. Kudos to you for making it happen first.

Right, if you're using a non-trivial amount of CPU (games tend to), it's even TDP bound at 55W, hence the 65W R-series version :)

I really is a power/cooling game at this point. The exact same chip can perform vastly differently depending on the quality of chassis/cooling it is paired with. With configurable TDP and such large turbo ranges, there are definitely hills and valleys in the "user experience" landscape.
Yep and the less you have to work with at the start, the less you can be expected to gain. I'm not surprised by the smallish gains here even with the doubled resources, tbh as an overall package it's pretty good for the same node.

The oddity for bench-marking as well is that things like time-demos start to become a bad way to represent quality of user experience as well, since they just max out TDP and heat up the chip, often for no real gain over running a game simulation at "normal" rates v-synced. We're going to have to evolve benchmarking as well.
I'm glad to see you say that because I've always felt like we're being had somewhat by some of these ULV benchmarks.
 
Last edited by a moderator:
I didn't know about this (v-sync) but it's similar to something I advocated that AMD might end up doing in order to save power with Llano on mobile, many moons ago on another forum. Kudos to you for making it happen first.
It's nothing specific to Haswell really, just the fact that burning extra power beyond the display refresh rate (or whatever target frame rate) is really harmful on these thermally constrained platforms. Instead, letting things go idle to wait for the next v-blank vastly improves the overall end user experience.

A big offender currently is menus. Lots of games run the menu unthrottled at hundreds of FPS, so by the time you even get into the game you've already heated up the chip and have the fan spinning at maximum.

I'm glad to see you say that because I've always felt like we're being had somewhat by some of these ULV benchmarks.
Yeah I've been a big proponent for the switch to more experienced-based metrics like frame time variance, etc. but it goes even further for ULV-type stuff. Ultimately games need to be part of the solution here too instead of their typical "run at max performance and use every last hardware resource I have", but a lot of that is driven outwards-in by reviews, so the change probably needs to start happening there first.
 
It's nothing specific to Haswell really, just the fact that burning extra power beyond the display refresh rate (or whatever target frame rate) is really harmful on these thermally constrained platforms. Instead, letting things go idle to wait for the next v-blank vastly improves the overall end user experience.

A big offender currently is menus. Lots of games run the menu unthrottled at hundreds of FPS, so by the time you even get into the game you've already heated up the chip and have the fan spinning at maximum.

Good point. Also you maybe just gave away the reason why Intel powerpointed Silvermont with much better power consumption vs ARM recently during the Angry Birds menu (I noticed this straight away but until now was unsure of the exact reason behind it.) Shh, I'll not say anything!. :D

Yeah I've been a big proponent for the switch to more experienced-based metrics like frame time variance, etc. but it goes even further for ULV-type stuff. Ultimately games need to be part of the solution here too instead of their typical "run at max performance and use every last hardware resource I have", but a lot of that is driven outwards-in by reviews, so the change probably needs to start happening there first.
Yeah it's all about bigger numbers. To be frank I don't even think people want to know the truth, they just want to see simple bars with x is better than y. The tech press is mostly giving them what they want.
 
I tried playing some games on HD 4000 for the first time this weekend. I've never actually tried modern Intel graphics until now..... It runs old Guild Wars 1 at 1080p with 4xAA very well. It can't handle Forged Alliance quite adequately though even at 1360x768. That puts it pretty low end since I know old GF 8600 runs the game better.

I'd really like to play with GT3e.

I bought a open box HP DV6t 7000. 15.6" 1080p, i7 3610, GT 650. Selling it back on ebay though because it gets too damn hot when actually using those components heavily. WASD gets so hot my fingers feel a little burning sensation lol. Crap cooling design.

But playing with HD 4000 and seeing its relative competence has me thinking even more of APUs. So does the extreme heat of modern discrete and its challenges in sub 17" notebooks.
 
Another thing I just remembered is since I've been looking at notebooks lately I've read many reports of throttling issues caused by inadequate cooling capacity. It seems that notebook designers have found a new way to go cheap - by intentionally leaning on modern thermal features. The hardware won't fry like in the good old days, but in some cases people are seeing performance drop off a cliff. I've seen it reported with all mixes of hardware.
 
It's nothing specific to Haswell really, just the fact that burning extra power beyond the display refresh rate (or whatever target frame rate) is really harmful on these thermally constrained platforms. Instead, letting things go idle to wait for the next v-blank vastly improves the overall end user experience.
Yes, and properly waiting for v-synch will actually improve game (minimum) frame rate on new CPUs, since the CPU and GPU will not constantly try to run at the TDP extra limit. Intel's chips can momentarily run over TDP if needed, and thus prevent those occasional frame hiccups you often encounter in games (for example explosion near the camera).

Haswell does actually slightly better with v-synch (or gapped frame rate in general) than previous CPUs, since it will go/resume very quickly to the new S0ix power saving state. If the game is not CPU bound, the CPU can spend most of the frame time in S0ix, saving huge amount of power. For example (assuming 60 fps v-synch): If the CPU can process a frame in 5ms, it can be in S0ix for 8ms rest of the frame (S0ix resume takes around 3ms). A quad core laptop Haswell CPU should easily crunch through the frames of current generation console ports in less than 5ms, allowing the CPU to save lots of power (that can be used to improve integrated GPU performance, as the TDP is shared).
 
A quad core laptop Haswell CPU should easily crunch through the frames of current generation console ports in less than 5ms, allowing the CPU to save lots of power (that can be used to improve integrated GPU performance, as the TDP is shared).
Do quad cores even support S0ix? Though even C-states should give a lot of power savings. Or are C-states too slow?
 
A quad core laptop Haswell CPU should easily crunch through the frames of current generation console ports in less than 5ms, allowing the CPU to save lots of power (that can be used to improve integrated GPU performance, as the TDP is shared).

Only the U and Y series chip has S0iX....
 
Is Swiftshader still maintained and updated?, that would be interesting. It could be feasible to actually play existing games on a software rasterizer.

Likewise llvmpipe is a software OpenGL implementation intended to be usable. It's the default on recent linux distro if you don't have a suitable driver (or your computer is misconfigured). AVX2 llvmpipe could be fun to look at.
I ran llvmpipe on my Athlon II X2 2.9GHz, it gave me a somewhat smooth 15 to 20 fps on Counterstrike 1.6 (fun how it was consistent instead of spiking here and there). A couple days ago I've seen it on a dual core Atom too (whose linux driver is a 2D only one, without even basic video acceleration support). It gives a "near real time" performance in Google Earth (very slow but nominally usable)

Not sure why you would want to do that on Haswell, unless you're a 3D dev or for fun. Or you're bored and have that idle 24 core Haswell server (no GPU), why not try to play a game and pipe it through VNC or RDP.
 
From Swiftshader 2013 whitepaper "Intel’s Haswell chips, available later this year, will include three 256-bit wide SIMD units per core, two of which are capable of a fused multiply-add operation. This arrangement will process up to 32 floatingpoint operations per cycle: with four cores on a mid-range version of this architecture, this provides about 450 raw GFLOPS at 3.5 GHz.
Intel’s AVX2 instruction set offers room to increase the SIMD width size to 1024 bits, which would put the raw CPU GFLOPS at similar levels to the highest end GPUs currently available
"


At least they are thinking on it.
 
Maybe I missed it, but I did not know that a GT1.5 exists:
http://static.myce.com/images_posts/2013/07/page-7.jpg
I wonder whether dual-core mainstream notebook CPUs will have HD4400 or HD4600. I am very happy with my current laptop, which has a full-voltage dual-core Sandy GT2, but weights 1,7kg and lasts for around six hours on battery. So I hope that mainstream duallies will have HD4600, or the choice will be between ultrabooks (which are underpowered) or quad cores (which are too heavy, large and expensive) or cheap dual cores (with weak GPUs) :(
 
Has anyone tried running games in software mode with DirectX SDK on Haswell? Does AVX 2 get utilized at all?

The WARP renderer tops out at SSE4.1, so that would be a no. As for SwiftShader, IIRC they're still stuck in DX9 land, so not hugely interesting for the time being...unless you're Google and are in need of an efficient software rasterizer for your browser.
 
Where was GT1.5 mentioned before? I must have missed it. Also, in the official developer's guide, HD4200 is said to have 20 EUs. And if the desktop HD4200 has GT1.5 while the mobile HD4200 has GT2, it gets even more confusing.
 
Back
Top