Intel Skylake Platform

The base clock is lower its because you have 2 extra cores.

Almost all processors are TDP bound regardless of the segment.
 
Wow, that's crap. We need another Athlon64.

We need to create other universes with different physics laws to run our computers in, or time manipulation where time runs faster in a bubble where the CPU/APU is contained in ; and of course develop FTL travel for information to enable lower latencies. Intercontinental online gaming and other uses (even plain web browsing) will be so much better once it's done.
 
The base clock is lower its because you have 2 extra cores.

Almost all processors are TDP bound regardless of the segment.

Sure, we can compare it to the Core i7 QM processors.
Some could get very "slow" : the 2630QM at 2.0GHz ; 3720QM is better, at 2.6GHz. In effect rather than "turbo" I think of it as underclocking when all the cores are pushed.
So for web browsing and the like and most gaming you'll be at 3.7GHz on that Haswell-E. If the underclocking at 100% use bothers you just change the base CPU clock in the BIOS, then it's up to you to make sure the cooling is adequate.
 
The base clock is lower its because you have 2 extra cores.

Almost all processors are TDP bound regardless of the segment.

I understand that but to give up 16% of your clock speed for just the extra 2 cores - especially when Haswell is supposed to be a more efficient architecture is rubbish IMO. And I very much doubt it's a true technical limitation and more a lack of motivation to push hard in the desktop space.

Given you're getting 33% more cores with 16% less clock speed than Ivy Bridge-E and given that core scaling is usually quite a bit worse than clock scaling I'd say there'll be a lot of situations where the Ivy is faster and in the few where Haswell is faster is likely won't be by very much at all (unless AVX2 is being fully leveraged).
 
Given you're getting 33% more cores with 16% less clock speed than Ivy Bridge-E and given that core scaling is usually quite a bit worse than clock scaling I'd say there'll be a lot of situations where the Ivy is faster and in the few where Haswell is faster is likely won't be by very much at all (unless AVX2 is being fully leveraged).
Let's be blunt though -- the "E" series has never been about being the fastest Intel desktop platform. The 3930k and the 3770k came out around the same time, but the "E" platform never did as well as the 3770k in what would be considered "normal" desktop apps.

Similarly with the 4930k and 4770k, the "E" platform is again really not competitive using standard desktop apps and games. The only time these "E" platforms really shine is if you have workloads that are massively threaded, need a ton of ram (and lots of usable bandwidth to it), and / or if you want to run a TON of PCI-E expansion cards (triple SLI plus a hardware raid card.)

The same will come with Haswell "E" -- that platform is all about epic memory bandwidth, massively threaded applications and the need for a shit-ton of PCI-E lanes at full speed. If your application load doesn't need that kind of hardware, then you're always better off to use the non-E platform.

The only reason I opted for the 3930k platform was to get all the PCI-E lanes and the want for a ton more ram. A hardware raid card and a pair of video cards was what I really was aiming for, although I ended up with only one video card for now. And 64GB of ram wasn't feasible on anything else outside of server boards.

Finally keep in mind that, at least in the ULV space, the Haswell 1.3Ghz Core i5 series is performing at the same level as the Ivy Bridge 1.8Ghz Core i5 series. That's a LOT of clock deficit to cover...
 
Let's be blunt though -- the "E" series has never been about being the fastest Intel desktop platform. The 3930k and the 3770k came out around the same time, but the "E" platform never did as well as the 3770k in what would be considered "normal" desktop apps.

Yes, but he was comparing two E-platforms. With just 3.0GHz the Haswell-E has it's hands full against the 49xx-models.

Finally keep in mind that, at least in the ULV space, the Haswell 1.3Ghz Core i5 series is performing at the same level as the Ivy Bridge 1.8Ghz Core i5 series. That's a LOT of clock deficit to cover...

Isn't that, because the turbo evens out the difference in those thermally restricted cases? Not claiming just asking. In any case I wouldn't worry too much about the stock clocks of Haswell-E at the moment. If it's a decent overclocker, one should be able to get good performance out of it.
 
Don't the goals of 512-bit AVX run counter to creating a better mobile chip?

I would think that Intel would prioritize other things with their transistor budget -- a larger GPU, integrating more -- or all -- of the PCH, perhaps greater optimization for power over performance...

In addition, won't a lot of structures in the CPU core have to enlarge in order to feed the vector units? Intel managed to pull off some tricks to prevent enlarging the caches with Haswell, but isn't it unlikely that they'll be able to do the same with Skylake? We'd probably need a larger L1D, right? Maybe a larger L2?
 
If mobile were Intel's only focus, there might be a point to that. As well, unless 512-bit AVX is coming to their low power line, it wouldn't be a factor anyway. And while something like Haswell has mobile low power parts, their true mobile low power CPU cores are Silvermont based at the moment (or soon will be). Haswell has to service everything from servers to mobile. Silvermont is for mobile and ultra mobile.

Regards,
SB
 
I know that it's not their only goal, however my point is that such a move would likely take up a pretty significant portion of the transistor budget. Granted, they have a LOT of die real estate to play with, given how incredibly tiny they've made their processors over the past few generations. However, I am just curious as to how they're going to balance things out. I think it's pretty clear that AVX2 was a pretty big contributor to the TDP increase from Ivy to Haswell. AVX3 would further raise peak power, at least compared to a theoretical Skylake with 256 bit AVX. Overall power should be down, unless clock speeds increase significantly.

That's actually another question I have. I know that Intel's 14nm process details haven't been released yet (please, please IEDM 2013), but will we see increased clock speeds at the high end? When Intel moved to FinFETs, they actually treaded water with transistor performance at the voltages over 1V, and regressed when past a certain point. Theoretically, this penatly only applies once -- after that, "traditional" electrical performance should come back at the high end. In other words, 14nm FinFETs would out perform 32nm planar at the higher end of the voltage spectrum, despite the penalty associated with multigate devices. [sources available upon request]

So, unless we see more changes that reduce power consumption at the expense of performance at the higher end of things, 14nm should allow for higher clock speeds on the desktop. Not much, but maybe a couple 100 MHz bumps.

Back to AVX3, am I correct in assuming that cache sizes would need to increase in order to provide more bandwidth to the execution engines? If so, we could theoretically see pretty decent performance increases from going from 32KB L1D to 64KB, and from 256KB L2 to 512KB, right? Maybe a small handful of precentage points (2-5%, ish)?

I've been hypothesizing that Skylake could bring pretty substantial core changes. For one, the Israeli Intel team is likely to be at the helm of Skylake. Two, Intel has very small dies -- Broadwell is less than 115 mm2. Intel hasn't had a die that small since Cedar Mill, and if we exclude Cedar Mill, it is likely the smallest "mainstream" die since the Pentium III. Three, Haswell was a relatively conservative update on a CPU level.

Even if Intel integrated the PCH (which would take up less than 30 mm2 on 14nm), there would be tons of room for more to be added. Intel could theoretically create 6 core parts at a reasonable cost. Unlikely, but feasible. We could see L4 get integrated, which could have interesting implications. We could see more fixed function hardware. Perhaps we could see more in the way of power supply being integrated.

Or, we could just see Intel keeping things conservative, as they have the past two generations. I don't really know what the best option would be as far as keeping ARM at bay, but I'd imagine that Skylake is crafted to do just that.
 
Last edited by a moderator:
No point integrating the PCH in to a server platform CPU - you just end up with needlessly replicated hardware.
 
Going to 64kb L1 dcache would require a move to 16 ways to get rid of aliases. The impact on power would be non negligible.
 
yes it did.
No, it did not. Let me hold your hand (again):
Back to AVX3, am I correct in assuming that cache sizes would need to increase in order to provide more bandwidth to the execution engines?
Here, I am asking if it is necessary to expand cache sizes to support 512-bit AVX. This question has remained unanswered. Also, note the absence of anything related to power in my question here.

Again, note the absence of anything power related here:
If so, we could theoretically see pretty decent performance increases from going from 32KB L1D to 64KB, and from 256KB L2 to 512KB, right? Maybe a small handful of precentage points (2-5%, ish)?
At the top of my post, I have already acknowledge the power issue, where I was questioning the addition of AVX3 due to the areal and power concerns, both of which conflict with Intel's striving for a better mobile microarchitecture. Since I had already assumed that 512-bit AVX3 would result in higher power consumption than 256-bit AVX2, Laurent06's statement that power consumption will increase as a result of a potentially necessary expansion of the caches was redundant, obvious, and contributed nothing to the discussion.
you will also will increase latency. you might even go backwards in performance on more workloads then you go forward.
Increasing cache sizes may increase latency -- not "will." Please understand the difference between correlation and causation.

For example, AMD's K10 architecture has a 3-cycle 64KB L1 cache. If we used your logic, AMD's K10 would be breaking the laws of physics, because it somehow has lower latency than Intel's 32KB L1 caches, but is even larger in size.

Different microarchitectures have different needs. Design goals change from μarch to μarch.

Another bit: when Intel moved from a 3-cycle latency to a 4-cycle latency during their switch from Penryn to Nehalem, the average performance loss was only 2-3%, despite the L1 caches staying the same size. Modern microarchitectures are heavily built upon hiding latency.
 
If you don't get it that power has become a potentially blocking issue that limits micro-architecture design space and that it can't be ignored, then there's not much that can be added to the discussion ;)
 
Let me hold your hand (again):

And let me hold your hand, stop being an ass. I'm not sure where the aggression is coming from (bad week at work?), but it's clearly not warranted. Let's try for civilized discussions.
 
It's actually well warranted, and I will let you know why in a PM. Although I do find it interesting that you're giving me the "do as I say, not as I do" spiel.

I'm aware of my shortcomings in life. ;) But ignoring me for a second, I've been following this thread and I don't understand why Laurent06 or anyone else needs to be talked down like that. If you have a problem with certain individuals, ignore them and report posts you feel are against forum policy.
 
If you don't get it that power has become a potentially blocking issue that limits micro-architecture design space and that it can't be ignored, then there's not much that can be added to the discussion ;)
I understand that power is a problem. That's the entire point of my original post. Had you read it, you might have avoided this whole issue.
 
For example, AMD's K10 architecture has a 3-cycle 64KB L1 cache. If we used your logic, AMD's K10 would be breaking the laws of physics, because it somehow has lower latency than Intel's 32KB L1 caches, but is even larger in size.

Different microarchitectures have different needs. Design goals change from μarch to μarch.

.
All other things being equal ( do i really need to 100% specify everything) you will increase latency. You will also notice i said latency not clock cycles. how fast can you run that K10 L1D vs nehalem L1D. while im sure there were other bottlenecks K10(on 45nm) couldn't clock as high as a D0 nehalem while being on a process that all things being equal gives a clock speed advantage.

Another bit: when Intel moved from a 3-cycle latency to a 4-cycle latency during their switch from Penryn to Nehalem, the average performance loss was only 2-3%, despite the L1 caches staying the same size. Modern microarchitectures are heavily built upon hiding latency

Your ignoring all the other improvements that help recover from that increase and the L1 exist entirely for the purpose of things you cant hide latency for. If the prefetchers and predictors are already good enough to hide that latency increase what do you gain by increasing the L1D? Its not going to help that pathological worst case, about the only thing i can think of is power, but then that needs to be compared to the increased power your using in your L1D to increase its size while keeping all its other characteristics the same.

I would think getting scatter and gather down pat wold be a far bigger performance improver for SIMD while hopefully not hurting legacy code.

The thing i recon would lead Intel to increase L1D and L2 is cache throughput, im sure there's some point where throughput in the cache structure gets high enough that at the current cache sizes data cant be held for as long as desired/currently while hitting good memory system utilization and then you would start wasting both power and performance.


You still holding my hand? maybe your suffering from your own conformation bias :runaway:

/joke dont have an anurisiym
 
Yes, but he was comparing two E-platforms. With just 3.0GHz the Haswell-E has it's hands full against the 49xx-models.

Overclocked comparisons would be the real deal, because these are enthusiast platforms. The differences in clock when at max overclocks might be lot less than at base clocks.

I think it's pretty clear that AVX2 was a pretty big contributor to the TDP increase from Ivy to Haswell.

I think this also has to do with where they are focusing at. It's increasingly about at lower clocks than before, which benefits server and mobile, the markets with lot more potential.

Examples:

U series: Core i5 4300U and Core i7 4600U
Y series: Core i5 4300Y and Core i7 4610Y

Where you see equal/higher clocks than 3rd gen core predecessors, lowering the TDP while including the PCH. In case of the 4610Y, its 13% higher clocked in Base and 11.5% in Turbo. That, when including perf/watt gains, equals to 25% power reduction with 10% performance increase in U and 28% power reduction with 20% performance increase in Y!
 
Last edited by a moderator:
Back
Top