Intel introduced AVX clock throttling with high-core count Haswell-E Xeons.. And I my memory serves me right, they started avec different frequencies for avx256 / non avx256 with broadwell ? Or is it only with avx512/skylake-x ?
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Intel introduced AVX clock throttling with high-core count Haswell-E Xeons.. And I my memory serves me right, they started avec different frequencies for avx256 / non avx256 with broadwell ? Or is it only with avx512/skylake-x ?
There's a few places where it's not clear if they simplified the arrows, or there's something to be read into the diagram for the integer execution engine. The Load/Store block in particular has arrows that go to the retire queue, the forwarding mux, and register file.Been slowly digesting the Anandtech article, there's an awful lot of doubling of stuff there that should at least help out with a bunch of corner cases.
Bunch of other improvements also.
The TAGE predictor is a level-two predictor, meaning it is accessed after the initial prediction by the perceptron. Perhaps Zen3 has a similar arrangement, or the later addition with Zen2 meant it was easier to fit the larger TAGE one level further out from the inner prediction loop, since power was the supposed reason for keeping the perceptron as the initial predictor.TAGE branch predictor could be a big win, from what I've been reading apparently is pretty bleeding edge tech, much better than Perceptrons they've been using previously (vid above says it was intended for Zen3 but they brought it forward), though I did find a suggestion Intel has already been using this.
The number of ports and dispatch width hasn't changed with the FPU, so I don't think it does.Regarding the AVX256: will they do double-rate 128bit?
I think this is the case, or at least I've not seen a strong enough distinction in terms of features or design behavior to make this appear any different from other cycles of integration and separation that happen over time.Occurs to me this chiplet architecture is arguably a return to separate CPU-Northbridge-Southbridge
I think the cited mechanism is that the DVFS system uses activity monitors and built-in estimates for the power cost of instructions to determine what voltage and clock steps should be used, rather than a coarse change in clocking regime based on what category of instruction the decoder encounters.According to Anand article apparently the scheduler tries to split them up to manage thermals -> no dedicated clock reduction like Intel has but not ruling out thermal throttling via normal systems.
So could be they simplified/downgraded some bits that haven't been bottlenecked to help make space for the extra bits?There's a few places where it's not clear if they simplified the arrows, or there's something to be read into the diagram for the integer execution engine...
Yeah, that and its probably the sort of thing they'd mention explicitly if it was double-rate.The number of ports and dispatch width hasn't changed with the FPU, so I don't think it does.
This may help in certain cases where instructions that might be considered wide by the front end have internally lower costs for whatever reason. One possible area is using very wide AVX instructions to boost the performance of memory copies and clears, where a naive throttling of the core that makes sense for heavy ALU work hurts the memory optimization. However, I think more recent Intel cores have gotten better at subdividing AVX categories so that fewer optimizations are treated like very wide ALU ops.
I don't believe so. It was never an option on my X370 but I don't have the high end OC board. Don't remember seeing it mentioned on OC threads ever.Was it always possible to select the IF clock speed?
IF in Zen1 was fixed at 1/2 DRAM transfer rate.
So, looks like only X500 series mobos will be able to set arbitrary IF divider. Legacy boards probably lack the dedicated clock generator for that. Dunno.
Pentium 4!Async clock domains always incur latency penalty during transition. Overclockers will probably try to keep synced IF and DRAM clocks as far as possible, for latency sensitive benchmarks.
Kind of reminds me of the good old i875P chipset for P4, that had special "short path" mode when FSB and DRAM were operating at the same clocks.
Well we don't know the latency cost of having the separate I/O die.Memory latency and write speed are quite terrible. Hopefully it will perform much better on X570 with final BIOS.
It'd be interesting to see what this core could do with an onboard memory controller though.