Intel i9 7900x CPUs

3dilettante · Jul 20, 2017

CarstenS said:
Do prefetches generally feed into the L3 first and from there L2 is populated or is it a general rule that you prefetch into L2 directly even when you have an inclusive LLC behind it?

From the high level summary of the caching model, it would be undesirable for the L2 to have a line populated and accessible before the L3 and its coherence/snoop information is populated with a state consistent with the L2's status and core-use information.

The L2 prefetcher's fetches create an L2 miss, which I think a more straightforward implementation would then go the L3 to see if the data is there.
If not there, then the L3 slice could generate an L3 miss that would then send a request to memory or broadcast a request if it is an SMP setup.
The listed behavior would readily fall out of this chain if the sequence is maintained, and the rule is that the prefetcher's miss can be discarded and the L3 slice's miss cannot.

I'm not sure if that means a message or signal is sent out to actively cancel the L2's request or the L2 makes a note of ignoring or rejecting it. Ignoring it might work since Intel's level of inclusion is not total, and the cores can silently evict lines without telling the L3. This would be like a preemptive eviction of a line. At worst, that leads to redundant snoops or invalidates that yield nothing.

That might not be what the hardware necessarily does. There are events separated by variable amounts of time, and it may be possible to shift events around or bypass stages as long as the arbitrating hardware properly isolates intermediate states or recovers from problems. For Intel's inclusive L3, the slices each have agents that manage that arbitration, so it would seem like the L3 would be the nearest place to update when the caching agent starts processing the transaction.

Malo said:
In the Alerts menu (the flag on the top bar), it will state "[username] mentioned you in [thread title]"

I'm not sure if any of the mentions yesterday were meant to have gone through or if mentions don't archive, since my list doesn't seem to have any from this thread.

xEx · Jul 21, 2017

Be careful with the OC...

That was 1.25v on a 7800X

CarstenS · Jul 23, 2017

There's something else wrong witht that CPU/Board then. 1.25v is default VID for 4,5 GHz 2c-TBM3 in SKX.

Clukos · Jul 23, 2017

Voxilla · Jul 28, 2017

Meanwhile I also got a retail 7820X octa core.
I can confirm it indeed has both FMA AVX 512 units enabled.

Here a AVX 512 julia/mandelbrot real time zoomer I made up to date.
Computations done in double precision.
Compared to a Titan XP, it runs twice faster

Warning, when running all 8 cores at 4 Ghz, CPU power is up to 208 Watt !

Edit
- Replaced with a slightly less optimized version, to reduce the heat, 10% less heat and speed.
(my cooler can't cope, with CPU at ~100 degrees celcius)
- Added a fall back to AVX2 if no AVX512 is present
- Added a missing libmmd.dll ( had to use an Intel compiler and didn't find a way to get rid of this dll)

entity279 · Jul 28, 2017

Clukos said:

I'm at my post ..

http://www.eurogamer.net/articles/d...view-core-i9-7900x-i7-7820x-i7-7800x-i7-7740x

Alexko · Jul 30, 2017

Voxilla said:
Meanwhile I also got a retail 7820X octa core.
I can confirm it indeed has both FMA AVX 512 units enabled.

Here a AVX 512 julia/mandelbrot real time zoomer I made up to date.
Computations done in double precision.
Compared to a Titan XP, it runs twice faster

Warning, when running all 8 cores at 4 Ghz, CPU power is up to 208 Watt !

Edit
- Replaced with a slightly less optimized version, to reduce the heat, 10% less heat and speed.
(my cooler can't cope, with CPU at ~100 degrees celcius)
- Added a fall back to AVX2 if no AVX512 is present
- Added a missing libmmd.dll ( had to use an Intel compiler and didn't find a way to get rid of this dll)

Very interesting! What happens if you keep the optimized code path, but downclock and undervolt the CPU a bit?

CarstenS · Jul 30, 2017

4 GHz at AVX512 load with unlimited power by UEFI sounds much like the MSI X299 board. Other boards enforce the 140 watt TDP, downclocking the 7900X for example to 3,1-3,2 GHz in AVX512 loads.

Voxilla · Jul 31, 2017

Alexko said:
Very interesting! What happens if you keep the optimized code path, but downclock and undervolt the CPU a bit?

I'll be adding the fully optimized version, so it can be tried on CPUs with more safe settings.
The additional optimization is 4 way interleaving of computations. The fractal computations are one long dependency chain and the FMAs have 4 or 6 cycles latency. Interleaving and SMT mitigates the dependencies.
The less optimized version does only 2 way interleaving.

I'd like to keep my CPU at 4 Ghz for AVX512, only for this extreme kind of code it is a problem.

Voxilla · Jul 31, 2017

CarstenS said:
4 GHz at AVX512 load with unlimited power by UEFI sounds much like the MSI X299 board. Other boards enforce the 140 watt TDP, downclocking the 7900X for example to 3,1-3,2 GHz in AVX512 loads.

Indeed, the board is a MSI X299 Tomahawk. I'm running with the Enhanced Turbo on, which means all cores run normally at 4.3 Ghz. AVX512 would not run at that frequency. To fix that I put 'AVX offset' to -3, which causes frequency to be reduced to 4 Ghz when running AVX/AVX512.

CarstenS · Jul 31, 2017

I see. 4.0 GHz still is all-core turbo and not what non-insane UEFIs do use. No wonder you're having problems cooling that amount of heat with air.

Any chance you could make that more optimized torture version of your mandelbrot/julia renderer available again? And does it tax GPUs equally heavy? For now, your Waves3D is hammering GPUs the most, even though it largely depends on bandwidth.

Voxilla · Jul 31, 2017

CarstenS said:
I see. 4.0 GHz still is all-core turbo and not what non-insane UEFIs do use. No wonder you're having problems cooling that amount of heat with air.
Any chance you could make that more optimized torture version of your mandelbrot/julia renderer available again? And does it tax GPUs equally heavy? For now, your Waves3D is hammering GPUs the most, even though it largely depends on bandwidth.

I'll be adding the fully optimized version tonight.
The CPU and GPU code are very similar. On GPUs there is no explicit interleaving of computations but I would think the inherent threading takes care of FMA dependencies, so it's likely optimal on GPUs too.

Voxilla · Jul 31, 2017

Ok, I've updated the AVX2 / AVX512 / GPU fractal zoomer, to include fastest AVX512 computation.
This can be toggled on/off with the 'F' key. You may have to disable waiting for vsync to see the difference (V key).
Warning, this code can produce extreme heat, even more than prime95, use at your own risk !

CarstenS · Jul 31, 2017

Thanks! If I find the time, I'll test it against my current worst case tomorrow (but with a more tame UEFI that's honoring the 140 Watt TDP - I'm measuring achieved clock rates here instead)

sebbbi · Aug 3, 2017

http://www.anandtech.com/show/11687/coffee-lake-not-supported-by-intels-200series-motherboards

Forthcoming Coffee Lake (6-core / 12 threads non HEDT consumer chips) needs new motherboards. This makes HEDT 6/8-core and Ryzen much more appealing upgrade options for many consumers, since you can't simply plug the new Coffee Lake 6-core to your existing Skylake 6600K/6700K socket. Someone needs to upgrade the Wikipedia page (https://en.wikipedia.org/wiki/LGA_1151).

I was also considering the highest clocked 6-core Coffee Lake as an cost effective upgrade path for our non-programmers (we all have Skylake 6700K now). I will get myself a Threadripper in any case, but now it seems that Threadripper would be a pretty good upgrade path for all of us (that 12-core / 24 thread model at 799$ is very aggressively priced).

Malo · Aug 3, 2017

Gee what a surprise.

Voxilla · Aug 3, 2017

sebbbi said:
http://www.anandtech.com/show/11687/coffee-lake-not-supported-by-intels-200series-motherboards

Forthcoming Coffee Lake (6-core / 12 threads non HEDT consumer chips) needs new motherboards. This makes HEDT 6/8-core and Ryzen much more appealing upgrade options for many consumers, since you can't simply plug the new Coffee Lake 6-core to your existing Skylake 6600K/6700K socket. Someone needs to upgrade the Wikipedia page (https://en.wikipedia.org/wiki/LGA_1151).

I was also considering the highest clocked 6-core Coffee Lake as an cost effective upgrade path for our non-programmers (we all have Skylake 6700K now). I will get myself a Threadripper in any case, but now it seems that Threadripper would be a pretty good upgrade path for all of us (that 12-core / 24 thread model at 799$ is very aggressively priced).

What do you use the large amount of threads for ?

sebbbi · Aug 3, 2017

Voxilla said:
What do you use the large amount of threads for ?

UE4 code recompile takes 25 minutes on 6700K. Shader recompile (console target) takes over an hour (UE4 has so many shader permutations). Data cooking is also slow on quad (I have fast SSD obviously). Many console platforms + PC + debug/release, so there's plenty of these operations happening. Quad loses 30+ min of your time every day, and more than an hour in bad days.

Alexko · Aug 4, 2017

What would non-programmers do with all that power?

BRiT · Aug 4, 2017

Play Crysis ...

Intel i9 7900x CPUs

3dilettante

xEx

CarstenS

Moderator

Clukos

Bloodborne 2 when?

Voxilla

entity279

Alexko

CarstenS

Moderator

Voxilla

Voxilla

CarstenS

Moderator

Voxilla

Voxilla

CarstenS

Moderator

sebbbi

Malo

Yak Mechanicum

Voxilla

sebbbi

Alexko

BRiT

(>• •)>⌐■-■ (⌐■-■)

Similar threads