Intel i9 7900x CPUs

Aida cache benchmark on an overclocked 7820x from oc.net:

f6244d69_li3ffc28buaj.jpeg


6950x for comparison: http://cdn.overclock.net/5/5b/5b83c75c_69aida.png

Looks like
  • L1 cache about the same (maybe a bit faster, not sure what difference clocks make)
  • L2 cache performance up, latency up as well
  • L3 cache performance significantly worse, latency up too
  • Higher read/write memory performance, latency potentially higher (different timings though so who knows)
 
Last edited:
I've had someone else benchmark an ES-i7-7820X for me and this particular chip seemingly had full AVX512 throughput, as does the i9. Monday hopefully I'll get my hands on a retail i7-7800X and finally be able to check.

edit: Alex Yee's FLOPS benchmark seems to indicate full AVX512 throughput with the retail i7-7800X
 
Last edited:
The Xeon Skylakes have been anounced including more detail about AVX 512.
There are versions both with 1 and 2 FMAs per core.
That makes the same scenario for Skylake X more plausible again :(
 
From what I've seen at a quick glance over the Xeon specs: the 1-FMA-SKUs mostly are in the 85W-105W range, so it is possibly not a question of the additional FMA unit being physically present but of optimizing for lower power. The SKL-X desktop SKUs are supposedly all rated at 140W, even though CPU-z reads our retail sample at 122W - we will see, if this survives the next revision of CPU-z.

If I had to guess though, I'd bet that the 1-FMA Xeons in fact have both the coupled and the "external" FMA unit enabled, but both are used at half clocks alternating. That would probably save even more power than to disable one and run one at full clocks.
 
From what I've seen at a quick glance over the Xeon specs: the 1-FMA-SKUs mostly are in the 85W-105W range, so it is possibly not a question of the additional FMA unit being physically present but of optimizing for lower power. The SKL-X desktop SKUs are supposedly all rated at 140W, even though CPU-z reads our retail sample at 122W - we will see, if this survives the next revision of CPU-z.

If I had to guess though, I'd bet that the 1-FMA Xeons in fact have both the coupled and the "external" FMA unit enabled, but both are used at half clocks alternating. That would probably save even more power than to disable one and run one at full clocks.

There is quite a few Xeons at 205W that have 2 FMAs enabled.
The ones that have 2 are the Platinum and the Gold 61xx

Platinum 81xx up to 28 cores 2xFMA 512
Gold 61xx up to 22 cores 2xFMA 512
Gold 51xx up to 14 cores 1xFMA 512
Silver 41xx up to 12 cores 1xFMA 512
Bronze 31xx up to 8 cores 1xFMA 512

I'm still eager to know if for Skylake-X the 6/8 cores have now 1 or 2 FMAs enabled.
It's quite baffling that these CPUs are put on the market without any explicit specification.

For the Xeons the nr of enabled FMA units is specified:
https://ark.intel.com/products/123546/Intel-Xeon-Bronze-3104-Processor-8_25M-Cache-1_70-GHz
https://ark.intel.com/products/120497/Intel-Xeon-Platinum-8153-Processor-22M-Cache-2_00-GHz

Not so for the Skylake-X
https://ark.intel.com/products/1236...-series-Processor-13_75M-Cache-up-to-4_30-GHz
 
Last edited:
I linked our test below - full AVX512 throughput (i.e. 2 FMA) with a retail-marked, retail-sold i7-7800X. Have no 7820X yet.
 
This may be a silly question, but I have been unable to find an answer:
If there is only one FMA unit, is floating point throughput identical to previous CPUs with AVX2? (and, with two FMA units, doubled?).
Or is it that with one FMA unit, FP throughput is doubled vs AVX2 and with two FMA units, quadrupled?
 
The former. The two AVX2 units that were present already in Haswell/Broadwell/Skylake/Kaby Lake are combined to work on 512 Bit registers for AVX512F. One additional unit, that is present in Skylake-X and only some of the higher-tier Xeon offerings, doubles the throughput compared to previous Intel-generations.
 
L3 cache latency, size per core and performance is decreased in comparison to Kaby Lake and Broadwell-E, this is probably affecting games more than the increase in L2 cache.
 
So seems the new 6 core i7 7800X is significantly worse the i7 7700K in several games, notably Far Cry Primal, Warhammer, GTA V and even Doom. Postulated reason for that is the significantly worse latency of the L3 cache.

[IMG ]https://techspot-static-xjzaqowzxao...rticles-info/1445/bench/AverageSlide.png[/IMG ]

https://www.techspot.com/review/1445-core-i7-7800x-vs-7700k/page9.html
Is that really their conclusion? I mean, it's not like all games scale perfectly with even four cores, not to speak of 6 or more. And then, there's the slight clock speed advantage the 7740 enjoys in addition to the L3 cache not only having better latency, but running at north of 4 GHz as well compared to the 2,4 GHz at which the L3 cache in SKL-X runs.

I see the 7800X also behind in our games-testing, but I would not try to scapegoat the L3 alone.
 
L3 cache latency, size per core and performance is decreased in comparison to Kaby Lake and Broadwell-E, this is probably affecting games more than the increase in L2 cache.
Games seem to prefer large shared inclusive L3 cache over a smaller and slower L3 victim cache. Server software is mostly using data independent threads (or combining data infrequently) while games are frequently (every millisecond) moving data between cores or accessing world state from multiple threads at once (= big chunk of mostly immutable data that can be easily shared). Big shared L3 cache is great for this purpose. We can see similar performance issues with Ryzen. Ryzen also has a smaller and slower L3 victim cache (and it's also split between clusters).

Skylake-X gaming performance is giving us more information why Ryzen's gaming performance lags behind i7 6900K and i7 7700K. Ryzen also has 2x larger L2 cache than these Intel consumer chips, but that's not apparently a big deal for games, since Skylake-X has 4x larger L2 cache and that doesn't help either. Games seem to really love a big and fast fully shared inclusive L3. Unfortunately you can't really scale up the core count and keep caches like this around. AMD vs Intel gaming performance (8+ core chips) is now much more comparable than it was with last gen Intel chips. It looks like Zen2 doesn't need huge changes after all to compete against modern Intel HEDT in games. But unfortunately this means that Intel's quad core chips will remain the best chips for gaming. Skylake-X can't beat them, Ryzen and Threadripper can't beat them. Hopefully next gen consoles will have Ryzen in them (with 16+ threads), forcing game developers to design their systems in a way that scales properly to these 8+ core PC CPUs.
 
Back
Top