Intel i9 7900x CPUs

lanek · May 31, 2017

itsmydamnation said:
I actually expected Skylake-X to be based on the mesh or rings used in knights-whatever-they-are-up-to. I was actually disappoint to see the same old ring bus still kicking around. Maybe we will see in on the XCC die?

I tend to believe that LCC, HCC, XCC share basically the same design ( with difference on cache size ) but are the binning for number core.. ( 12-10-8-6 cores =LCC), 18-16-14 = HCC, and top server one with high number of cores ( 32 and lower ? ).. i dont think they will be such a change between XCC and HCC ( but who knows. ).

Dygaza · May 31, 2017

I do believe also that coming 6 and 8 core intel's will be the choise if you are looking for the best possible performance in games, though AMD still offers good products for value aswell. Competition is good, and more fast cores for mainstream is even better. Now if only we would get totally rid of 2C/4T cpu's for starters.

fellix · May 31, 2017

sebbbi said:
I would guess that L2 size was increased because Intel needs to reduce their shared L3 traffic. Skylake-X has 18 cores (36 threads) feeding from the same shared L3. The core counts keep increasing every year. 4x larger L2 cache on each core means that each core accesses L3 cache less often. Ryzen on the other hand doesn't have one big shared L3 cache. Each four core fluster has their own separate L3 cache. Intel had to eventually increase their L2 size to make it possible to keep their big shared L3 cache.

The nice thing of the small L2 in all previous Intel designs was that it complemented very well the L1 with its low access latency. During Nehalem's launch, Intel actually dubbed it "L1.5" cache for a good reason.
I guess, the scaling issue with the growing number of cores (Intel had to bridge two rings in their top-end Xeons) resulted in too much "horizontal" traffic along the ring stops. So bringing more data closer to the cores, by enlarging the L2, was inevitable. The non-inclusive L3 is still able to prefetch data, so probably it will grow slightly in size at some point in an upcoming architecture refresh.

Gubbi · May 31, 2017

sebbbi said:
I would guess that L2 size was increased because Intel needs to reduce their shared L3 traffic. Skylake-X has 18 cores

A big, shared and inclusive last level cache becomes prohibitively expensive at high core counts.

IMO, that's one of the reasons Intel's highend desktop SKUs has stayed at four cores for so long. In normal desktop apps, games included, processes and threads bounce around from core to core all the time unless developers pay special attention to core affinity. A fast shared LLC remedies the cost of a process/thread migrating from one physical core to another.

Cheers

Gubbi · May 31, 2017

fellix said:
I guess, the scaling issue with the growing number of cores (Intel had to bridge two rings in their top-end Xeons) resulted in too much "horizontal" traffic along the ring stops. So bringing more data closer to the cores, by enlarging the L2, was inevitable. The non-inclusive L3 is still able to prefetch data, so probably it will grow slightly in size at some point in an upcoming architecture refresh.

The semi-distributed L3 connected with a fat ring makes a lot of sense when the number of cores is reasonable. Considering a N core system: You have N agents probing N-1 remote L3 slabs; Fine for four cores, for the 18-core case you get 25 times the aggregate probe traffic. Aggregate L3 set-associativity also scales with number of cores (see below), so you end up with parts of the coherency cost scaling with core count cubed. No wonder Intel uses multiple rings on high core count SKUs.

An inclusive L3 (naively) needs to scale the aggregate associativity with core count. Sky- and Kaby-lake are already cutting corners, the I$ and D$ are both 8-way set-associative, but the L2 is only 4-way set-associative. With lower associativity in your bigger caches, you run the risk of having to evict hot lines from the small caches because you run out of ways in the L2 (ie. if every D$-set alias to the start of a 64KB L2 set, you effectively only have a 4 way, - 16KB D$). It works out in Skylake, probably because the stack is addressed fairly linearly as is the instruction stream.

Cheers

sebbbi · Jun 1, 2017

Gubbi said:
In normal desktop apps, games included, processes and threads bounce around from core to core all the time unless developers pay special attention to core affinity. A fast shared LLC remedies the cost of a process/thread migrating from one physical core to another.

Yeah, a big shared LLC is definitely a benefit for multithreaded applications programmed without specifically optimizing for memory architecture.

Intel Xeons (since Ivy Bridge) Cluster on Die (CoD) mode. This splits the CPU to two halves, both having a separate LLC. This is similar to 8-core Ryzen, where the CPU has two 4-core clusters with their own LLCs. CoD mode decreases the L3 cache latency. It boosts performance in applications that do not need to share data between cores (for example virtual machine running lots of separate processes). AMDs EPYC has always one LLC per 4 cores. So it is basically always running in "CoD mode". This is fine for most enterprise applications, but Ryzen performance clearly points out that consumer applications (including games) like a big shared LLC much more than slightly lower LLC latency. Future programmers need to be thinking more about memory locality. Big shared LLC is starting to become too expensive (and it gets slower as you add more cores and more capacity).

More info here: http://frankdenneman.nl/2016/07/11/numa-deep-dive-part-3-cache-coherency/

Clukos · Jun 1, 2017

Some new results from the 7900x:

2442 MT score at 4.5GHz, the 6950x at the same clock speed scores 2392: https://d1ebmxcfh8bf9c.cloudfront.net/u14296/image_id_1675076.png

2% difference once again, probably down to IPC. This OC is most probably with delid + liquid metal applied between the die and the IHS as the guy in this video is a professional LN2 overclocker.

sebbbi · Jun 1, 2017

Clukos said:
2% difference once again, probably down to IPC. This OC is most probably with delid + liquid metal applied between the die and the IHS as the guy in this video is a professional LN2 overclocker.

Anandtech's two years old Broadwell->Skylake IPC test:
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9

Result = Skylake IPC is on average +2.4% vs Broadwell (same 3 GHz clock rate, same DDR3 memory).

This result was expected. Sad to see no gains from the 4x larger L2 cache vs consumer Skylake. This means that most likely the new L2 cache has higher latency, and the new smaller L3 cache is also hurting.

Clukos · Jun 1, 2017

The 18 core Sky-X part won't be out until next year according to Raja from Asus: https://rog.asus.com/forum/showthre...1324fb0ae9fca7&p=653561&viewfull=1#post653561

I'm not sure if that means the 12/14/16 core parts as well (assuming they are using the same die).

DavidC · Jun 2, 2017

sebbbi said:
This result was expected. Sad to see no gains from the 4x larger L2 cache vs consumer Skylake. This means that most likely the new L2 cache has higher latency, and the new smaller L3 cache is also hurting.

That's just from one result, and it can easily vary that much depending on the site running the review.

Also, Cinebench never cared about L3 cache sizes. Who says it won't be the case with the L2?

sebbbi · Jun 2, 2017

Clukos said:
The 18 core Sky-X part won't be out until next year according to Raja from Asus: https://rog.asus.com/forum/showthre...1324fb0ae9fca7&p=653561&viewfull=1#post653561
I'm not sure if that means the 12/14/16 core parts as well (assuming they are using the same die).

Intel has two Skylake-X chips: LCC and HCC. 12-core is the full LCC chip.

Source: http://www.anandtech.com/show/11464...ging-18core-hcc-silicon-to-consumers-for-1999
Quote: "Still covering the LCC core designs, the final processor in this stack is the Core i9-7920X. This processor will be coming out later in the year, likely during the summer, but it will be a 12-core processor on the same LGA2066 socket for $1199"

But Intel hasn't yet given any details about the 12-core clock rates or TDP. It requires perfect dies. Maybe Intel is waiting to see how good the yields are before deciding on the final clocks.

DavidC said:
Also, Cinebench never cared about L3 cache sizes. Who says it won't be the case with the L2?

Agreed. Cinebench doesn't represent the real world performance in generic (frequently cache missing) code that well. But it's hard to believe that 4x larger L2 cache could still maintain 12 cycle latency. They can't drop associativity either.

CSI PC · Jun 2, 2017

Clukos said:
Some new results from the 7900x:

2442 MT score at 4.5GHz, the 6950x at the same clock speed scores 2392: https://d1ebmxcfh8bf9c.cloudfront.net/u14296/image_id_1675076.png

2% difference once again, probably down to IPC. This OC is most probably with delid + liquid metal applied between the die and the IHS as the guy in this video is a professional LN2 overclocker.

Not sure what to make of his vid, in his text below the video he states (with Google translate).

Here's a preview of his current performance in a light-OC to 4.5 GHz @ 1.15V down.

A 6950X needs around 1.2V minimum for all cores at 4GHz (which is its Boost 3 limit when not overclocked), so no idea why the guy talks about 1.15V down for the 7900X.
If true that is a big surprise for all cores at 4.5GHz, or something not quite right with the setup.

Der8auer vid is a good example of benchmarking 6950X (this is before delid) and does 4GHz all cores on air with 1.2V as a comparison.

Cheers

Clukos · Jun 4, 2017

The video is a bit dramatic towards the end but has some good info in it.

Entropy · Jun 4, 2017

Clukos said:
The video is a bit dramatic towards the end but has some good info in it.

Well it's pretty straightforward isn't it, and he basically tells it like it is. Even softens the blow a bit by glossing over the financial aspects.
Intels stratification of the market has gotten a bit disrupted in these segments, and they haven't come up with a good replacement strategy.

xEx · Jun 4, 2017

Clukos said:
The video is a bit dramatic towards the end but has some good info in it.

Actually is(Should be) everyone's opinion. This HEDT is awful...Like why in the hell am i buying a top end(?) MotherBoard with Q-channel, support for bootable raid and all of that and pair it with a 4core(!!!!!!!!!!) CPU that actually can't use any of those features just to have a CPU running 100Mhz higher? Idk what the guy who came with that idea is smoking but i am some myself.

I know Intel will sell this just by inertia but besides some specific cases I still don't see any mayor reason to spend a lot more in the Intel platform over the AMDs. Maybe if you are hoping optane hdds to be much faster than SSDs or you want the bootable raid(with of course! the raid key...(are we back in the early 2000?) or you want specific software that runs way better on Intels and ur willing to pay the extra price.

Also all of TR CPUs will have full access to PCI lanes so you can have multi GPUs and M.2 drives while in intel they get cut.

If I recall correctly the 14+ core will be available next year so It will face zen2 so the difference in performance, in that case, will be smaller.

Blazkowicz · Jun 4, 2017

Are you sure about that Zen 2? I thought we would have to wait for 7 nm and if so, that's quite a way off.
In my line of thinking Intel Cannon Lake-X might be released before "Zen 2" Ryzen and Thread Ripper.

lanek · Jun 4, 2017

sebbbi said:
Anandtech's two years old Broadwell->Skylake IPC test:
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9

Result = Skylake IPC is on average +2.4% vs Broadwell (same 3 GHz clock rate, same DDR3 memory).

This result was expected. Sad to see no gains from the 4x larger L2 cache vs consumer Skylake. This means that most likely the new L2 cache has higher latency, and the new smaller L3 cache is also hurting.

This is why i was speaking about taking the actual skylake as IPC performance range, as, cache will only do a big work on a so specific instances ( meaning benchamrks ) and cases ( meaning softwares ).. yes it can have a big impact, but from memory, i have never see it outside some rough corners case.

Blazkowicz said:
Are you sure about that Zen 2? I thought we would have to wait for 7 nm and if so, that's quite a way off.
In my line of thinking Intel Cannon Lake-X might be released before "Zen 2" Ryzen and Thread Ripper.

" might be released before "Zen 2" Ryzen and Thread Ripper"

Cannonlakex before what ???? ... do you mean threadrippers 2 and ryzen 2 ?

Cyan · Jun 5, 2017

Clukos said:
The video is a bit dramatic towards the end but has some good info in it.

It seems that intel sucks good.
My God.
The thing is worse than I expected.
The dlc returns.
Intel are desperate and messing up fine.
Now I hope that AMD will follow the path they are on now and intel improve.
That's an ironic thing.

sebbbi · Jun 5, 2017

xEx said:
Actually is(Should be) everyone's opinion. This HEDT is awful...Like why in the hell am i buying a top end(?) MotherBoard with Q-channel, support for bootable raid and all of that and pair it with a 4core(!!!!!!!!!!) CPU that actually can't use any of those features just to have a CPU running 100Mhz higher? Idk what the guy who came with that idea is smoking but i am some myself.

I know Intel will sell this just by inertia but besides some specific cases I still don't see any mayor reason to spend a lot more in the Intel platform over the AMDs. Maybe if you are hoping optane hdds to be much faster than SSDs or you want the bootable raid(with of course! the raid key...(are we back in the early 2000?) or you want specific software that runs way better on Intels and ur willing to pay the extra price.

Also all of TR CPUs will have full access to PCI lanes so you can have multi GPUs and M.2 drives while in intel they get cut.

If I recall correctly the 14+ core will be available next year so It will face zen2 so the difference in performance, in that case, will be smaller.

Also no AVX-512 in KabyLake-X. Only in Skylake-X. This means that we aren't getting low core count AVX-512 chips anytime soon. This is confusing, since KabyLake is newer chip than Skylake. HEDT buyers need to be careful when picking their CPU model if they want AVX-512 support.

xEx · Jun 5, 2017

Also that you will need to read the mobos manual to spot the ram slot that will be disable when you use the i7 and i5 on them.

That moment when you realize that the if you but a High End Desktop you actually may end up buying a mid end PC...its at the same time sad and hilarious.

.

AMD couldn't hope for a better reaction from Intel.

Intel i9 7900x CPUs

lanek

Dygaza

fellix

Gubbi

Gubbi

sebbbi

Clukos

Bloodborne 2 when?

sebbbi

Clukos

Bloodborne 2 when?

DavidC

sebbbi

CSI PC

Clukos

Bloodborne 2 when?

Entropy

xEx

Blazkowicz

lanek

Cyan

orange

sebbbi

xEx

Similar threads