The Intel Execution in [2024]

What and how this 'clock tree circuit' and vMin? Is the silicon of this circuit the one that's degrading, or is the whole chip degrading?
Clock tree is another term for the integrated on-die network which distributes clock tics to the entire chip. The "trunk" of the tree starts at the clock generator, which is onboard the CPU itself. The "limbs" and "branches" of the tree speak to how those clock signals are distributed around the die. It's more a tree in logical representation than physical layout, of course.

They weren't very specific about how the clock tree plays into the vMin conversation. One interpretation might be the clock tree network itself is somehow damaged by voltage (it's made of silicon transistors like everything else in the die) and somehow this physical damage results in missing or otherwise skewed / corrupted clock signals being transmitted. If the discrete components of a CPU get out of lock-step with eachother, everything immediately goes to hell. Think of latch and atomic operations where there are mandatory predecessor / successor steps for something to work, and then imagine when those get processed out of order. The user's perception of such a thing would be a hard lock, but it might come along with data corruption depending on what instructions were being processed when it stopped.
 
As per past processors (AMD Epyc 9684X), Phoronix released Linux HPC benchmarks for Granite Rapids. Specifically, the Xeon 6980P processors paired with MRDIMM 8800MT/s memory in an Intel AvenueCity reference platform. The 6980P is quite the heavyweight. Sporting 128 P-cores, a 2.0 GHz base clock, a 3.2 all-core turbo cock, 504 MB L3 cache, and a 500 W TDP, Intel has brought a howitzer to gunfight. AI and GPU performance notwithstanding, as indicated by the HPC benchmarks courtesy of Phoronix, Intel is back in the game in a big way.

The following results are some snapshots of popular HPC benchmarks. Consult the Phoronix article for a full account of all the results and test setups.
 
Last edited:
One interpretation might be the clock tree network itself is somehow damaged by voltage (it's made of silicon transistors like everything else in the die) and somehow this physical damage results in missing or otherwise skewed / corrupted clock signals being transmitted...
That would be my gut feeling based on the reported errors being unpredictable, diverse, more consistent over time and leading to total chip failure. But why would these bits be susceptible? Did Intel make them different? I guess they won't go into details, unless someone sues.
 
Yup, we're just left to guess WTF at this point. It's interesting they mentioned the clock tree at all, because it obviously brings up all the questions you just described. How did they isolate the cause specifically to the clock tree network? What makes that network of transistors more susceptible to this voltage issue than any other part might have been? Asking the same question in reverse: why isn't this vMin problem causing damage to other circuitry?

So many questions we'll likely never know the answers to. :(
 
Not surprised that Intel 3 is genuinely good. I'm just annoyed that they didn't release anything for consumers using it. Desktops have spent the last three years still on 10nm, and Intel has faced multiple levels of scrutiny as a result. For one, they're widely known as 'inefficient' compared to Ryzen competitors. Two, all the negative press over Intel's 14th gen refresh offering basically nothing for consumers. Three, spending an extra year releasing products that are pushed so hard they're creating widespread reliability issues.

I mean, I get it, desktop is not their primary money making market, but it's still annoying all the same. And is still hurting them in terms of PR and reputation. And now they're jumping to TSMC N3B for desktop parts, which is likely not good for their financials and probably means that they cant release anything else in the consumer space on an Intel node at all til 18A is in proper mass production with scale.

I do think Intel is getting back on track, but man there's a lot of pain in getting there. Hopefully they start doing better financially cuz we kind of really need them to.
 
Last edited:
My guess is they were physically examining/scanning failed CPUs and noticed that commonality?
I would assume that would narrow down the potential failure points but there is only so much information you can get from scans.
I think nanoprobing would give them the confirmation.
https://www.thermofisher.com/blog/m...-analysis-technique-for-todays-tech-industry/
https://en.wikipedia.org/wiki/Nanoprobing

Smallest I've dealt with in legacy probe is 5µm, so not even in the same ballpark as what they are using now.
 
Last edited:
Back
Top