Why would Microsoft purposely downclock Lockhart's GPU? If anything, Lockhart should have faster clocks than Anaconda because of the better thermals (or less heat generated from having a single chiplet design). Unless, they went with a super-shitty vapor/heatsink design on Lockhart.
Yield, Ppwer Delivery, Cooling, etc are all factors into price points; I'm not going to say this is legit just some factors to consider.
My issue is with the RCC. Like it doesn't make mcuh sense to me, even if it were real and it did TF calculations, shouldn't it be closer to 60 TF?
Like lets make a simple comparison to the 2080RTX for instance. We see the performance with RTX on and RTX off. In DirectX Ray Tracing ,there is a flow chart section that says, if you have hardware enabled intersection go right, otherwise use a intersection shader and go left.
If we consider going left for a second, the developer needs to write an intersection shader, or whatever the case is. That shader has access to how many flops of 2080s 14? certainly that intersection shader can use up more than 1.2 TF of compute power. And assuming we see the difference of say 10 fps to 50 or 60fps, whatever that intersection hardware acceleration is doing, it's speeding up the process by that amount. So 1.2 TF doesn't make a lot of sense to me.
For reference point:
The full-fat version of Turing (it’s not clear which GPU this specifically refers to) is capable of 14 TFLOPS of FP32, 110 FP16 tensor FLOPS (that’s the half-precision mode) and 78 RTX-OPS. That last metric isn’t really a metric at all since we don’t really know what an RTX-OP is, exactly, but presumably, that kind of information will be fleshed out at a later date. The current Titan X, in contrast, is capable of 12 RTX-OPS.
So RT cores are doing about 6x more than what the intersection shaders on a Titan X are capable of. If this is my understanding.
This is either a plain wrong measure or the if it's indeed true, our understanding of the bottlenecks around triangle intersection is way outside our domain knowledge. I'd almost recognize that if ti's not compute power issues for figuring out intersection, then they should have invested in a lot more general compute power.
edit: Tom's Hardware made an attempt at a breakdown
https://www.tomshardware.com/reviews/nvidia-turing-gpu-architecture-explored,5801-10.html
So, given that…
FP32 compute = 4352 FP32 cores * 1635 MHz clock rate (GPU Boost rating) * 2 = 14.2 TFLOPS
RT core compute = 10 TFLOPS per gigaray, assuming GeForce GTX 1080 Ti (11.3 TFLOPS FP32 at 1582 MHz) can cast 1.1 billion rays using software emulation = ~100 TFLOPS on a GeForce RTX 2080 Ti capable of casting ~10 billion rays
INT32 instructions per second = 4352 INT32 cores * 1635 MHz clock rate (GPU Boost rating) * 2 = 14.2 TIPS
Tensor core compute = 544 Tensor cores * 1635 MHz clock rate (GPU Boost rating) * 64 floating-point FMA operations per clock * 2 = 113.8 FP16 Tensor TFLOPS
…we can walk Nvidia’s math backwards to see how it reached a 78 RTX-OPS specification for its GeForce RTX 2080 Ti Founders Edition card:
(14 TFLOPS [FP32] * 80%) + (14 TIPS [INT32] * 28% [~35 INT32 ops for every 100 FP32 ops, which take up 80% of the workload]) + (100 TFLOPS [ray tracing] * 40% [half of 80%]) + (114 TFLOPS [FP16 Tensor] * 20%) = 77.9
Yea... 1.2 TF for a RCC seems stupid. Won and done. That author should have googled tom's hardware