Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Deleted member 13524 · Aug 19, 2021

Bondrewd said:
They run packed math like every other RDNA2 part out there.

From the Linux drivers it looks like RPM on DP4A also runs on every RDNA1 GPU too, with the sole exception of Navi 10.
Which is good news for owners of Navi 12 and 14 (Macbook Pro?) GPUs.

Bondrewd · Aug 19, 2021

ToTTenTranz said:
Which is good news for owners of Navi 12 and 14 (Macbook Pro?) GPUs.

Well until Apple orphans those teehee

Deleted member 13524 · Aug 19, 2021

troyan said:
And you saying that 1/4 INT8 performance is "very much fast enough" for ML approaches?

Considering DLSS 1.9 ran on CUDA cores probably using DP4a RPM as well, yes.

https://www.techspot.com/article/1992-nvidia-dlss-2020/

And there's also this:

Andrew Lauritzen said:
In terms of area/power, you should probably be more mad about the amount of tensor cores put on these consumer chips (yes, even with DLSS...)

Davros · Aug 19, 2021

Bondrewd said:
Codeplay is writing one.

Really thought their last album was poor

Bondrewd · Aug 19, 2021

ToTTenTranz said:
using DP4a

That's a Gen12 extension but yes.

troyan · Aug 19, 2021

No, 1.9 was just a shader pass. Had nothing to do with anything related to DLSS or XeSS.
That quote doesnt make any sense. But i guess when your software was developed prior new hardware you try to defend it. UE5 needs upscaling because no hardware is possible to even run it at 1440p with more than 60FPS. So having dedicated units for adcanced upscaling should be applaud...

DegustatoR · Aug 19, 2021

troyan said:
No, 1.9 was just a shader pass.

1.9 was very much what UE5's TSR is.

Dictator · Aug 19, 2021

ToTTenTranz said:
Considering DLSS 1.9 ran on CUDA cores probably using DP4a RPM as well, yes.

I am pretty sure there was no referencing at all for 1.9 - it was more just a TAA U. Hence why it cost next to nothing unlike DLSS which has a noticable millisecond frame cost.

Deleted member 13524 · Aug 19, 2021

Dictator said:
I am pretty sure there was no referencing at all for 1.9 - it was more just a TAA U. Hence why it cost next to nothing unlike DLSS which has a noticable millisecond frame cost.

Inferencing?

Fair enough, then. I wrongly assumed Control's Deep Learning Super Sampling was using... ML inferencing.

Regardless, Intel's predictions on the performance deficit when using mixed precision dot products on the FP32 ALUs point to a pretty small frametime difference, even if the upscaling process itself is seemingly taking 2x longer.
Also, the fact that they're pivoting the DP4a path for their Xe LP iGPUs is another strong indicator that it runs fast enough using the shader processors.

It's still hardware that can't do anything else for gaming at the moment, so the question on whether the die area wouldn't be better spent on other execution units still stands. Especially as Intel's top-end offering to release only in Q1 2022 (i.e. a quarter away from RDNA3) seems to only be competitive with a Navi 22 or a GA104, at least in rasterization.

Jay · Aug 19, 2021

ToTTenTranz said:
Considering DLSS 1.9 ran on CUDA cores probably using DP4a RPM as well, yes.

https://www.techspot.com/article/1992-nvidia-dlss-2020/

And there's also this:

.
I wouldn't say it makes little sense.
Just a view that the amount of die space used for tensor cores is excessive.
Doesn't mean that couldn't have a smaller separate tensor accelerator that isn't implemented on side of each CU.

Sorry think I quoted wrong thing.
Was in relation to @Andrew Lauritzen quote though

Deleted member 13524 · Aug 19, 2021

Jay said:
I wouldn't say it makes little sense.
Just a view that the amount of die space used for tensor cores is excessive.
Doesn't mean that couldn't have a smaller separate tensor accelerator that isn't implemented on side of each CU.

Yes, to me the comment is suggesting that there's too much die area dedicated to the tensor cores, not that they shouldn't exist. Probably because whatever GPU diagnostics tool or methodology Epic is using to light up the tensor cores shows that their occupancy isn't great when DLSS is used.

Though it's also a good indicator that those hundreds of TOPs are probably not really necessary for ML-based upscaling in real-time rendering.

Dictator · Aug 19, 2021

ToTTenTranz said:
Inferencing?

Fair enough, then. I wrongly assumed Control's Deep Learning Super Sampling was using... ML inferencing.

Regardless, Intel's predictions on the performance deficit when using mixed precision dot products on the FP32 ALUs point to a pretty small frametime difference, even if the upscaling process itself is seemingly taking 2x longer.
Also, the fact that they're pivoting the DP4a path for their Xe LP iGPUs is another strong indicator that it runs fast enough using the shader processors.

It's still hardware that can't do anything else for gaming at the moment, so the question on whether the die area wouldn't be better spent on other execution units still stands. Especially as Intel's top-end offering to release only in Q1 2022 (i.e. a quarter away from RDNA3) seems to only be competitive with a Navi 22 or a GA104, at least in rasterization.

Yes, inferencing. Lovely Phone.

troyan · Aug 19, 2021

ToTTenTranz said:
Yes, to me the comment is suggesting that there's too much die area dedicated to the tensor cores, not that they shouldn't exist. Probably because whatever GPU diagnostics tool or methodology Epic is using to light up the tensor cores shows that their occupancy isn't great when DLSS is used.

Problem is that Epic is a ISV. They dont care about hardware limitations. TensorCores are cheap. They run for 1ms on a 3090 for super resolution. That is basically 1/16 the time a 3090 can render a 1080p frame. Yet this 1ms provide at least twice the image quality of a 16x higher compute intensiv frame.

Using Matrix engines for upscaling is highly effcient.

DegustatoR · Aug 19, 2021

Spending these ~10% of transistors on something like shading units would be completely pointless for anyone, especially since these added 10% of shading units would bring an added 10% of power with them leading to 10% clocks reduction and what would be the exact same performance we have now, with tensor cores - but without said tensor cores, DLSS and all the fancy ML stuff they enable.

I'd suggest leaving the h/w design to the h/w designers here. They generally know what they are doing.

Lurkmass · Aug 19, 2021

DegustatoR said:
I kinda wonder what backend a console would run a SYCL code through? Is there anything in modern console APIs which is compatible with SYCL compilers?

If XeSS is "cross-compatible" across vendors then I'd wager that SYCL is the least likely possibility to begin with and it most probably runs on more widely supported standards like Direct3D or Vulkan ...

Consoles don't support SYCL at all but GNM supports advanced C++ features like templates so you can write your compute kernels in a single file thus it supports the single source programming model just like CUDA. Consoles are in too good of a state to support subpar standards like SYCL ...

DegustatoR · Aug 19, 2021

Lurkmass said:
If XeSS is "cross-compatible" across vendors then I'd wager that SYCL is the least likely possibility to begin with and it most probably runs on more widely supported standards like Direct3D or Vulkan ...

These don't have any ML capabilities though. So the DP4a version may use them I suppose but the XMX one likely can't.

trinibwoy · Aug 19, 2021

DegustatoR said:
https://www.anandtech.com/show/16895/a-sneak-peek-at-intels-xe-hpg-gpu-architecture

Cross-posting from RT thread...

Nice, I like "core" much better than "subslice". Intel is sticking to its guns and referring to self contained execution units as cores. This is arguably more accurate but pretty useless for comparison to AMD's and Nvidia's "cores". This picture also makes it seem like the RT and texture units can be accessed from any of the cores. That would be really interesting but the picture is probably just misleading. Rasterizer and ROPs inside the slice are very similar to Ampere and RDNA.

Intel Slice = Nvidia GPC = AMD Shader Array
Intel Core = Nvidia SM = AMD WGP
Intel Vector engine = Nvidia Partition = AMD SIMD

Did I get that right?

Bondrewd · Aug 19, 2021

trinibwoy said:
Did I get that right?

Yes.

Davros · Aug 19, 2021

Is that card just a mock up for marketing purposes or is the gpu really that big? It looks F**kin huge or am I just behind the times and that's what size gpu's are theses days?

Bondrewd · Aug 19, 2021

Davros said:
Is that card just a mock up for marketing purposes or is the gpu really that big?

OAMs are huge!

Davros said:
that's what size gpu's are theses days?

Yes!
OAMs are big and made to handle 500W+ ACCs.

Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Deleted member 13524

Guest

Bondrewd

Deleted member 13524

Guest

Davros

Bondrewd

troyan

DegustatoR

Dictator

Deleted member 13524

Guest

Jay

Deleted member 13524

Guest

Dictator

troyan

DegustatoR

Lurkmass

DegustatoR

trinibwoy

Meh

Bondrewd

Davros

Bondrewd

Similar threads