Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
They run packed math like every other RDNA2 part out there.

From the Linux drivers it looks like RPM on DP4A also runs on every RDNA1 GPU too, with the sole exception of Navi 10.
Which is good news for owners of Navi 12 and 14 (Macbook Pro?) GPUs.
 
No, 1.9 was just a shader pass. Had nothing to do with anything related to DLSS or XeSS.
That quote doesnt make any sense. But i guess when your software was developed prior new hardware you try to defend it. UE5 needs upscaling because no hardware is possible to even run it at 1440p with more than 60FPS. So having dedicated units for adcanced upscaling should be applaud...
 
I am pretty sure there was no referencing at all for 1.9 - it was more just a TAA U. Hence why it cost next to nothing unlike DLSS which has a noticable millisecond frame cost.
Inferencing?

Fair enough, then. I wrongly assumed Control's Deep Learning Super Sampling was using... ML inferencing.

Regardless, Intel's predictions on the performance deficit when using mixed precision dot products on the FP32 ALUs point to a pretty small frametime difference, even if the upscaling process itself is seemingly taking 2x longer.
Also, the fact that they're pivoting the DP4a path for their Xe LP iGPUs is another strong indicator that it runs fast enough using the shader processors.

It's still hardware that can't do anything else for gaming at the moment, so the question on whether the die area wouldn't be better spent on other execution units still stands. Especially as Intel's top-end offering to release only in Q1 2022 (i.e. a quarter away from RDNA3) seems to only be competitive with a Navi 22 or a GA104, at least in rasterization.
 
Considering DLSS 1.9 ran on CUDA cores probably using DP4a RPM as well, yes.

https://www.techspot.com/article/1992-nvidia-dlss-2020/


And there's also this:
.
I wouldn't say it makes little sense.
Just a view that the amount of die space used for tensor cores is excessive.
Doesn't mean that couldn't have a smaller separate tensor accelerator that isn't implemented on side of each CU.

Sorry think I quoted wrong thing.
Was in relation to @Andrew Lauritzen quote though
 
I wouldn't say it makes little sense.
Just a view that the amount of die space used for tensor cores is excessive.
Doesn't mean that couldn't have a smaller separate tensor accelerator that isn't implemented on side of each CU.

Yes, to me the comment is suggesting that there's too much die area dedicated to the tensor cores, not that they shouldn't exist. Probably because whatever GPU diagnostics tool or methodology Epic is using to light up the tensor cores shows that their occupancy isn't great when DLSS is used.


Though it's also a good indicator that those hundreds of TOPs are probably not really necessary for ML-based upscaling in real-time rendering.
 
Inferencing?

Fair enough, then. I wrongly assumed Control's Deep Learning Super Sampling was using... ML inferencing.

Regardless, Intel's predictions on the performance deficit when using mixed precision dot products on the FP32 ALUs point to a pretty small frametime difference, even if the upscaling process itself is seemingly taking 2x longer.
Also, the fact that they're pivoting the DP4a path for their Xe LP iGPUs is another strong indicator that it runs fast enough using the shader processors.

It's still hardware that can't do anything else for gaming at the moment, so the question on whether the die area wouldn't be better spent on other execution units still stands. Especially as Intel's top-end offering to release only in Q1 2022 (i.e. a quarter away from RDNA3) seems to only be competitive with a Navi 22 or a GA104, at least in rasterization.
Yes, inferencing. Lovely Phone.
 
Yes, to me the comment is suggesting that there's too much die area dedicated to the tensor cores, not that they shouldn't exist. Probably because whatever GPU diagnostics tool or methodology Epic is using to light up the tensor cores shows that their occupancy isn't great when DLSS is used.

Problem is that Epic is a ISV. They dont care about hardware limitations. TensorCores are cheap. They run for 1ms on a 3090 for super resolution. That is basically 1/16 the time a 3090 can render a 1080p frame. Yet this 1ms provide at least twice the image quality of a 16x higher compute intensiv frame.

Using Matrix engines for upscaling is highly effcient.
 
Spending these ~10% of transistors on something like shading units would be completely pointless for anyone, especially since these added 10% of shading units would bring an added 10% of power with them leading to 10% clocks reduction and what would be the exact same performance we have now, with tensor cores - but without said tensor cores, DLSS and all the fancy ML stuff they enable.

I'd suggest leaving the h/w design to the h/w designers here. They generally know what they are doing.
 
I kinda wonder what backend a console would run a SYCL code through? Is there anything in modern console APIs which is compatible with SYCL compilers?

If XeSS is "cross-compatible" across vendors then I'd wager that SYCL is the least likely possibility to begin with and it most probably runs on more widely supported standards like Direct3D or Vulkan ...

Consoles don't support SYCL at all but GNM supports advanced C++ features like templates so you can write your compute kernels in a single file thus it supports the single source programming model just like CUDA. Consoles are in too good of a state to support subpar standards like SYCL ...
 
If XeSS is "cross-compatible" across vendors then I'd wager that SYCL is the least likely possibility to begin with and it most probably runs on more widely supported standards like Direct3D or Vulkan ...
These don't have any ML capabilities though. So the DP4a version may use them I suppose but the XMX one likely can't.
 

Cross-posting from RT thread...

Alchemist-Slice2-678x452.jpg


Nice, I like "core" much better than "subslice". Intel is sticking to its guns and referring to self contained execution units as cores. This is arguably more accurate but pretty useless for comparison to AMD's and Nvidia's "cores". This picture also makes it seem like the RT and texture units can be accessed from any of the cores. That would be really interesting but the picture is probably just misleading. Rasterizer and ROPs inside the slice are very similar to Ampere and RDNA.

Intel Slice = Nvidia GPC = AMD Shader Array
Intel Core = Nvidia SM = AMD WGP
Intel Vector engine = Nvidia Partition = AMD SIMD

Did I get that right?
 
sqZgubF.jpg

Is that card just a mock up for marketing purposes or is the gpu really that big? It looks F**kin huge or am I just behind the times and that's what size gpu's are theses days?
 
Status
Not open for further replies.
Back
Top