Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

CarstenS · Aug 20, 2021

trinibwoy said:
This is a really weird take. Why should there be faster hardware at all for offline processes if you can just “take as much time as you want”. Obviously this is not true in the real world.

I think that's just a legit consumer perspective. For me, as a consumer, it does not matter how the neural networks are trained behind the scenes, only that my hardware is quick enough to apply it's learnings to my data (inference).

Bondrewd · Aug 20, 2021

trinibwoy said:
Why should there be faster hardware at all for offline processes if you can just “take as much time as you want”

Offline process is done on chungus DC boards hence why leave the silly GEMM tumors the fuck away from all and every client GPU.

Lurkmass · Aug 20, 2021

trinibwoy said:
This is a really weird take. Why should there be faster hardware at all for offline processes if you can just “take as much time as you want”. Obviously this is not true in the real world.

It doesn't really matter to the end user as to how exactly the model was trained, does it ? What does it matter to the consumer who can't even appreciate a process that's largely invisible to them ? Why should we care about what exact process was used behind the scenes to achieve the end result ? Does it directly affect the enjoyment of our product ?

trinibwoy · Aug 20, 2021

Lurkmass said:
It doesn't really matter to the end user as to how exactly the model was trained, does it ? What does it matter to the consumer who can't even appreciate a process that's largely invisible to them ? Why should we care about what exact process was used behind the scenes to achieve the end result ? Does it directly affect the enjoyment of our product ?

Did someone say that consumers should care about datacenter hardware?

If ML inferencing was fast enough on regular ALUs then Intel wouldn’t have included their XMX thingies on consumer GPUs.

Bondrewd · Aug 20, 2021

trinibwoy said:
then Intel wouldn’t have included their XMX thingies on consumer GPUs.

And they didn't on the relevant (iGP) ones.
Jeez stop being dense; GEMM tumors in client GPU is le data science pleb pandering faec and not what you think it is.

troyan · Aug 20, 2021

Guess the same is valid for DP4a, too?

trinibwoy · Aug 20, 2021

Bondrewd said:
And they didn't on the relevant (iGP) ones.
Jeez stop being dense; GEMM tumors in client GPU is le data science pleb pandering faec and not what you think it is.

Yeah sure. Hope you’re singing the same tune when AMD inevitably caves to those same plebs.

trinibwoy · Aug 20, 2021

troyan said:
Guess the same is valid for DP4a, too?

Don’t all modern GPUs already support DP4a?

troyan · Aug 20, 2021

I dont know. But it started with Gaming-Pascal for inference workload.

Deleted member 13524 · Aug 20, 2021

The only non-graphics usage of neural network inference in games that I know of is the muscle deformation in PS5's Miles Morales. It's running on mixed dot product RPM on the PS5's GPU and the performance hit was negligible according to Insomniac.

TopSpoiler said:
Try find out the slide titled "Deep Learning: The Future of Real-Time Rendering?", presented by Marco Salvi from Nvidia. It was in 2017.

I just did. It's here:
https://slideplayer.com/slide/12757526/

It's ~15 slides talking about deep learning in general and then 30 slides about antialiasing, reconstruction (DLSS) and denoising. Basically what we already have.

trinibwoy said:
If ML inferencing was fast enough on regular ALUs then Intel wouldn’t have included their XMX thingies on consumer GPUs.

Why did Intel include AVX-512 on dual-core 9W Ice Lake CPUs?
I mean at this point, why even insist on AVX-512 on all consumer CPUs at all?
And why are there TMUs and ROPs on A100? Why are there Ray Tracing units in Ponte Vecchio?

I understand that these chips' designs get locked around ~3 years before getting into production. Some of what goes in there might be a "just in case" option. There were many early and never-used tessellation blocks on ATi / AMD GPUs before DX11 arrived. And remember the VirtualLink port on Turing GPUs and even the reference 6800/6900XT?
And then there's the fact that designing these chips is hard, and often there are execution blocks that get validated and for time/cost reasons they just keep the same blocks unchanged across GPUs for different tiers and markets.

trinibwoy said:
Yeah sure. Hope you’re singing the same tune when AMD inevitably caves to those same plebs.

Keyword here being "when". And "when" it comes, the GPU in question might not be performant enough to enable those features without being bottlenecked by bandwidth / compute / fillrate / etc.

trinibwoy · Aug 20, 2021

DegustatoR said:
If it's SYCL then it will run on OpenCL/CUDA/OneAPI/whatever AMD has.

I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.

Intel’s claim of broad hardware support is missing some key details. On the same note if Nvidia wanted to “open source” DLSS it’s also not clear what api they would use.

trinibwoy · Aug 20, 2021

ToTTenTranz said:
Some of what goes in there might be a "just in case" option.

Maybe. But at the end of the day Intel had the benefit of learning from Nvidia’s attempts and still decided to allocate considerable budget to ML acceleration on their brand new gaming architecture. If they thought they could beat DLSS without XMX that’s what they would’ve done.

Keyword here being "when". And "when" it comes, the GPU in question might not be performant enough to enable those features without being bottlenecked by bandwidth / compute / fillrate / etc.

You are presuming that the same future will arrive whether or not there’s inferencing hardware in consumer GPUs today. That’s not how it works. The availability of the hardware will accelerate software research and innovation.

Bondrewd · Aug 20, 2021

trinibwoy said:
Hope you’re singing the same tune when AMD inevitably caves to those same plebs.

Good news!
Never happening.

ToTTenTranz said:
I mean at this point, why even insist on AVX-512 on all consumer CPUs at all?

Lol dead on ADL.
Physically dead.
Intel spent time and binning lazors to gut physically present AVX512 FMAs out of every single ADL to ever ship.

DegustatoR · Aug 20, 2021

trinibwoy said:
I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.

My current expectation is for DP4a version to just be a DX shader (with the NN model supplied as a precompiled DLL?)

trinibwoy said:
Intel’s claim of broad hardware support is missing some key details. On the same note if Nvidia wanted to “open source” DLSS it’s also not clear what api they would use.

Yeah, the XMX version especially is rather interesting in this regard.

Do we have any info on how Turing/Ampere runs DP4a by the way?

PSman1700 · Aug 20, 2021

ToTTenTranz said:
The only non-graphics usage of neural network inference in games that I know of is the muscle deformation in PS5's Miles Morales. It's running on mixed dot product RPM on the PS5's GPU and the performance hit was negligible according to Insomniac.

Well that must be the most ’meh’ implementation of ML so far. Probably could be done on the PS4 aswell.

Without acceleration were probably limited to some uninspiring muscle ’deformation’. The end user doesnt even notice this.

DegustatoR · Aug 20, 2021

PS5 GPU doesn't seem to support DP4a.

OlegSH · Aug 20, 2021

DegustatoR said:
PS5 GPU doesn't seem to support DP4a.

Hah, not that any per vertex calculations can be done with such precision

trinibwoy · Aug 20, 2021

DegustatoR said:
My current expectation is for DP4a version to just be a DX shader (with the NN model supplied as a precompiled DLL?)

Good call.

https://docs.microsoft.com/en-us/wi...lsl-shader-model-6-4-features-for-direct3d-12

Yeah, the XMX version especially is rather interesting in this regard.

Do we have any info on how Turing/Ampere runs DP4a by the way?

I would be shocked if Intel lifts a finger to make the dense matrix XMX version compatible with other hardware.

Deleted member 13524 · Aug 20, 2021

trinibwoy said:
I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.

Intel’s claim of broad hardware support is missing some key details.

Their main hardware target for the DP4a path are the Xe LP GPUs, but in fact they haven't talked much about the software stack. Direct Compute?

trinibwoy said:
You are presuming that the same future will arrive whether or not there’s inferencing hardware in consumer GPUs today. That’s not how it works. The availability of the hardware will accelerate software research and innovation.

This isn't linear. There are features that came up in PC GPU hardware and were never implemented. And there are features that came up because of demand from software development.
Besides, we still need proof that any of these implementations when done through DP4a RPM on the shader processors in discrete GPUs actually cause a performance drop so large that dedicated hardware is actually needed for it.

OlegSH · Aug 20, 2021

ToTTenTranz said:
Their main hardware target for the DP4a path are the Xe LP GPUs

So that something like 1.5 ms on Top SKU with XMX would translate into 1.5*2.2 (DP4A) *10 (10x times less shading horsepower) = 33 ms on Xe LP?
Not going to happen, even for 1080p it would still take crazy 8 ms

Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

CarstenS

Moderator

Bondrewd

Lurkmass

trinibwoy

Meh

Bondrewd

troyan

trinibwoy

Meh

trinibwoy

Meh

troyan

Deleted member 13524

Guest

trinibwoy

Meh

trinibwoy

Meh

Bondrewd

DegustatoR

PSman1700

DegustatoR

OlegSH

trinibwoy

Meh

Deleted member 13524

Guest

OlegSH

Similar threads