Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
This is a really weird take. Why should there be faster hardware at all for offline processes if you can just “take as much time as you want”. Obviously this is not true in the real world.
I think that's just a legit consumer perspective. For me, as a consumer, it does not matter how the neural networks are trained behind the scenes, only that my hardware is quick enough to apply it's learnings to my data (inference).
 
This is a really weird take. Why should there be faster hardware at all for offline processes if you can just “take as much time as you want”. Obviously this is not true in the real world.

It doesn't really matter to the end user as to how exactly the model was trained, does it ? What does it matter to the consumer who can't even appreciate a process that's largely invisible to them ? Why should we care about what exact process was used behind the scenes to achieve the end result ? Does it directly affect the enjoyment of our product ?
 
It doesn't really matter to the end user as to how exactly the model was trained, does it ? What does it matter to the consumer who can't even appreciate a process that's largely invisible to them ? Why should we care about what exact process was used behind the scenes to achieve the end result ? Does it directly affect the enjoyment of our product ?

Did someone say that consumers should care about datacenter hardware?

If ML inferencing was fast enough on regular ALUs then Intel wouldn’t have included their XMX thingies on consumer GPUs.
 
The only non-graphics usage of neural network inference in games that I know of is the muscle deformation in PS5's Miles Morales. It's running on mixed dot product RPM on the PS5's GPU and the performance hit was negligible according to Insomniac.


Try find out the slide titled "Deep Learning: The Future of Real-Time Rendering?", presented by Marco Salvi from Nvidia. It was in 2017.
I just did. It's here:
https://slideplayer.com/slide/12757526/

It's ~15 slides talking about deep learning in general and then 30 slides about antialiasing, reconstruction (DLSS) and denoising. Basically what we already have.


If ML inferencing was fast enough on regular ALUs then Intel wouldn’t have included their XMX thingies on consumer GPUs.
Why did Intel include AVX-512 on dual-core 9W Ice Lake CPUs?
I mean at this point, why even insist on AVX-512 on all consumer CPUs at all?
And why are there TMUs and ROPs on A100? Why are there Ray Tracing units in Ponte Vecchio?


I understand that these chips' designs get locked around ~3 years before getting into production. Some of what goes in there might be a "just in case" option. There were many early and never-used tessellation blocks on ATi / AMD GPUs before DX11 arrived. And remember the VirtualLink port on Turing GPUs and even the reference 6800/6900XT?
And then there's the fact that designing these chips is hard, and often there are execution blocks that get validated and for time/cost reasons they just keep the same blocks unchanged across GPUs for different tiers and markets.


Yeah sure. Hope you’re singing the same tune when AMD inevitably caves to those same plebs.
Keyword here being "when". And "when" it comes, the GPU in question might not be performant enough to enable those features without being bottlenecked by bandwidth / compute / fillrate / etc.
 
If it's SYCL then it will run on OpenCL/CUDA/OneAPI/whatever AMD has.

I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.

Intel’s claim of broad hardware support is missing some key details. On the same note if Nvidia wanted to “open source” DLSS it’s also not clear what api they would use.
 
Some of what goes in there might be a "just in case" option.

Maybe. But at the end of the day Intel had the benefit of learning from Nvidia’s attempts and still decided to allocate considerable budget to ML acceleration on their brand new gaming architecture. If they thought they could beat DLSS without XMX that’s what they would’ve done.

Keyword here being "when". And "when" it comes, the GPU in question might not be performant enough to enable those features without being bottlenecked by bandwidth / compute / fillrate / etc.

You are presuming that the same future will arrive whether or not there’s inferencing hardware in consumer GPUs today. That’s not how it works. The availability of the hardware will accelerate software research and innovation.
 
I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.
My current expectation is for DP4a version to just be a DX shader (with the NN model supplied as a precompiled DLL?)

Intel’s claim of broad hardware support is missing some key details. On the same note if Nvidia wanted to “open source” DLSS it’s also not clear what api they would use.
Yeah, the XMX version especially is rather interesting in this regard.

Do we have any info on how Turing/Ampere runs DP4a by the way?
 
The only non-graphics usage of neural network inference in games that I know of is the muscle deformation in PS5's Miles Morales. It's running on mixed dot product RPM on the PS5's GPU and the performance hit was negligible according to Insomniac.

Well that must be the most ’meh’ implementation of ML so far. Probably could be done on the PS4 aswell.

Without acceleration were probably limited to some uninspiring muscle ’deformation’. The end user doesnt even notice this.
 
My current expectation is for DP4a version to just be a DX shader (with the NN model supplied as a precompiled DLL?)

Good call.

https://docs.microsoft.com/en-us/wi...lsl-shader-model-6-4-features-for-direct3d-12

Yeah, the XMX version especially is rather interesting in this regard.

Do we have any info on how Turing/Ampere runs DP4a by the way?

I would be shocked if Intel lifts a finger to make the dense matrix XMX version compatible with other hardware.
 
I wonder what Intel used for its DP4a test assuming it’s not just made up. SYCL doesn’t seem anywhere ready for prime time and it’s just additional overhead on top of CUDA/OpenCL. OpenCL seems like the better option since it’s supported on all relevant hardware already. Can’t find any reference to DP4a support in OpenCL though.

Intel’s claim of broad hardware support is missing some key details.
Their main hardware target for the DP4a path are the Xe LP GPUs, but in fact they haven't talked much about the software stack. Direct Compute?



You are presuming that the same future will arrive whether or not there’s inferencing hardware in consumer GPUs today. That’s not how it works. The availability of the hardware will accelerate software research and innovation.

This isn't linear. There are features that came up in PC GPU hardware and were never implemented. And there are features that came up because of demand from software development.
Besides, we still need proof that any of these implementations when done through DP4a RPM on the shader processors in discrete GPUs actually cause a performance drop so large that dedicated hardware is actually needed for it.
 
Status
Not open for further replies.
Back
Top