Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
So that something like 1.5 ms on Top SKU with XMX would translate into 1.5*2.2 (DP4A) *10 (10x times less shading horsepower) = 33 ms on Xe LP?
Not going to happen, even for 1080p it would still take crazy 8 ms

No, it will be equal on every GPU. I mean when you dont need TensorCores why would i.e. a 3090 be faster than a 2060 Super?
Obviously some people here believe that XeSS doesnt scale with more compute performance. Otherwise claiming that "DP4a would be very fast" is a useless claim...
 
Obviously some people here believe that XeSS doesnt scale with more compute performance. Otherwise claiming that "DP4a would be very fast" is a useless claim...
Lossless performance scaling for low end Xe LP with 0 presence on market, that's what people want, who are we to judge them, lol
 
So that something like 1.5 ms on Top SKU with XMX would translate into 1.5*2.2 (DP4A) *10 (10x times less shading horsepower) = 33 ms on Xe LP?
Not going to happen, even for 1080p it would still take crazy 8 ms

Where are you taking the 1.5ms, the "2.2x" and the "10x" from?
That's a whole lot of assumptions out of a handful of bars without any scale on a slide that says "for conceptual illustration purposes only".

PixBtnT.jpeg






Furthermore, questioning the possibility of running XeSS on Xe LP using DP4A isn't a productive argument, considering that's exactly what Intel said they want to apply the DP4A path on.

Ryan Smith said:
With that said, Intel has gone one step further and is also developing a version of XeSS that doesn’t require dedicated matrix math hardware. Owing to the fact that the installation base for their matrix hardware is starting from 0, that they’d like to be able to use XeSS on Xe-LP integrated graphics, and that they want do everything possible to encourage game developers to adopt their XeSS technology, the company is developing a version of XeSS that instead uses the 4-element vector dot product (DP4a) instruction.
 
What Intel want and what Intel will actually be able to achieve aren't the same thing. We have zero details which would suggest that XeSS will be useable on Xe-LP. Intel has a history of providing support while not providing nearly enough performance for said support to be actually useable.
 
Last edited:
That's a whole lot of assumptions out of a handful of bars without any scale on a slide that says "for conceptual illustration purposes only"
You can measure precisely the length of the bars, DP4A bar is 2.2x longer, so DP4A is 2.2x slower.
10x are shading, texturing and other specs of the Xe HPG top configuration with 4096 SPs at 1.5x higher frequency mentioned on the arch day.
Of course the XMX bar can be anything from 1 ms in the best case (DLSS performance on 3090 in 4K) to 3 ms in the worst, so the lower bound for Xe LP is 22 ms and the upper bound is 66 ms in the worst case, 5.5 ms of the lower bound in 1080p is a lot.
5.5 ms can be feasible for the heaviest games with 4x upscaling factor, still pretty heavy, something like TSR or TAAU would likely do a better job of reconstructing from higher resolution at fewer milliseconds.

Furthermore, questioning the possibility of running XeSS on Xe LP using DP4A isn't a productive argument, considering that's exactly what Intel said they want to apply the DP4A path on.
Nobody will question the sensibility of running XeSS on Xe LP only when there are actual games with it running on Xe LP and performance/quality tests, until then, it's pretty questionable.
 
You can measure precisely the length of the bars, DP4A bar is 2.2x longer, so DP4A is 2.2x slower.

Intel Architecture Day 2021_Pressdeck_93.jpg



Untitledq.png



You're trying to take precise calculations out of a graph whose author says it's not precise.
That's a pointless exercise. ¯\_(ツ)_/¯



10x are shading, texturing and other specs of the Xe HPG top configuration with 4096 SPs at 1.5x higher frequency mentioned on the arch day.

Iris Xe 96 = 96 EUs * 8 ALUs = 768 shader units
Fully enabled Alchemist: 8 slices * 4 Xe-core * 16 vector engine * 8 ALUs = 4096 shader units

4096 / 768 = 5.3(3)x more units.
5.3(3)x more units at 1.5x higher frequency = 5.3(3) * 1.5 = 8x.
8x, not 10x.


Not only are you making calculations out of counting pixels from graph bars that are meant for "conceptual illustration purposes only", you're also doing said calculations wrong.
Perhaps because you're confusing the Xe MAX discrete GPU with 80 EUs with the integrated Iris Xe with 96 EUs.


Nobody will question the sensibility of running XeSS on Xe LP only when there are actual games with it running on Xe LP and performance/quality tests, until then, it's pretty questionable.
Out of everything that Intel has said and shown, claiming they're lying about their deep-learning upscaling method running efficiently on their integrated Iris Xe using RPM DP4A truly is a very specific flex.
Almost like someone's feeling threatened by that possibility. Imagine running XeSS on the popular GTX 1650 / Ti on laptops, and how that would put the newly released RTX 3050.
 
Last edited by a moderator:
Try find out the slide titled "Deep Learning: The Future of Real-Time Rendering?", presented by Marco Salvi from Nvidia. It was in 2017.

I just did. It's here:
https://slideplayer.com/slide/12757526/

It's ~15 slides talking about deep learning in general and then 30 slides about antialiasing, reconstruction (DLSS) and denoising. Basically what we already have.

I am mildly upset by someone re-hosting my presentation while erasing notes and destroying the quality of the images in it. You can get a better version here:

https://openproblems.realtimerendering.com/s2017/
 
You're trying to take precise calculations out of a graph whose author says it's not precise.
Yes, I highlighted this yesterday when that slide was posted here - https://forum.beyond3d.com/threads/...itecture-for-dgpus.60999/page-25#post-2219524
But that's all what we got, and yes, I prefer calculations to belief (into someone's probably misquoted, misspelled, misunderstood or overthinked words).
Moreover, the slide says "Subject to revision with further testing", so graph results are based on initial testing.
That means the graph should illustrate tested proportions in execution time between DP4A and XMX.

Iris Xe 96 = 96 EUs * 8 ALUs = 768 shader units
I looked at specs for a discrete SKU tested here - https://www.tomshardware.com/features/intel-xe-dg1-benchmarked
Have no idea what configs other DG1 SKUs have and whether they all use salvage of full parts.

With THG specs, that's 4096*1.5 / 640 = 9.6 times, or 10 when rounded to the closest integer value.

Not only are you making calculations out of counting pixels from graph bars that are meant for "conceptual illustration purposes only", you're also doing said calculations wrong.
These graphs are meant for "conceptual illustration purposes only" because they don't contain any performance numbers (are they too shy of them?), but the graphs itself are based on real measurements, so should be perfectly fine for calculating proportions, the scale is linear anyway.
Also, nice spin on calculations.

Perhaps because you're confusing the Xe MAX discrete GPU with 80 EUs with the integrated Iris Xe with 96 EUs.
Yes, I was talking about discrete solutions all the way here.

claiming they're lying about their deep-learning upscaling method running efficiently on their integrated Iris Xe using RPM DP4A truly is a very specific flex.
I've not seen any quotation of Intel guys on "deep-learning upscaling method running efficiently on their integrated Iris Xe using RPM DP4A", can you share the quotes from Intel guys?
Sorry, but I don't buy it, someone's retellings (that can be based on wrong assumptions) are not Intel's claims.
 
4k native is only ~2x as long as 1080p native on that graph, doesn't seem right.
So is the scaling in the Valley Of The Ancient Demo - https://gamegpu.com/test-video-cards/unreal-engine-5-valley-of-the-ancient-demo-test-gpu-cpu
Intel's XeSS demo is on UE5 too.
2x scaling going from 1080p to 4K is a common case in many games today even when geometry load should not be the same across different resolutions (UE5 should tune LODs per resolution, but looks like it doesn't affect performance much if at all).
I can imagine in addition to mostly constant geometry workload, many devs use lower resolution effects with higher rendering resolutions. Capcom for example did 1080p RT AO and reflections in 1080p and used the same 1080p RT AO and reflections for 4K at maximum settings in RE Village (you can see how it does not scale linearly with RT). Probably Epic does the same for Lumen in UE5.
 
Last edited:
Looking forward to Arc's mining performance. 256bit GDDR6 is nothing amazing but if Intel tries to be competitive against the duopoly we may get RTX 3070 level mining performance at reasonable prices and very efficient cards thanks to 6nm TSMC.
 
Woah, AI performance is much higher than on Ampere. That is impressive!
Much higher on dense matrices in comparison with 3070, or with a little bit of sparsity optimization it can be lower than 3070's 160 tflops with structural sparsity.
 
nVidia downgraded TensorOPS with Ampere over Turing. And Intel is on TSMC 6nm vs Samsung 8m. Looks normal to me.
 
Should we consider that a good thing?
Sure we should. Modern GPUs are used not just for games, there are many pro apps with DirectML acceleration support, sophisticated neural net denoising in OptiX, all kinds of ML based animation systems, ML apps, STEM apps, etc, etc.
AMD is traditionally passive here since developing this market would require SW effort, but it looks like Intel wants to make a difference on this growing market in the same way as NVIDIA and push for GPU usage beyond simple gaming.
 
nVidia downgraded TensorOPS with Ampere over Turing.
Downgraded is a wrong word for what nvidia did with Tensors. Tensor performance is the same per SM in both Ampere and Turing, on sparse matrices Ampere is 2x faster per SM. Obviously using the structured sparsity requires additional optimizations, but when done properly should provide huge boost in tensors perf/mm and perf/watt
 
Status
Not open for further replies.
Back
Top