Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Cyan · Mar 27, 2022

trinibwoy said:
Wow Intel has some really strong opinions on RT architecture and they mostly seem to be saying that AMD is doing it all wrong.

Don’t do BVH traversal on vector units

DXR 1.0 style PSOs are better than DXR 1.1 ubershaders doing inline RT

While Intel’s approach is very similar to Nvidia they’re doing some things differently that could give them a big advantage. SIMD execution is only 8-wide when running RT meaning fewer and shorter stalls compared to Nvidia’s 32-wide warps. Intel is also sorting rays after each bounce which would further reduce losses due to divergence.

Alchemist could make things really interesting!

Thing is that the first to show Raytracing back in the day day were those of Intel, and when they did RT was still science fiction to say the least, so they have some experience in this regard that's for sure.

TopSpoiler · Mar 28, 2022

Question: Can it keep grouping efficiency when it should call very divergent materials like ray traced reflections?

pharma · Mar 28, 2022

Jawed · Mar 28, 2022

TopSpoiler said:
View attachment 6373

Question: Can it keep grouping efficiency when it should call very divergent materials like ray traced reflections?

In the video he says that target shaders (signatures) are sorted. Textures that make up materials are still an opportunity for memory divergence.

The sorting (backed by spilling state to cache hierarchy) looks pretty funky. The key is what happens after 2 or 3 ray bounces and whether there's enough targets to sort into "meaningfully full" hardware threads.

Simon F · Mar 29, 2022

trinibwoy said:
Wow Intel has some really strong opinions on RT architecture and they mostly seem to be saying that AMD is doing it all wrong.

Don’t do BVH traversal on vector units

DXR 1.0 style PSOs are better than DXR 1.1 ubershaders doing inline RT

While Intel’s approach is very similar to Nvidia they’re doing some things differently that could give them a big advantage. SIMD execution is only 8-wide when running RT meaning fewer and shorter stalls compared to Nvidia’s 32-wide warps. Intel is also sorting rays after each bounce which would further reduce losses due to divergence.

Alchemist could make things really interesting!

The Intel approach is not too dissimilar to that described by James McCombe at SIGGRAPH 2013 (See first video in https://dlnext.acm.org/doi/abs/10.1145/2504435.2504444 )

trinibwoy · Mar 29, 2022

Jawed said:
In the video he says that target shaders (signatures) are sorted. Textures that make up materials are still an opportunity for memory divergence.

The sorting (backed by spilling state to cache hierarchy) looks pretty funky. The key is what happens after 2 or 3 ray bounces and whether there's enough targets to sort into "meaningfully full" hardware threads.

Sorting may also incur unnecessary overhead for more trivial cases where there isn’t a lot of shader divergence. e.g. RT sun shadows. Hopefully it’s smart enough to know when to turn it off.

Nvidia says this about the benefits of separate shaders. They haven’t talked about sorting so I wonder how it helps on their hardware.

“In particular, avoid übershaders that manually switch between material models. When different material models are required, I recommend implementing each in a separate hit shader. This gives the system the best possibilities to manage divergent hit shading.”

Dayman1225 · Mar 29, 2022

Videocardz leaking specs of the A770M which is perhaps launching tomorrow?

https://twitter.com/i/web/status/1508910382060081152

Dayman1225 · Mar 29, 2022

Dayman1225 said:
Videocardz leaking specs of the A770M which is perhaps launching tomorrow?

https://twitter.com/i/web/status/1508910382060081152

Update on this since I can’t edit.

article posted by videocardz.

Only A300M series available from tomorrow A500M and A700M in early summer.
https://videocardz.com/newz/intel-a...res-and-16gb-g6-memory-to-launch-early-summer

Silent_Buddha · Mar 30, 2022

At 25-35W for A350M, if it has really good video decode performance I'd be really interested in that product as a discrete video card depending on its price.

Regards,
SB

CarstenS · Mar 30, 2022

Dayman1225 said:
Videocardz leaking specs of the A770M which is perhaps launching tomorrow?

https://twitter.com/i/web/status/1508910382060081152

Leaks already? Quelle surprise!

orangpelupa · Mar 30, 2022

Seems amd APU with rdna2 have much lower TDP. Just need to compare the sustained performance then

MfA · Mar 30, 2022

The 6000 series APUs seem in extremely tight supply, only showing up in high price high margin laptops for the moment ... EUV is only affordable for mass market to Apple for the moment it seems. Intel will likely have a far easier time producing A300 GPUs.

Cyan · Mar 30, 2022

CarstenS said:
Leaks already? Quelle surprise!

some more leaks:

Cyan · Mar 30, 2022

If they turn out to be good, I might return to Intel, which were my favourite "GPUs" at the time. I want to buy a laptop in the future. Since 2005 'til 2017 I always bought laptops. all of them with Integrated Intel GPUs -and one with a GTX 1050Ti too-, I feel nostalgia and a warm affection for the unfortunately basic :yep2:

GPUs of Intel.

I had a great time playing with low details -frustrating at times, specially when I purchased Diablo 3 day one-, but it was glorious to watch your games on a laptop at least.

Some more interesting info:

Intel Reveals Full Details for Its Arc A-Series Mobile Lineup | Tom's Hardware (tomshardware.com)

We suspect Intel will break 2GHz on desktop cards, but the mobile parts appear to top out at around 1.55GHz on the smaller chip and 1.65GHz on the larger chip. Do the math and the smaller ACM-G11 should have a peak throughput of over 3 TFLOPS FP32, with 25 TFLOPS of FP16 deep learning capability. The larger ACM-G10 will more than quadruple those figures, hitting peak throughput of 13.5 TFLOPS FP32 and 108 TFLOPS FP16.

swaaye · Mar 30, 2022

They've always been capable of playing some oldies. I remember playing with the i810 and being fairly satisfied with what it could do.

Sandy Bridge was when things started to get interesting.

I suppose I'm most interested in how they behave with SteamVR. NVidia has a lot of VR functionality. Some games support DLSS. They have VRSS foveated rendering capability too which might become very important. AMD has much less going for them. I don't know if Intel has anything going for VR.

trinibwoy · Mar 31, 2022

Intel is really stirring the pot with their more accurate definition of a core. How are we supposed to compare to the other guys where every SIMD lane is a “core”.

Still hdmi 2.0. I hope the entry level desktop cards are 2.1.

AlexV · Mar 31, 2022

trinibwoy said:
Intel is really stirring the pot with their more accurate definition of a core.

https://tenor.com/view/mando-way-this-is-the-way-mandalorian-star-wars-gif-18467370

DavidGraham · Mar 31, 2022

trinibwoy said:
How are we supposed to compare to the other guys where every SIMD lane is a “core”.

This method accounts for the change in architecture inside each GPU SM/Core.

Perhaps we should borrow from the CPU guys and introduce the concepts of cores and threads. Where a 3090Ti is composed of 84 cores, with each core having 128 threads.

Pressure · Mar 31, 2022

Cyan said:
If they turn out to be good, I might return to Intel, which were my favourite "GPUs" at the time. I want to buy a laptop in the future. Since 2005 'til 2017 I always bought laptops. all of them with Integrated Intel GPUs -and one with a GTX 1050Ti too-, I feel nostalgia and a warm affection for the unfortunately basic GPUs of Intel.

I had a great time playing with low details -frustrating at times, specially when I purchased Diablo 3 day one-, but it was glorious to watch your games on a laptop at least.

Some more interesting info:

Intel Reveals Full Details for Its Arc A-Series Mobile Lineup | Tom's Hardware (tomshardware.com)

We suspect Intel will break 2GHz on desktop cards, but the mobile parts appear to top out at around 1.55GHz on the smaller chip and 1.65GHz on the larger chip. Do the math and the smaller ACM-G11 should have a peak throughput of over 3 TFLOPS FP32, with 25 TFLOPS of FP16 deep learning capability. The larger ACM-G10 will more than quadruple those figures, hitting peak throughput of 13.5 TFLOPS FP32 and 108 TFLOPS FP16.

Did you really have the Intel 740 from 1998? Isn't that the only dedicated GPU they have made before?

Silent_Buddha · Mar 31, 2022

Cyan said:
some more leaks:

Damn, only HDMI 2.0b? All of my displays are HDMI only now and DP to HDMI 2.1 converters are still very much hit or miss as to whether or not they will work correctly. Bleh.

Regards,
SB

Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Cyan

orange

TopSpoiler

pharma

Jawed

Simon F

Tea maker

trinibwoy

Meh

Dayman1225

Dayman1225

Silent_Buddha

CarstenS

Moderator

orangpelupa

Elite Bug Hunter

MfA

Cyan

orange

Cyan

orange

swaaye

Entirely Suboptimal

trinibwoy

Meh

AlexV

Heteroscedasticitate

DavidGraham

Pressure

Silent_Buddha

Similar threads